Loading a RDS file from a URL#

A simple example showing how to read a dataset in the RDS format from a URL.

# sphinx_gallery_thumbnail_path = '_static/download.png'

If the data to read is accesible at a particular URL, we can open it as a file using the function urllib.request.urlopen(). Thus, we need to import that function as well as the rdata package.

from urllib.request import urlopen

import rdata

For this example we will use a dataset hosted at Zenodo. This is a small dataset containing information about some fungal pathogens.

dataset_url = (
    "https://zenodo.org/records/7425539/files/core_fungal_pathogens.rds"
)

The object resulting from calling urlopen() can be then passed to parse_file() as if it were a normal file.

/home/docs/checkouts/readthedocs.org/user_builds/rdata/envs/stable/lib/python3.9/site-packages/rdata/parser/_parser.py:1217: UserWarning: Unknown file type: assumed RDS
  warnings.warn("Unknown file type: assumed RDS")  # noqa: B028

RDS files do not have a special magic number that identifies them. Thus, when reading a RDS file, rdata has to suppose that the file is a valid RDS file, and warns about that. We can omit this warning by passing manually the extension of the file instead.

with urlopen(dataset_url) as dataset:
    parsed = rdata.parser.parse_file(dataset, extension=".rds")

This parsed object contains a lossless representation of the internal data contained in the file. This data mimics the internal format used in R, and is thus not directly usable. However, we can retrieve some information about the file that will be lost after the conversion to a Python object, such as the version of the format employed or the encoding used for the strings.

3
UTF-8

In order to convert it to Python objects we need to use the function rdata.conversion.convert().

RDS files contain just one R object. In this particular case, it is a R dataframe object, that will be converted to a Pandas dataframe by default.

species_taxid assembly_accession refseq_category taxid organism_name infraspecific_name assembly_level seq_rel_date asm_name ftp_path Species source.taxid source.name human.pathogen animal.pathogen plant.pathogen plant.host putative.human.host putative.animal.host putative.plant.host human.pathogen.source animal.pathogen.source plant.pathogen.source plant.host.source putative.human.host.source putative.animal.host.source putative.plant.host.source atlas.synonym.name atlas.synonym.taxid
1 4754 GCA_001477545.1 representative genome 1408658 Pneumocystis carinii B80 strain=B80 Contig 16791.0 Pneu_cari_B80_V3 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Pneumocystis carinii Pneumocystis carinii <NA> True <NA> <NA> <NA> True <NA> Cissé, O. H., Ma, L., Dekker, J. P., Khil, P. ... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
3 4837 GCA_001638985.2 representative genome 763407 Phycomyces blakesleeanus NRRL 1555(-) strain=NRRL 1555(-) Scaffold 16927.0 Phybl2 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Phycomyces blakesleeanus 4837 Phycomyces blakesleeanus <NA> <NA> True True <NA> <NA> <NA> Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... None <NA>
4 4839 GCA_000611695.1 representative genome 1031333 Rhizomucor miehei CAU432 strain=CAU432 Scaffold 16160.0 RhzM_1.0 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Rhizomucor miehei 4839 Rhizomucor miehei True <NA> <NA> <NA> True <NA> True Taylor, L. H., Latham, S. M., & Woolhouse, M. ... Conrad L Schoch, Stacy Ciufo, Mikhail Domrache... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
5 4840 GCA_900175165.2 representative genome 4840 Rhizomucor pusillus Scaffold 17781.0 FCH_5_7 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/9... Rhizomucor pusillus 4840 Rhizomucor pusillus True True <NA> <NA> True True <NA> Taylor, L. H., Latham, S. M., & Woolhouse, M. ... Smith, J.M. (2006). Fungal Pathogens of Nonhum... Conrad L Schoch, Stacy Ciufo, Mikhail Domrache... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
6 4841 GCA_000697255.1 representative genome 1357677 Mucor racemosus B9645 strain=B9645 Contig 16223.0 MucRacB9645-1.0 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Mucor racemosus 4841 Mucor racemosus True <NA> <NA> <NA> True True True Taylor, L. H., Latham, S. M., & Woolhouse, M. ... Conrad L Schoch, Stacy Ciufo, Mikhail Domrache... Wardeh, M., Risley, C., McIntyre, M. et al. Da... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1885 2714763 GCA_001636795.1 representative genome 1081108 Akanthomyces lecanii RCEF 1005 strain=RCEF 1005 Scaffold 16925.0 LEL 1.0 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Akanthomyces lecanii Verticillium lecanii <NA> True <NA> <NA> <NA> True True D. Chandler, G. Davidson, J. K. Pell, B. V. Ba... Wardeh, M., Risley, C., McIntyre, M. et al. Da... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
1886 2747968 GCA_000151355.1 representative genome 660122 Fusarium vanettenii 77-13-4 Scaffold 14483.0 v2.0 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Fusarium vanettenii 70791 Fusarium solani True <NA> True True <NA> <NA> True Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
1888 2778779 GCA_004016085.1 representative genome 2778779 Claviceps quebecensis Contig 17910.0 ASM401608v1 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Claviceps quebecensis 2778779 Claviceps quebecensis <NA> <NA> True True <NA> <NA> <NA> Farr, D.F., & Rossman, A.Y. Fungal Databases, ... Farr, D.F., & Rossman, A.Y. Fungal Databases, ... None <NA>
1890 2822231 GCA_001910725.1 representative genome 2822231 Phanerodontia chrysosporium strain=ATCC 20696 Contig 17155.0 ASM191072v1 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Phanerodontia chrysosporium 2822231 Phanerodontia chrysosporium True <NA> <NA> <NA> True <NA> True Anuradha Chowdhary, Shallu Kathuria, Kshitij A... Conrad L Schoch, Stacy Ciufo, Mikhail Domrache... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
1891 2867405 GCA_900239735.1 representative genome 546991 Blumeria hordei DH14 cultivar=Golden Promise Scaffold 17621.0 BGH_DH14_v4 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/9... Blumeria hordei 62688 Erysiphe graminis <NA> <NA> True True <NA> <NA> True Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>

954 rows × 29 columns



As usually we just want to parse and convert a given dataset, the convenience functions rdata.read_rds() and rdata.read_rda() can be used with that purpose.

species_taxid assembly_accession refseq_category taxid organism_name infraspecific_name assembly_level seq_rel_date asm_name ftp_path Species source.taxid source.name human.pathogen animal.pathogen plant.pathogen plant.host putative.human.host putative.animal.host putative.plant.host human.pathogen.source animal.pathogen.source plant.pathogen.source plant.host.source putative.human.host.source putative.animal.host.source putative.plant.host.source atlas.synonym.name atlas.synonym.taxid
1 4754 GCA_001477545.1 representative genome 1408658 Pneumocystis carinii B80 strain=B80 Contig 16791.0 Pneu_cari_B80_V3 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Pneumocystis carinii Pneumocystis carinii <NA> True <NA> <NA> <NA> True <NA> Cissé, O. H., Ma, L., Dekker, J. P., Khil, P. ... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
3 4837 GCA_001638985.2 representative genome 763407 Phycomyces blakesleeanus NRRL 1555(-) strain=NRRL 1555(-) Scaffold 16927.0 Phybl2 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Phycomyces blakesleeanus 4837 Phycomyces blakesleeanus <NA> <NA> True True <NA> <NA> <NA> Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... None <NA>
4 4839 GCA_000611695.1 representative genome 1031333 Rhizomucor miehei CAU432 strain=CAU432 Scaffold 16160.0 RhzM_1.0 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Rhizomucor miehei 4839 Rhizomucor miehei True <NA> <NA> <NA> True <NA> True Taylor, L. H., Latham, S. M., & Woolhouse, M. ... Conrad L Schoch, Stacy Ciufo, Mikhail Domrache... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
5 4840 GCA_900175165.2 representative genome 4840 Rhizomucor pusillus Scaffold 17781.0 FCH_5_7 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/9... Rhizomucor pusillus 4840 Rhizomucor pusillus True True <NA> <NA> True True <NA> Taylor, L. H., Latham, S. M., & Woolhouse, M. ... Smith, J.M. (2006). Fungal Pathogens of Nonhum... Conrad L Schoch, Stacy Ciufo, Mikhail Domrache... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
6 4841 GCA_000697255.1 representative genome 1357677 Mucor racemosus B9645 strain=B9645 Contig 16223.0 MucRacB9645-1.0 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Mucor racemosus 4841 Mucor racemosus True <NA> <NA> <NA> True True True Taylor, L. H., Latham, S. M., & Woolhouse, M. ... Conrad L Schoch, Stacy Ciufo, Mikhail Domrache... Wardeh, M., Risley, C., McIntyre, M. et al. Da... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1885 2714763 GCA_001636795.1 representative genome 1081108 Akanthomyces lecanii RCEF 1005 strain=RCEF 1005 Scaffold 16925.0 LEL 1.0 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Akanthomyces lecanii Verticillium lecanii <NA> True <NA> <NA> <NA> True True D. Chandler, G. Davidson, J. K. Pell, B. V. Ba... Wardeh, M., Risley, C., McIntyre, M. et al. Da... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
1886 2747968 GCA_000151355.1 representative genome 660122 Fusarium vanettenii 77-13-4 Scaffold 14483.0 v2.0 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Fusarium vanettenii 70791 Fusarium solani True <NA> True True <NA> <NA> True Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
1888 2778779 GCA_004016085.1 representative genome 2778779 Claviceps quebecensis Contig 17910.0 ASM401608v1 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Claviceps quebecensis 2778779 Claviceps quebecensis <NA> <NA> True True <NA> <NA> <NA> Farr, D.F., & Rossman, A.Y. Fungal Databases, ... Farr, D.F., & Rossman, A.Y. Fungal Databases, ... None <NA>
1890 2822231 GCA_001910725.1 representative genome 2822231 Phanerodontia chrysosporium strain=ATCC 20696 Contig 17155.0 ASM191072v1 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/0... Phanerodontia chrysosporium 2822231 Phanerodontia chrysosporium True <NA> <NA> <NA> True <NA> True Anuradha Chowdhary, Shallu Kathuria, Kshitij A... Conrad L Schoch, Stacy Ciufo, Mikhail Domrache... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>
1891 2867405 GCA_900239735.1 representative genome 546991 Blumeria hordei DH14 cultivar=Golden Promise Scaffold 17621.0 BGH_DH14_v4 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/9... Blumeria hordei 62688 Erysiphe graminis <NA> <NA> True True <NA> <NA> True Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... Tao Lu, Bo Yao, Chi Zhang, DFVF: database of f... Wardeh, M., Risley, C., McIntyre, M. et al. Da... None <NA>

954 rows × 29 columns



Total running time of the script: (0 minutes 7.543 seconds)

Gallery generated by Sphinx-Gallery