60 likes | 76 Views
Overview of tools for importing/exporitng data in bioinformatics research, including Java & Perl APIs, BioPAX export, multiple file formats, and connections to databases like BioWarehouse.
E N D
Data Import / Export Markus Krummenacker Bioinformatics Research Group SRI, International Q3 2012
Data Exchange Overview • Java API and Perl API : read & modify • BioPAX Export: since Pathway Tools 9.0 • Biopax.org • Export of entire PGDB as a set of Flatfiles • Export of Reactions as SBML -- sbml.org • Import/Export of Pathways: between PGDBs • Import/Export of Selected Frames, for Spreadsheets • Import/Export of Compounds as Molfile, CML • Registering/Publishing PGDBs on WWW • Export PGDB as Genbank • BioWarehouse : Loader for Flatfiles, SQL access • http://bioinformatics.ai.sri.com/biowarehouse/
Import/Export of Pathways, etc. • Export selected pathways (and related objects) as a file • Import this file into a different PGDB • Can be used for submitting pathways to MetaCyc. See http://metacyc.org/MetaCycPosting.shtml • Visit page of pathway (or object), and right-click choose • Edit->Add Object to File Export List • File->Export->Selected Objects to Lisp-Format File • File->Import->Frames from Lisp-Format File
Dump PGDB into Flatfiles • Export of entire PGDB as Flatfiles • Format Description: http://bioinformatics.ai.sri.com/ptools/flatfile-format.html • Column delimited: 1 line per frame • Attribute-value: 1 record per frame • Multiple slot values: • Column delimited: several values per column • Attribute-value: several lines for several values
Frame Import/Export • Import/Export of Selected Frames, for Spreadsheets • Allows external editing of frames, and also frame creation • Detailed Description: UG section 5.6 • Export: GUI for Frame selection, Slot selection • Slots depend on selected class • Caveat: value annots in slots get lost ! • Direct or all instances under class can be exported • Import: Many choices for merging or replacing data values • File Format Choices like the Flatfiles: • Column delimited: 1 line per frame • Attribute-value: 1 record per frame • Multiple slot values: • Column delimited: several values per column • Attribute-value: several lines for several values
Misc. • Export of a replicon as a Genbank file • Pathologic is the inverse, “Import” • But: information loss, e.g. gene product comments have no feature qualifier in Genbank • Importing protein features from UniProt • Connection to MySQL BioWarehouse needed • See UG section 5.8 • Importing Citations from PubMed