1 / 19

Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase

Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase. Eric Just dictyBase http://dictybase.org Center for Genetic Medicine Northwestern University. WHY?. dictyBase based on SGD Increase flexibility in feature storage

norina
Download Presentation

Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase Eric Just dictyBase http://dictybase.org Center for Genetic Medicine Northwestern University

  2. WHY? • dictyBase based on SGD • Increase flexibility in feature storage • Want to use CHADO for feature data, but ‘dicty’ SGD schema for the rest • ‘dicty’ SGD (Oracle) needs to link to CHADO Eric Just - dictyBase – Northwestern University

  3. Schema porting • SQL Fairy did most of this, but • Had to tweak Oracle Producer • Object name limited to 30 characters, systematically truncate names • Unique/primary keys on CLOBs (text) not allowed, changed to varchar2(4000) • ‘SYNONYM’ reserved name in Oracle, changed name to ‘SYNONYM _’ Eric Just - dictyBase – Northwestern University

  4. Class::DBI • Class::DBI provides nice ‘table level’ abstraction • CRUD, follow references WITHOUT WRITING SQL • Excellent tool for portability • GMOD ships with Class::DBI configured for CHADO • Had to fix/customize Oracle Driver Eric Just - dictyBase – Northwestern University

  5. AutoDBI • Package which loads Class::DBI classes for each table • Keep class name Chado::Synonym but call set_up_table( ‘synonym_’ ) • Made ‘residues’ a ‘lazy’ column of Chado::Feature • No other Significant porting needed Eric Just - dictyBase – Northwestern University

  6. Data Migration GFF3 CHADO ‘dicty’ SGD • Export chromosome sequences and locations in GFF3 • Load GFF3 into CHADO schema • Update references to features with new tables and id’s Eric Just - dictyBase – Northwestern University

  7. GBrowse porting • ‘rows’ method does not exist in Oracle DBI Driver if ($sth->rows() == 0) {…} my $rows_returned = @{$sth->fetchall_arrayref()}; $sth->execute or Bio::Root::Root->throw(); if ( $rows_returned== 0) {…} • Oracle fetchrow_hashref() is case sensitive $sth->fetchrow_hashref() $sth->fetchrow_hashref("NAME_lc") Eric Just - dictyBase – Northwestern University

  8. GBrowse porting - Queries • Oracle does not like anything in a ‘using’ clause to also be in the ‘where’ clause select f.feature_id, f.name, fl.fmin,fl.fmax from feature f join featureloc fl using (feature_id) where f.feature_id = 221659 and fl.rank=0; select f.feature_id, f.name, fl.fmin,fl.fmax from feature f join featureloc fl on f.feature_id = fl.feature_id where f.feature_id = 221659 and fl.rank=0; • ‘substring’ becomes ‘substr’ • Any SQL containing synonym table must be modified • Any procedural SQL must be reproduced, in some cases this can be avoided Eric Just - dictyBase – Northwestern University

  9. Tuning • Added is_deleted flag to feature table • Added some audit columns • Added audit table and triggers • Created Indexes Heuristically • Added hints to some difficult queries Eric Just - dictyBase – Northwestern University

  10. Integrating into dictyBase I Various middleware and presentation objects dictyBase Presentation Layer dictyBase Object Model ‘Dbtable’ database abstraction layer ‘dicty’ SGD Eric Just - dictyBase – Northwestern University

  11. Integrating into dictyBase II dictyBase Presentation Layer Various middleware and presentation objects dictyBase Object Model ‘Dbtable’ layer Class::DBI layer ‘dicty’ SGD CHADO Eric Just - dictyBase – Northwestern University

  12. dictyBase Objects • Retrieve, insert, update, delete • Interface ignorant of schema • No presentation in data classes • Easy to use interfaces • Tuned with lazy evaluation most accessors • 75 – 80% unit test coverage Eric Just - dictyBase – Northwestern University

  13. Use BioPerl • Use Bio::Seq to represent sequences • Use Bio::SeqFeatures to represent transcript and alignment locations • Harness the power of BioPerl for sequence tasks, file generation • NOTE: BioPerl only used for sequence and location Eric Just - dictyBase – Northwestern University

  14. Class Diagram Feature Aligned mRNA Contig Chromosome getOverlappingFeatures() getOverlappingAlignments() Bio::SeqFeature::Generic Bio::SeqFeature::Gene::Transcript Bio::SeqFeature::Generic Bio::Seq Eric Just - dictyBase – Northwestern University

  15. Object use case: Add an Exon, dbxref, and Description #!perl use dicty::Feature; my $transcript = new dicty::Feature( -feature_no => 218420 ); $transcript->description( ‘Gene model derived from AU12345' ); $transcript->add_external_id( -source => ‘GenBank Accession Number', -id => 'AU12345' ); $bioperl = $transcript->bioperl(); [$bioperl->exons()]->[2]->start( 281050 ); my $exon = Bio::SeqFeature::Gene::Exon->new( -start => 280921, -end => 280959, -strand => -1 ); $exon->is_coding(1); $bioperl->add_exon($exon); $transcript->update(); Eric Just - dictyBase – Northwestern University

  16. Using Apollo GenBank file GenBank file GenBank file GenBank file Object layer Chado • Request segment through SOAP message over HTTP • Object layer generates GenBank File • Send GenBank File via SOAP message • Modify, in Apollo send changed gene models back via SOAP • Adaptor changes gene models and updates the database Eric Just - dictyBase – Northwestern University

  17. New Curation Tools • Gene and Feature curation had to be rewritten • ‘Gene centric’ curation • Added more evidence qualifiers • Presentation classes that manipulate Object Layer Eric Just - dictyBase – Northwestern University

  18. Where Are We Going • Utilize the flexibility – New Feature Types, feature relations, and SO. • Contribute back to GMOD • Gradually port different areas into CHADO • Provide feedback and testing ground for database independence Eric Just - dictyBase – Northwestern University

  19. Acknowlegments Other Groups • Funding • NIH (NIGMS and NHGRI) • SGD • GMOD • CHADO • GBrowse • Apollo • BioPerl dictyBase • PIs • Rex Chisholm, PhD • Warren Kibbe, PhD • Programmer • Sohel Merchant • Curators • Petra Fey • Pascale Gaudet, PhD • Karen Pilcher • Bioinformatics Core at Northwestern Eric Just - dictyBase – Northwestern University

More Related