190 likes | 282 Views
Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase. Eric Just dictyBase http://dictybase.org Center for Genetic Medicine Northwestern University. WHY?. dictyBase based on SGD Increase flexibility in feature storage
E N D
Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase Eric Just dictyBase http://dictybase.org Center for Genetic Medicine Northwestern University
WHY? • dictyBase based on SGD • Increase flexibility in feature storage • Want to use CHADO for feature data, but ‘dicty’ SGD schema for the rest • ‘dicty’ SGD (Oracle) needs to link to CHADO Eric Just - dictyBase – Northwestern University
Schema porting • SQL Fairy did most of this, but • Had to tweak Oracle Producer • Object name limited to 30 characters, systematically truncate names • Unique/primary keys on CLOBs (text) not allowed, changed to varchar2(4000) • ‘SYNONYM’ reserved name in Oracle, changed name to ‘SYNONYM _’ Eric Just - dictyBase – Northwestern University
Class::DBI • Class::DBI provides nice ‘table level’ abstraction • CRUD, follow references WITHOUT WRITING SQL • Excellent tool for portability • GMOD ships with Class::DBI configured for CHADO • Had to fix/customize Oracle Driver Eric Just - dictyBase – Northwestern University
AutoDBI • Package which loads Class::DBI classes for each table • Keep class name Chado::Synonym but call set_up_table( ‘synonym_’ ) • Made ‘residues’ a ‘lazy’ column of Chado::Feature • No other Significant porting needed Eric Just - dictyBase – Northwestern University
Data Migration GFF3 CHADO ‘dicty’ SGD • Export chromosome sequences and locations in GFF3 • Load GFF3 into CHADO schema • Update references to features with new tables and id’s Eric Just - dictyBase – Northwestern University
GBrowse porting • ‘rows’ method does not exist in Oracle DBI Driver if ($sth->rows() == 0) {…} my $rows_returned = @{$sth->fetchall_arrayref()}; $sth->execute or Bio::Root::Root->throw(); if ( $rows_returned== 0) {…} • Oracle fetchrow_hashref() is case sensitive $sth->fetchrow_hashref() $sth->fetchrow_hashref("NAME_lc") Eric Just - dictyBase – Northwestern University
GBrowse porting - Queries • Oracle does not like anything in a ‘using’ clause to also be in the ‘where’ clause select f.feature_id, f.name, fl.fmin,fl.fmax from feature f join featureloc fl using (feature_id) where f.feature_id = 221659 and fl.rank=0; select f.feature_id, f.name, fl.fmin,fl.fmax from feature f join featureloc fl on f.feature_id = fl.feature_id where f.feature_id = 221659 and fl.rank=0; • ‘substring’ becomes ‘substr’ • Any SQL containing synonym table must be modified • Any procedural SQL must be reproduced, in some cases this can be avoided Eric Just - dictyBase – Northwestern University
Tuning • Added is_deleted flag to feature table • Added some audit columns • Added audit table and triggers • Created Indexes Heuristically • Added hints to some difficult queries Eric Just - dictyBase – Northwestern University
Integrating into dictyBase I Various middleware and presentation objects dictyBase Presentation Layer dictyBase Object Model ‘Dbtable’ database abstraction layer ‘dicty’ SGD Eric Just - dictyBase – Northwestern University
Integrating into dictyBase II dictyBase Presentation Layer Various middleware and presentation objects dictyBase Object Model ‘Dbtable’ layer Class::DBI layer ‘dicty’ SGD CHADO Eric Just - dictyBase – Northwestern University
dictyBase Objects • Retrieve, insert, update, delete • Interface ignorant of schema • No presentation in data classes • Easy to use interfaces • Tuned with lazy evaluation most accessors • 75 – 80% unit test coverage Eric Just - dictyBase – Northwestern University
Use BioPerl • Use Bio::Seq to represent sequences • Use Bio::SeqFeatures to represent transcript and alignment locations • Harness the power of BioPerl for sequence tasks, file generation • NOTE: BioPerl only used for sequence and location Eric Just - dictyBase – Northwestern University
Class Diagram Feature Aligned mRNA Contig Chromosome getOverlappingFeatures() getOverlappingAlignments() Bio::SeqFeature::Generic Bio::SeqFeature::Gene::Transcript Bio::SeqFeature::Generic Bio::Seq Eric Just - dictyBase – Northwestern University
Object use case: Add an Exon, dbxref, and Description #!perl use dicty::Feature; my $transcript = new dicty::Feature( -feature_no => 218420 ); $transcript->description( ‘Gene model derived from AU12345' ); $transcript->add_external_id( -source => ‘GenBank Accession Number', -id => 'AU12345' ); $bioperl = $transcript->bioperl(); [$bioperl->exons()]->[2]->start( 281050 ); my $exon = Bio::SeqFeature::Gene::Exon->new( -start => 280921, -end => 280959, -strand => -1 ); $exon->is_coding(1); $bioperl->add_exon($exon); $transcript->update(); Eric Just - dictyBase – Northwestern University
Using Apollo GenBank file GenBank file GenBank file GenBank file Object layer Chado • Request segment through SOAP message over HTTP • Object layer generates GenBank File • Send GenBank File via SOAP message • Modify, in Apollo send changed gene models back via SOAP • Adaptor changes gene models and updates the database Eric Just - dictyBase – Northwestern University
New Curation Tools • Gene and Feature curation had to be rewritten • ‘Gene centric’ curation • Added more evidence qualifiers • Presentation classes that manipulate Object Layer Eric Just - dictyBase – Northwestern University
Where Are We Going • Utilize the flexibility – New Feature Types, feature relations, and SO. • Contribute back to GMOD • Gradually port different areas into CHADO • Provide feedback and testing ground for database independence Eric Just - dictyBase – Northwestern University
Acknowlegments Other Groups • Funding • NIH (NIGMS and NHGRI) • SGD • GMOD • CHADO • GBrowse • Apollo • BioPerl dictyBase • PIs • Rex Chisholm, PhD • Warren Kibbe, PhD • Programmer • Sohel Merchant • Curators • Petra Fey • Pascale Gaudet, PhD • Karen Pilcher • Bioinformatics Core at Northwestern Eric Just - dictyBase – Northwestern University