480 likes | 499 Views
Explore the steps to construct a Model Organism Database (MOD), including data sets, genetic maps, genome views, and software specifics.
E N D
How to Build a MOD Lincoln Stein Cold Spring Harbor Laboratory
What’s a MOD? • Model Organism Database • Repository for reagents • Stocks, vectors, clones • Genetic & physical maps • Large-scale data sets • Genome • EST sets, microarray results, 2-cell hybrid interactions • Literature • Ontologies & Nomenclature • Meetings, announcements
mek-2 RNAi Studies Found a Genetic Locus: mek-2 mek-2 Phenotype & Expr Pattern
How WormBase Works Web server Images, Movies Perl scripts You Database access library Genomic Data ACeDB MySQL
.ace .ace .ace .ace .ace WormBase Information Workflow CalTech Sanger WashU NCBI CGC Sanger CalTech Caltech.wormbase.org CSHL www.wormbase.org
Curating a Paper Clipping Service Domain Expert Gene Record Database Entry Cell Record Mutant Record CalTechAce .ACE Files .ACE File
Can You Reuse WormBase Software for your Favorite Organism? No!
Sorry Charlie • Wormbase website difficult to install • Data model nematode-centric • Curators tools very process-specific • Customization difficult • Software documentation uneven • Standard operating procedure documentation uneven
MOD Redux • SGD, MGD, FlyBase, TAIR… • The same basic idea as WormBase • Implementation entirely different • Wheel reinvented many times • Little software sharing • This madness must stop!
The GMOD Project • Portable, open source software to support model organism databases • Multiple MODs involved • Worm, fly, yeast, mouse, arabidopsis, rat, monocot, [fugu], [E. coli] • Funded by NIH as of June 2002 • Programmers, coordinator, quarterly meetings http://www.gmod.org
Modular Applications The GMOD Pyramid Modular Schema Open Source DBMS & Middleware
genetic maps liter- ature genome A MOD Construction Set map browser map editor Appplication Layer annotation pipeline genome browser genome editor citation browser citation editor Bioperl BioJava BioPython Middleware Layer genomes maps citations Database Layer
Current GMOD Packages • Chado modular schema • Apollo genome annotation editor • Gbrowse generic genome browser • PubSearch literature curation editor • CMAP comparative map browser • LabDoc standard operating procedure editor
Chado – Modular Schema • Immediate goal: common schema for use by FlyBase and WormBase • Ontology Driven • Small number of generic tables e.g. “feature” • Controlled vocabulary names subtypes & describes relationships among them • e.g. “transcript fg83.2 encodesprotein fp1803” • Detail tables provide further information on subtypes
Apollo Data adapters • Parser -> data models -> display • Existing data adapters • GAME XML • GFF • Ensembl CGI server • DAS • Write your own data adapter! • Extend AbstractDataAdapter class • Display options defined in config file
Who is Using Apollo? • BDGP • Reannotated Drosophila genome • Bristol-Myers Squibb • Launching Apollo from web browser via mime types • GNF • JDBC adapter layer over BioSQL • Biogen • View human genome alignment between public and Biogen internal database • Connected BLAT pipeline to Apollo • HGMP-RC Fugu Genomics group • Displaying annotations on fugu scaffolds
Extensively Customizable • End-user • Turn tracks on and off, change order, change packing & labeling attributes (stored in cookie) • Data provider • Change fonts, colors, text. • Change overview – genetic map, contigs, coverage, karyotype. • Define new tracks using simple config file. • Tinker with track appearance to hearts content.
Adding a New Track (a) Create a GFF file named “deletions.gff” Chr1 targeted deletion 1293224 1294901 . . . Deletion d101k2 Chr1 targeted deletion 8239811 8241116 . . . Deletion d680k2 Chr2 targeted deletion 5866382 5866500 . . . Deletion d007k2 (b) Run the load_gff.pl script > load_gff.pl –d example_database deletions.gff Loading features… Done. 3 features loaded. (c) Add a new track “stanza” to the gbrowse configuration file [Knockout] feature= deletion glyph= span fgcolor= red key = Knockouts link = http://example.org/cgi-bin/knockout_details?$name citation= These are deletion knockouts produced by the example knockout consortium (http://example.org/knockouts.html)
Extensively Extensible Plugins gbrowse CGI script Apache Web Server Glyphs Bio::Graphics library Oracle adaptor BioPerl library Flat File adaptor Bio::DB::GFF adaptor Chado adaptor Oracle MySQL/Postgres Flat Files
GenBank Proxy Adaptor Bio::DB::GFF adaptor GenBank MySQL GBrowse on GenBank? GBrowse on GenBank! Plugins gbrowse CGI script Apache Web Server Glyphs Bio::Graphics library BioPerl library
Who is Using GBrowse? • GMOD Members • WormBase, FlyBase, RatDB • HGMP-RC Fugu genomics group • KEGG (multiple microorganisms) • Ingenium AG (mouse) • Bristoll-Myers Squibb (drosophila) • Texas A&M University (salmonella) • Institute of Systems Biology (human)
Coming Soon to www.gmod.org • Biopipe – genome annotation pipeline • Insertional mutagenesis analysis pipeline • Tree browser • Pathway browsers • Generic MOD web site framework
Joining GMOD • Go to www.gmod.org • Examine software matrix • Find a project or suggest new one • Contact Scott Cain: cain@cshl.org • Or mail gmod-dev@lists.sourceforge.net
Credits CSHL Adrian Arva Shuly Avraham Scott Cain Ken Clark Allen Day BDGP Nomi Harris Suzanna Lewis Chris Mungall John Richter ShengQiang Shu Colin Weil EBI Michele Clamp Stephen Searle Carnegie Institute Sue Rhee Danny Yoo Harvard David Emmert Stan Letovsky http://www.gmod.org