600 likes | 766 Views
The GMOD Project. Lincoln Stein Cold Spring Harbor Laboratory. Test Subject: Michael Caudy. Drosophila neurobiologist Proneural differentiation notch pathway HLH transcriptional activators/repressors achaete/scute complex No computer science training
E N D
The GMOD Project Lincoln Stein Cold Spring Harbor Laboratory
Test Subject: Michael Caudy • Drosophila neurobiologist • Proneural differentiation • notch pathway • HLH transcriptional activators/repressors • achaete/scute complex • No computer science training • Took my “bioinformatics for biologists” course
“Simple” Problem • Discover the transcriptional factor binding site code controlling proneural differentiation.
Regular Expression Search • Using achaete promoter as exemplar, search for combinations of known binding sites in particular architectures
Mike’s Got Lots of Data • 90-11,000 TF binding site clusters • 100s-1000s of genes • millions of interactions • Which genes are involved in neural differentiation? • Which have interactions with the pathway? • Which have suggestive mutant phenotypes?
Mike Needs a Database • Database management system for proneural differentiation genes. • Visualization/exploration tools for relationship of genes to putative TF clusters. • Literature citations • Link out to FlyBase, Genbank & other DBs. • Add notes and other annotations.
Try to do it with Filemaker • “Cluster-centric” vs “gene-centric”? • Data import from FlyBase? • Storing images? • Maintaining relationships between genes & clusters? • Updates?
Mike Needs a MOD • Model Organism Database • Repository for reagents • Stocks, vectors, clones • Genetic & physical maps • Large-scale data sets • Genome • EST sets, microarray results, 2-cell hybrid interactions • Literature • Ontologies & Nomenclature • Meetings, announcements
How WormBase Works Web server Images, Movies Perl scripts You Database access library Genomic Data ACeDB MySQL
Sorry Mike • WormBase website difficult to install • Data model nematode-centric • Data entry tools very process-specific • Customization difficult • Software documentation uneven • Standard operating procedure documentation uneven
MOD Redux • SGD, MGD, FlyBase, TAIR, RGD… • The same basic idea as WormBase • Implementation entirely different • Wheel reinvented many times • Little software sharing • This madness must stop!
The GMOD Project • Portable, open source software to support model organism databases • Multiple MODs involved • Worm, fly, yeast, mouse, arabidopsis, rat, monocot, [fugu], [E. coli] • Funded by NIH as of June 2002 • Programmers, coordinator, quarterly meetings http://www.gmod.org
Modular Applications The GMOD Pyramid Modular Schema Open Source DBMS & Middleware
genetic maps liter- ature genome A MOD Construction Set map browser map editor Appplication Layer annotation pipeline genome browser genome editor citation browser citation editor Bioperl BioJava BioPython Middleware Layer genomes maps citations Database Layer
Chado – Modular Schema • Common schema for use by FlyBase and WormBase • Ontology Driven • Small number of generic tables e.g. “feature” • Controlled vocabulary names object types and relationships among them: • “achaeteproteinis aHLH activator” • “m8 proteininhibitsachaetetranscription” • Evidence-Savvy
GMOD Applications • Apollo genome annotation editor • Gbrowse generic genome browser • PubSearch literature curation editor • CMAP comparative map browser • IMD insertional mutagenesis database management system
Apollo Data adapters • Parser -> data models -> display • Existing data adapters • GAME XML • GFF • Ensembl CGI server • DAS • Write your own data adapter! • Extend AbstractDataAdapter class • Display options defined in config file
Who is Using Apollo? • BDGP • Reannotated Drosophila genome • Bristol-Myers Squibb • Launching Apollo from web browser via mime types • GNF • JDBC adapter layer over BioSQL • Biogen • View human genome alignment between public and Biogen internal database • Connected BLAT pipeline to Apollo • HGMP-RC Fugu Genomics group • Displaying annotations on fugu scaffolds
Extensively Customizable • End-user • Turn tracks on and off, change order, change packing & labeling attributes (stored in cookie) • Data provider • Change fonts, colors, text. • Change overview – genetic map, contigs, coverage, karyotype. • Define new tracks using simple config file. • Tinker with track appearance to hearts content.
Adding a New Track (a) Create a GFF file named “deletions.gff” Chr1 targeted deletion 1293224 1294901 . . . Deletion d101k2 Chr1 targeted deletion 8239811 8241116 . . . Deletion d680k2 Chr2 targeted deletion 5866382 5866500 . . . Deletion d007k2 (b) Run the load_gff.pl script > load_gff.pl –d example_database deletions.gff Loading features… Done. 3 features loaded. (c) Add a new track “stanza” to the gbrowse configuration file [Knockout] feature= deletion glyph= span fgcolor= red key = Knockouts link = http://example.org/cgi-bin/knockout_details?$name citation= These are deletion knockouts produced by the example knockout consortium (http://example.org/knockouts.html)
Extensively Extensible Plugins gbrowse CGI script Apache Web Server Glyphs Bio::Graphics library Oracle adaptor BioPerl library Flat File adaptor Bio::DB::GFF adaptor Chado adaptor Oracle MySQL/Postgres Flat Files
GenBank Proxy Adaptor Bio::DB::GFF adaptor GenBank MySQL GBrowse on GenBank? GBrowse on GenBank! Plugins gbrowse CGI script Apache Web Server Glyphs Bio::Graphics library BioPerl library
Who is Using GBrowse? • GMOD Members • WormBase, FlyBase, RatDB • HGMP-RC Fugu genomics group • KEGG (multiple microorganisms) • Ingenium AG (mouse) • Bristoll-Myers Squibb (drosophila) • Texas A&M University (salmonella) • McGill University (human chr7) • Institute of Systems Biology (human)