180 likes | 353 Views
wFleaBase Daphnia Genome Database from Common Components. Daphnia Genomic Consortium Meeting, Sept. 2003. Don Gilbert, gilbertd@indiana.edu. http://iubio.bio.indiana.edu/daphnia. A Replicable Genome infOrmation System ( Argos ). http://eugenes.org/argos | flybase.net/flybase-ng
E N D
wFleaBaseDaphnia Genome Database from Common Components Daphnia Genomic Consortium Meeting, Sept. 2003 Don Gilbert, gilbertd@indiana.edu
A Replicable Genome infOrmation System ( Argos ) http://eugenes.org/argos | flybase.net/flybase-ng common/ java/ ; perl/ -- program libraries and packages servers/ -- major programs (BLAST, MySql/PostgreSQL, others) systems/ -- OS executables of programs daphnia/ .. implemented organism genome systems eugenes/ flybase/ docs/ & install/ -- Argos instructions and usage template/ -- structure for new projects ROOT/ -- common directory of installed projects
Argos features Common genome tool set • Share benefits of “best of breed” genome tools • Common parts are tested & maintained by others • Minimal IT expertise (no compiles or system management) • Choice of tools (existing or new genome DB use parts desired) Flexible project packages • Project needs specify tool set (compare EnsEMBL where all use one set) • Own look’n’feel web pages, contents, functions • Security for protected and public sections Easy replication to any Unix computer • ‘Live’ database system replication using rsync • Keep remote servers up-to-date every day • Local cluster/grid for high-volume traffic • Works on common workstations, laptops
Argos - advanced features Data mining • Fulfill need to search & retrieve 1000s of genes • Web Services, Grid Services and LDAP for large data sets • Simple, computable, industry standards for query by criteria and retrieval of volumes of data • Bypass time-consuming web pages made for people • Use with personal, lab databases to keep genome links up-to-date
Argos common parts Java common library, Ant builds, XML Tools, Web Services (Axis), Lucene for “Google”-like searches Perl common library of BioPerl, GBrowse, others Servers include Apache, Tomcat web servers MySQL, PostgreSQL databases BLAST (NCBI) Systems compiled for apple-powerpc-darwin, intel-linux, sun-sparc-solaris
wFleaBase structure Cgi-bin -- Web programs(Perl) Common -- Link to common, shared tools Conf -- Site configurations for web, data Data -- Bulk data & FTP site folder Dbs -- Project databases: blast, lucene, mysql Indices -- Database indices Lib -- Program libraries Web -- Web structure and documents Genomics, Sequences, Maps, Literature, Stocks, Docs, other includes Public and Protected (project member only) parts Webapps -- Web programs (Java) includes Search system, Secure web and editing
Where to put Daphnia Genome? Database needs • Automated annotation and curated updates • Search and retrieve data subsets Choices • EnsEMBL - working now, Gramene & others use • GMOD:Chado - in development (FlyBase,WormBase, ChlamyGenome,TIGR, others will use) • Others choices?
Generic Model Organism Database Construction Set www.gmod.org • Genome+ Database (more than annotations) • Genome visualization tools • Genome annotation pipeline planned • Literature curation and Gene Ontology tools • Component system (pick and choose) • Developing - more complete in 2004
EnsEMBL Genome Database www.ensembl.org • Genome annotation database • Genome visualization tools • Genome annotation pipeline • Comprehensive system (all or none) • Production - useable now
wFleaBase issues • Basic web system ready for genome data? • Start with EnsEMBL for management; move to GMOD:Chado if better choice? • Add GMOD GBrowse; Apollo Editor with genome • Add “Self-service” database features for? • Easy management by scientists • Genome data; stocks; research literature • Add evolutionary, ecological, environmental data Prototype at http://iubio.bio.indiana.edu/daphnia/