440 likes | 576 Views
Value-adding, Access, and Use: Biological Databases as a Case Study. Genes…. …….make proteins. Proteins form complex 3D structures. Molecules interact. the right molecules need to be present at the right time. EMBL-Bank DNA sequences. EMBL-Bank DNA sequences. SWISS-PROT + TrEMBL
E N D
Value-adding, Access, and Use: Biological Databases as a Case Study
EMBL-BankDNA sequences SWISS-PROT + TrEMBL InterPro
EMBL-BankDNA sequences SWISS-PROT + TrEMBL InterPro EnsEMBL Metazoan Genome Gene Annotation
EMBL-BankDNA sequences Array-Express Microarray Expression Data SWISS-PROT + TrEMBL InterPro EnsEMBL Metazoan Genome Gene Annotation
EMBL-BankDNA sequences Array-Express Microarray Expression Data SWISS-PROT + TrEMBL InterPro EnsEMBL Metazoan Genome Gene Annotation
EMBL-BankDNA sequences Array-Express Microarray Expression Data SWISS-PROT + TrEMBL InterPro EnsEMBL Metazoan Genome Gene Annotation EMSD Macromolecular Structure Data
EMBL-BankDNA sequences Array-Express Microarray Expression Data SWISS-PROT + TrEMBL InterPro EnsEMBL Metazoan Genome Gene Annotation EMSD Macromolecular Structure Data
EMBL-BankDNA sequences Array-Express Microarray Expression Data SWISS-PROT + TrEMBL InterPro EnsEMBL IntAct Protein Protein Interaction Data EMSD Macromolecular Structure Data
EMBL-BankDNA sequences Array-Express Microarray Expression Data SWISS-PROT + TrEMBL InterPro EnsEMBL IntAct Protein Protein Interaction Data EMSD Macromolecular Structure Data
EMBL-BankDNA sequences SWISS-PROT + TrEMBL InterPro IntAct Protein Protein Interaction Data
Running a database project Database design End Users Service Tools Service DB Genomes Genes Patents Updates Submitters Add value (computation) Releases & Updates Q/C etc Add value (review etc.)
Running a database project Database design End Users Service Tools Production DB Service DB Genomes Genes Patents Updates Submitters Add value (computation) Releases & Updates Q/C etc Add value (review etc.)
Running a database project Database design End Users Service Tools Production DB Service DB Genomes Genes Patents Updates Submission tools Submitters Add value (computation) Releases & Updates Q/C etc Add value (review etc.)
Running a database project Database design End Users Service Tools Production DB Service DB Genomes Genes Patents Updates Submission tools Submitters Add value (computation) Releases & Updates Q/C etc Add value (review etc.)
Running a database project Database design End Users Service Tools Production DB Service DB Genomes Genes Patents Updates Submission tools Submitters Add value (computation) Releases & Updates Q/C etc Add value (review etc.)
Running a database project Database design End Users Service Tools Production DB Service DB Genomes Genes Patents Updates Submission tools Submitters Add value (computation) Releases & Updates Q/C etc Add value (review etc.)
Running a database project Database design End Users Service Tools Production DB Service DB Genomes Genes Patents Updates Submission tools Submitters Data Distrib. Add value (computation) Releases & Updates Q/C etc Add value (review etc.)
Running a database project Other archives Database design End Users Data exchange Service Tools Production DB Service DB Genomes Genes Patents Updates Submission tools Submitters Data Distrib. Releases & Updates Q/C etc Add value (review etc.)
Running a database project Other archives Database design Development DB End Users Data exchange Service Tools Production DB Service DB Genomes Genes Patents Updates Submission tools Submitters Data Distrib. Releases & Updates Q/C etc Add value (review etc.)
Running a database project Other archives Database design Development DB End Users Data exchange Service Tools Production DB Service DB Genomes Genes Patents Updates Submission tools Submitters Data Distrib. Add value (computation) Releases & Updates Q/C etc Add value (review etc.)
EMBL Relational Schema Sequence Info Reference Info Location Info Taxonomy Info Feature Info
Data Access and Use • Network services • Sequence Retrieval System (SRS)integrating and linking the main nucleotide and protein databases plus many specialized databases • Database releases are produced quarterly- via FTP (inc. mirror sites) and CD-ROM • Daily and cumulative updates via FTP • Sequence search servers
April 2003: TrEMBL 23.4 + SWISS-PROT 41.2 • 829,111 TrEMBL entries • 123,721 SWISS-PROT entries • weekly production of a non-redundant and comprehensive protein sequence database consisting of SWISS-PROT, TrEMBL, and TrEMBLnew: ftp.ebi.ac.uk/pub/databases/sp_tr_nrdb/
Goals • High level of annotation • Minimal redundancy • High level of integration with other databases • Complete and up-to-date • Availability
Automatic annotation of TrEMBL • Data-mining to extract conditions from InterPro • Extract SWISS-PROT reference entries fulfilling the conditions • Extract common annotation • Store conditions and common annotation in RuleBase • Group TrEMBL by conditions • Add common annotation to TrEMBL InterPro SWISS-PROT TrEMBL RuleBase
UniProt NREF50 UniProt NREF90 UniProt NREF100 UniProt Knowledgebase: TrEMBL + SWISS-PROT Literature Based Annotation Automated Annotation Classification UniProt Archive DDBJ/ EMBL/ GenBank SWISS-PROT Other Data… Patent Data TrEMBL RefSeq PIR EnsEMBL PDB
Funding • EMBL • European Commission • NIH • Industrial licenses • MRC • IUPHAR
SWISS-PROT, TrEMBL, InterPro, etc, at EBI and SIB • Group leaders: Rolf Apweiler, Amos Bairoch • Co-ordinators:Wolfgang Fleischmann, Henning Hermjakob, Michele Magrane, Maria-Jesus Martin, Nicola Mulder, Claire O’Donovan, Manuela Pruess • Annotators/curators:Philippe Aldebert, Andrea Auchincloss, Kirsty Bates, Marie-Claude Blatter Garin, Brigitte Boeckmann, Silvia Braconi Quintaj, Paul Browne, Evelyn Camon, Danielle Coral, Elisabeth Coudert, Tania de Oliveria Lima, Kirill Degtyarenko, Sylvie Dethiollaz,Ann Estreicher, Livia Famiglietti,Nathalie Farriol-Mathis,Stephanie Federico, Serenella Ferro, Gill Fraser, Raffaella Gatto, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Janet James, Florence Jungo, Vivien Junker,Youla Karavidopoulou, Maria Krestyaninova, Kati Laiho, Minna Lehvaslaiho, Karine Michoud, Virginie Mittard, Madelaine Moinat, Sandra Orchard, Sandrine Pilbout, Sylvain Poux, Sorogini Reynaud, Catherine Rivoire, Bernd Röchert, Michel Schneider,Christian Sigrist, Andre Stutz,Shyamala Sundaram, Michael Tognolli,Sandra van den Broek, Bob Vaughan, Eleanor Whitfield • Programmers:Daniel Barrell, David Binns, Michael Darsow, Ujjwal Das, Eduardo de Castro, Alexander Fedotov, Astrid Fleischmann, Elisabeth Gasteiger, Alain Gateau, Andre Hackmann, Ivan Ivanyi, Eric Jain,Alexander Kanapin, Paul Kersey,Ernst Kretschmann, Corinne Lachaize, Chris Lewington, Xavier Martin, John Maslen, Peter McLaren, Rupinder Singh Mazara, Lorna Morris, John O’Rourke, Isabelle Phan, Astrid Rakow, Kai Runte, Florence Servant, Allyson Williams, Dan Wu • Research staff: Kristian Axelsen, Pierre-Alain Binz, Nicolas Hulo, Anne-Lise Veuthey • Clerical/secretarial assistance: Veronique Mangold, Claudia Sapsezian, Margaret Shore-Nye, Veronique Verbegue • Students: Pavel Dobrokhotov, Alexandre Gattiker, various MCF, etc