380 likes | 495 Views
ChEBI: The story so far. Paula de Matos. Private Data. Public Data. The state of affairs of bioinformatics in 2002. Bioinformatics is booming Human Genome sequence rough draft published June 2000 Free resources and free data. A different story for chemoinformatics.
E N D
ChEBI: The story so far Paula de Matos
Private Data Public Data ChEBI: The story so far
The state of affairs of bioinformatics in 2002 • Bioinformatics is booming • Human Genome sequence rough draft published June 2000 • Free resources and free data ChEBI: The story so far
A different story for chemoinformatics • Private data and private software ChEBI: The story so far
Too hard to solve… lets put our head in the sand ChEBI: The story so far
Bioinformatics data too large to keep track of chemical compounds • 100000 Protein entries in SwissProt (2002) • 20 million entries in EMBL Database (2002) • Small databases unable to keep track • ENZYME resources ~ 3500 enzymatic reactions ChEBI: The story so far
New initiatives start up • PubChem • Chemical repository, millions of entries, focus on screening assays • ChEBI • Manually annotated database, nomenclature reference and compound database, tens of thousands of entries ChEBI: The story so far
2002 2003 2004 2005 2006 2007 2008 Principles of foundation • December 2002 email exchanges within the EBI to address the issue of chemistry • Three principles outlined ChEBI: The story so far
“Nothing held in the database must be proprietary or derived from a proprietary source that would limit its free distribution/availability to anyone.” ChEBI: The story so far
“Every data item in the database should be fully traceable and explicitly referenced to the original source/version.” ChEBI: The story so far
“Although the EBI will provide a web interface, the entirety of the data should be available to all without constraint as, for example, SQL table dumps, ASCII tables, and XML (e.g. DAML+OIL)” ChEBI: The story so far
2002 2003 2004 2005 2006 2007 2008 We make a start using existing resources • Integratethree resources • KEGG Compound • IntEnz • Chemical Ontology • Annotation starts summer 2003 • Focus on nomenclature ChEBI: The story so far
2002 2003 2004 2005 2006 2007 2008 Our first release was modest but it was a start • 21 July 2004 • 2783 annotated entities • Data: • ChEBI Name, ChEBI Id • IUPAC Names, Synonyms • Formula • Cross-references ChEBI: The story so far
2002 2003 2004 2005 2006 2007 2008 We introduce structures - Sep 2005 • Molfiles • InChI (IUPAC International Chemical Identifier) • SMILES (Simplified Molecular Input Line Entry System) • Image (PNG) ChEBI: The story so far
Marvin in ChEBI ChEBI: The story so far
2002 2003 2004 2005 2006 2007 2008 We start editing the chemical ontology – Dec 2005 ChEBI: The story so far
2002 2003 2004 2005 2006 2007 2008 Internationalisation of web pages – March 2006 ChEBI: The story so far
2002 2003 2004 2005 2006 2007 2008 Internationalisation of data – Feb 2008 ChEBI: The story so far
2002 2003 2004 2005 2006 2007 2008 Web Services - Oct 2006 • Programmatic access to a ChEBI entry • SOAP based Java implementation • Clients currently available in Java and perl • Four methods with which to access data • getLiteEntity • getCompleteEntity • getOntologyParents • getOntologyChildren ChEBI: The story so far
2002 2003 2004 2005 2006 2007 2008 Automated Cross References – Aug 2007 Current Databases: UniProtKB, Reactome, BioModels, IntAct, SABIO-RK, PubChem and ArrayExpress ChEBI: The story so far
2002 2003 2004 2005 2006 2007 2008 Chemical Structure Searching – May 2008 ChEBI: The story so far
After all this, where are we? ChEBI: The story so far
Annotation is linear ChEBI: The story so far
Number of web hits grows • Total pure entry hits in April: 42,612 / 273,219 • Total web services hits in April: 88,226 • Web hits for 2007: ChEBI: The story so far
D I V E R S I T Y Diversity of users Constant challenge of balancing our users' varied interests. ChEBI: The story so far
Our positives • Nomenclature database • Manually annotated data • Attention to detail • Free and accessible • Loyal users ChEBI: The story so far
Our not so positives • Size for some people • Not well integrated into other bioinformatics resources • Community interaction • No software publicly available to manipulate the database ChEBI: The story so far
Involve the community • Create a submission web based tool • Users can easily submit their entities on a one to one basis • Also allowing bulk submission from other resources. ChEBI: The story so far
Improvements to data depth • Addition of more Xrefs: PDB, MACIE ??? • Addition of more chemical attributes? What chemical attributes? • Text mining projects to extract relevant chemical information from patents, journals • European Patent Office ChEBI: The story so far
Going Open Source • Commercial software packages will be replaced with Open Source • Long term goal: allow people to create a free local instance of ChEBI • Distribution of data in useful formats: CML, SDF ChEBI: The story so far
Proposed changes to the ontology • New relationships • “Is disjoint from” ChEBI: The story so far
Is alloprote of succinate(2−) CHEBI:30031 succinic acid CHEBI:15741 Is alloprote of ChEBI: The story so far
Has biological role Has biological role and Has application ChEBI: The story so far
CHEBI:15422 C10H16N5O13P3 CHEBI:16027 C10H14N5O7P CHEBI:16761 C10H15N5O10P2 Encourage use of ChEBI nomenclature • Currently working with the Swiss Institute of Bioinformatics building a database of biochemical reactions called Rhea • All reactions mapped to ChEBI EC 2.7.4.3 “ATP + AMP = 2 ADP” ChEBI: The story so far
Acknowledgements • IntEnz Team • Rafael Alcantara, Volker Ast, Kristian Axelsen, Anne Morgat • EPO Collaborators • Helene Courrier, Stephane Nauche, Jeremy Parsons • Database supporters • ArrayExpress, IntAct, Reactome, SABIO-RK, RSC, GO, RESID etc… • ChEBI Team • Paula de Matos, Kirill Degtyarenko, Marcus Ennis, Janna Hastings, Christoph Steinbeck • Alumni • Michael Darsow, Mickael Guedj, Alan McNaught, Martin Zbinden • ChEBI supporters • Rolf Apweiler, Michael Ashburner, Henning Hermjakob, Janet Thornton ChEBI: The story so far
Discussion Points Data Depth Community New Relationships Encourage Nomenclature ChEBI: The story so far