1 / 36

Content, Format, and Standards in Genomics Scale Data

Content, Format, and Standards in Genomics Scale Data. The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT. Outline. Why do we need a database for toxicogenomics How is it envisioned that this will be developed What are the issues for such a database

maia
Download Presentation

Content, Format, and Standards in Genomics Scale Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT

  2. Outline • Why do we need a database for toxicogenomics • How is it envisioned that this will be developed • What are the issues for such a database • Who is involved in such development The ILSI – EBI Collaboration

  3. Traditional Biology One tree at a time

  4. “Omic” Biology Forests and Mountains

  5. Biological Explanation INFORMATICS ? Experiment Challenge of Genomics • “It’s the informatics, period!” • And it’s awfully tempting to take shortcuts!

  6. Why do we need a database? • Volume of data • Traditional endpoints per animal • <20 histopathology observations • <10 gross measurements (e.g. weights, food) • <25 serum measurements • <10 urinalysis measurements • Genomic endpoints per animal • 5,000-10,000 transcripts !!!

  7. Why do we need a database?(cont) • Influence of technology details • Influence of probe sequence • Many genes are “alternatively spliced” – such events may not be detected unambiguously by a microarray

  8. Influence of Probe Sequence Most arrays target this region of the mRNA!

  9. Why do we need a database?(cont) • Influence of technology details • Influence of probe sequence • Many genes are “alternatively spliced” – such events may not be detected unambiguously by a microarray • For cDNA arrays, probes may hybridize to more than one sequence • A database that captures probe sequence is required to resolve discrepancies through automated bioinformatics

  10. How are databases being developed? • Microarray Gene Expression Data Society - MGED Society • MIAME - Minimum Information About a Microarray Experiment • “the minimum information that should be reported about a microarray experiment to enable its unambiguous interpretation and reproduction” • Essentially, what should go into the database

  11. How are databases being developed? • MIAME – Basic Areas • Experiment Design • Samples used, extract preparation and labeling • Hybridization procedures and parameters • Measurement data and specifications • Array Design

  12. How are databases being developed? (cont) • MGED Society • MAGE • Programming conventions and data structures to communicate Microarray Gene Expression data • MAGE-OM Object Model • MAGE-ML Markup Language • Essentially, how the data is exchanged/ how the database is constructed

  13. How are databases being developed? (cont) • MGED Society • Ontology working group • Ontologies provide a vocabulary for representing and communicating knowledge about a topic,allowing interpretation and use by computers • MGED Ontology will provide standard terms for the annotation of microarray experiments, allowing: • structured queries • unambiguous descriptions of experiments

  14. How are databases being developed? (cont) • MGED Society • Data Transformation and Normalization Working Group • Standards for recording how microarray data are transformed and normalized.

  15. What are the issues for a toxicogenomics database? • Scope of the ILSI effort: • Genotoxicity Group • 10 array platforms • 11 compounts • >2 time points, up to 10 doses / compound • Nephrotoxicity Group • 6 array platforms • 3 compounds, 260 animals

  16. What are the issues for a toxicogenomics database? • Scope of the ILSI effort: • Hepatotoxicity Group • 8 array platforms • 2 compounds, 144 animals • 2 in-life studies / compound • ALL Groups • Analysis of each sample at multiple sites

  17. What are the issues fortoxicogenomics databases? (cont) • Traditional toxicology endpoints are not currently covered by MAGE, MIAME, or the MGED Ontologies! • Organ weights • Clinical pathology • Histopathology • Etc

  18. What are the issues for toxicogenomics databases? • Traditional toxicology endpoints are not standardized in nomenclature • Clinical pathology/chemistry • AACC • IUPAC • Histopathology • STP • WHO/IARC/RITA • NACAD • SNOMED • NTP, TDMS Database Pathology Code Table

  19. Who is involved in database development • Private Companies • Genelogic, Iconix, Curagen • MSU - dbZach • NIEHS - CEBS • NCTR - ArrayTrack • ILSI - EBI

  20. ILSI-HESI and EBI collaboration • Establishment of database for toxicogenomics data • Capture, store and analyse gene expression data produced from many different toxicogenomic experiments, conducted in several different laboratories worldwide by the ILSI-HESI members • Interrogate the gene array dataintegrating information from genomic, experimental and toxicological domains • Gain knowledge of possible links between gene expression changes and toxicological endpoints

  21. ILSI-HESI and EBI collaboration • Aims of the database and tools • Provide a way to integrate the different domains • Control the annotation to achieve data harmonization • Centralize the information to ease data access and data sharing • Improve array annotations as the genome assemblies are released • ALLOW data comparison

  22. ILSI-HESI and EBI collaboration • Main challenge • Getinternally consistent data to allow comparability among the experiments and run complex queries across and within domains • Note= Experiments conducted in ~40 different sites, using different array platforms and terminologies, measuring parameters with different units and storing information in different format !

  23. ILSI-HESI and EBI collaboration • ‘Simple’ question: • “Does gene X expression goes up after treatment with compound Y with biological endpoint Z in experiments from ILSI-HESI members A and B ?” • ‘Not simple’ question: • “Which are the most reproducible gene expression changes (and the quantitative measure of this reproducibility) for all experiments on the rat arrays, with biological endpoint X, and which functional category these genes belong to and which are the human homologues ? ”

  24. EMBL-EBI Toxico- genomics NIEHS-NCT ILSI-HESI MIAME/Tox • An international effort aiming to • Share expertise • Encourage harmonization • Promote standardization initiative • A call for community participation!

  25. MIAME/Tox objectives • Standard contextual information • Establish worldwide scientific consensuson the minimal information descriptors for array-based toxicogenomics experiments • Data harmonization • Encourage use of controlled vocabularies for the toxicological assessments • Data integration and data sharing • Linkdata within a study • Link several studies from one institution • Exchangedatasets among institutions • Data storage • Facilitate development of MIAME/Tox compliant data management softwares and databases - ArrayExpress @ EBI and CEBS @ NIEHS-NCT

  26. MIAME/Tox document • Promote standard contextual information • Defining the core common to most experiments - Minimum/sufficient information • Structured information • Promote data harmonization, data capture and communication • MIAME/Tox is based on MIAME • Focus on toxicological domain • Sample treatment and conventional toxicologyinformation - Clinical pathology, pathology, histopathology……

  27. MIAME/Tox document • Available at the MGED Society and ILSI-HESI web sites • Circulate for consensus • Toxicogenomics, pharmacogenomics and ecotoxicogenomics communities - Regulatory bodies • MGED Meeting (AAAS, Denver, Feb 2003; MGED6, France, Sept 2003) - Toxicology societies (SOT Meeting, Salt Lake City, March 2003) • Review and publish • Work closely with the MGED working groups • Ontology working group • Identify controlled vocabularies for toxicological metadata

  28. Data Input As a Key Step • Capture data in a standard manner • Tox-MIAMExpress • Store information domains in database • ArrayExpress • Compare/query across and within domains

  29. Protocols • Conventional toxicology tests • Microarray experiments Tox-MIAMExpress

  30. Tox-MIAMExpress • Array designs • A set of procedures for formatting the array design information into a standard referencing format (ADF) • A set of procedure to re-annotate or up date the array designs via a link to another database at EBI (EnsMart)

  31. Tox-MIAMExpress • Experiment • Experiment design, quality controls, publications • Sample source and treatment • Conventional toxicology tests data • Microarray hybridizations data

  32. Tox-MIAMExpress

  33. Tox-MIAMExpress

  34. Tox-MIAMExpress

  35. ILSI-HESI and EBI collaboration • Status: • Interface and database infrastructure developed • Data input ongoing

  36. Acknowledgments • Microarray Informatics Team at EBI, in particular • Alvis Brazma (Team Leader and MGED Society President) • Susanna-Assunta Sansone • Philippe Rocca-Serra (Data Management) • NIEHS-NCT and NTP • ILSI-HESI EBI Steering Committee • ILSI-HESI Genomics Committee

More Related