410 likes | 541 Views
Annotating Microarray Data with the MGED Ontology. NCI Center for Bioinformatics April 15, 2004 P. L. Whetzel, A. Pizarro, E. Manduchi, J. Liu, H. He, G. Grant, M. Mailman, C. Stoeckert Center for Bioinformatics University of Pennsylvania. Science 298:601-604, 2002.
E N D
Annotating Microarray Data with the MGED Ontology NCI Center for Bioinformatics April 15, 2004 P. L. Whetzel, A. Pizarro, E. Manduchi, J. Liu, H. He, G. Grant, M. Mailman, C. Stoeckert Center for Bioinformatics University of Pennsylvania
To compare experiments, you need some minimum information about the microarray experiments. Ivanova et al. Science 2003
Microarray Information to be Shared Figure from: David J. Duggan et al. (1999)Expression Profiling using cDNA microarrays. Nature Genetics21: 10-14
International organization Comprised of biologists computer scientists, and data analysts Aims to facilitate the sharing and evaluation of microarray data Establish standards for microarray data annotation Create microarray databases Promote sharing of high quality, well-annotated data Generalize to data generated by functional genomics and proteomics experiments MGED Society www.mged.org
MGED Standardization Efforts • MIAME • The formulation of the minimum information about a microarray experiment required to interpret and verify the results. (Brazma et al. Nature Genetics 2001) • MAGE-OM • The establishment of a data exchange format and object model for microarray experiments. (Spellman et al. Genome Biol. 2002) • MGED Ontology • The development of an ontology for microarray experiment description and biological material (biomaterial) annotation in particular. (Stoeckrt & Parkinson, Comp. Funct. Genom. 2003) • Transformations • The development of recommendations regarding microarray data transformations and normalization methods.
MGED Ontology (MO) • Purpose • Provide standard terms for the annotation of microarray experiments • Not to model biology but to provide descriptors for experiment components • Benefits • Unambiguous description of how the experiment was performed • Structured queries can be generated • Ontology concepts derived from the MIAME guidelines/MAGE-OM
MGED Ontology developmenthttp://mged.sourceforge.net/ontologies/MGEDontology.php • OILed • File formats • DAML file • HTML file • NCI DTS Browser • Changes • Notes • Term Tracker
Relationship of MO to MAGE-OM • MO class hierarchy follows that of MAGE-OM • Association to OntologyEntry • MO provides terms for these associations by: • Instances internal to MO • Instances from external ontologies • Take advantage of existing ontologies
MGED Ontology Class Hierarchy • MGED CoreOntology • Coordinated development with MAGE-OM • Ease of locating appropriate class to select terms from • MGED ExtendedOntology • Classes for additional terms as the usage of genomics technologies expand
BioMaterial OntologyEntry Main focus of MGED Ontology • Structured and rich description of BioMaterials +characteristics +associations
Desirable Microarray Queries • Return all experiments with species X examined at developmental stage Y • Sort by platform type • Which are untreated? Treated? • Treated with what compound? • How comparable are these? • What can these experiments tell me?
RAD: RNA Abundance Databasehttp://www.cbil.upenn.edu/RAD • RADis part of GUS (Genomics Unified Schema) • The GUS platform maximizes the utility of stored data by warehousing them in a schema that integrates the genome, transcriptome, gene regulation and networks, ontologies and controlled vocabularies, gene expression • Relational schema (implemented in Oracle) • Stores data from gene expression arrays and SAGE • Comes with a suite of web-annotation forms (Study-Annotator) • MAGE-RAD Translator (MR_T) generates MAGE-ML files for exports • Manduchi et al. 2004 Bioinformatics 20:452-459.
Namespace Domain Features RAD Gene Expression MIAME/MAGE-OM SRes Shared Resources Ontologies DoTS Sequence and annotation Central dogma Core Data Provenance Documentation TESS Gene regulation Grammars GUS (Genomics Unified Schema) http://www.gusdb.org
RAD Schema • About 65 tables and 30 views • Assay to Quantification tables • Study Design tables • BioMaterials tables • Platform tables • Quantification Result tables • Processing tables • Analysis Result tables • Misc tables: Protocol, Contact*, Ontologies* • Meta tables*: data privacy and for history tracking • Integrity Checks tables • * These are used by RAD, but belong to common GUS components Tables populated by the Study-Annotator
RAD Study-Annotator • Covers all relevant parts of the MIAME checklist • Exploits the MGED Ontology • Allows entering of very specific details of an experiment • Web-based forms: • Modular structure • Written in PHP • Front-end data integrity checks using JavaScript • Manages Data Privacy based on Project/Group selections present in GUS schema • Available at http://www.cbil.upenn.edu/RAD/RAD-installation.htm
RAD Study-AnnotatorLogical Flow New User Registration Login Data Preferences (Project, Group) Study Misc From Assay to Quantification Study Design BioMaterials (samples, treatments) Module III Module I Module II
Using the Ontologies new terms can be proposed OntologyEntry RAD Ontology instances propagated to annotation web forms RAD Study-Annotator MGED Ontology Anatomy DevelopmentalStage Disease Lineage PATOAttribute Phenotype Taxon SRES MGED Ontology ExternalDatabases
Sources of New Terms in OntologyEntry • MGED Ontology • Continued development of new classes and terms • Shared Resources (SRes) • Contains controlled vocabularies and ontologies • External Database Sources • Annotated term provided by user
Adding New Terms Add term from SRes 1 Add term from External Database 2
Future Issues • Burning Issues • Developing MO in synch with related efforts (MAGE-OM v.2.0) • Use/presentation in annotation forms • Coverage of other technologies and biological domains • Flame retardant structure • ExtendedOntology • Space to add new classes, terms and their relationship to one another
A Functional Genomics View A. Jones et al. submitted
A Functional Genomics Object Model (FGE-OM) • Separate out common components from technology-specific ones • Allow new domains to be added as new modules to the model • Incorporate ideas from SysBio-OM (Xirasgur et al. Bioinformatics in press) Jones et al. Bioinformatics in press
Microarray Standards MIAME MAGE-OM MGED Ontology Proposed Development of FGE-OM Informal specification Formal specification Strong type system Immutable type system Proteomics Standards Pedro MIAPE-OM FGE-OM MIAPE Pedro Functional Genomics Standards MIAME MIAME-Tox MIAPE FGE-OM MGED Ontology Use Cases
Acknowledgements • MGED Ontology Working Group • Chris Stoeckert, Trish Whetzel (Penn) • Helen Parkinson (EBI) • Joe White (TIGR) • Gilberto Fragoso, Liju Fan, Mervi Heiskanen (NCI) • Many others!