200 likes | 343 Views
CEBS SysBio: Systeomics Object Model. The collection, interpretation, and storage of information about gene and protein activity in order to identify toxic substances in the environment, and to help treat people at the greatest risk of diseases caused by environmental
E N D
The collection, interpretation, and storage of information about gene and protein activity in order to identify toxic substances in the environment, and to help treat people at the greatest risk of diseases caused by environmental pollutants or toxicants. Definition of Toxicogenomics NIEHS National Center for Toxicogenomics (NCT) • Toxicogenomics is a Both a Discovery and Hypothesis-Driven Science • Seeks Mechanistic Understanding of Toxic Response and Disease Progression • Eschews Reliance on Narrowly Focused (i.e., Gene by Gene) Approach • Generate Novel Hypotheses Via Global Gene and Protein Expression Profiling • Obtain High Level View of a Biological System • View Composed of Several Biological Domains (RNA, DNA, PPI, etc…) Requires Systems Biology or “Systeomics” Approach
CEBS Systeomics Technologies • Transcriptomics • Microarray • Proteomics • 2D Gel Electophoresis – MS – DB Search (i.e., MASCOT) • Forward and Reverse Phase Protein Microarrays • Metabonomics • Nuclear Magnetic Resonance (NMR) • GC/MS or LC/MS
CEBS Knowledge Base Data Challenges • Integrate Systems Biology (“Systeomics”) and Toxicology Data • Capture Variety of Systeomics Data • Knowledge Base Paradigm Requires Highly Inclusive Experiment Context • Acquisition of Content Requires Facile Data Exchange with External Sources • Scope of Data Population and Data Mining Task CEBS Challenges and SysBio-OM Objectives SysBio-OM Requirements Extensible OM Enabling Precise and Accurate Modeling of Toxicology Data Unify Data Representation—Several Systeomics Technologies, One Object Model Must Model All Annotation and Data Required to Provide Full Experiment Context Standards-Based Design to Facilitate Data Exchange Leverage Existing Standards and Software Frameworks Foster Software Reuse and Facile Capture of Full Experiment Context
Hides Data Source Specifics—RDBMS, Text or XML Files,URL, etc… • Enables Specialization of Tasks—Not Everyone Must be DBA • Access Data Via Domain Concepts, Not Data Source Queries What Is An Object Model? Gene • Pathway • Protein • Sequence • SNP Java Implementation
Currently No Standard Encompassing All Systeomics Domains Leveraging Existing Standards Transcriptomics • Minimal Information About a Microarray Experiment (MIAME) • Microarray Gene Expression Object Model (MAGE-OM) • Microarray Gene Expression Markup Language (MAGEML) Proteomics • Protein Expression Data Repository (PEDRo) • Protein Expression Markup Language (PEML) Metabonomics No Known Standards Extend One or More Existing Standards for SysBio-OM?
Very Expansive—Captures Full Microarray Experiment Context • Abstracts Experimental Details via Hierarchical Object Model • Models Several Technologies—Microarray, SAGE, and CGH • Incorporates Standard Ontologies via MGED Ontology Effort MAGE-OM: MIAME Software Implementation
Highly Accurate and Precise Modeling of Proteomics Workflow PEDRo: Proteomics Object Model 2D Gel Analysis 1D Gel Analysis Image Scanning (Molecular Dynamics, BioRad) Image Analysis (PDBQuest, BioRad) Spot Picking or Cut Bands (Manual, ProPic) Digestion of Gel Piece MS or MS-MS Analysis Protein Identification
PEDRo Focused Solely on Specifics of Proteomics PEDRo: Proteomics Object Model Sample Processing Sample Generation Gel Gel1D Gel2D Column (LC) Sample Origin Organism Sample MS MS Results Analysis MSExperiment MALDI Electrospray Peak List ToF DBSearch Peptide Hit Protein Hit Protein
Reality Check on Existing Standards MAGE-OM Expansive—Models Full Transcriptomics Experiment Context Extensible—Captures Microarray, SAGE, and CGH Data XML Implementation (MAGEML)—Prospect for Facile Data Exchange Industry Standard for Capturing and Exchanging Microarray Data MGED Ontology—Controlled Vocabulary Greatly Enhances Queries Not Applicable to Non-Array Expression Data PEDRo Accurately and Precisely Models Proteomics Workflow Models Non-Array Protein Expression Experiments XML Implementation (PEML)—Potential Data Exchange Mechanism Not Expansive—Does Not Model Full Proteomics Experiment Context Overall Strategy: Extend MAGE-OM by Integrating PEDRo
Integrated PEDRo Via Additional Classes and Packages Implementation of SysBio Design Strategy Additional Package Additional Classes
Biomaterial: PEDRo Extension Capture PEDRo Elements in Additional Biomaterial Classes Output of One Analysis Step is Input To Next (i.e., 2D Gel to MS) Data Must Be Linked to Each Resulting Biomaterial (Sample)
BioAssay: PEDRo Extension Capture PEDRo Elements in Additional BioAssay Classes Account for Non-Array BioAssay Classes
CommonBioAssayData: Proteomics Package Models Non-Array Proteomic and Metabonomic Data
SummaryData: Proteomics Package Enables Integrated, Cross-Technology View of Expression Data
Protocol: PEDRo-Related Subclasses Required Protocols Per PEDRo Component of SysBio
CEBS Implementationof SysBio-OM CEBS SysBIO-OM Stk Extends MAGEStk CEBS Services Updates for OJB Persistence Common API for data access CEBS SysBIO-OM APIs with OJB Fields Compile and deploy to CEBS Application Server CEBS Application Server CEBS Relational Database OJB Relational Mapping MAGE XMI Generated from Rose
CEBS SysBio-OM Implementation Highlights MAGEstk Code Generator Reuse of MGED Open Source MAGEstk API MAGEML Generation and Consumption Built into API Automated Code Generation to Support Model Updates OJB (Object Relational Bridge) Apache Software Foundation (ASF) Open Source Project Abstract Object Model From Persistence Framework Persistence Broker—High Performance, Flexible API Highly Configurable via XML Files Available ODMG and JDO APIs Reuse and Extend Open Source Software with Strong Communities
Extension of SysBio to Implement MIAME/Tox SysTox: SysBio for Toxicogenomics Toxicological Experiment Design Descriptions Full Toxicology-Specific Sample Annotation Description of Toxicological Assessments Clinical Pathology and Clinical Chemistry Data and Descriptions Textual Toxicological Endpoint Data and Descriptions Analogous to MAGE-OM Implementation of MIAME
Acknowledgements SAIC NCT Denny Chan Scott Gustafson * Nick Xiao Sandhya Xirasagar * John Yost Alex Merrick Stan Stasiewicz Ken Tomer Mike Waters Scripps Paradigm Genetics John Yates (SAIC Consultant) Susan Sumner CEBS OBJECT MODEL FOR SYSTEMS BIOLOGY DATA, CEBS MAGE SysBio-OM Bioinformatics, Submitted