560 likes | 796 Views
Strategies to Enhance the Utility of Data in ImmPort. Barry Smith http://ontology.buffalo.edu/smith. pipeline. discover, aggregate, analyze, data in ImmPort. perform study & collect data. process & de-identify, data in ImmPort. submit data to ImmPort. analyze data
E N D
Strategies to Enhance the Utility of Data in ImmPort Barry Smith http://ontology.buffalo.edu/smith
pipeline discover, aggregate, analyze, data in ImmPort perform study & collect data process & de-identify, data in ImmPort submit data to ImmPort analyze data (SAS …)
Pipeline Max & Mindy Northrop Grumman PIs, hospitals, biostatisticians
The problem • too many incompatible standards and terminologies at all stages in the pipeline • results in poorer quality of data available for analysis – requiring considerable manual effort • as more studies come online this will get worse
Training and Strategy Workshop for Rho Federal http://ncorwiki.buffalo.edu/index.php/ImmPort
Rho participants • David Ikle(Chief, Biostatistics, Rho Federal): Database Creation and Data Analysis Processes at Rho Federal • John Lim and Karen Kesler: Views from Rho of ImmPortSubmission Process • Jeff Abolafia: CDISC and CDASH standards in Rho ~ 20 biostatisticians and data managers at Rho Federal External participants • Ravi Shankar, Barry Smith, Jeff Wiser from BISC • Anna Maria Masci (Duke University): On submission of data to ImmPort for the Multiscale System Immunology project • Lindsay Cowell (UT Southwestern): Immunology Ontology
The solution(s) • Post-coordination • Pre-coordination
Pre- vs. Post-coordination Max & Mindy Northrop Grumman PIs, hospitals, biostatisticians, Rho …
Post-coordination = arms-length enhancement of data PIs, hospitals, biostatisticians, Rho … Northrop Grumman Max & Mindy uniform standards applied post hoc
Post-coordination = arms-length enhancement of data PIs, hospitals, biostatisticians, Rho … Northrop Grumman Max & Mindy Lots of free text, local formats, local standards, local terminologies operating here LEAVE AS IS uniform standards applied post hoc
Advantages: BISC controls all ImmPort data issuesDisadvantages: BISC bears all costs of data processing; data are divorced from source PIs, hospitals, biostatisticians, Rho … Northrop Grumman Max & Mindy free text protocols, local formats, local standards, local terminologies Lots of free text, local formats, local standards, local terminologies operating here uniform standards applied post hoc
Pre-coordination apply uniform standards alreadyhere Northrop Grumman Max & Mindy PIs, hospitals, biostatisticians, Rho …
Advantages: higher quality data for integration and analysis; lower costs to BISCDisadvantages: increased costs to data providers;which uniform standards will they accept? which ones should they accept? same uniform standards applied across the whole pipeline Northrop Grumman Max & Mindy PIs, hospitals, biostatisticians, Rho … (=data providers)
Multiple moving parts PIs, hospitals, biostatisticians, Rho … Max & Mindy Northrop Grumman
Multiple time scales PIs, hospitals, biostatisticians, Rho … Max & Mindy Northrop Grumman *CDISC effort initiated 1997 †Medidata Rave only now being adopted by Rho
For Rho Federal CDISC / FDA are of secondary importance But they may adopt CDISC standards nonetheless, for the sake of uniformity, and because they may need to use Medidata Currently use of standards by Rho Federal is: • uncoordinated across different studies • involves standards of varying quality • is inefficient (costs money) • involves considerable post-coordination (e.g. of the sort used to package data for sending to ImmPort)
Goal of the Rho meeting • Devise strategy to optimize Rho-BISC collaboration • Rho has to pre-coordinate for ImmPort • If Rho can use ImmPort templates already in its day-to-day operations, this will make submission to Immport more effective and potentially improve quality of data along the whole pipeline --> Need for collaborative development of some standards, libraries and ontologies
Standards Example: Visit days
ITN Data Flow Cytometry data (yellow) Study Protocol, Operational data, Clinical data (blue) HLA data (purple) PCR data (green) Specimen Management Data (green)
What is in a visit name? (ITN) Day 0 0 Visit 0 v 0 Transplant v0
CRF Assays Kit Report Specimen Table Database Schedule of Events ImmunoTrak Tube Table What is in a visit name?Visit 0, v0, v 0, 0, Day 0, Transplant CRO Day 0, Transplant Core Labs v0 Data Center Assay Group 0 Protocol Group Tube Manufacturer v 0 0 Cimarron v0, Visit 0 Operations Group v 0
Mappings between protocol, lab tests and mechanistic assays were missing Lab Tests ( Study Time collected) Allergy Score ( Study Collection Day) Microarray Data ( Only Visit ) Flow ( Collection_Study_day and Visit)
ImmPort Templates How specify “Subject Phenotype”?
ImmPort Adverse Event template Problems Runs together terms with what they describe ‘severity reported’ vs ‘severity preferred’ ‘outcome reported’ vs ‘outcome preferred’ Are there definitions?
Immport Adverse Event template Ontology Ontology term proposals contributed by Yongqun He
Which standards do we need for mechanistic assays? Anna Maria Masci Department of Immunology Duke University
Standards needed for bench work • Purpose of the experiment • Model (in vivo: animal or in vitro: cell, protein etc.) • Method type (DNA sequencing, ELISA, in vivo microscopy) • Method specification (treatment, incubation time, instrument used) • Data format ( Excel file, image ) • Output (List of entities, OD value, fluorescence value)
Standards needed for statistical analysis • Data type: qualitative or quantitative • Normalization: Removal during data analysis of non-biological variations such as instrument variability, experimental protocol changes, and reagent changes • Population • Variable • Outcome • Statistical test
Experimental methodology ontology ASSAY INPUT OUTPUT TRASFORMATION ORGANISM, CELL, DNA, DRUG, REAGENT TARGET , TREATMENT, INSTRUMENT DATA FORMAT PROCESSED DATA
CSFE staining : • Input: • Organism: mouse • Cell: Naïve B cell • Reagent: CarboxyfluoresceinSuccinimidyl ester • Transformation • Target assay: cell cytosol • Reagent: carboxy- fluorescein diacetate, succinimidyl ester • Cell treatment: none • Instrument: FACS • Output: • Data Type: Facs histograms • Processed data: number of cell divisions
Need for supplementary ontology content to support design of ImmPort templates that can be useful already to Rho workflow • allow high quality interoperable standards which can • keep pace with current research • advance discoverability of ImmPort data by third parties • allow high-powered analysis by Max and Mindy • Examples: • planned Antibody Ontology to support automatic analysis of CyTOF results • Ontology for Biomedical Investigations (OBI)
ImmPort Antibody Registry (Diehl, et al) from BD Lyoplate Screening Panels Human Surface Markers
Ontology of Biomedical Investigations3rd WorkshopImmPort Richard H. Scheuermann 29 JAN 2007
Semantic Query • Find all experiments in which IL2 mRNA levels were quantified • Infer that IL2 mRNA is analyte and SAGE, QPCR and microarrays are appropriate measurement techniques • Find all experiment samples that include samples from subjects with diseases like Type 1 diabetes • Infers that the source of the biological sample used must be a human subject with Type 1 diabetes mellitus, Grave’s disease or other autoimmune diseases of endocrine glands
Applications of OBI to Functional Genomics Data Annotation and Integrative Tools for Protozoan Parasite Research JieZheng & Chris Stoeckert Center for Bioinformatics University of Pennsylvania School of Medicine 2011 San Diego OBI workshop
EuPathDB is a NIAID Bioinformatics Resource Center covering Eukaryotic Parasites EuPathDB: a portal to eukaryotic pathogen databases.Aurrecoechea C, Brestelli J, Brunk BP, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer ET, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Srinivasamoorthy G, Stoeckert CJ Jr, Thibodeau R, Treatman C, Wang H.Nucleic Acids Res. 2010
Ontology-based Representation of Isolate Data The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box.
Support multiple sequences submission Isolate Submission Form
Ontology-based Representation of Genetic Manipulation with Resulting Phenotype Data Use OPL for annotation The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box. Ontology for Parasite Lifecycle (OPL) will be used in the annotation of life cycle stage
Phenotype Section Cellular location Biological process Question: What relation should use to link the quality (PATO: organismal quality) such as: PATO: lethal to biological process such as GO: growth
Original strategy Rho is intMedidata Rave as its Clinical Trial Management Platform BISC will convince Rho and Medidata to adopt high-quality, computable ontologies of the sort which will enable automatic export of source data into ImmPort This strategy will not work because Medidata is tied to CDISC (CDASH, ODM, ADaM ...), geared to FDA statistical analysis pipelines Result: much of CDISC content is packaged in ways not conducive to secondary analysis
CDASH – Clinical Data Acquisition Standards Harmonization Uses the Operational Data Model (ODM) [XLM dialect] designed to facilitate the archive and interchange of the metadata and data for clinical research, its power being fully unleashed when data are collected from multiple sources. http://www.cdisc.org/stuff/contentmgr/files/0/f968ea2a3bdad76eb3e23e3c4978fff4/misc/odm1_3_1_final.htm Medidata Rave uses ODM
ODM kubjs http://www.cdisc.org/stuff/contentmgr/files/0/fa3021351c086aeaaef00cd17feaef58/misc/cdash_std_1_1_2011_01_18.pdf http://www.cdisc.org/stuff/contentmgr/files/0/3f998d957905d7ed83b0bbeff9822f7a/misc/cdash_ug_1_1_1_2012_04_12_final.pdf http://www.cdisc.org/stuff/contentmgr/files/0/919cb4ef843829170d470b37eb662aeb/misc/odm1_3_0_final.htm http://www.cdisc.org/stuff/contentmgr/files/0/464923b10ea16b477151fcaa9f465166/misc/define_xml_2_0_releasepackage20140424.zip http://www.cdisc.org/stuff/contentmgr/files/0/3f998d957905d7ed83b0bbeff9822f7a/misc/cdash.odm_updated.xml
CDASH somehow closed to members, but there is this from NCI: http://evs.nci.nih.gov/ftp1/CDISC/SDTM/CDASH%20Terminology.pdf