1 / 56

Strategies to Enhance the Utility of Data in ImmPort

Strategies to Enhance the Utility of Data in ImmPort. Barry Smith http://ontology.buffalo.edu/smith. pipeline. discover, aggregate, analyze, data in ImmPort. perform study & collect data. process & de-identify, data in ImmPort. submit data to ImmPort. analyze data

zoey
Download Presentation

Strategies to Enhance the Utility of Data in ImmPort

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Strategies to Enhance the Utility of Data in ImmPort Barry Smith http://ontology.buffalo.edu/smith

  2. pipeline discover, aggregate, analyze, data in ImmPort perform study & collect data process & de-identify, data in ImmPort submit data to ImmPort analyze data (SAS …)

  3. Pipeline Max & Mindy Northrop Grumman PIs, hospitals, biostatisticians

  4. The problem • too many incompatible standards and terminologies at all stages in the pipeline • results in poorer quality of data available for analysis – requiring considerable manual effort • as more studies come online this will get worse

  5. Training and Strategy Workshop for Rho Federal http://ncorwiki.buffalo.edu/index.php/ImmPort

  6. Rho participants • David Ikle(Chief, Biostatistics, Rho Federal): Database Creation and Data Analysis Processes at Rho Federal • John Lim and Karen Kesler: Views from Rho of ImmPortSubmission Process • Jeff Abolafia: CDISC and CDASH standards in Rho ~ 20 biostatisticians and data managers at Rho Federal External participants • Ravi Shankar, Barry Smith, Jeff Wiser from BISC • Anna Maria Masci (Duke University): On submission of data to ImmPort for the Multiscale System Immunology project • Lindsay Cowell (UT Southwestern): Immunology Ontology

  7. The solution(s) • Post-coordination • Pre-coordination

  8. Pre- vs. Post-coordination Max & Mindy Northrop Grumman PIs, hospitals, biostatisticians, Rho …

  9. Post-coordination = arms-length enhancement of data PIs, hospitals, biostatisticians, Rho … Northrop Grumman Max & Mindy uniform standards applied post hoc

  10. Post-coordination = arms-length enhancement of data PIs, hospitals, biostatisticians, Rho … Northrop Grumman Max & Mindy Lots of free text, local formats, local standards, local terminologies operating here LEAVE AS IS uniform standards applied post hoc

  11. Advantages: BISC controls all ImmPort data issuesDisadvantages: BISC bears all costs of data processing; data are divorced from source PIs, hospitals, biostatisticians, Rho … Northrop Grumman Max & Mindy free text protocols, local formats, local standards, local terminologies Lots of free text, local formats, local standards, local terminologies operating here uniform standards applied post hoc

  12. Pre-coordination apply uniform standards alreadyhere Northrop Grumman Max & Mindy PIs, hospitals, biostatisticians, Rho …

  13. Advantages: higher quality data for integration and analysis; lower costs to BISCDisadvantages: increased costs to data providers;which uniform standards will they accept? which ones should they accept? same uniform standards applied across the whole pipeline Northrop Grumman Max & Mindy PIs, hospitals, biostatisticians, Rho … (=data providers)

  14. Multiple moving parts PIs, hospitals, biostatisticians, Rho … Max & Mindy Northrop Grumman

  15. Multiple time scales PIs, hospitals, biostatisticians, Rho … Max & Mindy Northrop Grumman *CDISC effort initiated 1997 †Medidata Rave only now being adopted by Rho

  16. For Rho Federal CDISC / FDA are of secondary importance But they may adopt CDISC standards nonetheless, for the sake of uniformity, and because they may need to use Medidata Currently use of standards by Rho Federal is: • uncoordinated across different studies • involves standards of varying quality • is inefficient (costs money) • involves considerable post-coordination (e.g. of the sort used to package data for sending to ImmPort)

  17. Goal of the Rho meeting • Devise strategy to optimize Rho-BISC collaboration • Rho has to pre-coordinate for ImmPort • If Rho can use ImmPort templates already in its day-to-day operations, this will make submission to Immport more effective and potentially improve quality of data along the whole pipeline --> Need for collaborative development of some standards, libraries and ontologies

  18. Standards Example: Visit days

  19. ITN Data Flow Cytometry data (yellow) Study Protocol, Operational data, Clinical data (blue) HLA data (purple) PCR data (green) Specimen Management Data (green)

  20. What is in a visit name? (ITN) Day 0 0 Visit 0 v 0 Transplant v0

  21. CRF Assays Kit Report Specimen Table Database Schedule of Events ImmunoTrak Tube Table What is in a visit name?Visit 0, v0, v 0, 0, Day 0, Transplant CRO Day 0, Transplant Core Labs v0 Data Center Assay Group 0 Protocol Group Tube Manufacturer v 0 0 Cimarron v0, Visit 0 Operations Group v 0

  22. Mappings between protocol, lab tests and mechanistic assays were missing Lab Tests ( Study Time collected) Allergy Score ( Study Collection Day) Microarray Data ( Only Visit ) Flow ( Collection_Study_day and Visit)

  23. ImmPort Templates How specify “Subject Phenotype”?

  24. ImmPort Adverse Event template Problems Runs together terms with what they describe ‘severity reported’ vs ‘severity preferred’ ‘outcome reported’ vs ‘outcome preferred’ Are there definitions?

  25. Immport Adverse Event template Ontology Ontology term proposals contributed by Yongqun He

  26. Clinical Activities Library (from ITN, via Ravi)

  27. Which standards do we need for mechanistic assays? Anna Maria Masci Department of Immunology Duke University

  28. Standards needed for bench work • Purpose of the experiment • Model (in vivo: animal or in vitro: cell, protein etc.) • Method type (DNA sequencing, ELISA, in vivo microscopy) • Method specification (treatment, incubation time, instrument used) • Data format ( Excel file, image ) • Output (List of entities, OD value, fluorescence value)

  29. Standards needed for statistical analysis • Data type: qualitative or quantitative • Normalization: Removal during data analysis of non-biological variations such as instrument variability, experimental protocol changes, and reagent changes • Population • Variable • Outcome • Statistical test

  30. Experimental methodology ontology ASSAY INPUT OUTPUT TRASFORMATION ORGANISM, CELL, DNA, DRUG, REAGENT TARGET , TREATMENT, INSTRUMENT DATA FORMAT PROCESSED DATA

  31. CSFE staining : • Input: • Organism: mouse • Cell: Naïve B cell • Reagent: CarboxyfluoresceinSuccinimidyl ester • Transformation • Target assay: cell cytosol • Reagent: carboxy- fluorescein diacetate, succinimidyl ester • Cell treatment: none • Instrument: FACS • Output: • Data Type: Facs histograms • Processed data: number of cell divisions

  32. Need for supplementary ontology content to support design of ImmPort templates that can be useful already to Rho workflow • allow high quality interoperable standards which can • keep pace with current research • advance discoverability of ImmPort data by third parties • allow high-powered analysis by Max and Mindy • Examples: • planned Antibody Ontology to support automatic analysis of CyTOF results • Ontology for Biomedical Investigations (OBI)

  33. ImmPort Antibody Registry (Diehl, et al) from BD Lyoplate Screening Panels Human Surface Markers

  34. Ontology of Biomedical Investigations3rd WorkshopImmPort Richard H. Scheuermann 29 JAN 2007

  35. Semantic Query • Find all experiments in which IL2 mRNA levels were quantified • Infer that IL2 mRNA is analyte and SAGE, QPCR and microarrays are appropriate measurement techniques • Find all experiment samples that include samples from subjects with diseases like Type 1 diabetes • Infers that the source of the biological sample used must be a human subject with Type 1 diabetes mellitus, Grave’s disease or other autoimmune diseases of endocrine glands

  36. Applications of OBI to Functional Genomics Data Annotation and Integrative Tools for Protozoan Parasite Research JieZheng & Chris Stoeckert Center for Bioinformatics University of Pennsylvania School of Medicine 2011 San Diego OBI workshop

  37. EuPathDB is a NIAID Bioinformatics Resource Center covering Eukaryotic Parasites EuPathDB: a portal to eukaryotic pathogen databases.Aurrecoechea C, Brestelli J, Brunk BP, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer ET, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Srinivasamoorthy G, Stoeckert CJ Jr, Thibodeau R, Treatman C, Wang H.Nucleic Acids Res. 2010

  38. Ontology-based Representation of Isolate Data The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box.

  39. Support multiple sequences submission Isolate Submission Form

  40. Ontology-based Representation of Genetic Manipulation with Resulting Phenotype Data Use OPL for annotation The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box. Ontology for Parasite Lifecycle (OPL) will be used in the annotation of life cycle stage

  41. Genetic Manipulation Section

  42. Phenotype Section Cellular location Biological process Question: What relation should use to link the quality (PATO: organismal quality) such as: PATO: lethal to biological process such as GO: growth

  43. Original strategy Rho is intMedidata Rave as its Clinical Trial Management Platform BISC will convince Rho and Medidata to adopt high-quality, computable ontologies of the sort which will enable automatic export of source data into ImmPort This strategy will not work because Medidata is tied to CDISC (CDASH, ODM, ADaM ...), geared to FDA statistical analysis pipelines Result: much of CDISC content is packaged in ways not conducive to secondary analysis

  44. CDASH – Clinical Data Acquisition Standards Harmonization Uses the Operational Data Model (ODM) [XLM dialect] designed to facilitate the archive and interchange of the metadata and data for clinical research, its power being fully unleashed when data are collected from multiple sources. http://www.cdisc.org/stuff/contentmgr/files/0/f968ea2a3bdad76eb3e23e3c4978fff4/misc/odm1_3_1_final.htm Medidata Rave uses ODM

  45. ODM kubjs http://www.cdisc.org/stuff/contentmgr/files/0/fa3021351c086aeaaef00cd17feaef58/misc/cdash_std_1_1_2011_01_18.pdf http://www.cdisc.org/stuff/contentmgr/files/0/3f998d957905d7ed83b0bbeff9822f7a/misc/cdash_ug_1_1_1_2012_04_12_final.pdf http://www.cdisc.org/stuff/contentmgr/files/0/919cb4ef843829170d470b37eb662aeb/misc/odm1_3_0_final.htm http://www.cdisc.org/stuff/contentmgr/files/0/464923b10ea16b477151fcaa9f465166/misc/define_xml_2_0_releasepackage20140424.zip http://www.cdisc.org/stuff/contentmgr/files/0/3f998d957905d7ed83b0bbeff9822f7a/misc/cdash.odm_updated.xml

  46. CDASH

  47. CDASH somehow closed to members, but there is this from NCI: http://evs.nci.nih.gov/ftp1/CDISC/SDTM/CDASH%20Terminology.pdf

More Related