580 likes | 713 Views
How to make ImmPort data fit for secondary use. Barry Smith http://ontology.buffalo.edu/smith. Goals of ImmPort. Accelerate a more collaborative and coordinated research environment Create an integrated database that broadens the usefulness of scientific data
E N D
How to make ImmPort data fit for secondary use Barry Smith http://ontology.buffalo.edu/smith
Goals of ImmPort • Accelerate a more collaborative and coordinated research environment • Create an integrated database that broadens the usefulness of scientific data • Advance the pace and quality of scientific discovery • Integrate relevant data sets from participating laboratories, public and government databases, and private data sources • Promote rapid availability of important findings • Provide analysis tools to advance immunological research
Improve immunology research through enhanced • Collaboration • Coordination • Discoverability • Integration • Analyzability Hypothesis: all of these ends will be promoted by describing ImmPort data using terms from shared high quality ontologies
ImmPort data is already being tagged with ontology terms For example • where data is prepared to meet FDA requirements • where data is published to meet NIH mandates for reusability • in the post-submission phase, where data is analyzed by third parties But this tagging is • partial • uncoordinated • uses ontologies and analysis tools of varying quality
SDY 165: Characterization of in vitro Stimulated B Cells from Human Subjects shared to Semi-Public Workspace (SPW) Project
SDY 165: Characterization of in vitro Stimulated B Cells from Human Subjects shared to Semi-Public Workspace (SPW) Project During the human B cell (Bc) recall response, rapid cell division results in multiple Bc subpopulations. RNA microarray and functional analyses showed that proliferating CD27lo cells are a transient pre-plasmablast population, expressing genes associated with Bc receptor editing. Undivided cells had an active transcriptional program of non-ASC B cell functions, including cytokine secretion and costimulation, suggesting a link between innate and adaptive Bc responses. Transcriptome analysis suggested a gene regulatory network for CD27lo and CD27hi Bc differentiation. • In vitro stimulated B cells from human subjects • B cell receptor editing
SDY 165: Characterization of in vitro Stimulated B Cells from Human Subjects shared to Semi-Public Workspace (SPW) Project
Discoverability: examples • Find [ImmPort] data pertaining to in vitro stimulated B cells from human subjects • Find studies of genes associated with B cell receptor editing in human subjects • Find all data in public and government databases relating to B cell receptor editing
Discoverability through literature search Two queries: • In vitro stimulated B cells from human subjects • B cell receptor editing on • Pubmed • MeSH (Medical Subject Headings) • Google
PubMed retrieves 144 results for “In vitro stimulated B cells from human Subjects” – Zand paper not found
PubMed retrieves 0 results for “Zand[Author] AND In vitro stimulated B cells from human subjects”
Pubmed retrieves 179 results for “B cell receptor editing” – Zand paper not found
MeSH results for “In vitro stimulated B cells from human subjects”
MeSH results for “in vitro stimulated B cells from human subjects”
Google retrieves 180 results for “In vitro stimulated B cells from human subjects” – Zand paper not found
How to make this [ImmPort data] SDY 165: Characterization of in vitro Stimulated B Cells from Human Subjects shared to Semi-Public Workspace (SPW) Project During the human B cell (Bc) recall response, rapid cell division results in multiple Bc subpopulations. RNA microarray and functional analyses showed that proliferating CD27lo cells are a transient pre-plasmablast population, expressing genes associated with Bc receptor editing. Undivided cells had an active transcriptional program of non-ASC B cell functions, including cytokine secretion and costimulation, suggesting a link between innate and adaptive Bc responses. Transcriptome analysis suggested a gene regulatory network for CD27lo and CD27hi Bc differentiation. discoverable?
B cell receptor editing GO:0002452
GO definition GO provides a definition
and position in GO hierarchy -- hierarchy allows logical reasoning
(B cell receptor editing Zand) AND ("Zand"[au]) why are zero documents retrieved?
Proposal1. Tag ImmPort SDY abstracts with GO URIs2. Publish the results to the GO Annotation database During the human B cell recall response, rapid cell division results in multiple B cell subpopulations. RNA microarray and functional analyses showed that proliferating CD27lo cells are a transient pre-plasmablast population, expressing genes associated with B cell receptor editing. Undivided cells had an active transcriptional program of non-ASC B cell functions, including cytokine secretion and costimulation, suggesting a link between innate and adaptive Bc responses. Transcriptome analysis suggested a gene regulatory network for CD27lo and CD27hi Bc differentiation.
But GO is not enough See http://ncorwiki.buffalo.edu/index.php/ Immunology_Ontologies immune disorders infectious diseases allergies immune epitopes, etc. etc. For special case of Flow Cytometry and CyTOF: ImmPort Ontology Meeting, Stanford, September 4-5, 2013: http://x.co/1W1Om
lk_race.txt American Indian or Alaska Native Asian Black or African American Native Hawaiian or Other Pacific Islander Not_Specified Other Unknown White
ImmPort Templates https://immport.niaid.nih.gov/immportWeb/experimental/displaySubmitTemplates.do
ImmPort Templates: Race https://immport.niaid.nih.gov/immportWeb/experimental/displaySubmitTemplates.do
ImmPort Templates How specify Race if Race = ‘Other’?
ImmPort Templates How specify “Subject Phenotype”?
NG / BISC proposal create controlled vocabularies (ontology drop down lists) for fields currently populated by submitters with free text
lk_sample_type proposal: where controlled vocabularies exist, provide definitions for all terms
Two kinds of definitions • human readable definitions support consistency of data entry • logical definitions • allow logical analysis of data • support aggregation of data • allow automatic validation of consistent data entry Definitions can often be taken over from already existing public domain ontologies such as GO • use of ready-made definitions supports discoverability, and creates automatic linkage to huge bodies of public domain data
ImmPort Antibody Registry (Diehl, et al) from BD Lyoplate Screening Panels Human Surface Markers
CDISC • Clinical Data Interchange Standards Consortium • http://www.cdisc.org/
SDTM • Study Data Tabulation Model developed by FDA as part of CDISC • for Race, Gender, Ethnicity, … • no human readable definitions • no logical definitions Jan 2013: release of CDISC SDTM Model by CDISC2RDF (Kerstin Forsberg of AstraZeneca)
PHUSE (EU, Roche, AstraZeneca, FDA, …) project to incorporate ontology technology into CDISC
BRIDG • http://bridgmodel.nci.nih.gov/files/BRIDG_Model_3.2_html/index.htm • Biomedical Research Integrated Domain Group (BRIDG) Project
Other strategies to simplify creation of structured data for submission into ImmPort • ELN: Electronic Lab Notebooks • PRIME: “Contur ELN has been automating the process of data deposition into ImmPort, making it much easier for our researchers to submit data to ImmPort” • CTMS: Clinical Trial Management Systems • EHR: Electronic Health Records • experiments to prepopulate EHR data into CTMS and from there into case report forms (and into ImmPort?) • Minimal Information Checklists
MIFLOWCYT: Minimal Information for a Flow Cytometry Experiment