420 likes | 430 Views
Data, Data Everywhere, But Not a Byte to Eat. Michael F. Huerta, Ph.D. Associate Director, National Library of Medicine Director, Office of Health Information Programs Development BRDI/NAS 2/26/13. Biomedical Research Enterprise - Today. Lots and lots of data – in individual labs.
E N D
Data, Data Everywhere, But Not a Byte to Eat Michael F. Huerta, Ph.D. Associate Director, National Library of Medicine Director, Office of Health Information Programs Development BRDI/NAS 2/26/13
Biomedical Research Enterprise - Today • Lots and lots of data – in individual labs
Biomedical Research Enterprise - Today • Lots and lots of data – in individual labs • Few data broadly available to research community • Exceptions: genomic, human subject autism, particular research initiatives (e.g., ADNI, Human Connectome Project)
Biomedical Research Enterprise - Today • Lots and lots of data – in individual labs • Few data broadly available to research community • Exceptions: genomic, human subject autism, particular research initiatives (e.g., ADNI, Human Connectome Project) • For much of biomedical research enterprise • Major public products: concepts in scientific papers, not data • Biomedical research is concept-centric, not data-centric
Biomedical Research Enterprise - Tomorrow • Liberated data - increase data sharing
Biomedical Research Enterprise - Tomorrow • Liberated data - increase data sharing • Advances in relevant data science and data tools
Biomedical Research Enterprise - Tomorrow • Liberated data - increase data sharing • Advances in relevant data science and data tools • Ways to make data • Discoverable • Useful to others • Citable • Linked to scientific literature
Biomedical Research Enterprise - Tomorrow • Liberated data - increase data sharing • Advances in relevant data science and data tools • Ways to make data • Discoverable • Useful to others • Citable • Linked to scientific literature • Greater prominence of data in science & scholarship
Today Tomorrow
NIH Big Data to Knowledge Initiative for Research Data Today Tomorrow
NIH Big Data to Knowledge (BD2K) • Data and Informatics Working Group (DIWG) of the Advisory Committee to the Director of NIH • D DeMets & L Tabak
NIH Big Data to Knowledge (BD2K) • Data and Informatics Working Group (DIWG) of the Advisory Committee to the Director of NIH • D DeMets & L Tabak • Recommendations for NIH Research Data:
NIH Big Data to Knowledge (BD2K) • Data and Informatics Working Group (DIWG) of the Advisory Committee to the Director of NIH • D DeMets & L Tabak • Recommendations for NIH Research Data: • Sharing & Standards • Tools • Workforce
NIH Big Data to Knowledge (BD2K) • Data and Informatics Working Group (DIWG) of the Advisory Committee to the Director of NIH • D DeMets & L Tabak • Recommendations for NIH Research Data: • Sharing & Standards • Tools • Workforce • Implementation Groups • Eric Green (Acting AssocDirof NIH for Data Science) • Sharing & Standards M Huerta & J Larkin • Tools(Software Development) V Bonazzi & J Couch • Tools+(Centers) L Brooks, M Huerta, P Lyster & B Seto • Workforce M Dunn
Sharing & Standards • Policies to increase data sharing and change the culture • Changes will liberate data
Sharing & Standards • Policies to increase data sharing and change the culture • Changes will liberate data • Frameworksfor community-based standards efforts • Standards make data useful • Community-base promotes their use
Sharing & Standards • Policies to increase data sharing and change the culture • Changes will liberate data • Frameworksfor community-based standards efforts • Standards make data useful • Community-base promotes their use • Catalog of data set information research ecosystem • Discoverable, citable, and linked to the literature
Sharing & Standards • Policies to increase data sharing and change the culture • Changes will liberate data • Frameworksfor community-based standards efforts • Standards make data useful • Community-base promotes their use • Catalog of data set information research ecosystem • Discoverable, citable, and linked to the literature Each adds value
Sharing & Standards • Policies to increase data sharing and change the culture • Changes will liberate data • Frameworksfor community-based standards efforts • Standards make data useful • Community-base promotes their use • Catalog of data set information research ecosystem • Discoverable, citable, and linked to the literature Each adds value Synergy together
NIH Data Catalog: A Use Case An NIH-funded investigator
NIH Data Catalog Just before submitting a scientific paper to a journal, investigator uploads minimal info about the data set to the NIH Data Catalog
NIH Data Catalog Minimal info includes: -Authors proper credit for data -Data descriptors (controlled) efficient search -Data locus, availability & way to access sharing
NIH Data Catalog Minimal info includes: -Authors proper credit for data -Data descriptors (controlled) efficient search -Data locus, availability & way to access sharing Upload generates: -Data publication citation -Data Unique IDentifier (DUID)
NIH Data Catalog Data Unique IDentifier (DUID) is sent to the investigator
NIH Data Catalog Investigator submits manuscript to the scientific journal - with DUID in abstract & data publication cited in manuscript
NIH Data Catalog Journal paper is published & indexed in PubMed
NIH Data Catalog PubMed pulls DUID from abstract as a separate data element in the PubMed citation
NIH Data Catalog Data publication is also sent to PubMed for indexing
NIH Data Catalog PubMed also pulls DUID from data publication as an element of PubMed citation
DUID now in PubMed citations of both the scientific publication & the data publication forming a 2-way link NIH Data Catalog
NIH Data Catalog PubMed uses same data descriptors as data publication for indexing data publication
NIH Data Catalog Use of same controlled Terms in catalog and PubMed provides discoverabilityof info about data sets
NIH Data Catalog DUIDs, citations of data publications & scientific publications can be used in NIH administrative systems
Bringing Data into the Research Ecosystem • Data more available (policies) & useful (standards)
Bringing Data into the Research Ecosystem • Data more available (policies) & useful (standards) • Data sets are discoverable: • Same descriptors of data sets used in data catalog are used as index and search terms in PubMed
Bringing Data into the Research Ecosystem • Data more available (policies) & useful (standards) • Data sets are discoverable: • Same descriptors of data sets used in data catalog are used as index and search terms in PubMed • Data sets are citable: • NIH Data Catalog produces citable data publications • Citability + proper credit incentives related to data
Bringing Data into the Research Ecosystem • Data more available (policies) & useful (standards) • Data sets are discoverable: • Same descriptors of data sets used in data catalog are used as index and search terms in PubMed • Data sets are citable: • NIH Data Catalog produces citable data publications • Citability + proper credit incentives related to data • Data sets are linked with the literature • Common search & retrieval approach for scientific publications and data publications through PubMed • Use of DUID for direct, two-way linkage
Bringing Data into the Research Ecosystem • Data more available (policies) & useful (standards) • Data sets are discoverable: • Same descriptors of data sets used in data catalog are used as index and search terms in PubMed • Data sets are citable: • NIH Data Catalog produces citable data publications • Citability + proper credit incentives related to data • Data sets are linked with the literature • Common search & retrieval approach for scientific publications and data publications through PubMed • Use of DUID for direct, two-way linkage • Information in ecosystem - use by NIH and 3rd parties • Trend analysis, etc.