240 likes | 390 Views
Harnessing Health.Data.gov Data to Address Diabetes in the US. Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ April 17, 2013
E N D
Harnessing Health.Data.gov Data to Address Diabetes in the US Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ April 17, 2013 http://semanticommunity.info/Health_Datapalooza_IV#Health.Data.gov
Background • HealthData.gov and Health Datapalooza III Knowledge Base and Data Ecosystem: • Two Published Stories, Two Spreadsheets, and Two Spotfire Dashboards. My Note: HealthData.gov 194 Data Sets in 2012 and 399 now in 2013. • Health Datapalooza IV Technology Development Track: • Knowledge Graph, Metadata, RPI Watson, Bootcamp, and Linked Data. See Next Slide • My Process: • Harness Data for Diabetes Knowledge Base • Data Ecosystem Spreadsheet • Data Ecosystem Spotfire • My Results: • Story • Slides • Spotfire Dashboard • Research Notes
HealthData.gov and Health Datapalooza III Knowledge Base http://semanticommunity.info/HealthData.gov
HealthData.gov and Health Datapalooza III Spotfire Data Ecosystem https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?HealthData.gov-Spotfire
Health Datapalooza IV Technology Development Track • Open Health Knowledge Graphs: • This session will describe healthdata.gov platform components, including new functionality that programmatically exposes tabular and graph-oriented data. • Lifting Schemes: • We will describe the ‘bottom up’ automation tools and techniques employed in the winning submission for the healthdata.govMetadata Domain Challenge. • Open Government Data: • We will present emerging solution standards and transitioning academic technologies, including innovative work conducted by the ‘Watson’ research group at Rensselaer Polytechnic Institute on using Watson as a ‘data advisor’. • Health Industry Bootcamp - A Real-World Crash Course: • An interactive, games-based bootcamp designed to get participants up and running the same day with their own real-world portfolio covering how to use public data to create market value, how to navigate perverse incentives in the industry, and how to deliver public and social good. • Cooperation Without Coordination: Managing Distributed Clinical Trial Data: • TBA See http://health.data.gov/cqld/ and http://reference.data.gov/cqld/about.html • Linked Data – Structured Data on the Web: • TBA See http://sw.appliedinformaticsinc.com/fct/facet_doc.html http://healthdatapalooza.org/agenda/tech-development-track/
Vocab.Data.gov: Government Data Vocabulary http://vocab.data.gov/gd
Health Data Platform Metadata Challenge Mirrored http://hub.healthdata.gov to improve the CKAN-metadata and RDF. Created three levels of metadata for http://healthdata.gov datasets. Created a set of ontologies to link several datasets from HealthData.gov. http://www.health2con.com/devchallenge/health-data-platform-metadata-challenge/ http://www.healthdata.gov/blog/domain-challenge-1-metadata
IBM Watson at RPI My Note: See Our Semantic Medline Work with New Cray Graph Computer. • What is Watson?: • The underlying “DeepQA” architecture is designed to find the meaning behind a question posed in natural language and deliver a single, precise answer. • IBM’s Watson goes to school: A Q&A with RPI’s Jim Hendler: • A version of the system similar to the one used on “Jeopardy!” will be housed at RPI for three years as part of a Shared University Research Award from IBM Research. The system at RPI will have 15 terabytes of hard disk storage and give 20 users access to the system simultaneously, making it, according to a release, "an innovation hub” for the campus. • One thing we want to explore is how Watson can interact with social media, especially things such as “tweets” where the language is not as carefully constructed as it is in the documents Watson has used in the Jeopardy game. • I run a group that does a lot of work with Open Government Data systems (like the US data.gov) and we’re excited about the possibility of using Watson to help researchers around the world find relevant government data and documents for their work. • Our goal for the next few years is to gain an understanding of what having the new ways of bringing unstructured data and documents into our computational lives will be. http://watson.rpi.edu/
Health.Data.gov My Note: Promotes the Diabetes Challenge, But Does Not Provide Much Data For It! http://www.healthdata.gov/
Health.Data.gov: Search for Diabetes My Note: Found One Data Set and Downloaded Two Excel Files and Added Them to the Diabetes Ecosystem Spreadsheet. See Slide 18. http://www.healthdata.gov/dataset/search/diabetes http://statesnapshots.ahrq.gov/snaps09/allStatesallMeasures.jsp?menuId=63&state=
HealthData.gov Catalog Hub My Note: 402 datasets instead of 399. My Note: Found Same State AQHR Snapshots and CDC WONDER Births. See Next Slide. http://hub.healthdata.gov/
HealthData.gov Catalog Hub: CDC WONDER Births http://hub.healthdata.gov/dataset/wonder-births
HealthData.tw.rpi.edu Catalog Hub: CDC WONDER Births “We mirrored the http://hub.healthdata.gov CKAN instance using its API to our own instance at http://healthdata.tw.rpi.edu/hub. This allowed us to both improve the CKAN-based metadata, including adding Data Dictionaries and Technical Documentation as Resources, and to improve the RDF generated by CKAN.” Source: Health Data Platform Metadata Challenge Source: See Next Slide http://healthdata.tw.rpi.edu/hub/dataset/wonder-births-1
CDC WONDER: Natality Information Live Births My Note: Data Description contains Maternal Risk Factors: Diabetes - Yes, No, Not Stated, Not Reported. My Note: A Data Access Agreement is Required. http://wonder.cdc.gov/natality.html
CDC WONDER: Natality Data Live Births - Diabetes http://wonder.cdc.gov/controller/datarequest/D66;jsessionid=A7C4A365FB2F877955A61D7BF9C5EC5C
CDC WONDER: Natality Data Live Births - Diabetes My Note: Export to Text File And Remove Metadata and Import to Spreadsheet. http://wonder.cdc.gov/controller/datarequest/D66;jsessionid=A7C4A365FB2F877955A61D7BF9C5EC5C
Harness Health.Data.gov Data to Address Diabetes in the US Knowledge Base My Note: Did not find CAHMI! My Note: Only found one! http://semanticommunity.info/Health_Datapalooza_IV#Health.Data.gov
Diabetes Data Ecosystem Spreadsheet http://semanticommunity.info/@api/deki/files/23811/Diabetes.xlsx
NHQR State Snapshots 2009 https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AHRQFocusonDiabetes-Spotfire.dxp
AHRQ State Snapshots Conclusion • Getting started on quality improvement is not an easy task. One strategy a State may find helpful is to identify other States with populations similar to those targeted for a quality improvement effort. For example, a State seeking to improve rates of pneumonia vaccination for people discharged from hospitals may want to model its efforts on those of a State that has previously implemented an improvement program in this area and demonstrated success. • In many cases, the greatest value in comparison may lie in identifying States that have started from relatively low performance and made incremental improvements. The State with the greatest improvements may have the most to contribute in demonstrating to other States how to encourage delivery system change that improves quality of care. http://statesnapshots.ahrq.gov/snaps09/interpretation.jsp?menuId=67&state=AL#conclusion
AHRQ Quality of Care for Diabetes by Region and State for 2005-2006 by Conditions https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AHRQFocusonDiabetes-Spotfire.dxp
CDC WONDER Births Natality Diabetes https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AHRQFocusonDiabetes-Spotfire.dxp
Diabetes Data Ecosystem Spotfire My Note: Can See All the Data Sets and Their Data Elements To Do Joins, Mappings, and Rule-Driven Visualizations. https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AHRQFocusonDiabetes-Spotfire.dxp
Conclusions and Recommendations • A Health.Data.gov search for “diabetes” gives only one data set. A Search of HealthData.gov Catalog Hub gives two data sets. • The Health Datapalooza IV Technology Development Track Objectives Are Shown in This Work. • I prefer both human-readable and machine-readable metadata instead of just the later which I find at the HealthData.gov Catalog Hub. • Next is First Lady Michelle Obama on Exercise and Dr. Amen on Natural Supplements Data in Preventing and Treating Diabetes.