1 / 30

FDA Data Innovation Lab Visualization Gallery

FDA Data Innovation Lab Visualization Gallery. Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/

colton-wise
Download Presentation

FDA Data Innovation Lab Visualization Gallery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FDA Data Innovation Lab Visualization Gallery Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup October 1, 2014

  2. Overview • Data Science White House Big Data Review & Brooke Aker: Big Data Lens on OpenFDA • July 7thMeetup • Meeting with Dr. TahaKass-Hout, FDA’s First Chief Health Informatics Officer (CHIO) • September 23rd • HHS Idea Lab Demo Day with Bryan Sivak and Dr. TahaKass-Hout • September 30th • FDA Data Innovation Lab and Predictive Analytics Meetup • October 6thMeetup • Data Science, Data Infrastructure, & Data Publications for the HHS IDEA Lab • December 1stMeetup

  3. HHS Ignite Application • Proposal Questions: • Executive Summary (Your Elevator Pitch) [ 500 characters ] • What’s the problem you’ve identified? [ 2000 characters ] • What’s your proposed solution? What do want to accomplish within the 3 months? [ 1000 characters ] • Who is your target end-user / customer? [ 75 characters ] • Is there any other information you’d like us to know about? (optional) [ 500 characters ] Source: http://www.hhs.gov/idealab/what-we-do/hhs-ignite/

  4. HHS Ignite Evaluation Process • The Scoring Criteria and Selection Process: • The project’s importance to the Office, Agency and/or Department [20 points] • The potential impact of the proposed solution. [40 points] • The proposal’s understanding and explanation of the problem that needs to be solved. [20 points] • The proposal’s understanding of the customers that the project serves. [20 points] • Teams submitting the top proposals will be asked to present and discuss their project with members of the HHS Innovation Council. • The Council will make recommendations to the Secretary who will make the final selection.

  5. OpenFDA • OpenFDA, a new initiative to provide unprecedented access to FDA data and highlight projects in the public and private sector that use these data to further scientific research, educate the public, and save lives. • OpenFDAis an initiative of FDA’s Office of Informatics and Technology Innovation to provide a new level of access to a number of public high-value FDA datasets via RESTful APIs and structured raw file download. Currently, the project is in an early-development stage, with an alpha release of two datasets planned for spring 2014 and a larger public release later in the year. Additionally, openFDA will provide a platform for the community to interact with each other and FDA domain experts with the goal of spurring innovation around FDA data and creating new partnerships and opportunities between the public and private sector (BOLDING BY ME). • Presidential Innovation Fellow: Sean Herron is a Presidential Innovation Fellow serving at FDA sean.herron@fda.hhs.gov | @seanherron http://www.hhs.gov/idealab/innovate/openfda/

  6. FDA's Path Forward for Open Data and Next Generation Sequencing • Utility NGS (Next Generation Sequencing) in the Internet cloud: FDA is facing growing NGS needs for processing internal genome sequencing data as well as the NGS data from industry submissions. The NGS initiative is planning and developing a cloud-base Big Data platform and analytics for robust, secure and controlled data storage, analysis, and collaboration and potentially sharing public-access genome sequencing information. • NGS is a Big Data Initiative. https://open.fda.gov/update/fda-path-forward-for-open-data-and-next-generation-sequencing/

  7. Data Science Data Publications forBig Data Analytics • New Government Data Science Best Practices: • Digital Government Strategy • Open Research Data Policy • Agency: HHS IdeaLab, NIH Data Commons, FDA Innovation Lab • White House NITRD Big Data Initiative and NSF Agency Strategic Plan: Data Science, Data Infrastructure, and Data Publications • New Government Data Science Publication Examples: • Federal Data Center Consolidation 2014 • Performance.gov • FDA Data and FDA Data Innovation Lab • National Science Board Science & Engineering Indicators

  8. Data Science Data Publications for FDA:Data Science Data Mining Process • Recall OpenFDAKnowledge Base for previous visualization and analytics: • Brooke Aker, Biplab Pal, and Brand Niemann. • Mined HealthData.gov for FDA data and built linked data spreadsheets (17) for Spotfire: • See next slides. • Mined FDA Site Map for data: • Found Two: Data Standards and FDA Drug Approvals & Databases. • Downloaded and inventoried files (41) (ZIP, CSV & XLS) for Spotfire. • Used for FDA Data Innovation Lab Visualization Gallery.

  9. Data Science for FDA DataExcel Spreadsheet Data Ecosystem • FDA @ HealthData.gov • Summary FDA • FDA Site Map • FDA-TRACK • FDA Glossary • FDA-TRACK Research Glossary • FDA Drug Approvals & Databases • Summary All • Holdren Memo Agencies • HealthData.gov Subject 09172014 • HealthData.gov Agency 09172014 • HealthData.gov Date 09172014 • HealthData.gov Year 09172014 • HealthData.gov Period 09172014 • HealthData.gov Spatial 09172014 • HealthData.gov Start 09172014 • HealthData.gov Media 09172014 http://semanticommunity.info/@api/deki/files/30746/HHSFDA.gov.xlsx?origin=mt-web

  10. Data Science Data Publication:FDA Data in Spotfire • Cover Page-Performance Analytics: FDA TRACK • Most programs do not have a Strategic Plan! • Content Analytics: Summary Statistics • Of the 5 HHS agencies that come under the Holdren Memo, CDC and FDA have by far the most and almost equal number of data sets! • Content Analytics: HealthData.gov Statistics 09172014 • See how few of these data sets are in readily useable media! • Content Analytics: FDA @ HealthData.gov • A Dashboard to the FDA Dashboards! • Network Analytics: FDA Glossary & Site Map • The FDA Site Map and Glossary as a Linked Data Network! • Data Analytics: FDA Drug Approvals & Databases • The FDA Site Map and Glossary as a Linked Data Network!

  11. Cover Page-Performance Analytics: FDA TRACK My Note: Most programs do not have a Strategic Plan! Web Player

  12. FDA Data Innovation Lab Visualization Gallery:Spreadsheet Inventory My Note: My Note: Inventory to prioritize further data science data publication work! This inventory is updated as one drills down into the data sets! http://semanticommunity.info/@api/deki/files/30746/HHSFDA.gov.xlsx?origin=mt-web

  13. FDA Data Innovation Lab Visualization Gallery:File Folder My Note: Some folders contain multiple files!

  14. Suggestions • Help the FDA Data Innovation Lab with data publication gallery and wall posters. • Help the FDA Data Innovation Lab with their Open Data Lab Day. • Organize Joint Meetups and promote use of the FDA Data Innovation Lab. • Help form Data Science Teams to work on FDA big data problems.

  15. SEMOSS FDA Analytics • This video shows how SEMOSS can be used to analyze adverse drug reaction data from the Food and Drug Administration’s Adverse Event Reporting System. This database includes information on demographics, drugs, reactions, roles, outcomes and more. This is very useful data but even the FDA admits that “users of these files need to be familiar with creation of relational databases…A simple search of AERS/FAERS data cannot be performed with these files by persons who are not familiar with creation of relational databases.” In other words, this data is freely available but people can’t use it or analyze it very easily. In this video we show how we can easily ask questions of this data and arrive quickly at insights. These data and visualizations will be useful for patients, doctors, health administrators and other medical professionals.  • http://blog.semoss.org/2013/11/fda-analytics.html • My Note: I would like to try to reproduce these results with Spotfire, but as you will see the FAERS_ASCII_2013Q4 data set is not good.

  16. Edward Tufte:How to Create Trust in the Agency Pitch Process • "The Visual Display of Quantitative Information" is one of the most successful self-published books in history. • Tufte is generally considered to the biggest thinker in the uber-trendy field of data visualization. • He thinks The Guardian is the best-designed newspaper and that The New York Times does the best visualizations by far. • He says: • Good content reasoners and presenters are rare, designers are not. • PowerPoint should be used solely as a projector operating system to show 100% content, without the "chartjunk" and "chartoons“. • At NASA, where PowerPoint trumped rocket science -- and the Columbia Accident Investigation Board agreed with and published my analysis in their final Report. • Presenters need (1) to tell a coherent story and (2) to convince their audience of their credibility.  Not a cherry-picker, but a master of detail. • Graphics are at their best for really large data sets, as in sparklines for time series and NASA's photographs of the Earth. • Sensibly-designed tables usually outperform graphics for data sets under 100 numbers. • I think designers and marketers greatly underestimate their audiences. http://adage.com/article/adagestat/edward-tufte-adagestat-q-a/230884/

  17. Cover Page: Spotfire Explanations: I inventoried, downloaded, unzipped, and imported the files in the table below into Spotfire and tried to understand the data and its usefulness. Web Player

  18. bmis: Spotfire No Column Names!

  19. CLIL: Spotfire Some Have No Column Names!

  20. drls_reg: Spotfire Three firms clearly stand out here!

  21. Drugsatfda:Spotfire Lots of data here, but not sure of its information value.

  22. EOBZIP_2013_09_16 This data set is unintelligible to me!

  23. EOBZIP_2013_09_16 (2): Spotfire These data sets have single columns or missing column definitions!

  24. FAERS_ASCII_2013Q4: Spotfire These data sets also have single columns or missing column definitions!

  25. pmc: Spotfire Most of the pmc commitments are pending (849 or 2,113) and have been received in 2013-2014.

  26. Data Exchange Standards Catalog: Spotfire The standard terminology code sets are listed in a separate tab of this worksheet. Please look at the "Terminology" tab to find standard terminology information. This is what GENIS is about.

  27. FDA Data Spreadsheets • Fraudulent H1N1 Products List 2009 • 185 rows and 6 columns • Hydrolyzed Vegetable Protein Products List 2010 • 177 rows and 9 columns • Infant Formula Recall List 2010 • 2173 rows and 8 columns • Milk Recall Products 2009 • 286 rows and 9 columns • Peanut Butter Products 2009 • 3918 rows and 9 columns • Pistachio 2009 • 662 rows and 9 columns • Shell Eggs Recall List 2010 • 94 rows and 8 columns Comment: These are 2009-2010. More recent?

  28. Fraudulent H1N1 Products List 2009:Spotfire Most of the products were approved, etc. content on the Web and Supplements

  29. FDA ndc: Spotfire The ndc package.txt and product.txt data sets were visualized in separateSpotfire file and showed that HUMAN PRESCRIPTION DRUG (43,458 out of 83,167 rows, and most of the STARTMARKETINGDATEwere after the Year 2000. Web Player

  30. Conclusions • We have participated in Meetups & Demos to understand the OpenFDA Data & the HHS Ignite Application & Evaluation Criteria. • We have created an FDA Data Innovation Lab Visualization Gallery. There are some problems with the FDA Data sets. • We are creating Data Science Data Publications for FDA using the Data Science Data Mining Process. • Semantic Community has a platform for the community to interact with each other and FDA domain experts with the goal of spurring innovation around FDA data and creating new partnerships and opportunities between the public and private sector.

More Related