80 likes | 180 Views
Data Science for the National Big Data R&D Initiative. Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/
E N D
Data Science for the National Big Data R&D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup October 5, 2014
Background • I received an email that called attention to this important Federal Register Notice and the Knowledge Base for this story evolved to include: • Request for Input (RFI)-National Big Data R&D Initiative • Email and Web • The National Big Data R&D Initiative • PDF • GovLoop (Web and PDF) • The Journey to Big Data & Analytics Adoption • The Foundation For Data Innovation: The Enterprise Data Hub Report • Access and Use of NASA and Other Federal Earth Science Data • Email and Wiki • Big Data in Materials Research and Development • Web and PDF
Data Science Knowledge Base in MindTouch Data Science for the National Big Data R&D Initiative
Data Science Data Publication:EPA Open Data Policy Inventory • Project Open Data to Find EPA Public Excel File • Start at: • http://project-open-data.github.io/ • Scroll down to find Example Data Hubs: • http://project-open-data.github.io/data-hubs/ • Scroll down to find Environmental Protection Agency: • http://www.epa.gov/data/ • See EPA's Public Excel file and download it: • http://www2.epa.gov/sites/production/files/2013-12/usepa-pdl4odp-nov-2013-final.xlsx • Answers to Data Science Publication Questions: • How was the data collected? • We do not know yet? • Where is the data stored? • Excel Spreadsheet • What are the results? • Not Provided – We will do some analytics in Spotfire • Why should we believe them? • We do not know yet?
EPA Public Excel File My Note: These links are not visible See next slide. http://semanticommunity.info/@api/deki/files/31049/usepa-pdl4odp-nov-2013-final.xlsx
EnviroAtlas: Link to Summary My Note: Link to Resource Not Available https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7b0068AF7C-DE6C-4510-806A-BC35F333FE24%7d
Data Science Analytics in Spotfire Web Player
Some Results and Next Steps • Semantic Community has provide a Response to the Request for Input (RFI)-National Big Data R&D Initiative citing the Federal Big Data Working Group Meetup as an example. • We have built a knowledge base of examples: government (NASA/EPA Workshop), academia (NAS Workshop), and industry (GovLoop - Cloudera and IBM) to highlight a key differentiator, namely data science data publications that answer four key questions. • We have used the Open Data Agency Data Hubs to find and use the EPA Public Excel File to illustrate this. • The result is that EPA Public Excel File is insufficient to answer the four questions to produce a Data Science Data Publication, but the analytics have led us to the EPA EnviroAtlas “big data” sets (209) that can be used when the data have been reformatted for ease of use.