260 likes | 413 Views
Build Your Own Data.gov and EPA Microsite with Semantics and Statistics in the Cloud. Brand Niemann May 15, 2010 http://semanticommunity.net/. Disclaimer: These slides do not reflect the views of the U.S. Environmental Protection Agency
E N D
Build Your Own Data.gov and EPA Microsite with Semantics and Statistics in the Cloud Brand Niemann May 15, 2010 http://semanticommunity.net/ Disclaimer: These slides do not reflect the views of the U.S. Environmental Protection Agency and does not constitute endorsement by the EPA of the standards or products mentioned.
Preface • First there was: • Put Your Desktop in the Cloud to Support the Open Government Directive and Data.gov/semantic, April 19, 2010, Semantic Universe. • A Semantic Cloud Computing Desktop/Mobile Apps with Linked Open Data consists of the following: • A database of "things" referenced by URL's (e.g. Twitter); • A free Wiki (Deki Express) that was a "fork" from MediaWiki that evolved to a platform (web-services with a wiki interface) that further evolved to a Cloud Computing Internet Operating System Desktop; and • A semantic publishing environment that supports use on Mobile Apps (e.g., iPhone, iPad) and Linked Open Data through MindTouch Extensions (e.g., App Catalog and Deki Mobile), conversion of the MySQL database to an RDF triple-store (e.g., DBpedia), and use with spreadsheet tools (e.g., Cambridge Semantics, Extentech Sheetster).
Preface • First there was: • Put Your Desktop in the Cloud to Support the Open Government Directive and Data.gov/semantic, April 19, 2010, Semantic Universe. • Now that Google and other search engines are reorienting rankings to favor inclusion of semantics and RDFa, this becomes a very strong argument for Linked Open Data for the government. • This paper describes the overall use case submitted to the Federal Cloud Computing Advisory Committee and three progressive uses cases for developing applications. This paper recommends continued work on actionable data publishing (e.g. data catalogs using RDF) of EPA and US federal government data with context, provenance, and quality information. This paper is part of the author’s Open Government Directive Plan (see http://semanticommunity.net).
Preface • First there was: • Put Your Desktop in the Cloud to Support the Open Government Directive and Data.gov/semantic, April 19, 2010, Semantic Universe. • Comment submitted May 14, 2010: • The need for statistics/data quality as well as semantics for Linked Open Data: • I am schooled as a scientist and statistic1an in the basic scientific method and the Data Quality Objectives (DQO) process, which is the formal mechanism for implementing the scientific method and identifying important information that needs to be known in order to make decisions based on the outcome of the data collection itself - e.g. was the data collected and handled in such a way to produce the information you need to make a decision and, in this case, were multiple data sets, collected by multiple processes, not all of which one controls, done in such a way to make linking (mashups) meaningful for decision making - a tall order. My approach - feeling my way forward - has been to start with say all the high-quality environmental data EPA controls and has had peer-reviewed and metadata created for (the Report on the Environment), use statistical visualization tools (e.g. S-PLUS for Spotfire) to do those controlled mashups for our Statistics Users Group to look at and see how they suggest we proceed. I think this needs the support and input from the statistics community of experts to ultimately succeed with decision makers or it will be dismissed as just (but really neat) a semantic technology thing.
Introduction • Where Data.gov fails the 'mom test‘, FCW, May 7, 2010: • Speakers at open government conference say much of the data isn't usable to everyday people or developers. • Data.gov provides: • Raw data (with very limited or incomplete metadata) • Tools (with limited instructions and training to use them) • Geospatial data (with limited tools to use them with the raw data) • What is needed is a way to combine all three in a usable way for everyday people and developers: • This tutorial shows how two “cloud tools” have been applied to building both a Data.gov and an EPA Microsite with semantics and statistics. • Microsites provide everything EPA knows about a subject, regardless of which office owns the information. Microsites will incorporate social media and multimedia, and feature content written for target audiences.
The Cloud Tools http://cloud.mindtouch.com/
The Cloud Tools http://epaontology.wik.is/
The Cloud Tools http://spotfire.tibco.com/
The Cloud Tools http://ondemand.spotfire.com/public/Help/index.htm
Data.gov with Semantics and Statistics • U.S. EPA Report on the Environment Indicators: • Wiki: • http://epaontology.wik.is/ • Glossary: • http://epaontology.wik.is/Apendices/Appendix_A_Acronyms_and_Glossary/Glossary • Data Dictionary: • http://epaontology.wik.is/Data_Dictionary (in process) • Metadata Registry: • http://epaontology.wik.is/Metadata_Registry • Data Catalog / Databases: • Air - http://epaontology.wik.is/@api/deki/files/8/=ROE2008Air.xls • Taxonomy: • http://epaontology.wik.is/Special:Sitemap • Ontology: • http://epaontology.wik.is/ • Business Intelligence Data Applications: • Air – Web Player • Tutorial (Earlier): • http://epaontology.wik.is/@api/deki/files/189/=BrandNiemann03052010.ppt
Data.gov with Semantics and Statistics Business Intelligence Data Applications: Spotfire Analytics Save Dialogue Screen • Note: This uses the “linked data”, but not with RDF at present. • This is “open” in the sense that the Spotfire file (.dxp) can be downloaded from the Web and a 30-day free trial client can be used to perform additional analyses including exporting the data back to Excel, etc. for use in your own applications.
U.S. EPA Report on the Environment Indicators Mindtouch on the Desktop in the Cloud http://epaontology.wik.is/2_Air
U.S. EPA Report on the Environment Indicators Spotfire on the Desktop
U.S. EPA Report on the Environment Indicators Spotfire Silver Analytics Beta in the Cloud (Web Player)
U.S. EPA Report on the Environment Indicators Spotfire Silver Analytics Beta in the Cloud (Web Player): Exhibit 2-27 SO2 emissions in the U.S. by source category, 1990 and 1996-2002
EPA Statistics Users Group • Wiki: • http://epadata.wik.is/Statistics_Users_Group • Glossary: • Data Dictionary: • Metadata Registry: • Data Catalog / Databases: • http://epadata.wik.is/Statistics_Users_Group/EnvironmentalStats_for_S-PLUS • Taxonomy: • http://epadata.wik.is/Statistics_Users_Group#Meetings • Ontology: • None yet. • Business Intelligence Data Applications: • Interpretation of Environmental Data – Web Player • Tutorial: • http://epadata.wik.is/@api/deki/files/184/=BrandNiemann04282010.ppt
EPA Statistics Users Group http://epadata.wik.is/Statistics_Users_Group
EPA Statistics Users Group http://epadata.wik.is/Statistics_Users_Group/Environmental_Statistics_Training/Interpretation_of_Environmental_Data
EPA Statistics Users Group Spotfire Silver Analytics Beta in the Cloud Web Player
EPA Statistics Users Group Try Slider to Change Bib Size! Spotfire Silver Analytics Beta in the Cloud Web Player: Scenario 1
EPA Statistics Users Group http://epadata.wik.is/Statistics_Users_Group/EnvironmentalStats_for_S-PLUS
EPA Statistics Users Group http://epadata.wik.is/Statistics_Users_Group/EnvironmentalStats_for_S-PLUS/Getting_Started
EPA Statistics Users Group Spotfire Silver Analytics Beta in the Cloud Web Player: EnvironmentalStats for S-Plus Databases with Spotfire
EPA Statistics Users Group http://epadata.wik.is/Statistics_Users_Group/Environmental_Statistics_Training/Spotfire/Mapping
EPA Statistics Users Group Use Filters To Get Individual States, etc. Spotfire Silver Analytics Beta in the Cloud Web Player: Appendix II Mapped
Some Next Steps • Apply this to build additional Data.gov’s and EPA Microsites with semantics and statistics: • Annual Statistical Abstract • Community Health Data • CIA Fact Book • Etc.