1 / 22

W3C Standards for Data Interoperability @ Recovery and Data

W3C Standards for Data Interoperability @ Recovery.gov and Data.gov. Brand Niemann & Rick Murphy (unable to attend) Data Architecture Subcommittee Meeting June 11, 2009 http://federaldata.wik.is/Federal_Enterprise_Architecture_Reference_Model_Revision_Submission_Form. Overview. Brief History

ofira
Download Presentation

W3C Standards for Data Interoperability @ Recovery and Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. W3C Standards for Data Interoperability@ Recovery.gov and Data.gov Brand Niemann & Rick Murphy (unable to attend) Data Architecture Subcommittee Meeting June 11, 2009 http://federaldata.wik.is/Federal_Enterprise_Architecture_Reference_Model_Revision_Submission_Form

  2. Overview • Brief History • Steps in the Semantic Web @ EPA • April 10th Governance Subcommittee Data Reference Model Maintenance Submission • May 20th Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologiesfor the Semantic Web / Web of Data • Steps in Transition/Target Data Architecture @ EPA • Some Data.Gov Comments • Nextgov.com • Owen Ambur • Recovery.gov & Data.gov Pilots: • Web 2.0 • Web 3.0

  3. Steps in the Semantic Web @ EPA (1) See Semantic Web Project Methodology

  4. Getting to Web Semantics for Spreadsheets in the U.S. Government • Every year, the U.S. Census Bureau publishes the Annual Statistical Abstract, "the authoritative and comprehensive summary of statistics on the social, political, and economic organization of the United States" as a large set of downloadable Excel spreadsheet files. This government data is not readily accessible to Web search engines and cannot readily be shared, reused, and analyzed in new contexts. • This talk will present joint efforts between Cambridge Semantics, the U.S. EPA, and the Federal Semantic Interoperability Community of Practice (SICoP) to integrate semantic technologies, spreadsheets, and the Web to overcome many of these shortcomings. In particular, by representing information in the Census Bureau's spreadsheets as RDF data backed by definitions in a common semantic repository, shared concepts and relationships between different agencies' data is easily discovered and exploited. And by treating the spreadsheet as a user interface for manipulating semantic data, the data can easily be presented on the Web, where it is automatically updated when the underlying data tables change. This presentation will demonstrate the following in the context of the data that comprises the U.S. Government's Annual Statistical Abstract: • The use of Cambridge Semantics' SHAPE middleware platform to extract semantic information from Microsoft Excel spreadsheets. • A semantic repository containing shared definitions of data table columns that can be created, extended, and reused via a tightly integrated user interface in Excel. • Real-time changes to information that are reflected in other spreadsheets. • Repurposing the spreadsheet-based data tables onto the Web, while maintaining a live connection to the authoritative spreadsheet tables. • Guided search and query across the data from different spreadsheets. • http://www.semantic-conference.com/2008/session/588/index.html • Lee Feigenbaum, VP Technology and Standards, Cambridge Semantics Inc. • 2008 Semantic Technology Conference, May 18-22, 2008, San Jose, California, Wednesday, May 21, 2008, 08:30 AM - 09:30 AM.

  5. Real World Semantic Query of Organizational Data • Our experience in enterprise data integration over many years has taught us that for a new technology such as the Semantic Web to succeed, we need a solution offering zero programming to implement; we deem this an essential prerequisite for mainstream adoption. We have built such a solution and show it in action providing a query-able interface to some 300+ Environmental Protection Agency spreadsheets and Oracle RDBMS. We believe this is the first time that the benefit of the Semantic Web in this context - making it completely possible for end users to ask any query across dozens of spreadsheets and databases via an Ontology - has been exposed to a mainstream audience. • http://www.semantic-conference.com/session/1559/ • Brian Donnelly, CEO, Semantic Discovery System. • 2009 Semantic Technology Conference, June 14-18, 2009, San Jose, California, Wednesday, June 17, 2009, 05:00 PM - 06:00 PM.

  6. April 10th Governance Subcommittee Data Reference Model Maintenance Submission • Brief History: • DRM 1.0 – Mid-2005 (not accepted) • DRM 2.0 – December 2006 (widely accepted) • DRM 3.0 – June 2007 and Recently (Best Practices Committee) • Workshops: February 6, 2007, February 5, 2008, and February 17, 2009. • Lucian Russell wrote White Paper: Ontologies in the OWL-DL sense should be created or referenced for each data item as needed, but class names should only be nouns. Non-lexical terms should only be specified as a specialization of a lexical term and specific inclusion/exclusion rules should be provided. • Best Practice: NASA Global Change Master Directory • Professor Selmer Bringsjord: Using Sorted Logic to overcome schema mismatch for semantic interoperability (ontology) across multiple relational databases.

  7. April 10th Governance Subcommittee Data Reference Model Maintenance Submission • Federal Enterprise Architecture Reference Model Revision Submission (April 10th): • Data Description: • Uniform Resource Identifiers (URI) • Data Context: • Taxonomy/Ontology: • Information: Topic and Subtopic • Data: Data Table and Data Elements • Information and Data Modeling: Build on David Hay’s “Data Model Patterns (2009) • Data Sharing: • Data and Metadata “Travel Together”

  8. May 20th Open Group Internet Workshop: Enterprise VocabularyLightweight Vocabularlies / Ontologiesfor the Semantic Web / Web of Data • 1. Some Examples: • Dublin Core, FOAF, and DOAP: Metadata, People, & Projects • SKOS: Semantic Web Topic Hierarchy • Gist: “The Minimalist Upper Ontology” (Organizations) • 2. U.S. Federal Data Reference Model: • SICoP Special Conferences: February 6, 2007, February 5, 2008, and February 17, 2009 • Semantic Technology Conferences 2008 and 2009 • DRM 3.0, Data.Gov, and Data Modeling • 3. Recent Activities: • DAMA Data Management Book of Knowledge Glossary • Interagency Working Group on Digital Data • 2009 Ontology Summit (April 5-6th) Pilot Projects • Vocabulary Camp (May 30th) Next Workshop: June 3rd.

  9. Steps in Transition/Target Data Architecture @ EPA • Our first transition last year was to put EPA metadata at a well-defined URL (e.g. http://epametadata.wik.is) - that was followed up with our Data Standards Registry Program recently agreeing to have the OneData software re-programmed so it provides EPA metadata at well-defined URLs. • Another transition earlier this year was to use the Semantic Web “dynamic ontology methodology” to search and integrate across two- or more relational databases (this work is still in process).

  10. Steps in Transition/Target Data Architecture @ EPA • A more recent transition was to more toward a more Web 2.0/3.0 version of the DRM by supporting the DRM Revision Process and Data.gov. In this transition, the Data Description has a well-defined URL, the Data Context uses an Ontology, and the Data Sharing using RDF. • The most recent transition was the Datafinder application in support of Data.gov that 'connects people to EPA data sources and whose future versions could link people to the specific environmental datasets'. This application does not yet support the first three transitions.

  11. Some Questions and Answers • What is the significance of the Data.gov announcement for enterprise data management? • Kundra, former CIO of the District of Columbia, told the MAVA audience that the aim of Data.gov will be to improve government transparency by releasing these data sets so that citizens are able to analyze them and build mash-up applications - see http://www.techjournalsouth.com/news/article.html?item_id=7501 • Why should enterprises care? • Kundra also said "By democratizing data and making it available to the public and private sector, we can tap into that ingenuity." • Does government data transparency have legs beyond government? • Non-government groups like the Sunlight Foundation are using government data to promote transparency - see http://www.sunlightfoundation.com/ • What are the enterprise governance implications of more widespread data publication? • More widespread data publication using RDF promotes "connected" governance with linked data across the Web.

  12. Some Recovery.gov Comments • FCW, May 18, 2009: • Stimulus site pressed for time: • Eric Gillespie, Chief Information officer at Onvia, has set up a private Recovery.org Web site to report on the money, said establishing a federal system to record and track all contracts and subcontracts by October is an extremely difficult task. • Onvia Web Site: http://www.onvia.com/governmentstimulus/fp/default.aspx • NPR Interview: http://www.npr.org/templates/player/mediaPlayer.html?action=1&t=1&islist=false&id=104679087&m=104679733

  13. Some Recovery.gov Comments http://www.recovery.org/

  14. Some Data.Gov Comments • Nextgov.com, May 21, 2009: • “great idea, weak execution” • “Mmmm….data? • “the initial offering is a bit of a let down” • “already a very powerful set of tools” • “providing raw data is inherent to establishing trust”

  15. Some Data.Gov Comments • Owen Ambur, W3C eGov Interest Group, May 23, 2009: • to a large degree, Data.gov duplicates another good site that has been available for a number of years but which also happens to be a data stovepipe: http://www.fedstats.gov/ • I understand the Data.gov folks started with the Dublin Core but implemented Data.gov's metadata in a stovepipe fashion.

  16. Web 2.0 Data.gov Implementation Pilot 31 Sections: Topic Taxonomy Self-describing URIs http://federaldata.wik.is/Statistical_Abstract_of_the_United_States%3a_2009

  17. Web 2.0 Data.gov Implementation Pilot 206 Sub-Sections: Sub-Topic Taxonomy Self-describing URIs http://federaldata.wik.is/Statistical_Abstract_of_the_United_States%3a_2009/Section_6._Geography_and_Environment

  18. Web 2.0 Data.gov Implementation Pilot About 1500 Data Tables Self-describing URIs http://federaldata.wik.is/Statistical_Abstract_of_the_United_States:_2009/Section_6._Geography_and_Environment/Tables

  19. Web 2.0 Data.gov Implementation Pilot Six Standard Table Metadata Properties RDF/XML Open Linked Web Data Self-describing URIs http://federaldata.wik.is/Statistical_Abstract_of_the_United_States:_2009/Section_6._Geography_and_Environment/Tables/ Table_357_-_National_Ambient_Air_Pollutant_Concentrations_by_Type_of_Pollutant

  20. Web 2.0 Data.gov Implementation Pilot Thousands of Data Elements with Metadata Self-describing URIs Six Standard Table Metadata Properties RDF/XML Open Linked Web Data http://federaldata.wik.is/Statistical_Abstract_of_the_United_States:_2009/Section_6._Geography_and_Environment/Tables/ Table_357_-_National_Ambient_Air_Pollutant_Concentrations_by_Type_of_Pollutant

  21. Web 3.0 Recovery.gov Pilot http://federaldata.wik.is/May_13%2c_2009_Semantic_Web_Meetup

  22. Web 3.0 Recovery.gov Pilot See http://federaldata.wik.is/@api/deki/files/88/=SemanticTechnologySolutions_for_RecoveryGov_and_DataGov_with_Transparency_Openness_and_Collaboration_Davis2009_web.pdf

More Related