1 / 26

Brand Niemann Senior Enterprise Architect U.S. EPA, Washington, DC May 20, 2009

Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data. Brand Niemann Senior Enterprise Architect U.S. EPA, Washington, DC May 20, 2009 http://semanticommunity.net. Background. The Open Group :

saskia
Download Presentation

Brand Niemann Senior Enterprise Architect U.S. EPA, Washington, DC May 20, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open Group Internet Workshop:Enterprise VocabularyLightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise Architect U.S. EPA, Washington, DC May 20, 2009 http://semanticommunity.net

  2. Background • The Open Group: • The Open Group is a vendor-neutral and technology-neutral consortium, whose vision of Boundaryless Information Flow™ will enable access to integrated information, within and among enterprises, based on open standards and global interoperability. • Semantic Interoperability Work Group: • The Internet and the World-Wide Web have solved the basic problems of information transmission; the next major advance will come from resolving the deeper issues of semantic interoperability.

  3. Background • The workshop will include a series of brief contributions on vocabularies and their use, including from: • Dennis Attinger, Philips, on why Philips should use vocabularies; • Ron Schuldt, Lockheed Martin, on controlled vocabularies; • Brand Niemann, EPA, on Lightweight Vocabularies/Ontologies for the Semantic Web / Web of Data.

  4. Background • These contributions will be the basis for discussion of: • The Problem Space: Why enterprises use vocabularies, how enterprises use vocabularies, and what problems enterprises have in using vocabularies; • What should vocabularies contain? • Are there common principles that apply to the seemingly different approaches? • The discussions and conclusions will be summarised in a report which will be distributed to attendees and others that have provided input. • Proceedings (password required)

  5. Overview • 1. Some Examples: • Dublin Core, FOAF, and DOAP: Metadata, People, & Projects • SKOS: Semantic Web Topic Hierarchy • Gist: “The Minimalist Upper Ontology” (Organizations) • 2. U.S. Federal Data Reference Model: • SICoP Special Conferences: February 6, 2007, February 5, 2008, and February 17, 2009 • Semantic Technology Conferences 2008 and 2009 • DRM 3.0, Data.Gov, and Data Modeling • 3. Recent Activities: • DAMA Data Management Book of Knowledge Glossary • Interagency Working Group on Digital Data • 2009 Ontology Summit (April 5-6th) Pilot Projects • Vocabulary Camp (May 30th)

  6. 1. Some Examples • Dublin Core: • The Dublin Core metadata element set is a standard for cross-domain information resource description. It provides a simple and standardised set of conventions for describing things online in ways that make them easier to find. Dublin Core is widely used to describe digital materials such as video, sound, image, text, and composite media like web pages. Implementations of Dublin Core typically make use of XML and are Resource Description Framework based. Dublin Core is defined by ISO in ISO Standard 15836, and NISO Standard Z39.85-2007 • The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, and Rights. • See http://en.wikipedia.org/wiki/Dublin_Core

  7. 1. Some Examples • FOAF: • An acronym of Friend of a Friend is a machine-readable ontology describing persons, their activities and their relations to other people and objects. Anyone can use FOAF to describe him or herself. FOAF allows groups of people to describe social networks without the need for a centralised database. • FOAF is a descriptive vocabulary expressed using RDF Resource Description Framework and OWL Web Ontology Language. • See: http://en.wikipedia.org/wiki/FOAF_(software)

  8. 1. Some Examples • DOAP: • Description of a Project (DOAP) is an RDF schema and XML vocabulary to describe open-source projects. It was created and initially developed by Edd Dumbill to convey semantically information associated with open-source software projects. It is currently used in the Mozilla Foundation's project page and in several other software repositories. • There are currently generators, validators, viewers, and converters to enable more projects to be able to be included in the semantic web. • See http://en.wikipedia.org/wiki/DOAP

  9. 1. Some Examples • SKOS: • Simple Knowledge Organisation Systems (SKOS) is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to enable easy publication of controlled structured vocabularies for the Semantic Web. SKOS is currently developed within the W3C framework. • See http://en.wikipedia.org/wiki/SKOS

  10. 1. Some Examples • SKOS: • Semantic Web Topic Hierarchy (in OWL): • Taxonomy of Semantic Web Topics adopted by the European Projects Knowledge Web: • 1.0 Foundations • 2.0 Semantic Web: Core Topics • 3.0 Semantic Web Special Topics • and REWERSE: • Knowledge Engineering / Ontology Engineering • Knowledge Representation and Reasoning • Basic Web Technologies • Information Access • Ontologies on the Semantic Web • Rules • Security / Trust / Privacy in the Semantic Web • Application Domains • Special Topics

  11. 1. Some Examples • Gist: The Minimalist Upper Ontology: • Introduced in 2006 and at the 2007 Ontology Summit and Semantic Technology Conference: • It is different from other upper ontologies in that we have attempted to do two things simultaneously: • cover a very broad range of future applications • cover them with the fewest number of concepts • See http://www.gist-ont.com/

  12. 1. Some Examples • Gist: The Minimalist Upper Ontology: • gist is an OWL 2 upper ontology with fewer than 200 concepts. It is freely available to use or modify. We have used it on several client engagements and have found that it covers most of the concepts needed for a large enterprise. Most of the distinctions we find are specializations of gist concepts rather than new concepts. • In this session we will briefly survey the world of upper ontologies. We will describe the structure and organization of gist. We will then show several design patterns contained in gist, including those that make use of OWL 2 features. • At the conclusion of this talk participants should be able to: • Access gist and use it as a learning aid • Understand how to specialize gist for an enterprise • Appreciate the need for some of the new OWL 2 features • Understand why committing to a minimalist upper ontology will reduce integration effort internally and externally • This talk assumes some previous knowledge of OWL • See http://www.semantic-conference.com/session/2054/ • Dave McComb, President, Semantic Arts • 2009 Semantic Technology Conference, June 14-18, 2009, San Jose, California, June 16, 2009, 2:15-3:15 p.m.

  13. Gist: the minimalist upper ontology Entities (130) http://www.gist-ont.com/data/50643f60-cdd7-4e9d-b080-a543eb7c62a1/files//Gist%20Diagram3.vsd

  14. Gist: the minimalist upper ontology Properties (86) http://www.gist-ont.com/data/50643f60-cdd7-4e9d-b080-a543eb7c62a1/files//Gist%20Diagram3.vsd

  15. 2. U.S. Federal Data Reference Model • Brief History: • DRM 1.0 – Mid-2005 (not accepted) • DRM 2.0 – December 2006 (widely accepted) • DRM 3.0 – June 2007 and Recently (Best Practices Committee) • Workshops: February 6, 2007, February 5, 2008, and February 17, 2009. • Lucian Russell wrote White Paper: Ontologies in the OWL-DL sense should be created or referenced for each data item as needed, but class names should only be nouns. Non-lexical terms should only be specified as a specialization of a lexical term and specific inclusion/exclusion rules should be provided. • Best Practice: NASA Global Change Master Directory • Professor Selmer Bringsjord: Using Sorted Logic to overcome schema mismatch for semantic interoperability (ontology) across multiple relational databases.

  16. 2. U.S. Federal Data Reference Model • Federal Enterprise Architecture Reference Model Revision Submission (April 10th): • Data Description: • Uniform Resource Identifiers (URI) • Data Context: • Taxonomy/Ontology: • Information: Topic and Subtopic • Data: Data Table and Data Elements • Information and Data Modeling: Build on David Hay’s “Data Model Patterns (2009) • Data Sharing: • Data and Metadata “Travel Together”

  17. 2. U.S. Federal Data Reference Model • Work on DRM 3.0: • 2008: Getting to Web Semantics for Spreadsheets in the U.S. Government • 2009: Real World Semantic Query of Organizational Data • Recovery.gov and Data.gov Pilots

  18. Steps in the Semantic Web @ EPA (1) See Semantic Web Project Methodology

  19. Getting to Web Semantics for Spreadsheets in the U.S. Government • Every year, the U.S. Census Bureau publishes the Annual Statistical Abstract, "the authoritative and comprehensive summary of statistics on the social, political, and economic organization of the United States" as a large set of downloadable Excel spreadsheet files. This government data is not readily accessible to Web search engines and cannot readily be shared, reused, and analyzed in new contexts. • This talk will present joint efforts between Cambridge Semantics, the U.S. EPA, and the Federal Semantic Interoperability Community of Practice (SICoP) to integrate semantic technologies, spreadsheets, and the Web to overcome many of these shortcomings. In particular, by representing information in the Census Bureau's spreadsheets as RDF data backed by definitions in a common semantic repository, shared concepts and relationships between different agencies' data is easily discovered and exploited. And by treating the spreadsheet as a user interface for manipulating semantic data, the data can easily be presented on the Web, where it is automatically updated when the underlying data tables change. This presentation will demonstrate the following in the context of the data that comprises the U.S. Government's Annual Statistical Abstract: • The use of Cambridge Semantics' SHAPE middleware platform to extract semantic information from Microsoft Excel spreadsheets. • A semantic repository containing shared definitions of data table columns that can be created, extended, and reused via a tightly integrated user interface in Excel. • Real-time changes to information that are reflected in other spreadsheets. • Repurposing the spreadsheet-based data tables onto the Web, while maintaining a live connection to the authoritative spreadsheet tables. • Guided search and query across the data from different spreadsheets. • http://www.semantic-conference.com/2008/session/588/index.html • Lee Feigenbaum, VP Technology and Standards, Cambridge Semantics Inc. and Brand Niemann, Senior Enterprise Architect, US EPA. • 2008 Semantic Technology Conference, May 18-22, 2008, San Jose, California, Wednesday, May 21, 2008, 08:30 AM - 09:30 AM.

  20. Real World Semantic Query of Organizational Data • Our experience in enterprise data integration over many years has taught us that for a new technology such as the Semantic Web to succeed, we need a solution offering zero programming to implement; we deem this an essential prerequisite for mainstream adoption. We have built such a solution and show it in action providing a query-able interface to some 300+ Environmental Protection Agency spreadsheets and Oracle RDBMS. We believe this is the first time that the benefit of the Semantic Web in this context - making it completely possible for end users to ask any query across dozens of spreadsheets and databases via an Ontology - has been exposed to a mainstream audience. • http://www.semantic-conference.com/session/1559/ • Brian Donnelly, CEO, Semantic Discovery System, and Brand Niemann, Senior Enterprise Architect, US EPA. • 2009 Semantic Technology Conference, June 14-18, 2009, San Jose, California, Wednesday, June 17, 2009, 05:00 PM - 06:00 PM.

  21. Recovery.gov and Data.gov Pilots http://federaldata.wik.is/May_13%2c_2009_Semantic_Web_Meetup

  22. 3. Recent Activities • The DAMA Dictionary of Data Management (Version 1.0) was announced at the DAMA International Symposium & Wilshire Meta Data Conference in San Diego March 16-20, 2008: • Over 800 terms defining a common data management vocabulary for IT professionals, data stewards and business leaders. • Over 40 topics including finance and accounting, knowledge management, architecture, data modeling, XML, and analytics. • DAMA Dictionary of Data Management was developed as the glossary for the DAMA-DMBOK Guide. Version 1.1 of the Dictionary will be published in conjunction with The DAMA-DMBOK Guide in 2009.

  23. Interagency Working Group on Digital Data (IWGDD): Formed under the auspices of the NSTC Committee for Science, the purpose is to develop and promote the implementation of a strategic plan for the Federal government to cultivate an open interoperable framework that will ensure reliable preservation and effective access to digital data for research, development, and education in science, technology, and engineering. See Harnessing the Power of Digital Data for Science and Society 3. Recent Activities (1) http://federaldata.wik.is/The_Recovery_Dialogue_on_IT_Solutions

  24. 3. Recent Activities • 2009 Ontology Summit (April 5-6th) Pilot Projects: • Connecting ISO/IEC 11179 to Data Sets: • ISO/IEC 11179-3 Edition 3 is expected to provide a standard metamodel for (among other things) defining the semantics of Data Elements in terms of formally defined concepts, as defined by formal ontologies. The connection between Data Elements and the actual data is however beyond the scope of 11179. Realization of the "Data Web" will require closing of this gap, to connect datasets with ontologies which define their semantics. A complete solution will need to address an array of dataset forms including XBRL, SDMX, domain-specific XML schemas and "microformats", and relational and non-relational DBMSs. Some of this may be supported by OMG CWM and/or forthcoming IMM standards, but a broader framework is called for. Details here.

  25. 3. Recent Activities • 2009 Ontology Summit (April 5-6th) Pilot Projects: • Suggested Structure for Work: • Of – Existing Standards: Re-engineering, FEA Reference Models (e.g. FEA Reference Model Ontology), etc. • For – New Standards: Compliance, Privacy (e.g. Rick Murphy Ontology of Privacy Act of 1974), etc. • and By – New Standards: Harmonization, Executable, Acquisition (e.g. TopQuadrant Work for GSA), etc.

  26. 3. Recent Activities • Vocabulary Camp, May 30-31, 2009, Washington DC: • Level of knowledge about ontologies: • Beginner should denote that you are completely new to ontologies and semantic web standards in general. • Intermediate notes that you have familiarity with ontologies and semantic web standards but are inexperienced building ontologies in practice. • Expert is anyone who has actually developed an ontology before. • Mike Lang of Revelytix is the organizer. You can contact him at mikelangjr@revelytix.com. • See http://vocamp.org/wiki/VoCampDCMay2009

More Related