1 / 18

DRM & Semantic Advances

DRM & Semantic Advances. Lucian Russell, PhD Expert Reasoning & Decisions LLC Feb 17 th , 2009. DRM 2.0 Semantic Baseline. DRM 2.0’s purpose was to advance Data Sharing among agencies The writing team wanted to avoid a document that created unnecessary work for Federal agencies

abiola
Download Presentation

DRM & Semantic Advances

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DRM & Semantic Advances Lucian Russell, PhD Expert Reasoning & Decisions LLC Feb 17th, 2009

  2. DRM 2.0 Semantic Baseline • DRM 2.0’s purpose was to advance Data Sharing among agencies • The writing team wanted to avoid a document that created unnecessary work for Federal agencies • It took the “do no harm” approach • The DRM audience was Enterprise Architects and Data Architects • It emphasized Communities of Interest

  3. The Environment Was Hostile • Data Sharing was of great interest, but was hard to do because • There was preparation work with no budget • The technology was decades old • Search Engine vendors were claiming that they could do everything and Political Appointees were hearing “slashed budgets” • Metadata specialists were under attack

  4. The Approach Was to Build • There was a new technology based on web interfaces, Service Oriented Architecture (SOA) which could overcome the “plumbing” issues preventing sharing • Specialists who knew their data collections were motivated to apply that knowledge • The DRM 2.0 did some organization of concepts that represented best practice and suggested that the specialists be allowed to use them

  5. What Changed? • April 19th 2006 the DNI DTO (now DNI IARPA) announced the success of the IKRIS project (It was briefed at a SICoP workshop October 2006) • Taken in concert with other IARPA advances is changed what could be expected from Semantic technology • The advances were briefed Feb 6th, 2007

  6. Key Elements from Feb 6th ‘07 • A vision of DRM 2.0 moving forward was presented by Chapter 3 & 4 specialist Lucian Russell and Chapter 5 specialist Bryan Aucoin (the “how to” chapters) • A vision of DRM 3.0 with a combined Data Context and Data Description was envisioned • Speakers described enabling technology

  7. The Specialists Speakers • From the DNI DTO (IARPA) work • Dr. Christiane Fellbaum – Princeton’s WordNet program • Lola Olson – NASA Goddard Master Directory program • Dr. John Prange – Language Computer Corporation • Dr. Michael Witbrock - CYCORP

  8. English as an Exact Language • The first key advance was from WordNet – Dr. Christiane Fellbaum • Heretofore English was too ambiguous • Now the 115K major words of English had been analyzed carefully and all meanings were distinguished, and more words could be added as needed • Nouns are not verbs! (sorry OWL specialists) • The result: unambiguous English could be generated for use by commuters • Note: this is not ambiguity resolution!

  9. Weaving Words and Directories • The NASA Goddard Master Directory program describes how to access 18 petabytes of data. • It is already massive, multi-agency data sharing and has worked for 10+ years • It combines topic words with data collection descriptions • The DRM 2.0 did not directly address massive data collections, but the wording on guidance indirectly let in this approach

  10. Parsing Meaning from Documents • Documents with English already can be parsed to detect meaning. • Language Computer Corporation had scored the best in the DTO (IARPA) program as measured through NIST challenge competitions (TREC) • The team showed that there are some 40 major relationship classes in English

  11. Putting Meanings Together • CYCORP has a unique Ontology CYC of millions of assertions about facts in the real world and in others (e.g. mythology) • The concepts that could be uncovered by the prior work could be woven together in the CYC Ontology • CYC has logic constructs that exceed First Order Logic, and because IKRIS showed CYC interoperable with other Knowledge Constructs the CYC Ontology is the most powerful known

  12. DRM 2.0 Could Be Improved! • The final Writing team found a DRM draft that covered only 3 of the 8 major categories of government information and offered little technical guidance • New technology existed that could advance the State of the Art for Data – that was the take away Feb 6th • It was put in a White Paper June 18th ‘07

  13. What Was Next? Feb 5th 2008 • Improvements on February 6th • Further elucidation of the role of Ontology in data sharing • Further elucidation of the role of DRM artifacts in Data Sharing and a suggestion of their content • A new approach, via Sorted Logic, on the challenge of Schema Mismatch: a barrier to Data Sharing for fixed field databases

  14. Data Sharing: Alpha & Beta • In the context of Relational Databases current practice is to develop data models that are sparse in semantics – a practice that dates back to the 1970s when storage was expensive • This leads to two types of errors that can disable Data Sharing • Type Alpha Errors: A and B are the same but we miss the equivalence • Type Beta errors: A and B look the same but are different

  15. Type Beta Errors – Data Context • The DRM 2.0 explained that Data Context Artifacts should be used for Data Assets, such as relational databases, so as to provide more information. The goal is to make be able to distinguish different instances of data that appear the same • In relational databases this would mean adding semantic content to distinguish schemas – metadata about the data

  16. How Can This Be Done? • Data models themselves provide too little information – deliberately so • What is needed is a new semantic artifact, a “Data Descriptor” among the Data Description (Chapter 3) artifacts that explains the processes (using verbs) behind the collection of the data. • The context can be abstracted from the Data Descriptor

  17. The Logic Transformation! • The biggest technical challenge in Data Sharing for Relational Database is schema mismatch – type alpha errors • Dr. Selmer Bringsjord, a DTO and especially IKRIS investigator, showed how transformations of database schemas can be used to detect two identical databases that have different schemas

  18. The Conclusion • With more careful semantics, disambiguated words, and an Ontology that supports process descriptions and counterfactuals: • It is possible to build more advanced mechanisms to support data sharing in the Federal Government • These must be the basis for DRM 3.0 • Because they involve just using English better they are far less expensive than was feared!

More Related