150 likes | 171 Views
New Directions In Semantic Interoperability. Lucian Russell, PhD Expert Reasoning & Decisions LLC SICoP Special Conference 2 Building Knowledgebases for Cross-Domain Semantic Interoperability April 25 th , 2007. While you were not watching ….
E N D
New Directions In Semantic Interoperability Lucian Russell, PhD Expert Reasoning & Decisions LLC SICoP Special Conference 2 Building Knowledgebases for Cross-Domain Semantic Interoperability April 25th, 2007
While you were not watching … • In 2000 the Intelligence Community set in motion the Advanced Research and Development Activity (ARDA). • Some programs had no restrictions on the ability of their researchers to publish. • A multi-discipline activity was started in Information Exploitation. • One major program was the AQUAINT program, established by Dr. John Prange. • This program, Advanced Question Answering for Intelligence, was bold, and its goals of advancing State of the Art seemed extremely ambitious. • Fortunately that was not the case – the program is now in its Third Phase. • It remains unclassified, just not widely known. • Another major program was NIMD. It looked at a number of issues including reasoning. It’s findings are generally labeled For Official Use Only, but fortunately one is no longer is: IKRIS. • The Interoperable Knowledge Representation for Intelligence Support is a new extension of logic, incorporating OWL, ISO Common Logic and other features. • It is the new features that enable a breakthrough in Semantics.
Why You Should Care • Def: Semantic Interoperability is a state of an information system artifact. When an Artifact A is semantically interoperable then a service which whishes to discover the meaning of data associated with the artifact can do so precisely. • We do not have semantic interoperability today. • XML is a message format • UDDI and WSDL are means for pre-agreed data descriptions to be communicated. • OWL is a formalism to describe IS-A relationships which includes Functions • Semantic Interoperability requires • Computers that understand human language • Schema descriptions that are precise. • Prior to Info-X we had neither • On April 19th 2006 it became possible to develop Semantic Interoperability • WARNING: Computers cannot detect lies and miscommunication and cannot compensate for incorrect or intentionally ambiguous language.
Barriers to Interoperability: Texts and Schemas • Ideally an English language document describing an Artifact should suffice to describe it for SI purposes. • Databases could be defined clearly as to the nature and purposes of their data elements. • Text documents could be read by the computer and described by summaries as well as key concepts extracted. • Barriers: • Human language is ambiguous • Google gets around it by using the “MySpace” model for Web Pages, a social engineering construct, plus paid placements. • Lacking URLs and reference frequencies one is left with pre-culled word lists reduced to stems whose frequency is used as a surrogate for significance. • Well meaning attempts by non-linguists to create OWL Ontologies do not get at the real problem of correctly specifying concepts. • The Schema Mismatch problem • Database schemas use names for Entities and Attributes that are too abbreviated to be, by themselves, of use by computers, and even by computer professionals. Data Dictionaries, though helpful are rarely implemented. • There are syntax mis-matches (SSN) and an Entity can be an Attribute can be a Value
This has an impact on the DRM • Version 2.0 had three sections: Data Description, Data Context and Data Sharing. We can now see going from three to two, Data Sharing and Intelligent Awareness. • The key is unifying descriptions in language and logic. • The English Language is now far better understood as a formal construct; it can be used precisely to augment Data Modeling for fixed field databases. • A Logic formalism is now available that unified First Order Logic (and Descriptive Logic), some second order logic predicates and non-monotonic reasoning. • The new results allow for the first time the chance to make progress on the Schema Mismatch problem which has stymied Data Sharing of fixed field databases since 1991. • It is now cost-effective to build a set of data artifacts describing databases because the new tools can process them and enable a new set of Data Sharing Services. • A starting point is the interwoven set of data descriptions and keywords in the Global Change Master Directory, a multi-agency index to 18 Peratbytes of data maintained at NASA.
Data & Information & Knowledge Repository The Data Reference Model 3.0, Web 3.0 & SOAs dynamic static Data Resource Awareness Agent Language Logic Figure 3-1 DRM standardization Areas
The First Building Block: WordNet • WordNet is found at (http://wordnet.princeton.edu/) • WordNet disambiguates the English language by listing all the senses of the most common words in English. • Synset: a set of words that can be considered synonyms; each has a number • With nouns these are generally replaceable • With verbs the situation is not so precise: there may be a shade of difference • All entries for a word have an associated phrase – a gloss – where it is used. • Four parts of speech are used: nouns, verbs, adjectives and adverbs • WordNet started in 1990 – but has EVOLVED • Although the project remains the same the content of the system is very different. • When the book on WordNet was published 10 years ago there was WordNet 1.6 • The WordNet system and database is on Release 3.0 (free download) • All glosses consist of words that are marked up with their synset numbers. • Category words are distinguished from instances, e.g. “Atlantic” as a noun is an instance – the Atlantic Ocean. • WordNet might be more aptly named Wordnets
In what ways are words networked? • There are at least 10 in WordNet 3.0 • Nouns • Hypernyms are more general terms and hyponyms the more specific ones • Holonyms are higher level parts and meronyms are lower level parts • Telic relationships: “A chicken is a bird” vs. “A chicken is a food” – short for “A chicken is used as a food”. The latter is atelic relationship. • Synonyms • Antonyms • Verbs • Hypernyms exist but there are 4 types of “hyponyms, different aspects (in time) of entailment. • Holonyms are higher level parts and meronyms are lower level parts of a process • Coming Soon: others, e.g. noun to verb forms • HOWEVER: • A NOUN IS NOT A VERB • A VERB IS NOT A NOUN • Nouns have inheritance properties with hypernyms that differ from the hypernyms of verbs. Do not put a verb in an OWL-DL Ontology
Document Reading and AQUAINT • Given the new WordNet and the results from AQUAINT-funded projects documents can now be read (i.e. the content understood). • To answer a question posed by a user a system must be able to • Understand the question • Determine if it entails a number of sub-questions it must determine those • Each document must be read to find if it has the answers • The answer from each one of them must be evaluated • The results must be combined • The reasoning about the answer provided to the user • Obviously WordNet and sources like FrameNet, Verb-net and other reference collections must be examined and their results combined. • The key to understanding the content is two-fold • WordNet to distinguish word meanings of the same text string • A markup that correctly describes relationships in TIME, because • AQUAINT has found that there are dozens of logical forms of sentences and distinguishing them means understanding real world temporal relationships. • AQUAINT has funded a project that has provided a new more powerful markup language for time.
What this means to SI • We now have a means of creating a text document that is precise • All the word meanings are disambiguated • All the time relationships are correctly stated • For previously generated text documents a correct identification of concepts is much more likely. • For previously generated Relational or Object databases it is now worth the effort to describe the data precisely • Attributes full relationship to the Entities can be described. • Inter-relationships among heretofore ambiguous dates can be clearly stated. • Language Computer Corporation has a suite of tools that are State of the Art in realizing this capability, and it works across languages. • In other words: the schema mismatch problem can be addressed directly – the semantics of such databases case be precisely specified and one can reason about the different forms of the data. • WHY? Because of IKRIS.
What is IKRIS and why is it a Breakthrough? • IKRIS as a project has created the IKL, the IKRIS Knowledge Language • Using IKL one can specify • Any construct in any version of OWL; According to DR,. Chris Welty, IKRIS Co-PI, OWL is First Order Predicate Calculus without Variables • Any construct in First Order Logic • Certain expressions in Second Order Logic • CONTEXT assumptions, that allows expressions in Non-Monotonic Logic • How has this been used? One way is to show the interoperability among different languages that specify processes. e.g. • CYC-L, the language used by CYCORP for its massive Ontology • PSL, the manufacturing process language developed at NIST • SOA-S, the proposed Ontology for IT Services • MORE IMPORTANT: Combine Context and Process specification to create the Contrafactual Conditional!
The What? • The Contrafactual Conditional is a logical statement that is against fact. It is used to specify scientific laws, e.g. • “Glass is Brittle” “Were a pane of glass to be struck by a hammer it would shatter” • Question: “Is this pane of glass brittle? We cannot tell because it is intact!” • Using the CONTEXT clause we make a logical assertion: • “At Time T the pane of glass G is “” where P is the process “hit by a hammer H”. • Conclusion: G is shattered into a set of shards {Si} • Reasoning: • Goal: Find a process Q which keeps the glass G intact. • Conclusion: do not select process P as an instance of Q. What’s Important: Any models of the Real-World that can be described in a database can be subjected to real –world reasoning by a computer that has the relevant collection Real-World laws!
So we can fully describe database schemas and … • The Data Dictionary text can be implemented as texts that describe the data in a relational database as a collection of information about static states that are the result of real world processes. • The processes that create the data can be described accurately. • The processes that update the data can be described accurately. • Therefore a Service can be created that reasons about whether the Real-world processes that created database A are suited to the needs of he creators if database B. • Semantic Interoperability can now be realized. • Further, there can be a real Service Oriented Architecture, not just application programs repackaged as web-services. • It also means that Ill-formed OWL Ontologies, ones that do not correspond to linguistic principles, can be replaced by Knowledge Representations where accurate OWL-DL structures are specified on the one hand and the processes that use them ca be described separately using a more powerful representation.
Build knowledge bases - NOW • There are two types of Knowledge bases that are needed for Semantic Interoperability, linguistic and Real-World. • CYCORP has the capabilities that are used by IKL and has them now. • This means that there is no reason not to use CYC for building knwoledg bases because its representation can be converted by IKL to any other suitably powerful representation. • It also means that whatever other tools are used the knowledge bases created by then can be shared using IKL as a translator.
In Summary • Prior to 2006 Semantic Interoperability was stalled • The principles of Computer Science to do the job right were not present • As usual people did the best that they could with the tools at hand. • Many low-level computer processes were incorrectly named as “Semantic” when they were not. They were “gilded farthings”. • On 2006 there were four new developments • IKRIS was specified • TimeML was developed by James Pustejovsky at Brandeis • WordNet 3.0 was completed • The AQUAINT Phase II projects to understand language were completed. • What could only be wished for was now possible Note: The Computer Science Breakthroughs were paid for by your tax dollars! There IS a role for government funding.