210 likes | 332 Views
International Technology Alliance Programme: Fact Extraction using a Controlled Natural Language. David Mott, Dave Braines, ETS, Hursley, IBM UK. Team. Dave Braines, David Mott IBM, Hursley Steve Poteet, Ping Xue, Anne Kao Boeing, Seattle Paul Smart, Antonio Penta, Ron Tasker
E N D
International Technology Alliance Programme: Fact Extraction using a Controlled Natural Language David Mott, Dave Braines, ETS, Hursley, IBM UK
Team • Dave Braines, David Mott • IBM, Hursley • Steve Poteet, Ping Xue, Anne Kao • Boeing, Seattle • Paul Smart, Antonio Penta, Ron Tasker • University of Southampton
International Technology Alliance (ITA) in network and information sciences • How can coalition operations be assisted by networks of computer systems? • US/UK Academic/Industry collaboration • 10 year programme ending in May 2016 • Sponsored by UK MOD and US ARL • Research must be scientific, fundamental, reviewed by academic peers, and published
Fundamental Research Issues • How do we assist people to create and use applications that reason? • Modelling concepts, relationships and rules of inference • Grasping the basic logic of the model and rules • Understanding the reasoning performed by others • Sharing understanding across the human team • Sharing reasoning and artefacts across different systems
Supporting the "analyst" doc27 doc27 Requirements doc27 Assumptions NLP Analysts Conceptual Model CE Facts Product Query Inference Rationale Uncertainty Argumentation CNL Tools
Analysts's "Conceptual Model" • Analyst represents specialist knowledge as concepts, facts and rules for inference • a conceptual model • a common set of concepts • The system must "understand" the conceptual model • assist analyst to search for patterns, deduce information • A language to build the conceptual model • analyst: easy to understand • system: readable, unambiguous and formal • We use Controlled English to express the model
Controlled English • A Controlled Natural Language, being a subset of English • limited syntax, but still readable as English • meanings of the expressions unambiguously defined • Avoids the complexity of a real Natural Language • computer systems can read, interpret and apply it • Retains the appearance of a real language • humans can naturally use it, without learning "computer speak" • The analyst may use Controlled English to construct their Conceptual Model Based on work by John Sowa the person John is married to the person Jane and has red as hair colour.
CE for Reasoning • CE used to define: • "propositions", facts, assumptions • logical rules • queries • meta model of concepts • Inference engines constructed to apply logical rules • Specific Prolog implementations • CE Store based on Java and SQL • Rationale may be constructed: • presented to users for hybrid man/machine reasoning • to determine dependencies • Formal semantics for CE • (partially defined) in FOPL • Applications • analysis of information • societal and open government data • planning and resource allocation • (in progress) NLP
Fact Extraction using Controlled Natural Language • As the target of the NL processing • facts in documents can be used for further reasoning • As a means of describing the NL processing • to share understanding of the linguistic processing • to help configure NL tooling
Controlled English is "Curiously Useful" – Why? • perhaps because humans are naturally good at using language to model, understand and reason • we can build upon "literary devices" already developed to solve problems in expressing knowledge
Conceptual Model(s) meaning expresses conceptualises thing symbol stands for "Our" Semiotic Triangle, based on the original [Ogden, C. K. and Richards, I. A. (1923). ]
Our focus is on the semantics of the conceptual model Current NL Processing SYNCOIN Reports Message PreProcessor Proper Nouns (places, units) Stanford Parser Names Entity Extractor Situation Extractor CEStore CE Aggregator "Stylistic" CE Conceptual Model (concepts, logical rules, linguistic expression) For Analysis
if ( the prepositional phrase PP has the word '|in|' as head and has the noun phrase NP2 as object ) and ( the noun phrase NP2 stands for the thing T2 ) then ( the thing T2 is a container ). General Semantics: Containers if ( the noun phrase NP1 stands for the thing T1and has the prepositional phrase PP as dependent ) and ( the prepositional phrase PP has the word '|in|' as head and has the noun phrase NP2 as object ) and ( the noun phrase NP2 stands for the container T2) then ( the thing T1 is contained in the container T2 ). the noun phrase np1 has as dependent "the patrol in East Rashid discovers the facility." the prepositional phrase pp1 stands for has as head has as object the word |in| the noun phrase np2 stands for the thing t1 the thing t2 Least Commitment approach – dont say what sort of container is contained in is a container
Specific Semantics: Entities from Noun Phrases if ( the noun phrase NP has the noun N as head and stands for the thing T ) and ( the noun N expresses the entity concept C ) then ( the thing T realises the entity concept EC ). the noun phrase np1 has as head the noun |patrol| stands for Analyst's helper expresses "the patrol in East Rashid discovers the facility." the entity concept 'patrol unit' the thing s1 realises is a Requires "expresses" link between words and concepts patrol unit
Only the analyst knows what the concepts mean "Analyst's Helper" conceptual model MetaModel generator meta information Analyst Helper NL parser semantic rules the word |www| expresses the concept yyy "expresses" Proper Names Analyst the word |xxx| is an unrecognised word wordnet/etc ITAnet gazetteers etc translate translate wordnet/etc gazetteers etc
Current question • How should the "expresses" link be made more expressive! • conditional rules to handle ambiguous words • selectional constraints based on semantics of models? • introduce verbnet, etc? • ...
CE needs to be enhanced The ambiguity barrier Ambiguity Full English • we start from basic CE and move towards full English • Can we control the crossing of the ambiguity barrier? domain specific syntax sub clauses anaphoric reference verb inflections Ambiguity Barrier prepositional phrases flexible identities Basic CE
stylistically expressive CE NLP "Identical" NL and CNL parsers Reference English Grammar CNL Parser NL Parser lexicon Semantic Theory conceptual model basic CE or predicate logic or CE-in-Java stylistically expressive CE Better understanding of linguistics Increase stylistic expressibility of CE
Linguistic Frame for semantics v(T) T=OBJ,... verb phrase syntax there is a linguistic frame named vp0 that has 'is the dog Fido' as example and defines the verb phrase VP_vp0 and has the sequence ( the copula BE_vp0 , and the noun phrase OBJ_vp0 ) as syntactic pattern and is predicated on the thing T and has the statement that ( the noun phrase OBJ_vp0 is predicated on the thing OBJ ) and ( the thing T is the same as the thing OBJ ) as semantic statement. the word |is| belongs to the linguistic category 'copula'. the word |dog| is a noun. the entity concept ce:Dog is expressed by the word |dog| and has 'dog' as concept term. v(OBJ), dog(OBJ).. noun phrase copula is semantics the dog fido We want exactly the same logic here as in the real NL processing Linguistic Model Analyst's Conceptual Model
Could we? • use LKB instead of the Stanford Parser? • use the ERG instead of WordNet etc? • where does the Analysts Helper fit in? • improve our linguistic model to take account of LKB semantic theory? • represent MRS in CE? • represent linguistic rules in CE?