Using the English Resource Grammar to extend fact extraction capabilities

Using the English Resource Grammar to extend fact extraction capabilities v1.1 David Mott, IBM UK Stephen Poteet, Anne Kao, Ping Xue, Boeing Research & Technology Ann Copestake, University of Cambridge ITA Fall MeetingOctober 2013

Research Objectives • Extraction of facts in Controlled English from Natural Language documents • express the document in a formal but still readable way • extracted facts can be used to infer new information • Facilitate configuration of NL processing tools in CE • human analyst can be more involved in the NL processing • a common model of linguistics, grammar, and semantics • Provide rationale for linguistic and analytic processing • human can better understand and review the reasoning • facilitate evaluation of the quality of the reasoning We are not tasked with creating fundamental breakthroughs in the theory of NL processing

Supporting the analyst doc27 doc27 Requirements doc27 Assumptions NLP Analysts Conceptual Model CE Facts Product Linked data web Reference data Inference Rationale Query CE Facts other data Uncertainty Argumentation CE Tools The analyst does not have time to read all the reports Structured data

Working Scenario • Imagine you are an analyst in a team, being asked to provide high value information about events on the ground • Based upon reports and background reference material: • You want to extract basic facts from these reports and to infer new information • You want to have “new ideas” and implement this quickly without IT involvement • You want to understand and review the collaborative reasoning of the team which may contain differing skills 02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male (7115452376) in Bayaa to an unidentified male (7438604901) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 secondsSource: SYNCOIN simulated reports Graham, Rimland, & Hall. (2011). A COIN-inspired Synthetic Dataset for Qualitative Evaluation of Hard and Soft Fusion Systems: Proc, 14th international conference on information fusion. Chicago, IL.

The state of the BPP11 research Text Phrase structures Generic Semantics Domain Semantics Facts • We are using CE • as the target language for expressing facts • as the shared model of the concepts being expressed • as the language to configure NL systems • Detecting structures in phrases • Mapping language expressions to concepts • as the way to reveal reasoning performed by a collaborative team Controlled English Analysts Reasoning High Value Facts

Motivation for using DELPH-IN linguistics Translation Linguistic Knowledge Builder, Cambridge Typed Feature Structures English Resource Grammar, Stanford PET parser Japanese, German, Norwegian, Thai, Chinese, Spanish, ... Minimal Recursion Semantics, Cambridge • Collaborate with DELPH-IN consortium, to extend our NL and fact extraction capabilities • ERG is a high-coverage, high-precision English grammar, developed over 20 years • MRS is the representation of semantics • PET parser is an efficient parser • Explore Controlled English as possible facilitator for the use of DELPH-IN linguistic resources • Provide opportunity to research into deeper semantic processing • contribute to the NL research community

Integrating CE and the ERG ERG • Use ERG (and PET) to parse sentences and provide phrase structures • Use MRS to express generic semantics • Represent domain semantics in MRS, by extending generic semantics • Research into the integration of domain semantics and linguistic processing ? MRS Text Phrase structures Generic Semantics Domain Semantics Facts Controlled English Analyst’s Reasoning High Value Facts

Raw ERG system output PARSE TREE (syntax) MRS (semantics) We will turn this into CE

Defining the ERG lexicon in CE • Transformation between the ERG structures (Typed Feature Structures) and CE checkpoint_n1 := n_-_c_le & [ ORTH < “checkpoint" >, SYNSEM [ LKEYS.KEYREL.PRED "_checkpoint_n_1_rel", PHON.ONSET con ] ]. there is a count noun named checkpoint_n1 that is written as the word |checkpoint| and is a form of the noun sense ‘_checkpoint_n_1_rel’. • Mapping between generic semantics and specific semantics the noun sense ‘_checkpoint_n1_rel’ expresses the entity concept ‘checkpoint’. the noun sense ‘_carpet_n1_rel’ expresses the entity concept ‘carpet’. Is this easier to understand? The user has to define this link

Defining ERG grammar rules in CE Subcomponents of phrase are “head daughter” followed by “non head” daughter basic_head_initial := basic_binary_headed_phrase & [ HD-DTR #head, NH-DTR #non-head, ARGS < #head, #non-head > ]. there is a linguistic frame named f1 that defines the basic-head-initial PH and has the sequence ( the sign A0 , and the sign A1 ) as subcomponents and has the statement that ( the basic-head-initial PH has the sign A0 as HD-DTR and has the sign A1 as NH-DTR ) as semantics. a thing a basic-head-initial HD-DTR a thing NH-DTR ARGS a list 1ST 0TH a sign a sign

Three stage approach to defining MRS in CE • Generate raw representation of : • elementary predications (EPs) as objects with predicate and arguments • scope information between EPs • features of the entities involved • Extract intermediate, but generic, concepts describing the raw MRS: • patterns of quantification • … • Transform into domain specific CE concepts • using links between the predicate and the CE concept. • …

Step 1 - CE version of raw MRS x5 – “I” x5 “needs” x9 x9 – “new carpet” Still needs to be turned into more understandable concepts …

3 Steps to Domain Semantics Raw the mrs elementary predication #ep7_3 is an instance of the mrs predicate ‘_udef_q_rel’ and has the thing x9_8 as zeroth argument. the mrs elementary predication #ep7_5 is an instance of the mrs predicate '_carpet_n_1_rel’ and has the thing x9_8 as zeroth argument. the mrs elementary predication #ep7_3 equals modulo quantifiers the mrs elementary predication #ep7_5. rule to detect quantifier pattern in MRS Intermediate there is an indefinite quantification named q2 that is on the thing x9_8 and has the mrs predicate “_carpet_n_1_rel” as sense. the mrs predicate “_carpet_n_1_rel” expresses the entity concept ‘carpet’. if ( there is an indefinite quantification Q that is on the thing T and has the mrs predicate MRS as sense ) and ( the mrs predicate MRS expresses the entity concept EC ) then ( the thing T is an EC ). Domain the thing x9_8 is a carpet.

Facts extracted from example sentence 02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male (7115452376) in Bayaa to an unidentified male (7438604901) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 seconds This requires a number of linguistic and domain specific steps If other reports can add to information on the man x5_8 then we may know who is requiring new carpets, and could predict future events?

Discussion • DELPH-IN community have developed excellent Natural Language capabilities • We are integrating the “ERG system” and expressing lexicon, grammar rules and semantics in CE • However in the ERG system, the semantics are not completely separated from the linguistic structures • we propose intermediate semantic structures in CE, for bridging gap between generic and domain semantics • We are introducing domain semantics to represent facts in CE • provides a “target” for output of the ERG system • opportunity to explore how this can affect parsing of sentences • Much needs to to be done • improve integration • extend intermediate MRS • obtain rationale • feedback of semantic reasoning into the parsing • mechanisms to help adding/understanding of rules

Extra

Information Flow Conceptual model Redlinks have been partially implemented CE lexicon CE linguistic frames Use same transformation to be consistent ERG rules & types ERG lexicon CE facts Text MRS Raw MRS as CE PET parser PET parse tree shallow processing Stanford Parser Parse tree as CE

Rationale “the group of things x10 has the entity concept survey as categorisation.” • The rationale from the elementary predicates is: • How do we get the rationale FOR the elementary predicates? • could follow the parser tree + the TFS definitions, but nee a link between parse tree and MRS, which is so far not available

A layered Conceptual Model Our Semiotic Triangle, based on [Ogden, C. K. and Richards, I. A. (1923). ]

The ERG system architecture • PET is run under Linux (DEBIAN) in an ORACLE VirtualBox image • A Prolog program provides a web service for parsing sentences and turning the result into CE • Aiming to integrate to our CE Store CE parse tree and MRS PROLOG web service sentence parse tree and MRS parse tree and MRS CE sentence PET parser with ERG PROLOG CE generator

Feedback of domain reasoning to the parsing? • We want the domain to affect the parse, eg: • creating new lexical entries and grammar rules prior to parsing • But we also want arbitrary domain reasoning to affect the parse at runtime • Could this: • rule out inconsistent parses • provide disambiguations, and dialog context? ERG DOMAIN MODEL lexical entries, grammar rules DOMAIN REASONER ERG/PET facts constraints on linguistic phenomena

Linking text to domain situations

Working out the “requirer” • This can only be done by analysis of the communications as a whole (including anaphoric reference) 02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male (7115452376) in Bayaa to an unidentified male (7438604901) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 seconds STEP C STEP A • Step C needs knowledge of the structure of the report and of communications • Step A needs linguistic knowledge

Example CE rules if ( the communication C has the agent A as initiator ) and ( the agent A is located in the place P ) then ( the communication C is from the place P ). if ( the mrs elementary predication EP is an instance of the mrs predicate '_in_p_rel' and has the thing T as first argument and has the thing C as second argument ) then ( the thing T is contained in the container C ). DOMAIN RULE LINGUISTIC RULE

Domain Situations an agent is requested by needs a requirement has as material the material is requested from an agent has as material a production is produced by an agent has as material an agent a delivery is delivered by is delivered to an agent are these the same agent? has as material a usage is performed by an agent

CE representation for parse tree

Defining ERG grammar rules in CE Analysis of the rules for hd_cmp_u_c basic_head_initial := basic_binary_headed_phrase & [ HD-DTR #head, NH-DTR #non-head, ARGS < #head, #non-head > ]. headed_phrase := phrase & [ SYNSEM.LOCAL [ CAT [ HEAD head & #head, HC-LEX #hclex ], AGR #agr,CONJ #conj ], HD-DTR.SYNSEM.LOCAL local & [ CAT [ HEAD #head, HC-LEX #hclex ], AGR #agr,CONJ #conj ] ]. Ordered sequence of subcomponents, Head daughter followed by non head daughter Some info is passed up from head daughter to “this” phrase

Example CE rules if ( the communication C has the agent A as initiator ) and ( the agent A is located in the place P ) then ( the communication C is from the place P ). if ( the mrs elementary predication EP is an instance of the mrs predicate '_in_p_rel' and has the thing T as first argument and has the thing C as second argument ) then ( the thing T is contained in the container C ). DOMAIN RULE LINGUISTIC RULE

Calling ERG system from Word

Using the English Resource Grammar to extend fact extraction capabilities

Using the English Resource Grammar to extend fact extraction capabilities

Presentation Transcript

ENGLISH GRAMMAR

English Grammar

Fact Extraction

English Grammar

English Grammar

English Grammar

Using Parameterized URL’s to Extend Cognos Capabilities

English/Grammar

English Grammar

Hover Widgets: Using the Tracking State to Extend the Capabilities of Pen-Operated Devices

English Grammar

Fact Extraction using CNL: summary of reasoning (v3)

English Grammar

English Grammar

ENGLISH GRAMMAR

English Grammar

English Grammar

English Grammar

Using Link Grammar and WordNet on Fact Extraction for the Travel Domain

English Grammar

Microsoft dynamic CRM – Capabilities of web resource presents can be used to extend

English Grammar