1 / 29

Using the English Resource Grammar to extend fact extraction capabilities

Using the English Resource Grammar to extend fact extraction capabilities. v1.1. David Mott, IBM UK Stephen Poteet, Anne Kao, Ping Xue, Boeing Research & Technology Ann Copestake, University of Cambridge. ITA Fall Meeting October 2013. Research Objectives.

gudrun
Download Presentation

Using the English Resource Grammar to extend fact extraction capabilities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using the English Resource Grammar to extend fact extraction capabilities v1.1 David Mott, IBM UK Stephen Poteet, Anne Kao, Ping Xue, Boeing Research & Technology Ann Copestake, University of Cambridge ITA Fall MeetingOctober 2013

  2. Research Objectives • Extraction of facts in Controlled English from Natural Language documents • express the document in a formal but still readable way • extracted facts can be used to infer new information • Facilitate configuration of NL processing tools in CE • human analyst can be more involved in the NL processing • a common model of linguistics, grammar, and semantics • Provide rationale for linguistic and analytic processing • human can better understand and review the reasoning • facilitate evaluation of the quality of the reasoning We are not tasked with creating fundamental breakthroughs in the theory of NL processing

  3. Supporting the analyst doc27 doc27 Requirements doc27 Assumptions NLP Analysts Conceptual Model CE Facts Product Linked data web Reference data Inference Rationale Query CE Facts other data Uncertainty Argumentation CE Tools The analyst does not have time to read all the reports Structured data

  4. Working Scenario • Imagine you are an analyst in a team, being asked to provide high value information about events on the ground • Based upon reports and background reference material: • You want to extract basic facts from these reports and to infer new information • You want to have “new ideas” and implement this quickly without IT involvement • You want to understand and review the collaborative reasoning of the team which may contain differing skills 02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male (7115452376) in Bayaa to an unidentified male (7438604901) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 secondsSource: SYNCOIN simulated reports Graham, Rimland, & Hall. (2011). A COIN-inspired Synthetic Dataset for Qualitative Evaluation of Hard and Soft Fusion Systems: Proc, 14th international conference on information fusion. Chicago, IL.

  5. The state of the BPP11 research Text Phrase structures Generic Semantics Domain Semantics Facts • We are using CE • as the target language for expressing facts • as the shared model of the concepts being expressed • as the language to configure NL systems • Detecting structures in phrases • Mapping language expressions to concepts • as the way to reveal reasoning performed by a collaborative team Controlled English Analysts Reasoning High Value Facts

  6. Motivation for using DELPH-IN linguistics Translation Linguistic Knowledge Builder, Cambridge Typed Feature Structures English Resource Grammar, Stanford PET parser Japanese, German, Norwegian, Thai, Chinese, Spanish, ... Minimal Recursion Semantics, Cambridge • Collaborate with DELPH-IN consortium, to extend our NL and fact extraction capabilities • ERG is a high-coverage, high-precision English grammar, developed over 20 years • MRS is the representation of semantics • PET parser is an efficient parser • Explore Controlled English as possible facilitator for the use of DELPH-IN linguistic resources • Provide opportunity to research into deeper semantic processing • contribute to the NL research community

  7. Integrating CE and the ERG ERG • Use ERG (and PET) to parse sentences and provide phrase structures • Use MRS to express generic semantics • Represent domain semantics in MRS, by extending generic semantics • Research into the integration of domain semantics and linguistic processing ? MRS Text Phrase structures Generic Semantics Domain Semantics Facts Controlled English Analyst’s Reasoning High Value Facts

  8. Raw ERG system output PARSE TREE (syntax) MRS (semantics) We will turn this into CE

  9. Defining the ERG lexicon in CE • Transformation between the ERG structures (Typed Feature Structures) and CE checkpoint_n1 := n_-_c_le & [ ORTH < “checkpoint" >, SYNSEM [ LKEYS.KEYREL.PRED "_checkpoint_n_1_rel", PHON.ONSET con ] ]. there is a count noun named checkpoint_n1 that is written as the word |checkpoint| and is a form of the noun sense ‘_checkpoint_n_1_rel’. • Mapping between generic semantics and specific semantics the noun sense ‘_checkpoint_n1_rel’ expresses the entity concept ‘checkpoint’. the noun sense ‘_carpet_n1_rel’ expresses the entity concept ‘carpet’. Is this easier to understand? The user has to define this link

  10. Defining ERG grammar rules in CE Subcomponents of phrase are “head daughter” followed by “non head” daughter basic_head_initial := basic_binary_headed_phrase & [ HD-DTR #head, NH-DTR #non-head, ARGS < #head, #non-head > ]. there is a linguistic frame named f1 that defines the basic-head-initial PH and has the sequence ( the sign A0 , and the sign A1 ) as subcomponents and has the statement that ( the basic-head-initial PH has the sign A0 as HD-DTR and has the sign A1 as NH-DTR ) as semantics. a thing a basic-head-initial HD-DTR a thing NH-DTR ARGS a list 1ST 0TH a sign a sign

  11. Three stage approach to defining MRS in CE • Generate raw representation of : • elementary predications (EPs) as objects with predicate and arguments • scope information between EPs • features of the entities involved • Extract intermediate, but generic, concepts describing the raw MRS: • patterns of quantification • … • Transform into domain specific CE concepts • using links between the predicate and the CE concept. • …

  12. Step 1 - CE version of raw MRS x5 – “I” x5 “needs” x9 x9 – “new carpet” Still needs to be turned into more understandable concepts …

  13. 3 Steps to Domain Semantics Raw the mrs elementary predication #ep7_3 is an instance of the mrs predicate ‘_udef_q_rel’ and has the thing x9_8 as zeroth argument. the mrs elementary predication #ep7_5 is an instance of the mrs predicate '_carpet_n_1_rel’ and has the thing x9_8 as zeroth argument. the mrs elementary predication #ep7_3 equals modulo quantifiers the mrs elementary predication #ep7_5. rule to detect quantifier pattern in MRS Intermediate there is an indefinite quantification named q2 that is on the thing x9_8 and has the mrs predicate “_carpet_n_1_rel” as sense. the mrs predicate “_carpet_n_1_rel” expresses the entity concept ‘carpet’. if ( there is an indefinite quantification Q that is on the thing T and has the mrs predicate MRS as sense ) and ( the mrs predicate MRS expresses the entity concept EC ) then ( the thing T is an EC ). Domain the thing x9_8 is a carpet.

  14. Facts extracted from example sentence 02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male (7115452376) in Bayaa to an unidentified male (7438604901) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 seconds This requires a number of linguistic and domain specific steps If other reports can add to information on the man x5_8 then we may know who is requiring new carpets, and could predict future events?

  15. Discussion • DELPH-IN community have developed excellent Natural Language capabilities • We are integrating the “ERG system” and expressing lexicon, grammar rules and semantics in CE • However in the ERG system, the semantics are not completely separated from the linguistic structures • we propose intermediate semantic structures in CE, for bridging gap between generic and domain semantics • We are introducing domain semantics to represent facts in CE • provides a “target” for output of the ERG system • opportunity to explore how this can affect parsing of sentences • Much needs to to be done • improve integration • extend intermediate MRS • obtain rationale • feedback of semantic reasoning into the parsing • mechanisms to help adding/understanding of rules

  16. Extra

  17. Information Flow Conceptual model Redlinks have been partially implemented CE lexicon CE linguistic frames Use same transformation to be consistent ERG rules & types ERG lexicon CE facts Text MRS Raw MRS as CE PET parser PET parse tree shallow processing Stanford Parser Parse tree as CE

  18. Rationale “the group of things x10 has the entity concept survey as categorisation.” • The rationale from the elementary predicates is: • How do we get the rationale FOR the elementary predicates? • could follow the parser tree + the TFS definitions, but nee a link between parse tree and MRS, which is so far not available

  19. A layered Conceptual Model Our Semiotic Triangle, based on [Ogden, C. K. and Richards, I. A. (1923). ]

  20. The ERG system architecture • PET is run under Linux (DEBIAN) in an ORACLE VirtualBox image • A Prolog program provides a web service for parsing sentences and turning the result into CE • Aiming to integrate to our CE Store CE parse tree and MRS PROLOG web service sentence parse tree and MRS parse tree and MRS CE sentence PET parser with ERG PROLOG CE generator

  21. Feedback of domain reasoning to the parsing? • We want the domain to affect the parse, eg: • creating new lexical entries and grammar rules prior to parsing • But we also want arbitrary domain reasoning to affect the parse at runtime • Could this: • rule out inconsistent parses • provide disambiguations, and dialog context? ERG DOMAIN MODEL lexical entries, grammar rules DOMAIN REASONER ERG/PET facts constraints on linguistic phenomena

  22. Linking text to domain situations

  23. Working out the “requirer” • This can only be done by analysis of the communications as a whole (including anaphoric reference) 02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male (7115452376) in Bayaa to an unidentified male (7438604901) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 seconds STEP C STEP A • Step C needs knowledge of the structure of the report and of communications • Step A needs linguistic knowledge

  24. Example CE rules if ( the communication C has the agent A as initiator ) and ( the agent A is located in the place P ) then ( the communication C is from the place P ). if ( the mrs elementary predication EP is an instance of the mrs predicate '_in_p_rel' and has the thing T as first argument and has the thing C as second argument ) then ( the thing T is contained in the container C ). DOMAIN RULE LINGUISTIC RULE

  25. Domain Situations an agent is requested by needs a requirement has as material the material is requested from an agent has as material a production is produced by an agent has as material an agent a delivery is delivered by is delivered to an agent are these the same agent? has as material a usage is performed by an agent

  26. CE representation for parse tree

  27. Defining ERG grammar rules in CE Analysis of the rules for hd_cmp_u_c basic_head_initial := basic_binary_headed_phrase & [ HD-DTR #head, NH-DTR #non-head, ARGS < #head, #non-head > ]. headed_phrase := phrase & [ SYNSEM.LOCAL [ CAT [ HEAD head & #head, HC-LEX #hclex ], AGR #agr,CONJ #conj ], HD-DTR.SYNSEM.LOCAL local & [ CAT [ HEAD #head, HC-LEX #hclex ], AGR #agr,CONJ #conj ] ]. Ordered sequence of subcomponents, Head daughter followed by non head daughter Some info is passed up from head daughter to “this” phrase

  28. Example CE rules if ( the communication C has the agent A as initiator ) and ( the agent A is located in the place P ) then ( the communication C is from the place P ). if ( the mrs elementary predication EP is an instance of the mrs predicate '_in_p_rel' and has the thing T as first argument and has the thing C as second argument ) then ( the thing T is contained in the container C ). DOMAIN RULE LINGUISTIC RULE

  29. Calling ERG system from Word

More Related