1 / 31

RMRS

Explore the integration of deep and shallow processing systems by utilizing underspecified semantics and common representations for enhanced language understanding and generation. Learn about current work and future plans in linking domain-dependent and domain-independent processing. Discover the benefits of integrated parsing and deep semantics for various applications such as question answering and information extraction.

bermane
Download Presentation

RMRS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RMRS some background and current work

  2. Talk overview • RMRS: integrating processors via semantics • Underspecified semantics from shallow processing • Integration experiments with broad-coverage systems/grammars (LinGO ERG and RASP) • Planned work

  3. Integrating processing • No single system can do everything: deep and shallow processing have inherent strengths and weaknesses • Domain-dependent and domain-independent processing must be linked • Parsers and generators • Common representation for processing `above sentence level’ (e.g., anaphora)

  4. Compositional semantics as a common representation • Need a common representation language for systems: pairwise compatibility between systems is too limiting • Syntax is theory-specific and unnecessarily language-specific • Eventual goal should be semantics • Core idea: shallow processing gives underspecified semantic representation, so deep and shallow systems can be integrated • Full interlingua / common lexical semantics is too difficult (certainly currently), but can link predicates to ontologies, etc.

  5. Shallow processing and underspecified semantics • Integrated parsing: shallow parsed phrases incorporated into deep parsed structures • Deep parsing invoked incrementally in response to information needs • Reuse of knowledge sources: • domain knowledge, recognition of named entities, transfer rules in MT • Integrated generation • Formal properties clearer, representations more generally usable • Deep semantics taken as normative

  6. RMRS approach: current and planned applications • Question answering: • Cambridge CSTIT: deep parse questions, shallow parse answers • QA from structured knowledge: Frank et al • Information extraction: • Deep Thought • Chemistry texts (SciBorg (?)) • Dictionary definition parsing for Japanese and English • Bond and Flickinger • Rhetorical structure, multi-document summarization, email response ... • also LOGON: semantic transfer. MRSs from LFG used in HPSG generator.

  7. RMRS: Extreme underspecification • Goal is to split up semantic representation into minimal components (cf Verbmobil VITs) • Scope underspecification (MRS) • Splitting up predicate argument structure • Explicit equalities • Hierarchies for predicates and sorts • Compatibility with deep grammars: • Sorts and (some) closed class word information in SEM-I (API for grammar, more later) • No lexicon for shallow processing (apart from POS tags and possibly closed class words)

  8. RMRS principles • Split up information content as much as possible • Accumulate information monotonically by simple operations • Don’t represent what you don’t know but preserve everything you do know • Use a flat representation to allow pieces to be accessed individually

  9. Separating arguments lb1:every(x,h9,h6), lb2:cat(x), lb5:dog1(y), lb4:some(y,h8,h7), lb3:chase(e,x,y), h9=lb2,h8=lb5 goes to: lb1:every(x), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat(x), lb5:dog1(y), lb4:some(y), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase(e),ARG1(lb3,x),ARG2(lb3,y), h9=lb2,h8=lb5

  10. Naming conventions:predicate names without a lexicon lb1:_every_q(x1sg),RSTR(lb1,h9),BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg),RSTR(lb4,h8),BODY(lb4,h7), lb3:_chase_v(esp),ARG1(lb3,x2sg),ARG2(lb3,x4sg) h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg

  11. POS output as underspecification DEEP – lb1:_every_q(x1sg), RSTR(lb1,h9), BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg), RSTR(lb4,h8), BODY(lb4,h7),lb3:_chase_v(esp), ARG1(lb3,x2sg),ARG2(lb3,x4sg), h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg POS – lb1:_every_q(x1), lb2:_cat_n(x2sg), lb3:_chase_v(epast), lb4:_some_q(x3), lb5:_dog_n(x4sg)

  12. POS output as underspecification DEEP – lb1:_every_q(x1sg), RSTR(lb1,h9),BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg), RSTR(lb4,h8), BODY(lb4,h7),lb3:_chase_v(esp), ARG1(lb3,x2sg),ARG2(lb3,x3sg), h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg POS – lb1:_every_q(x1), lb2:_cat_n(x2sg), lb3:_chase_v(epast), lb4:_some_q(x3), lb5:_dog_n(x4sg)

  13. Semantics from RASP • RASP: robust, domain-independent, statistical parsing (Briscoe and Carroll) • can’t produce conventional semantics because no subcategorization • can often identify arguments: • S -> NP VP NP supplies ARG1 for V • potential for partial identification: • VP -> V NP • S -> NP S NP might be ARG2 or ARG3

  14. Underspecification of arguments ARGN ARG1or2 ARG2or3 ARG1 ARG2 ARG3 RASP arguments can be specified as ARGN, ARG2or3 etc Also useful for Japanese deep parsing?

  15. RMRS construction • ERG etc – uses MRS -> RMRS converter • argument splitting etc • also RMRS -> MRS conversion • POS-RMRS: tag lexicon • RASP-RMRS: tag lexicon plus semantic rules associated with RASP rules to match ERG • defaults when no rule RMRS specified

  16. RMRS composition with non-lexicalized grammars • MRS composition assumes a lexicalized approach: algebra defined in Copestake, Lascarides and Flickinger (2001) • RMRS with non-lexicalised grammars: has similar basic algebra • without lexical subcategorization, rely on grammar rules to provide the ARGs • `anchors’ rather than slots, to ground the ARGs (single anchor for RASP) • developed on basis of semantic test suite • most rules written by Anna Ritchie

  17. Some cat sleeps (in RASP) [h3,e], <h3>, {h3:_sleep(e)} sleeps [h,x], <h1>, {h1:_some(x),RSTR(h1,h2),h2:_cat(x)} some cat S->NP VP: Head=VP, ARG1(<VP anchor>,<NP hook.index>) [h3,e], <h3>, {h3:_sleep(e), ARG1(h3,x), h1:_some(x),RSTR(h1,h2),h2:_cat(x)} some cat sleeps

  18. Real rule ... <!--rule> <name>S/np_vp</name> <dtrs><dtr>NP</dtr><dtr>VP</dtr></dtrs> <head>RULE</head> <semstruct> <hook><index>E</index><label>H1</label></hook> <slots><noanchor/></slots> <ep><gpred>PRPSTN_M_REL</gpred><label>H1</label><var>H2</var></ep> <rarg><rargname>ARG1</rargname><label>H3</label><var>X</var></rarg> <hcons hreln='qeq'><hi><var>H2</var></hi><lo><var>H</var></lo></hcons> </semstruct> <equalities><rv>X</rv><dh><dtr>NP</dtr><he>INDEX</he></dh></equalities> <equalities><rv>H</rv><dh><dtr>VP</dtr><he>LABEL</he></dh></equalities> <equalities><rv>H3</rv><dh><dtr>VP</dtr><he>ANCHOR</he></dh></equalities> <equalities><rv>E</rv><dh><dtr>VP</dtr><he>INDEX</he></dh></equalities> </rule-->

  19. ERG-RMRS / RASP-RMRS

  20. Inchoative

  21. Infinitival subject (unbound in RASP-RMRS)

  22. Ditransitive: missing ARG3

  23. Mismatch: Expletive it

  24. Mismatch: larger numbers

  25. Comments on RASP-RMRS • Fast enough (not significant compared to RASP processing time because no ambiguity) • Too many RASP rules! Need to generalise over classes. • Requires SEM-I – API for MRS/RMRS from deep grammar • RASP and ERG may change: • compatible test suites – semi-automatic rule update? • alternative technique for composition? • Parse selection – need to generalise over RMRSs • weighted intersections of RMRSs (cf RASP grammatical relations)

  26. SEM-I: semantic interface • Meta-level: manually specified `grammar’ relations (constructions and closed-class) • Object-level: linked to lexical database for deep grammars • Object-level SEM-I auto-generated from expanded lexical entries in deep grammars (because type can contribute relations) • Validation of other lexicons • Need closed class items for RMRS construction from shallow processing

  27. Alignment and XML • Comparing RMRSs for same text efficiently uses characterization • labels RMRSs according to their source in the text • currently characters, but byte offset? Japanese etc? • RMRS-XML • RMRS seen as levels of mark-up: standoff annotation

  28. SciBorg: Chemistry texts • eScience project starting in October at Cambridge • Computer Laboratory (Copestake, Teufel), Chemistry (Murray-Rust), CeSC (Parker) • Aims: • Develop an NL markup language which will act as a platform for extraction of information. Link to semantic web languages. • Develop IE technology and core ontologies for use by publishers, researchers, readers, vendors and regulatory organisations. • Model scientific argumentation and citation purpose in order to support novel modes of information access. • Demonstrate the applicability of this infrastructure in a real-world eScience environment.

  29. Research markup • Chemistry: The primary aims of the present study are (i) the synthesis of an amino acid derivative that can be incorporated into proteins /via/ standard solid-phase synthesis methods, and (ii) a test of the ability of the derivative to function as a photoswitch in a biological environment. • Computational Linguistics: The goal of the work reported here is to develop a method that can automatically refine the Hidden Markov Models to produce a more accurate language model.

  30. RMRS and research markup • Specify cues in RMRS • Deep process cues: feasible because domain-independent • more general and reliable than shallow techniques • allows for complex interrelationships • Use zones for advanced citation maps and other enhancements to repositories

  31. Conclusions • RMRS: semantic representation language allowing linking of deep and shallower processors • RMRS construction: phrase-level compatibility between processors • Many potential applications

More Related