440 likes | 465 Views
Extreme underspecification. Using semantics to integrate deep and shallow processing. Acknowledgements. Alex Lascarides, Ted Briscoe, Simone Teufel, Dan Flickinger, Stephan Oepen, John Carroll, Anna Ritchie, Ben Waldron Deep Thought project members Cambridge Masters students …
E N D
Extreme underspecification Using semantics to integrate deep and shallow processing
Acknowledgements • Alex Lascarides, Ted Briscoe, Simone Teufel, Dan Flickinger, Stephan Oepen, John Carroll, Anna Ritchie, Ben Waldron • Deep Thought project members • Cambridge Masters students … • Other colleagues at Cambridge, Saarbrücken, Edinburgh, Brighton, Sussex and Oxford
Talk overview • Why integrate deep and shallow processing? • and why use compositional semantics? • Semantics from shallow processing • Flattening deep semantics • Underspecification • Minimal semantic units • Composition without lambdas • Integration experiments with broad-coverage systems/grammars (LinGO ERG and RASP) • How does this fit with deeper semantics?
Deep processing • Detailed, linguistically-motivated, e.g., HPSG, LFG, TAG, varieties of CG • Precise; detailed compositional semantics possible; generation as well as parsing • Some are broad coverage and fast enough for real time applications • BUT: not robust (coverage gaps, ill-formed input), too slow for IE etc, massive ambiguity
Shallow (and intermediate) processing • Shallow: e.g. POS tagging, NP chunking • Intermediate: e.g., grammars with only a POS tag lexicon (RASP) • Fast; robust; integrated stochastic techniques for disambiguation • BUT: no long-distance dependencies, allow ungrammatical input (so limitations for generation), no conventional semantics without subcategorization
Why integrate deep and shallow processing? • Complementary strengths and weaknesses • Weaknesses of each are inherent: more complexity means larger search space, greater information requirement • hand-coding vs machine learning is not the main issue – treebanking costs, sparse data problems • Lexicon is the crucial resource difference between deep and shallow approaches
Applications that may benefit from integrated approaches • Summarization: • shallow parsing to identify possible key passages, deep processing to check and combine • Email response: • deep parser uses shallow parsing for disambiguation, back off when parse failure • Information extraction: • shallow first (as summarization), named entities • Question answering: • deep parse questions, shallow parse answers
Compositional semantics as the common representation • Need a common representation language: pairwise compatibility between systems is too limiting • Syntax is theory-specific • Eventual goal should be semantics • Crucial idea: shallow processing gives underspecified semantic representation
Shallow processing and underspecified semantics • Integrated parsing: shallow parsed phrases incorporated into deep parsed structures • Deep parsing invoked incrementally in response to information needs • Reuse of knowledge sources: • domain knowledge, recognition of named entities, transfer rules in MT • Integrated generation • Formal properties clearer, representations more generally usable
Semantics from POS tagging • every_AT1 cat_NN1 chase_VVD some_AT1 dog_NN1 • _every_q(x1), _cat_n(x2sg), _chase_v(epast), _some_q(x3), _dog_n(x4sg) • Tag lexicon: AT1 _lemma_q(x) NN1 _lemma_n(xsg) VVD _lemma_v(epast)
Deep parser output • Conventional semantic representation Every dog chased some cat every(x,cat(xsg),some(ysg,dog1(ysg),chase(esp,xsg,ysg))) some(ysg,dog1(ysg),every(xsg,cat(xsg),chase(esp,xsg,ysg))) • Compositional: reflects morphology and syntax • Scope ambiguity
Modifying syntax of deep grammar semantics: overview • Underspecification of quantifier scope: in this talk, using Minimal Recursion Semantics (MRS) • Robust MRS • Separating relations • Explicit equalities • Conventions for predicate names and sense distinctions • Hierarchy of sorts on variables
Scope underspecification • Standard logical forms can be represented as trees • Underspecified logical forms are partial trees (or descriptions of sets of trees) • Constraints on scope control how trees may be reconstructed
Logical forms Generalized quantifier notation: every(x,cat(xsg),some(ysg,dog1(ysg),chase(esp,xsg,ysg))) forall x [cat(x) implies exists y [ dog1(y) and chase(e,x,y) ]] some(ysg,dog1(ysg),every(xsg,cat(xsg),chase(esp,xsg,ysg))) exists y [ dog1(y) and forall x [cat(x) implies chase(e,x,y) ]] Event variables: e.g., chase(e,x,y)
PC trees every some x y cat some dog1 every x y y chase x chase dog1 cat e x y y x e x y Every cat chased some dog
PC trees share structure every some x y cat some dog1 every x y y chase x chase dog1 cat e x y y x e x y
every x cat some x y dog1 y Bits of trees Reconstruction conditions: tree-ness variable binding chase e x y
Label nodes and holes h0 lb4:some Valid solutions: equate holes and labels y lb5:dog1 h7 lb1:every y lb2:cat x h6 h0 – hole corresponding to the top of the tree x lb3:chase e x y
e x y Maximize splitting h0 lb4:some Constraints: h8=lb5 h9=lb2 h8 y h7 lb1:every x h6 h9 lb2:cat lb3:chase lb5:dog1 y x
Notation for underspecified scope lb1:every(x,h9,h6) lb2:cat(x) lb5:dog1(y) lb4:some(y,h8,h7) lb3:chase(e,x,y) top: h0 h9=lb2 h8=lb5 MRS actually uses: h9 qeq lb2 h8 qeq lb5
Extreme underspecification • Splitting up predicate argument structure • Explicit equalities • Hierarchies for predicates and sorts • Goal is to split up semantic representation into minimal components
Separating arguments lb1:every(x,h9,h6), lb2:cat(x), lb5:dog1(y), lb4:some(y,h8,h7), lb3:chase(e,x,y), h9=lb2,h8=lb5 goes to: lb1:every(x), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat(x), lb5:dog1(y), lb4:some(y), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase(e),ARG1(lb3,x),ARG2(lb3,y), h9=lb2,h8=lb5
Explicit equalities lb1:every(x1), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat(x2), lb5:dog1(x4), lb4:some(x3), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase(e),ARG1(lb3,x2),ARG2(lb3,x4), h9=lb2,h8=lb5,x1=x2,x3=x4
Naming conventions lb1:_every_q(x1sg),RSTR(lb1,h9),BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg),RSTR(lb4,h8),BODY(lb4,h7), lb3:_chase_v(esp),ARG1(lb3,x2sg),ARG2(lb3,x4sg) h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg
POS output as underspecification DEEP – lb1:_every_q(x1sg), RSTR(lb1,h9), BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg), RSTR(lb4,h8), BODY(lb4,h7),lb3:_chase_v(esp), ARG1(lb3,x2sg),ARG2(lb3,x4sg), h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg POS – lb1:_every_q(x1), lb2:_cat_n(x2sg), lb3:_chase_v(epast), lb4:_some_q(x3), lb5:_dog_n(x4sg)(as previous slide but added labels)
POS output as underspecification DEEP – lb1:_every_q(x1sg), RSTR(lb1,h9),BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg), RSTR(lb4,h8), BODY(lb4,h7),lb3:_chase_v(esp), ARG1(lb3,x2sg),ARG2(lb3,x3sg), h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg POS – lb1:_every_q(x1), lb2:_cat_n(x2sg), lb3:_chase_v(epast), lb4:_some_q(x3), lb5:_dog_n(x4sg)
Hierarchies • esp (simple past) is defined a subtype of epast • in general, hierarchy of sorts defined as part of the semantic interface (SEM-I) • dog_n_1 is a subtype of dog_n • by convention, lemma_POS_sense is a subtype of lemma_POS
Extreme Underspecification • Factorize deep representation to minimal units • Only represent what you know for each type of processor • Compatibility: • Sorts and (some) closed class word information in SEM-I for consistency • No lexicon for shallow processing (apart from POS tags)
Semantics from RASP • RASP: robust, domain-independent, statistical parsing (Briscoe and Carroll) • can’t produce conventional semantics because no subcategorization • can sometimes identify arguments: • S -> NP VP NP supplies ARG1 for V • partial identification: • VP -> V NP • S -> NP S NP might be ARG2 or ARG3
Underspecification of arguments ARGN ARG1or2 ARG2or3 ARG1 ARG2 ARG3 RASP arguments can be specified as ARGN, ARG2or3 etc Also useful for Japanese deep parsing?
Software etc • Open Source LinGO English Resource Grammar (ERG) • LKB system: parsing and generation, now includes MRS-RMRS interconversion • RMRS output as XML • RMRS comparison • Preliminary RASP-RMRS • First version of SEM-I
Composition without lambdas • Formalized, consistent composition • integration at subsentential level • standardization • Traditional lambda calculus unsuitable • Doesn’t allow underspecification • Syntactic requirements mixed up with the semantics • Algebra is rational reconstruction of a feature structure approach to composition
Lexicalized composition [h,e1], {[h3,x]subj }, {h:_probably(h2), h3:_sleep(e), arg1(h3,x)}, {e1=e},{h2 qeq h3} • hook: externally accessible information • slots: when functor, slot is equated with argument hook • relations: accumulated monotonically • equalities: record hook-slot equations (not shown from now on) • scope constraints: (ignored from now on)
probably sleeps [h3,e], {[h3,x]subj}, {h3:_sleep(e), ARG1(h3,x)} sleeps [h,e1], {[h2,e1]mod}, {h:_probably(h2)} probably Syntax defines probably as semantic head, composition using mod slot [h,e1], {[h3,x]subj},{h:_probably(h3), h3:_sleep(e1), arg1(h3,x)} probably sleeps
Non-lexicalized grammars • Lexicalized approach is a rational reconstruction of semantic composition in the ERG (Copestake et al, 2001) • Without lexical subcategorization, rely on grammar rules to provide the ARGs • `anchors’ rather than slots, to ground the ARGs (single anchor for RASP)
Some cat sleeps (in RASP) [h3,e], <h3>, {h3:_sleep(e)} sleeps [h,x], <h1>, {h1:_some(x),RSTR(h1,h2),h2:_cat(x)} some cat S->NP VP: Head=VP, ARG1(<VP anchor>,<NP hook.index>) [h3,e], <h3>, {h3:_sleep(e), ARG1(h3,x), h1:_some(x),RSTR(h1,h2),h2:_cat(x)} some cat sleeps
Deep Thought • Saarbrücken, Sussex, Cambridge, NTNU, Xtramind, CELI • Objectives: demonstrate utility of deep processing in IE and email response • German, Norwegian, Italian and English • October 2002 – October 2004
Integrated IE: a scenario • Example: I don’t like the PBX 30 • Shallow processing finds interesting sentences • Named entity system isolates entities • h1:name(x,”PBX-30”) • Deep processor identifies relationships, modals, negation etc • h2:neg(h3), h3:_like(y,x), h3:name(x,”PBX-30”)
Some issues • `shallow’ processors can sometimes be deeper: e.g. h1:model-name(x,”PBX-30”) • Compatibility and standardization: defining SEM-I (semantic interface) • Limits on compatibility: e.g., causative- inchoative • Efficiency of comparison: indexing representations by character position
The bigger picture ... • `deep’ processing reflects syntax and morphology but limited lexical semantics • conventional vs predictable: • count/mass: lentils/rice, furniture, lettuce • adjectives: heavy defeat, ?large problem • prepositions and particles: up
Incremental development of wide-coverage semantics • corpus-based acquisition techniques: shallow processing • eventual integration with deep processing • statistical model of predicates: e.g., large_j_rel pointer to vector space • logic isn’t enough but is needed
Conclusion some every y dog1 every x cat some x y chase cat y x chase dog1 y x e x y e x y lb1:every(x), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat(x), lb5:dog1(y), lb4:some(y), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase(e),ARG1(lb3,x), ARG2(lb3,y), h9=lb2,h8=lb5
Conclusion: extreme underspecification • Split up information content as much as possible • Accumulate information by simple operations • Don’t represent what you don’t know but preserve everything you do know • Use a flat representation to allow pieces to be accessed individually