1 / 23

Grammar Engineering: Parsing with HPSG Grammars

Grammar Engineering: Parsing with HPSG Grammars. Miguel Hormazábal. Overview. The Parsing Problem Parsing with constraint-based grammars Advantages and drawbacks Three different approaches. The Parsing Problem. Given a Grammar and a Sentence,

kory
Download Presentation

Grammar Engineering: Parsing with HPSG Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grammar Engineering:Parsing with HPSG Grammars Miguel Hormazábal

  2. Overview • The Parsing Problem • Parsing with constraint-based grammars • Advantages and drawbacks • Three different approaches

  3. The Parsing Problem • Given a Grammar and a Sentence, • Can the < S, Θ> generate / rule out the input String ? • A candidate sentence must satisfy all the principles of the Grammar • Coreferences as main explanatory mechanism in HPSG

  4. Parsing with Constraint-based Grammars • Object-based formalism • Complex specifications on signs • Structure sharing imposed by the theory • Feature Structures • Sort resolved and well typed • Multiple information levels (PHON, SYNSEM) • Universal / Language specific principles to be met

  5. Advantages and Drawbacks Pros: • A common formalism for all levels of linguistic Information • All information simultaneously available Cons: • Hard to modularize • Computational overhead for parser

  6. 1st Approach: Distributed Parsing • Two kind of constraints: • Genuine: syntactic, they work as filters of the input • Spurious: semantic, they build representational structures • Parser cannot distinguish between analytical and structure-building constraints • VERBMOBIL implementation: • Input: word lattices of speech recognition hypotheses • Parser identifies those paths of acceptable utterances • Lattices can contain hundreds of hypotheses, most ungrammatical • Goal: Distribute the labour of evaluating the constrains in the grammar on several processes

  7. Distributed Parsing • Analysis strategy: Two parser units: • SYN-Parser: • Works directly with word lattices • Performs as a filter for the SEM-Parser • SEM-Parser: • Works only with successful analysis results • Performs under control by the SYN-Parser

  8. Distributed Parsing • Processing requirements: • Incrementality: • The SYN-Parser must NOT send its results only when it has complete analysis, forcing the SEM-Parser to wait • Interactivity: • The SYN-Parser must report back when its hypothesis failed • Efficient communication system between the parsers, based on the common grammar

  9. Distributed Parsing • Centralized Parsing • Distributed Parsing

  10. Distributed Parsing • Bottom-Up Hypotheses • Emitted by the SYN-Parser and sent to SEM-Parser, for semantic verification • Top-Down Hypotheses • Emitted by the SEM-Parser, failures reported back to SYN-Parser • Completion History C-hist(NP-DET-N) := ((DET t0 t1) (N t’1 t2)) C-hist(det) := ((“the” t0 t1)) C-hist(N) := ((“example” t’1 t2))

  11. Distributed Parsing • Compilation of Subgrammars • From common source Grammar, • Straightforward option: split up the Grammar into syntax and semantics strata • Manipulating grammar rules and lexical entries to obtain: Gsyn and Gsem

  12. 2nd Approach: Data-Oriented Parsing • Main goal: achieve domain adaptation to improve efficiency of HPSG parsing • Assumption: frequency and plausibility of linguistic structures within a certain domain, will render better results • DOP process new input by combining structure fragments from a Treebank • DOP allows to assign probabilities to arbitrarily large syntactic constructions

  13. Data-Oriented Parsing Procedure: • Parse all sentences from a training corpus using HPSG Grammar and Parser • Automatic acquisition of a stochastic lexicalized tree grammar (SLTG) • Each parse tree is decomposed into a set of subtrees. • Assignment of probabilities to each subtree

  14. Data-Oriented Parsing • Implementation using unification-based Grammar, parsing and generation platform: LKB • First parse each sentence of the training corpus • The resulting Feature Structure contains the parse tree • Each non-terminal node contains the label of the HPSG-rule schema applied • Each terminal node contains lexical type of the corresponding feature structure • After this, each parse tree is further processed

  15. Data-Oriented Parsing • 1. Decomposition, two operations: • Root  creates ‘passive’ (closed, complete) fragments by extracting substructures • Frontier  creates ‘active’ (open, incomplete) fragments by deleting pieces of substructure • Each non-head subtree is cut off, and the cutting point is marked for substitution.

  16. Data-Oriented Parsing • 2. Specialization • Rule labels of root node and substitution nodes are replaced with a corresponding category label. Example: signs with local.cat.head value of type noun, and local. cat.val.subj feature the empty list, are classified as NPs. • 3. Probability • Count total number n of all trees with same root label α • Divide frequency number m of a tree t with root α by n  p(t) • The sum of all probabilities of trees ti with root α  1 Σti: root(ti) = α p(ti) = 1

  17. Data-Oriented Parsing • This implementation for the VerbMobil project uses a chart-based agenda-driven bottom-up parser • Step 1: Selection of a set of SLTG-trees associated with the lexical items in the input sentence • Step 2: Parsing of the sentence with respect to this set. • Step 3: Each SLTG-parse tree is “expanded” by unifying the feature constraints into the parse trees • If successful, complete valid feature structure • Else, next most likely tree is expanded

  18. 3rd Approach: Probabilistic CFG Parsing • Main goal: to obtain the Viterbi parse (highest probability) given an HPSG and a probabilistic model • One way: • Parse input without using probabilities • Then select most probable parse looking at every result • Cost: Exponential search space • This Approach: • Define equivalence class function (F.S. reduction) • Integrate SEM and SYN preference into Figures Of Merit (FOMs)

  19. Probabilistic CFG Parsing • Probabilistic Model: • HPSG Grammar: G = < L, R >, where L = { l = < w, F > | wЄW, F ЄF } set of lexical entries • Ris a set of grammar rules, i.e., r Є R is a partial function: F x F -> F

  20. Probabilistic CFG Parsing • Probabilistic HPSG: Probability p(F | w) of F.S. Assign to given sentence: Where λiis a model parameter, si is a fragment of a F.S., and σ (si , F)is a function of N of appearences of F.S. fragment si in F • Probabilities represent syntactic/semantic preferences expressed in a Feature Structure

  21. Probabilistic CFG Parsing • Implementation: Iterative CYK parsing algorithm • Pruning edges during parsing • Best N parses are tracked • Reduced F.S.E though equivalence classes • Requires not over/undergenerate • FOMs computed with reduced F.S. Equivalent to original • Parser calculates Viterbi, taking maximum of probabilities of the same non terminal symbol at each point

  22. Assessment • The three approaches attempt to achieve a higher efficiency of the Parsing process Distributed Parsing • Distributed Parsing:  Unification and copying faster  Soundness of Grammar affected  L(G) ⊂ L(Gsyn) ∩L(Gsem) • DO Parsing  Fragment at the right level of generality  Straightforward Probability computation • PCFG Parsing  Highly efficient CYK parsing implementation trough reduced FS and edge pruning

  23. References • Pollard, C. and Sag, I. A. (1994). Head-Driven Phrase Structure Grammar . Chicago, IL: University of Chicago Press. • Richter, F. (2004b). A Web-based Course in Grammar Formalisms and Parsing. Textbook, MiLCA project A4, SfS, Universit¨at T¨ubingen. http://milca.sfs.uni-tuebingen.de/A4/Course/PDF/gramandpars.pdf. • Levine Robert, and Meurers Detmar. Head-Driven Phrase Structure Grammar: Linguistic Approach, Formal Foundations, and Computational Realization In Keith Brown (Ed.): Encyclopedia of Language and Linguistics, Second Edition. Oxford: Elsevier. 2006. • Abdel Kader Diagne, Walter Kasper, and Hans-Ulrich Krieger. (1995). Distributed Parsing With HPSG Grammars. In Proceedings of the 4th International Workshop on Parsing Technologies, IWPT-95, pages 79–86. • Neumann, G.HPSG-DOP: data-oriented parsing with HPSG. In: Unpublished manuscript, presented at the 9th Int. Conf. on HPSG, HPSG-2002, Seoul, South Korea (2002) • Tsuruoka Yoshimasa, Miyao Yusuke, and Tsujii Jun'ichi. 2003. Towards efficient probabilistic HPSG parsing: integrating semantic and syntactic preference to guide the parsing. Proceedings of IJCNLP-04 Workshop: Beyond shallow analyses - Formalisms and statistical modeling for deep analyses.

More Related