320 likes | 493 Views
An Extended GHKM Algorithm for Inducing λ -SCFG. Peng Li pengli09@gmail.com Tsinghua University. Semantic Parsing. Mapping natural language (NL) sentence to its computable meaning representation (MR). NL: Every boy likes a star. MR:. predicate. variable. Motivation.
E N D
An Extended GHKM Algorithm forInducing λ-SCFG Peng Li pengli09@gmail.com Tsinghua University
Semantic Parsing • Mapping natural language (NL) sentence to its computable meaning representation (MR) NL: Every boy likes a star MR: predicate variable
Motivation • Common way: inducing probabilistic grammar PCFG: Probabilistic Context Free Grammar
Motivation • Common way: inducing probabilistic grammar CCG: Combinatory Categorial Grammar
Motivation • Common way: inducing probabilistic grammar SCFG: Synchronous Context Free Grammar
Motivation • State of the art: SCFG + λ-calculus (λ-SCFG) • Major challenge: grammar induction • It is much harder to find the correspondence between NL sentence and MR than between NL sentences • SCFG rule extraction is well-studied in MT • GHKM is the most widely used algorithm • We want to adapt GHKM to semantic parsing • Experimental results show that we get the state-of-the-art performance
Background • State of the art: SCFG + λ-calculus (λ-SCFG) • λ-calculus • λ-expression: • β-conversion: bound variable substitution • α-conversion: bound variable renaming
λ-SCFG Rule Extraction • Outline • Building training examples • Transforming logical forms to trees • Aligning trees with sentences • Identifying frontier nodes • Extracting minimal rules • Extracting composed rules
Building Training Examples NL: Every boy likes a star MR:
Building Training Examples boy human pop like
Building Training Examples boy human pop like Every boy likes a star
λ-SCFG Rule Extraction • Outline • Building training examples • Transforming logical forms to trees • Aligning trees with sentences • Identifying frontier nodes • Extracting minimal rules • Extracting composed rules
Modeling • Log-linear model + MERT training • Target
Experiments • Dataset: GEOQUERY • 880 English questions with corresponding Prolog logical form • Metric
Experiments CCG PCFG SCFG
Experiments • F-measure for different languages * en - English, ge - German, el - Greek, th - Thai