Learning to Transform Natural to Formal Languages

Learning to Transform Natural to Formal Languages Rohit J. Kate Yuk Wah Wong Raymond J. Mooney July 13, 2005

Introduction • Semantic Parsing: Transforming natural language sentences into executablecomplete formal representations • Different from Semantic Role Labeling which involves only shallow semantic analysis • Two application domains: • CLang: RoboCup Coach Language • GeoQuery: A Database Query Application

CLang: RoboCup Coach Language • In RoboCup Coach competition teams compete to coach simulated players • The coaching instructions are given in a formal language called CLang If the ball is in our penalty area, then all our players except player 4 should stay in our half. Simulated soccer field Coach Semantic Parsing ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) CLang

GeoQuery: A Database Query Application • Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996] How many cities are there in the US? User Semantic Parsing answer(A, count(B, (city(B), loc(B, C), const(C, countryid(USA))),A)) Query

Outline • Semantic Parsing using Transformation Rules • Learning Transformation Rules • Experiments • Conclusions

Semantic Parsing using Transformation Rules • SILT(Semantic Interpretation by Learning Transformations) • Uses pattern-based transformation rules which map natural language phrases to formal language constructs • Transformation rules are repeatedly applied to the sentence to construct its formal language expression

RULE CONDITION DIRECTIVE bowner TEAM UNUM do TEAM UNUM ACTION our our 4 shoot 4 Formal Language Grammar NL: If our player 4 has the ball, our player 4 should shoot. CLang: ((bowner our {4}) (do our {4} shoot)) CLang Parse: • Non-terminals: RULE, CONDITION, ACTION… • Terminals: bowner, our, 4… • Productions:RULE  CONDITION DIRECTIVE DIRECTIVE  do TEAM UNUM ACTION ACTION  shoot

S VP NP NP VBZ TEAM UNUM has DT NN the ball Transformation Rule Representation • Rule has two components: a natural language pattern and an associated formal language template • Two versions of SILT: • String-based rules: used to convert natural language sentence directly to formal language • Tree-based rules: used to convert syntactic tree to formal language word gap

Example of Semantic Parsing If ourplayer 4 has the ball, our player 4 should shoot.

TEAM TEAM our our Example of Semantic Parsing If player 4 has the ball, player 4 should shoot . our our

TEAM TEAM our our Example of Semantic Parsing If player 4 has the ball, player 4 should shoot .

UNUM UNUM TEAM TEAM our our 4 4 Example of Semantic Parsing If has the ball, should shoot . player 4 player 4

UNUM UNUM TEAM TEAM our our 4 4 Example of Semantic Parsing If has the ball, should shoot .

ACTION UNUM UNUM TEAM TEAM shoot our our 4 4 Example of Semantic Parsing If has the ball, should . shoot

ACTION UNUM UNUM TEAM TEAM shoot our our 4 4 Example of Semantic Parsing If has the ball, should .

CONDITION ACTION UNUM UNUM TEAM TEAM (bowner our {4}) shoot our our 4 4 Example of Semantic Parsing If , should . has the ball

CONDITION ACTION UNUM TEAM (bowner our {4}) shoot our 4 Example of Semantic Parsing If , should .

CONDITION DIRECTIVE ACTION UNUM TEAM (do our {4} shoot) (bowner our {4}) shoot our 4 Example of Semantic Parsing If , . should

CONDITION DIRECTIVE (do our {4} shoot) (bowner our {4}) Example of Semantic Parsing If , .

CONDITION DIRECTIVE RULE ((bowner our {4}) (do our {4} shoot)) (do our {4} shoot) (bowner our {4}) Example of Semantic Parsing If , .

Learning Transformation Rules • SILT induces rules from a corpora of NL sentences paired with their formal representations • Patterns are learned for each production by bottom-up rule learning • For every production: • Call those sentences positives whose formal representations’ parses use that production • Call the remaining sentences negatives

Rule Learning for a Production • SILT applies greedy-covering, bottom-up rule induction method that repeatedly generalizes positives until they start covering negatives CONDITION (bpos REGION) positives negatives • The ball is in REGION , our player 7 is in REGION and no opponent is around our player 7 within 1.5 distance. • If the ball is in REGION and not in REGION then player 3 should intercept the ball. • During normal play if the ball is in the REGION then player 7 , 9 and 11 should dribble the ball to the REGION . • When the play mode is normal and the ball is in the REGION then our player 2 should pass the ball to the REGION . • All players except the goalie should pass the ball to REGION if it is in RP18. • If the ball is inside rectangle ( -54 , -36 , 0 , 36 ) then player 10 should position itself at REGION with a ball attraction of REGION . • Player 2 should pass the ball to REGION if it is in REGION . • If our player 6 has the ball then he should take a shot on goal. • If player 4 has the ball , it should pass the ball to player 2 or 10. • If the condition DR5C3 is true , then player 2 , 3 , 7 and 8 should pass the ball to player 3. • During play on , if players 6 , 7 or 8 is in REGION , they should pass the ball to players 9 , 10 or 11. • If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball REGION . • If it is before the kick off , after our goal or after the opponent's goal , position player 3 at REGION . • If the condition MDR4C9 is met , then players 4-6 should pass the ball to player 9. • If Pass_11 then player 11 should pass to player 9 and no one else.

Generalization of String Patterns ACTION  (pos REGION) Pattern 1:Always position player UNUM at REGION . Pattern 2: Whenever the ball is in REGION, position player UNUM near the REGION . Find the highest scoring common subsequence:

Generalization of String Patterns ACTION  (pos REGION) Pattern 1:Always position player UNUM at REGION. Pattern 2: Whenever the ball is in REGION, position player UNUM near the REGION. Find the highest scoring common subsequence: Generalization:position player UNUM [2] REGION .

Generalization of Tree Patterns REGION  (penalty-area TEAM) Pattern 1: Pattern 2 Find common subgraphs. NP NP PRP$ NN NN NP NN NN penalty area TEAM penalty TEAM POS box ’s

Generalization of Tree Patterns REGION  (penalty-area TEAM) Pattern 1: Pattern 2 Find common subgraphs. NP NP PRP$ NN NN NP NN NN penalty area TEAM penalty TEAM POS box ’s NP * NN NN Generalization: TEAM penalty

Rule Learning for a Production CONDITION  (bpos REGION) positives negatives • The ball is in REGION , our player 7 is in REGION and no opponent is around our player 7 within 1.5 distance. • If the ball is in REGION and not in REGION then player 3 should intercept the ball. • During normal play if the ball is in the REGION then player 7 , 9 and 11 should dribble the ball to the REGION . • When the play mode is normal and the ball is in the REGION then our player 2 should pass the ball to the REGION . • All players except the goalie should pass the ball to REGION if it is in REGION. • If the ball is inside REGION then player 10 should position itself at REGION with a ball attraction of REGION . • Player 2 should pass the ball to REGION if it is in REGION . • If our player 6 has the ball then he should take a shot on goal. • If player 4 has the ball , it should pass the ball to player 2 or 10. • If the condition DR5C3 is true , then player 2 , 3 , 7 and 8 should pass the ball to player 3. • During play on , if players 6 , 7 or 8 is in REGION , they should pass the ball to players 9 , 10 or 11. • If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball REGION . • If it is before the kick off , after our goal or after the opponent's goal , position player 3 at REGION . • If the condition MDR4C9 is met , then players 4-6 should pass the ball to player 9. • If Pass_11 then player 11 should pass to player 9 and no one else. Bottom-up Rule Learner

Rule Learning for a Production CONDITION  (bpos REGION) positives negatives • The CONDITION , our player 7 is in REGION and no opponent is around our player 7 within 1.5 distance. • If the CONDITION and not in REGION then player 3 should intercept the ball. • During normal play if the CONDITION then player 7 , 9 and 11 should dribble the ball to the REGION . • When the play mode is normal and the CONDITION then our player 2 should pass the ball to the REGION . • All players except the goalie should pass the ball to REGION if CONDITION. • If the CONDITION then player 10 should position itself at REGION with a ball attraction of REGION . • Player 2 should pass the ball to REGION if CONDITION . • If our player 6 has the ball then he should take a shot on goal. • If player 4 has the ball , it should pass the ball to player 2 or 10. • If the condition DR5C3 is true , then player 2 , 3 , 7 and 8 should pass the ball to player 3. • During play on , if players 6 , 7 or 8 is in REGION , they should pass the ball to players 9 , 10 or 11. • If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball REGION . • If it is before the kick off , after our goal or after the opponent's goal , position player 3 at REGION . • If the condition MDR4C9 is met , then players 4-6 should pass the ball to player 9. • If Pass_11 then player 11 should pass to player 9 and no one else. Bottom-up Rule Learner

accuracy coverage Rule Learning for All Productions • Transformation rules for productions should cooperate globally to generate complete semantic parses • Redundantly cover every positive example by β = 5 best rules • Find the subset of these rules which best cooperate to generate complete semantic parses on the training data

Experimental Corpora • CLang • 300 randomly selected pieces of coaching advice from the log files of the 2003 RoboCup Coach Competition • 22.52 words on average in NL sentences • 14.24 tokens on average in formal expressions • GeoQuery [Zelle & Mooney, 1996] • 250 queries for the given U.S. geography database • 6.87 words on average in NL sentences • 5.32 tokens on average in formal expressions

Experimental Methodology • Evaluated using standard 10-fold cross validation • Syntactic parses needed by tree-based version were obtained by training Collins’ parser [Bikel, 2004] on WSJ treebank and gold-standard parses of training sentences • Correctness • CLang: output exactly matches the correct representation • Geoquery: the resulting query retrieves the same answer as the correct representation • Metrics

Compared Systems • CHILL • Learns control rules for shift-reduce parsing using Inductive Logic Programming (ILP) • CHILLIN [Zelle & Mooney, 1996] • COCKTAIL [Tang & Mooney, 2001] • GEOBASE • Hand-built parser for GeoQuery [Borland International, 1988]

Precision Learning Curves for CLang

Recall Learning Curves for CLang

Precision Learning Curves for GeoQuery

Recall Learning Curves for GeoQuery

Related Work • SCISSOR [Ge & Mooney, 2005] • Integrates semantic and syntactic statistical parsing • Requires extensive annotations but gives better results • PRECISE [Popescu et al., 2003] • Designed to work specially on NL database interfaces • Speech Recognition Community [Zue & Glass, 2000] • Simpler queries in ATIS corpus

Conclusions • New approach for semantic parsing, SILT, which uses transformation rules • SILT learns transformation rules by doing bottom-up rule induction exploiting the target language grammar • Tested on two very different domains, performs better than previous ILP-based approaches

Thank You! Our corpora can be downloaded from: http://www.cs.utexas.edu/~ml/nldata.html Questions??

F-measure Learning Curves for CLang

F-measure Learning Curves for GeoQuery

Extra Slide: Average Training Time in Minutes

Extra Slide: Variations of Rule Representation • Context in the patterns:

CONDITION (bpos REGION) Extra Slide: Variations of Rule Representation • Context in the patterns: TEAM UNUMhas the ball inREGION

Extra Slide: Variations of Rule Representation • Context in the patterns: • Templates with multiple productions:

Extra Slide: Experimental Methodology • Correctness • CLang: output exactly matches the correct representation • Geoquery: the resulting query retrieves the same answer as the correct representation If the ball is in our penalty area, all our players except player 4 should stay in our half. Correct: ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) ((bpos (penalty-area opp)) (do (player-except our{4}) (pos (half our))) Output:

Extra Slide: Future Work • Hard-matching symbolic patterns are sometimes too brittle, exploit string and tree kernels as classifiers [Lodhi et al., 2002] • Unified implementation of string and tree-based versions for direct comparisons

Learning to Transform Natural to Formal Languages

Learning to Transform Natural to Formal Languages

Presentation Transcript

Languages: Natural and Formal

From Natural Language to LTL: Difficulties Capturing Natural Language Specification in Formal Languages for Automatic An

CSI 3104 /Winter 2007: Introduction to Formal Languages

Faculty/IT Partnering to Transform Learning

Formal Languages

Introduction to Formal Languages

CS 3813: Introduction to Formal Languages and Automata

Formal languages

CS 203: Introduction to Formal Languages and Automata

CSE202: Introduction to Formal Languages and Automata Theory

Unsupervised learning of Natural languages

Semi-Supervised Approaches for Learning to Parse Natural Languages

Applying Learning Technique to Formal Verification

CS 3813: Introduction to Formal Languages and Automata

CS 3813: Introduction to Formal Languages and Automata

CSE202: Introduction to Formal Languages and Automata Theory

Faculty/IT Partnering to Transform Learning

Formal Semantic of Natural Languages Focus: Vagueness

Formal Languages

CSE 3813 Introduction to Formal Languages and Automata

Formal Semantic of Natural Languages Focus: Vagueness

Semi-Supervised Approaches for Learning to Parse Natural Languages