800 likes | 982 Views
Using String-Kernels for Learning Semantic Parsers. Rohit J. Kate Raymond J. Mooney. Semantic Parsing. Semantic Parsing : Transforming natural language (NL) sentences into computer executable complete meaning representations (MRs) for some application Example application domains
E N D
Using String-Kernels for Learning Semantic Parsers Rohit J. Kate Raymond J. Mooney
Semantic Parsing • Semantic Parsing: Transforming natural language (NL) sentences into computer executablecomplete meaning representations (MRs) for some application • Example application domains • CLang: Robocup Coach Language • Geoquery: A Database Query Application
CLang: RoboCup Coach Language • In RoboCup Coach competition teams compete to coach simulated players [http://www.robocup.org] • The coaching instructions are given in a formal language called CLang [Chen et al. 2003] If the ball is in our goal area then player 1 should intercept it. Simulated soccer field Semantic Parsing (bpos (goal-area our) (do our {1} intercept)) CLang
Geoquery: A Database Query Application • Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996] Which rivers run through the states bordering Texas? Arkansas,Canadian,Cimarron, Gila,Mississippi, RioGrande … Answer Semantic Parsing Query answer(traverse(next_to(stateid(‘texas’)))) answer(traverse(next_to(stateid(‘texas’)))) answer(traverse(next_to(stateid(‘texas’))))
Learning Semantic Parsers • We assume meaning representation languages (MRLs) have deterministic context free grammars • True for almost all computer languages • MRs can be parsed unambiguously
ANSWER RIVER answer TRAVERSE STATE traverse NEXT_TO STATE next_to STATEID ‘texas’ stateid NL: Which rivers run through the states bordering Texas? MR: answer(traverse(next_to(stateid(‘texas’)))) Parse tree of MR: Non-terminals: ANSWER, RIVER, TRAVERSE, STATE, NEXT_TO, STATEID Terminals: answer, traverse, next_to, stateid, ‘texas’ Productions: ANSWER answer(RIVER), RIVER TRAVERSE(STATE), STATE NEXT_TO(STATE), TRAVERSE traverse, NEXT_TO next_to, STATEID ‘texas’
Learning Semantic Parsers • Assume meaning representation languages (MRLs) have deterministic context free grammars • True for almost all computer languages • MRs can be parsed unambiguously • Training data consists of NL sentences paired with their MRs • Induce a semantic parser which can map novel NL sentences to their correct MRs • Learning problem differs from that of syntactic parsing where training data has trees annotated over the NL sentences
KRISP: Kernel-based Robust Interpretation for Semantic Parsing • Learns semantic parser from NL sentences paired with their respective MRs given MRL grammar • Productions of MRL are treated like semantic concepts • SVM classifier with string subsequence kernel is trained for each production to identify if an NL substring represents the semantic concept • These classifiers are used to compositionally build MRs of the sentences
Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best MRs (correct and incorrect) Train string-kernel-based SVM classifiers Training Semantic Parser Testing Novel NL sentences Best MRs
Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best MRs (correct and incorrect) Train string-kernel-based SVM classifiers Training Semantic Parser Testing Novel NL sentences Best MRs
KRISP’s Semantic Parsing • We first define Semantic Derivation of an NL sentence • We next define Probability of a Semantic Derivation • Semantic parsing of an NL sentence involves finding its Most Probable Semantic Derivation • Straightforward to obtain MR from a semantic derivation
ANSWER RIVER answer TRAVERSE STATE traverse NEXT_TO STATE next_to STATEID ‘texas’ stateid Semantic Derivation of an NL Sentence MR parse with non-terminals on the nodes: Which rivers run through the states bordering Texas?
Semantic Derivation of an NL Sentence MR parse with productions on the nodes: ANSWER answer(RIVER) RIVER TRAVERSE(STATE) TRAVERSE traverse STATE NEXT_TO(STATE) NEXT_TO next_to STATE STATEID STATEID ‘texas’ Which rivers run through the states bordering Texas?
Semantic Derivation of an NL Sentence Semantic Derivation: Each node coversan NL substring: ANSWER answer(RIVER) RIVER TRAVERSE(STATE) TRAVERSE traverse STATE NEXT_TO(STATE) NEXT_TO next_to STATE STATEID STATEID ‘texas’ Which rivers run through the states bordering Texas?
Semantic Derivation of an NL Sentence Semantic Derivation: Each node contains a production and the substring of NL sentence it covers: (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE),[1..9]) (TRAVERSE traverse,[1..4]) (STATE NEXT_TO(STATE),[5..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID,[8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9
Semantic Derivation of an NL Sentence Substrings in NL sentence may be in a different order: ANSWER answer(RIVER) RIVER TRAVERSE(STATE) TRAVERSE traverse STATE NEXT_TO(STATE) NEXT_TO next_to STATE STATEID STATEID ‘texas’ Through the states that border Texas which rivers run?
Semantic Derivation of an NL Sentence Nodes are allowed to permute the children productions from the original MR parse (ANSWER answer(RIVER), [1..10]) (RIVER TRAVERSE(STATE), [1..10]] (STATE NEXT_TO(STATE), [1..6]) (TRAVERSE traverse, [7..10]) (NEXT_TO next_to, [1..5]) (STATE STATEID, [6..6]) (STATEID ‘texas’, [6..6]) Through the states that border Texas which rivers run? 1 2 3 4 5 6 7 8 9 10
(NEXT_TO next_to, [5..7]) the states bordering 5 6 7 Probability of a Semantic Derivation • Let Pπ(s[i..j]) be the probability that production π covers the substring s[i..j] of sentence s • For e.g., PNEXT_TO next_to(“the states bordering”) • Obtained from the string-kernel-based SVM classifiers trained for each production π • Assuming independence, probability of a semantic derivation D: 0.99
Probability of a Semantic Derivation contd. (ANSWER answer(RIVER), [1..9]) 0.98 (RIVER TRAVERSE(STATE), [1..9]) 0.9 (TRAVERSE traverse, [1..4]) (STATE NEXT_TO(STATE), [5..9]) 0.95 0.89 (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) 0.99 0.93 (STATEID ‘texas’, [8..9]) 0.98 Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9
Computing the Most Probable Semantic Derivation • Task of semantic parsing is to find the most probable semantic derivation of the NL sentence given all the probabilities Pπ(s[i..j]) • Implemented by extending Earley’s [1970] context-free grammar parsing algorithm • Resembles PCFG parsing but different because: • Probability of a production depends on which substring of the sentence it covers • Leaves are not terminals but substrings of words
Computing the Most Probable Semantic Derivation contd. • Does a greedy approximation search, with beam width ω=20, and returns ω most probable derivations it finds • Uses a threshold θ=0.05 to prune low probability trees
Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best semantic derivations (correct and incorrect) Train string-kernel-based SVM classifiers Pπ(s[i..j]) Training Semantic Parser Testing Novel NL sentences Best MRs
KRISP’s Training Algorithm • Takes NL sentences paired with their respective MRs as input • Obtains MR parses • Induces the semantic parser and refines it in iterations • In the first iteration, for every production π: • Call those sentences positives whose MR parses use that production • Call the remaining sentences negatives
Positives Negatives • which rivers run through the states bordering texas? • what is the most populated state bordering oklahoma ? • what is the largest city in states that border california ? • … • what state has the highest population ? • what states does the delaware river run through ? • which states have cities named austin ? • what is the lowest point of the state with the largest area ? • … String-kernel-based SVM classifier KRISP’s Training Algorithm contd. First Iteration STATE NEXT_TO(STATE)
String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” K(s,t) = ?
String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states K(s,t) = 1+?
String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = next K(s,t) = 2+?
String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = to K(s,t) = 3+?
String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states next K(s,t) = 4+?
String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the statesnext to” K(s,t) = 7
String Subsequence Kernel contd. • The kernel is normalized to remove any bias due to different string lengths • Lodhi et al. [2002] give O(n|s||t|) algorithm for computing string subsequence kernel • Used for Text Categorization [Lodhi et al, 2002] and Information Extraction [Bunescu & Mooney, 2005]
String Subsequence Kernel contd. • The examples are implicitly mapped to the feature space of all subsequences and the kernel computes the dot products state with the capital of states with area larger than the states next to states that border states bordering states through which states that share border
Separating hyperplane Support Vector Machines • SVMs find a separating hyperplane such that the margin is maximized state with the capital of states that are next to 0.97 states with area larger than the states next to states that border states bordering states through which states that share border Probability estimate of an example belonging to a class can be obtained using its distance from the hyperplane [Platt, 1999]
Positives Negatives • which rivers run through the states bordering texas? • what is the most populated state bordering oklahoma ? • what is the largest city in states that border california ? • … • what state has the highest population ? • what states does the delaware river run through ? • which states have cities named austin ? • what is the lowest point of the state with the largest area ? • … String-kernel-based SVM classifier PSTATENEXT_TO(STATE)(s[i..j]) KRISP’s Training Algorithm contd. First Iteration STATE NEXT_TO(STATE)
Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best semantic derivations (correct and incorrect) Train string-kernel-based SVM classifiers Pπ(s[i..j]) Training Semantic Parser Testing Novel NL sentences Best MRs
Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best semantic derivations (correct and incorrect) Train string-kernel-based SVM classifiers Pπ(s[i..j]) Training Semantic Parser Testing Novel NL sentences Best MRs
KRISP’s Training Algorithm contd. • Using these classifiers Pπ(s[i..j]), obtain the ω best semantic derivations of each training sentence • Some of these derivations will give the correct MR, called correct derivations, some will give incorrect MRs, called incorrect derivations • For the next iteration, collect positives from most probable correct derivation • Extended Earley’s algorithm can be forced to derive only the correct derivations by making sure all subtrees it generates exist in the correct MR parse • Collect negatives from incorrect derivations with higher probability than the most probable correct derivation
KRISP’s Training Algorithm contd. Most probable correct derivation: (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO(STATE), [5..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9
KRISP’s Training Algorithm contd. Most probable correct derivation: Collect positive examples (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO(STATE), [5..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9
KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9 Incorrect MR: answer(traverse(stateid(‘texas’)))
KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: Collect negative examples (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9 Incorrect MR: answer(traverse(stateid(‘texas’)))
(ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Which rivers run through the states bordering Texas? KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Traverse both trees in breadth-first order till the first nodes where their productions differ are found.
(ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Which rivers run through the states bordering Texas? KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Traverse both trees in breadth-first order till the first nodes where their productions differ are found.
(ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Which rivers run through the states bordering Texas? KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Traverse both trees in breadth-first order till the first nodes where their productions differ are found.
(ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Which rivers run through the states bordering Texas? KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Traverse both trees in breadth-first order till the first nodes where their productions differ are found.
(ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Which rivers run through the states bordering Texas? KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Traverse both trees in breadth-first order till the first nodes where their productions differ are found.
(ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Which rivers run through the states bordering Texas? KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Mark the words under these nodes.
(ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Which rivers run through the states bordering Texas? KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Mark the words under these nodes.
(ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Which rivers run through the states bordering Texas? KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Consider all the productions covering the marked words. Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation.
(ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Which rivers run through the states bordering Texas? KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Consider the productions covering the marked words. Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation.