300 likes | 429 Views
Efficient kernels for sentence pair classification. Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy. Motivation. Classifying sentence pairs is an important activity in many NLP tasks , e.g.: Textual Entailment Recognition
E N D
Efficient kernels for sentence pair classification Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete UniversityofRome “Tor Vergata” Roma, Italy
Motivation • Classifyingsentencepairsisanimportantactivity in many NLP tasks, e.g.: • TextualEntailmentRecognition • MachineTranslation • Question-Answering • Classifiersneedsuitalblefeaturespaces
P1: T1 H1 P2: T2 H2 P3: T3 H3 T3 T1 T2 “Farmersfeedcowsanimalextracts” “They feed dolphins fishs” “Mothersfeedbabies milk” H1 H2 H3 “Fishseatdolphins” “Cowseatanimalextracts” “Babieseat milk” Motivation Forexample, in textualentailment… Training examples RelevantFeatures feed eat X Y X Y First-orderrules Classification
In thistalk… • First-orderrule (FOR) featurespaces: a challenge • Tripartite DirectedAcyclicGraphs (tDAG) as a solution: • formodelling FOR featurespaces • fordefiningefficientalgorithmsforcomputingkernelfunctionswithtDAGs in FOR featurespaces • An efficientalgorithmforcomputingkernels in FOR spaces • Experimental and comparative assessmentof the computationalefficiencyof the proposedalgorithm
First-orderrule (FOR) featurespaces: challenges Wewantto exploit first-orderrule (FOR) featurespaceswriting the implicitkernelfunction K(P1,P2)=|S(P1)S(P2)| thatcomputeshowmany common first-orderrules are activatedfrom P1 and P2 Without loss ofgenerality, wepresent the problem in syntactic-first-orderrulefeaturespaces
T1 H1 T1 H2 Observations • … using the KernelTrick: • define the distance K(P1 , P2) • insteadofdefining the feautures K(T1 H1,T1 H2)
T1 H1 T1 “Farmersfeedcowsanimalextracts” H1 “Cowseatanimalextracts” First-orderrule (FOR) featurespaces: challenges Addingplaceholders Propagatingplaceholders S S NP VP 1 VP NP NNS VB NP , 3 1 Pa= VB NP 1 NP NNS 3 Cows eat NN NNS feed Farmers 2 3 NN NNS NNS 2 3 1 1 animal extracts cows animal extracts 2 3 1 2 3 S S S { } , , VP , ,... S(Pa)= NP VP 1 NP VP NP VP 1 VB NP 1 NP 3 VB NP 3 feed eat
T3 H3 T3 “Mothersfeedbabies milk” H3 “Babieseat milk” First-orderrule (FOR) featurespaces: challenges S S NP VP 1 VP NP NP NNS VB 2 , 1 VB NP 1 NP NNS 2 Babies eat NN 2 feed Mothers NN NNS 2 1 1 milk babies milk Pb= 2 1 2 S S S { } , , VP , ,... NP VP 1 NP VP NP VP 1 VB NP 1 NP 2 VB NP S(Pb)= 2 feed eat
First-orderrule (FOR) featurespaces: challenges K(Pa,Pb)=|S(Pa)S(Pb)| S S S { } , , VP S(Pa)= , ,... NP VP 1 NP VP NP VP 1 VB NP 1 NP 3 VB NP 3 feed VP S eat , NP VP VB NP NP X X Y = = VB NP feed Y = eat S S S { } , , VP S(Pb)= , ,... NP VP 1 NP VP NP VP 1 VB NP 1 NP 2 VB NP 2 feed eat
A stepback… • FOR featurespaces can bemodelledwithparticulargraphs • Wecallthesegraphs tripartite directacyclicgraphs (tDAGs) • Observations: • tDAGs are nottrees • tDAGs can beusedtomodelbothrules and sentencepairs • unifyingrules in sentencesis a graphmatchingproblem
Tripartite DirectedAcyclicGraphs (tDAG) As forFeatureStructures… VP S NP VP VB NP NP X X Y VB NP Y feed eat S S NP VP 1 VP NP NNS VB NP 3 1 VB NP 1 NP NNS 3 Cows eat NN NNS feed Farmers 2 3 NN NNS NNS 2 3 1 1 animal extracts cows animal extracts 2 3 1 2 3
Tripartite DirectedAcyclicGraphs (tDAG) As forFeatureStructures… VP S NP VP VB NP NP X X Y VB NP Y feed eat S S NP VP 1 VP NP NNS VB NP 3 1 VB NP 1 NP NNS 3 Cows eat NN NNS feed Farmers 2 3 NN NNS NNS 2 3 1 1 animal extracts cows animal extracts 2 3 1 2 3
Tripartite DirectedAcyclicGraphs (tDAGs) A tripartite directedacyclicgraph is a graph G = (N,E) where: • the set of nodes N is partitioned in three sets Nt, Ng, and A • the set of edges is partitioned in four sets Nt, Ng, EA(t), and EA(g) where t = (Nt,Et) and g = (Nt,Et) are two trees EA(t) = {(x, y)|x Nt and yA} EA(g) = {(x, y)|x Ng and yA} VP S NP VP VB NP NP feed VB NP eat
Tripartite DirectedAcyclicGraphs (tDAGs) Alternative definition A tDAGis a pair of extented trees G = (t,g) where: t = (NtAt,EtEA(t)) and g = (NgAg,EgEA(g)). VP S NP VP VB NP NP feed VB NP eat VP S NP VP VB NP NP X X Y feed VB NP Y eat
Again challenges Computing the implicitkernelfunction K(P1,P2)=|S(P1)S(P2)| involvesgeneralgraphmatching. Thisisanexponentialproblem. Yet… tDAGs are particulargraphs and we can defineanefficientalgorithm Wewillanalyze the isomorphismamongtDAGs and wewill derive analgorithmfor
IsomorphismbetweentDAGs Isomorphismbetweengraphs G1=(N1,E1) and G2=(N2,E2) are isomorphicif: • |N1|=|N2| and |E1|=|E2| • Amongall the bijecivefunctionsrelating N1 and N2, itexistsf : N1N2suchthat: • foreach n1 in N1, Label(n1)=Label(f(n1)) • foreach (na,nb) in E1, (f(na),f(nb)) is in E2
IsomorphismbetweentDAGs IsomorphismadaptedtotDAGs G1 = (t1,g1) and G2 = (t2,g2) are isomorphic if these two properties hold • Partial isomorphism • g1 and g2 are isomorphic • t1 and t2 are isomorphic • This property generates two functions fg and ft • Constraint compatibility • fg and ft are compatible on the sets of nodes A1 and A2, if for each n A1, it happens that fg (n) = ft (n).
IsomorphismbetweentDAGs • Partial isomorphism S , VP Pa=(ta,ga)= NP VP 1 VB NP 1 NP 3 VB NP 3 S , VP Pb=(tb,gb)= NP VP 1 VB NP 1 NP 2 VB NP 2 Cg= Ct= { ( , ) , ( , ) } { ( , ) , ( , ) } 1 1 3 2 1 1 3 2 Ct=Cg • Constraint compatibility
Constraintcompatibility Ideasfor building the kernel Alternative constraints subsetsofS(P1)S(P2) PartialIsomorphism Wedefine K(P1,P2)=|S(P1)S(P2)| using the isomorphismbetweentDAGs The idea: reverse the orderofisomorphism detection • First, constraintcompatibility • Building a set Cofall the relevantalternative constraints • FindingsubsetsofS(P1)S(P2) meeting a constraintcC • Second, partialisomorphism detection
Constraintcompatibility Ideasfor building the kernel Alternative constraints subsetsofS(P1)S(P2) PartialIsomorphism K(Pa,Pb)=|S(Pa)S(Pb)| I A 1 1 , Pa=(ta,ga)= M N B C 1 1 1 1 M M N N B B C C 2 1 2 1 1 2 1 2 A I 1 1 , Pb=(tb,gb)= B C M N 1 1 1 1 B B C C M M N N 1 2 1 3 3 1 2 1 C={c1,c2}={ } , { ( , ) , ( , ) } { ( , ) , ( , ) } 1 1 2 2 1 1 2 3
Constraintcompatibility Ideasfor building the kernel Alternative constraints subsetsofS(P1)S(P2) PartialIsomorphism K(Pa,Pb)=|S(Pa)S(Pb)| K(Pa,Pb)=|S(Pa)S(Pb)|=|(S(Pa)S(Pb))c1(S(Pa)S(Pb)) c2| I A 1 1 , C={c1,c2} M N B C 1 1 Pa= 1 1 c1= { ( , ) , ( , ) } 1 1 2 2 M M N N B B C C 2 1 2 1 1 2 1 2 A I 1 1 , B C M N Pb= 1 1 1 1 B B C C M M N N 1 2 1 3 3 1 2 1 I A 1 1 A 1 I 1 , M N B C { , 1 1 1 1 , , S(Pa)S(Pb)) c1= B C 1 1 M N 1 1 N N B B 2 1 1 2 I A 1 1 I A , , 1 1 M N , B C 1 } 1 1 1 M N B C 1 1 1 1 N N B B 2 1 1 2
Constraintcompatibility Ideasfor building the kernel Alternative constraints subsetsofS(P1)S(P2) PartialIsomorphism K(Pa,Pb)=|S(Pa)S(Pb)|=|(S(Pa)S(Pb))c1(S(Pa)S(Pb)) c2| I A 1 1 , C={c1,c2} M N B C 1 1 Pa= 1 1 c2= { ( , ) , ( , ) } 1 1 2 3 M M N N B B C C 2 1 2 1 1 2 1 2 A I 1 1 , Pb= B C M N 1 1 1 1 B B C C M M N N 1 2 1 3 3 1 2 1 I A 1 1 A 1 I 1 , M N B C 1 1 1 1 { , , , S(Pa)S(Pb)) c2= B C 1 1 M N 1 1 M M C C 2 1 1 2 I A 1 1 I A , , 1 1 M N B C 1 1 1 , 1 } M N B C 1 1 1 1 N N C C 2 1 1 2
Constraintcompatibility Ideasfor building the kernel Alternative constraints subsetsofS(P1)S(P2) PartialIsomorphism K(Pa,Pb)=|cC(S(Pa)S(Pb))c|=|cC(S(ta)S(tb))c(S(ga)S(gb))c| I A 1 1 A 1 I ={ , 1 , , (S(Pa)S(Pb)) c1 M N B C 1 1 1 1 , B C 1 1 M N 1 1 N N B B 2 1 1 2 I A 1 1 , I A } = , , 1 1 M N B C 1 1 1 1 M N B C 1 1 1 1 N N B B 2 1 1 2 I 1 A 1 ={ }{ } = A I 1 1 M N 1 , 1 , B C 1 1 B C M N 1 1 1 1 N N 2 1 B B 1 2 =(S(ta)S(tb))c1 (S(ga)S(gb))c1
Kernel on FOR featurespaces The generalEquation can becomputedusing: • KS (kernelfunctionfortrees) introduced in(Duffy&Collins, 2001) and refined in (Moschitti&Zanzotto, 2007) • The inclusionexclusionprinciple K(P1,P2)=|cC(S(t1)S(t2))c(S(g1)S(g2))c|
ComputationalEfficencyAnalysis • ComparisonKernel (Zanzotto&Moschitti, Coling-ACL 2006),(Moschitti&Zanzotto, ICML 2007) • Test-bed: corpus • RecognizingTextualEntailment challenge data
ComputationalEfficencyAnalysis Executiontime in seconds (s) forall the RTE2 withrespecttodifferentnumbersofallowedplaceholders
AccuracyComparison • Training: RTE 1, 2, 3 • Testing: RTE 4
Conclusions • Wereducedkernels in first-orderfeaturespacesasgraph-matchingproblems • Wedefined a newclassofgraphs, tDAGs • Wepresentedanefficientalgorithmforcomputingkernels in FOR featurespaces
S S NP VP 1 VP NP NNS VB NP , 3 1 VB NP 1 NP NNS 3 Cows eat NN NNS feed Farmers 2 3 NN NNS NNS 2 3 1 1 animal extracts cows animal extracts 2 3 1 2 3 S S S { } , , VP , ,... NP VP 1 NP VP NP VP 1 VB NP 1 NP 3 VB NP 3
S S NP VP 1 VP NP NP NNS VB 2 , 1 VB NP 1 NP NNS 2 Cows eat NN 2 feed Mothers NN NNS 2 1 1 milk babies milk 2 1 2 S S S S { } , , VP , ,... S S { } , , VP NP VP 1 , ,... NP VP 1 NP VP NP VP 1 VB NP 1 NP 2 NP VP NP VP 1 VB NP 1 NP 3 VB NP 2 VB NP 3