400 likes | 558 Views
History of Major NLP Products & Services in Toshiba. 1978 「 JW-10」 : Japanese Word Processor 1985 「 ASTRANSAC EJ」 : EtoJ MT System 1989 「 ASTRANSAC JE」 : JtoE MT System 1995 「 The 翻訳」 : PC MT System (Internet & Personal) 1996 「 News Watch」 : Information Filtering Service
E N D
History of Major NLP Products & Services in Toshiba 1978 「JW-10」 : Japanese Word Processor 1985 「ASTRANSAC EJ」 : EtoJ MT System 1989 「ASTRANSAC JE」 : JtoE MT System 1995 「The 翻訳」 : PC MT System (Internet & Personal) 1996 「News Watch」 : Information Filtering Service 1999 「Fresh Eye」 : Internet Search Engine/Portal 2001 「KnowledgeMeister」: KM Support System 2005 Chinese-Japanese Translation Service 2006 「KnowledgeMeister - Succeed」 1 /
29 January, 2008 Integrated Use of Phrase Structure Forest and Dependency Forest in Preference Dependency Grammar (PDG) Toshiba of Europe Ltd. Hideki Hirakawa
Agenda • Phrase Structure and Dependency Structure Analysis • Overview of the Preference Dependency Grammar(PDG) • Packed Shared Data Structure “Dependency Forest” • Evaluation of Dependency Forest • Conclusion
Dependency Structure (DS) Phrase Structure (PS) s vp pp np np Information explicitly expressed by PS - Phrases (non-terminal nodes) - Structural categories (non-terminal labels) Information explicitly expressed by DS - Head-dependent relations (directed arcs) - Functional categories (arc labels) n det n pre v pre sub det vpp time fly like an arrow time fly like an arrow Phrase Structure (PS)and Dependency Structure (DS) Two major syntactic representation schemes
Relation between PS (Constituency)and DS • Constituency and dependency describe different dimensions. • A phrase-structure tree (PST) is closely related to a derivation, whereas a dependency tree rather describes the product of a process of derivation. • Constituency and dependency are not adversaries, they are complementary notions. Using them together we can overcome the problems that each notion has individually. Formal & Computational Aspects of Dependency Grammar [Kruijff 02]
Integrated Use of Phrase and Dependency Structures Phrase structure analysis - Lexicalized PCFG Lexical information (including dependency relation) improves PS analysis accuracy (ex. Charniak 1997; Collins 1999; Bikel 2004) - Use of dependency relations as discriminative features of maximum entropy phrasestructure parser (ex. HPSG Parser (Oepen 2002), Reranking parser (Charniak and Johnson 2005)) - Use of another independent shallow dependency parser (Sagae et al. 2007) Dependency analysis Almost no use of phrase structure information (Kakari-uke parsers, MSTParser (McDonald 2005), Malt parser(Nivre 2004) Integration requires mapping Integration of PS and DS requires mapping between two structures of a sentence because sentence analyzers cannot combine any linguistic information without correspondence between the two structures.
Complete mapping based on the “Dependency Forest” ⇒Integrated use of PS and DS (described later) Mapping between PS and DS (traditional researches) • Conversion from/to PS to/from DS based on heuristics Phrase Structure Tree (PST) → Dependency Tree (DT)[Collins 99], DT → PST [Xia&Palmer 00] ⇒ Measurement of parse accuracy, tree bank creation etc. • Grammar equivalence [Gaifman 65],[Abney 94] studied the equivalence relation between CFG PSG (CFG) and DG (Tesniere model DG) ⇒ DG is strongly equivalent to only sub-class of CFG*1 • Structure mapping based on packed shared data structures Partial structure mapping framework based on the Syntactic Graph [Seo&Simmons 89]. Creates mappings between PSTs and DTs based on partial structure mapping rules (described later) ⇒ Syntactic graph generates inappropriate mapping[Hirakawa06]
Agenda • Phrase Structure and Dependency Structure Analysis • Overview of the Preference Dependency Grammar(PDG) • Packed Shared Data Structure “Dependency Forest” • Evaluation of Dependency Forest • Conclusion
◎ ○ × > > Basic Sentence Analysis Model Preference Knowledge preference order of interpretations Constraint Knowledge rejection of interpretations reject accept × × The optimum interpretation × ○ × ◎ Sentence × ◎ × ○ ○ Optimum Interpretation Extraction ○ × Interpretation ◎ correct ○ plausible × implausible × Generation Knowledge generates all possible interpretations Interpretation Space prescribed by interpretation description scheme
◎ ○ × > > Example (1) Probabilistic Context Free Grammar(PCFG) Constraint Knowledge No constraints Preference Knowledge Probabilities of the CFG rules × × The optimum interpretation × ○ × ◎ ◎ Sentence × ○ ○ × Optimum Interpretation Extraction theViterbi algorithm ○ × Generation Knowledge CFG rules × Interpretation Space Phrase structure (parse tree)
Optimum Interpretation Extraction PK1 CK1 PK2 CK2 PK3 CK3 mapping ◇ △ ○ ◇ △ 6△ 6○ 3◇ 2. Optimum Solution Search △ 1△ ◇ 4◇ 2△ △ 3△ 3○ ◎ Sentence 5◇ ○ △ 1◎ 1. Data Structure 4○ 5○ △ 4△ 5△ 2◇ 2○ l ◇ The Optimum Interpretation n○ m△ GK3 GK2 ◇ ◇ ◇ △ △ ○ ○ △ GK1 IS1 IS2 IS3 ◎ Level 2 Interpretation: Level 3 Interpretation: Level 1 Interpretation: 1◇ ◇ ◇ ◇ △ △ Basic Sentence Analysis Model of PDG Multilevel Packed Shared Data Connection Model • NLA system with multilevel interpretation space • Packed shared data structure and interpretation mapping • (c) Interpretations are externalizations of the lower level interpretations PK: Preference Knowledge, CK: Constraint Knowledge, GK: Generation Knowledge, IS: Interpretation Space
PDG Implementation Model (data structure) PDG is an all-pair dependency analysis method with three level architecture utilizing three packed shared data structures Morphological Layer Syntactic Layer root top s s The Optimum Dependency Tree Sentence top top vp vp np time/n fly/v np ○ “Time flies” obj time/v fly/v fly/n time/n time/v fly/n sub fly/n time/n time/v fly/v WPP trellis Dependency forest Phrase str. forest All WPP sequences All PSTs All DTs × △ ○ × × △ ○ Interpretation mapping × × root PST WPP sequence DT s top top vp np time/n fly/v fly/v sub fly/v time/n time/n Integrated use of PS and DS level in syntactic layer WPP = Word POS Pair, Phrase structure forest (PSF) = (packed shared) parse forest
◎ ◎, ◎ ◎ : 1 ◎ 2 ◎ : 1 ◎ 2 ◎ : Comparison with other dependency analysis methods Morphology Level DS Level PS Level Combinatorial Explosion CDG No CFG Grammar Sentence Well-formed Interpretations All DS Interpretations Over Pruning MSTParser No CFG Grammar Sentence ◎ 1 □ Optimum interpretation 1-best Morphological Interpretation All Interpretations with no POS ambiguities PDG △ □ CFG Filtering □ △ ◎ Sentence : : Optimum interpretation × All DS Interpretations All PS Interpretations All Morphological Interpretations CDG: Constraint Dependency Grammar, MSTParser : Maximum Spanning Tree Parser
fly/v time/n fly/n time/v top top top top obj top time/v fly/v fly/v sub fly/n time/n sub time/n PDG Implementation Model (optimum solution search) Integration of Preference Knowledge: Preference scores based on multilevel data structures are integrated into scores on a DF Optimum solution search Graph Branch Algorithm MorphologicalLayer SyntacticLayer root s s Sentence vp vp np np “Time flies” fly/n time/n The optimum dep. tree time/v fly/v Dep. forest WPP trellis PS forest WPP seq. score Phrase str. score Dep. score Score integration Scoring
The Optimum Tree WPP Trellis Co-occurrence Score Matrix PS Forest Scored Dependency Forest Dependency Forest Sentence PDG Analysis Flow ・Dependency Forest Generation Optimum Tree Search Extended Chart Parser ・Preference Score Integration ・Optimum Tree Search based on CM and PM Scoring Forest Generation
Agenda • Phrase Structure and Dependency Structure Analysis • Overview of the Preference Dependency Grammar(PDG) • Packed Shared Data Structure “Dependency Forest” • Evaluation of Dependency Forest • Conclusion
Y/wh … … … Xi/wi Xh/wh Xn/wn X1/w1 ◇ ◇ △ △ ◇ ◇ ◇ ◇ △ △ △ ◇ ◇ △ ◇ △ ◇ ◇ △ △ △ △ ◇ ◇ △ ◇ △ ◇ △ ◇ △ Partial Structure Mapping Method [Seo&Simmons 89] Grammar Rule : partial structure mapping rule wh d1 di … wi dn w1 Mapping … wn Headed CFG Rule Partial Dependency Tree Sentence Parser Packed Shared Phrase Structure (Phrase structure forest) Packed Shared Dependency Structure (Syntactic Graph) = = Mapping Set of dependency trees Set of phrase structure trees
vpp ppn S snp vpp [2,like,p] det [1,fly,v] [3,an,det] [0,time,n] [4,arrow,n] mod npp vnp snp vnp [0,time,v] [1,fly,n] [2,like,v] S S Syntactic Graph • Packed Shared Data Structure for Dependency Trees Encompasses all dependency trees corresponding to phrase structure trees in the parse forest for a sentence Arc: Dependency Relation Node: WPP “Time flies likes an arrow” Exclusion Matrix Syntactic Graph
Completeness and Soundness of the syntactic graph • Definitions • Completeness :For every parse tree in the forest, there is a syntactic reading • from the syntactic graph that is structurally equivalent to that parse tree. • ∀PST:Phr.Str.Tree∃DT:Dep.TreePST corresponds to DT • Soundness : For every syntactic reading from the syntactic graph, there is a • parse tree in the forest that is structurally equivalent to that syntactic reading. • ∀DT:Dep.Tree∃PST:Phr.Str. TreePST corresponds to DT • Problem of the syntactic graph Violation of the soundness [Hirakawa06] ○ × completeness Dep. tree:DT Phr. str. tree:PT × ○ × ○ × ○ soundness Phrase structure forest Syntactic graph
Example of the violation of soundness (b) np2 (a) S S nj-7 np1 nj-4 nc-6 nc-1 nc-2 nc-6 nc-2 rt-8 nc-3 rt-8 ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ Tokyo taxi driver call center Tokyo taxi driver call center (c) np3 S S nj-5 nj-7 nj-4 nj-5 rt-8 nc-6 nc-1 nc-3 rt-8 ○ ○ ○ ○ ○ nc-6 nc-1 nc-2 nc-3 Tokyo taxi driver call center ○ ○ ○ ○ ○ S Tokyo taxi driver call center (d) nc-6 nc-1 nc-2 nc-3 rt-8 ○ ○ ○ ○ ○ Syntactic graph for (a),(b) and (c) generates (d) which has no corresponding phrase structure tree in the phrase structure forest Syntactic Graph/Exclusion Matrix
root rt32 vpp20 pre15 rt31 rt29 sub24 det14 vpp18 0,time/n 1,fly/v 2,like/p 3,an/det 4,arrow/n npp19 nc2 obj16 0,time/v 1,fly/n 2,like/v sub23 obj4 obj25 Dependency Forest [Hirakawa 06] • Packed Shared Data Structure for Dependency Trees Dependency Forest(DF) = Dependency Graph(DG) + Co-occurrence Matrix(CM) CM(Dependency Forest): Defines the arc co-occurrence relation (Equivalent arcs are allowed in DF) Dependency Graph Co-occurrence Matrix Dependency Forest for “Time flies like an arrow.”
Features of the Dependency Forest • Mapping is assured (phrase structure tree ⇔ dependency tree) → usable for multilevel packed shared data connection model • High flexibility in describing constraints ex. non-projective dependency structure*1 *1 : dependency structure violating at least the following projectivity conditions ''no cross dependency exits'' ''no dependency covers the top node''
DF Reduction Generation Flow of Phrase Structure Forest and Dependency Forest PDG analysis process PDG data structure Input sentence Morphological Analysis Dictionary WPP Trellis Extended CFG Chart Parsing (2) Parse Forest (1) (3) DF Extraction Initial Dependency Forest (4) Dependency Forest Optimum Solution Search Dependency Tree
ex. vp/V → v/V, np/NP, pp/PP : [arc(obj,NP,V), arc(vpp,PP,V)] vp/V(=see/v) V(= see/v) : [arc(arcname1,Xi,Xj),...,arc(arcnamen-1,Xk,Xl)] y/Xi→x1/X1,...,xn/Xn y/Xh→x1/X1,...,xn/Xn obj vpp CFG Rewriting rule part Dependency structure part v/V(=see/v) np/NP(=girl/n) pp/PP(=in/pre) NP(= girl/n) PP(= in/pre) see a girl in the forest • Xi: Variable • Xh(phrase head) : “Xh” is either of “X1”..“Xn” • Dependency tree Nodes: X1, ... , Xn Top node: Xh Dependency structure Phrase structure PDG Grammar Rule Extended CFG rule with phrase head and mapping to dependency structure
Standard Chart Parsing: Structure of Standard Edge EDGE <0,2, s → np・vp pp> Head category Start position End position Found constituents Remaining constituents Active edge <0,2, s → np ・ vp pp> Inactive edge <0,2, np → det noun ・> Lexical edge <2,3,v → [chase]・> <1,2,n → [cat]・> <0,1,det → [a]・> a cat chases … 1 2 3 0 Input position
Structure of PDG Edge Two extensions to the standard edge structure (1) Mapping to dependency structure PDG single edge = Standard edge + Phrase head+ Dependency structure(tree) <0,2, s/V → np/[cat-n-1]・vp/V pp/PP : [arc(obj,/[cat-n-1],V), arc(vpp,PP,V)]> <0,2, np/[cat-n-1] → det/[a-det-0] noun/[cat-n-1] ・ : arc(det,[a-det-0] ,[cat-n-1] )> <0,1,det → [a]・ : [a-det-0]> <1,2,n → [cat]・ : [cat-n-1]> <2,3,v → [chase]・: [chase-n-2]> a cat chases 1 2 3 0 (2) Packing of inactive edges PDG (packed) edge is a set of sharable PDG single edges
Phrase Structure Forest a set of inactive edges reachable from the root edge <Eroot root→[s1 s2][ds1 ds2]> arc(root-17,[like]-v-2,[root]-x), arc(root-24,[flies]-v-1,[root]-x), arc(root-27,[time]-v-0,[root]-x), arc(sub-16,[flies]-n-1,[like]-v-2), arc(nc-4,[time]-n-0,[flies]-n-1), arc(obj-14,[arrow]-n-4,[like]-v-2), : CM1:Between arcs in DS <E2 s2→…> <E1 s1→[[np1 vp1]][ds11 ]> <E1 s1→[[np1 vp1 pp1]][Arc1,Arc2]> <Er root→[s] ・> <E2 s2→…・> : Initial Co-occurrence Matrix CM1~3:CMatrix setting condition <E4 vp1→[[v1 np2] [v1 np3 pp1]]: [ds41 ds42]> <E4 vp1→・・・> <E3 np1→[[det1 n1]]: [ds31 ]> <E3 np1→・・・> CM2: Between arcs in DS and arcs governed by constituents ○ ○ ○ ○ Arc8,Arc9,.. ○ ○ ○ ○ ○ ○ ○ ○ Arc3,.. ○ ○ ○ ○ ○ ○ CM3: Between arcs governed by different constituents Generation of Phrase Structure Forest and Initial Dependency Forest Initial Dependency Graph a set of arcs in the PS forest Chart Inactive Edges ・ Bottom-up chart parser using the Agenda ・ Terminates when the Agenda becomes empty <E52 np2→... ・> : Active edges <E12 s2→…・ …> <E12 s2→... ・…> : Agenda φ
186 root 196 s 186 s 191 s 195 vp 201 vp <Er root→[s] ・> <E2 s2→…・> : 189 vp 197 np 184 vp 188 pp 133 np 178 np 103 np 123 np 101 138 150 166 [2,like,p] [1,fly,v] [3,an,det] [0,time,n] 121 153 169 110 [4,arrow,n] [0,time,v] [1,fly,n] [2,like,v] Generation of Phrase Structure Forest and Initial Dependency Forest Phrase Structure Forest Initial Dependency Forest Chart Initial Dependency Graph a set of arcs in the PS forest Inactive Edges <E52 np2→... ・> : Initial Co-occurrence Matrix CM1~3:CMatrix setting condition Active edges <E12 s2→…・ …> <E12 s2→... ・…> : Agenda φ
sub24 vpp18 0,time/n 1,fly/v Reduction sub24 vpp18 0,time/n 1,fly/v npp19 nc2 npp19 nc2 0,time/v 1,fly/n obj4 sub23 0,time/v 1,fly/n obj4 sub23 Generated from two grammar rules vp/V → v/V,np/NP : [arc(obj,NP,V)] vp/V → v/V,np/NP,pp/PP : [arc(obj,NP,V), arc(vpp,PP,V)] Equivalent arc obj25 Reduction of the Initial Dependency Forest • more than one equivalent arc is merged into one arc without increasing the number of the generalized dependency trees in the dependency forests
Completeness and Soundness of the Dependency Forest Completeness : All phrase structure trees in the parse forest have corresponding dependency trees in the dependency forest. ∀PT:phrase structure tree ∃DT:dependency tree dep_tree(PT) = DT Soundness :Every phrase structure tree corresponding to a dependency tree in the dependency forest exists in the phrase structure forest ∀DT:dependency tree ∃PT:phrase structure tree dep_tree(PT) = DT Dependency forest Phrase structure forest DT:dependency tree PT:phrase structure tree × × ○ ○ ○ × ○ × ○ × ○ ○ × 1:N correspondence in general The completeness and soundness of the dependency forest is assured [Hirakawa 06]
Evaluation of the Dependency Forest Framework • Analysis of prototypical ambiguous sentences • 1 to N / N to 1 correspondence between phrase structure tree/trees and dependency trees/tree • Generation of Non-projective dependency tree
Grammar for Ambiguous Sentences =========== s/Sentence =========== (R1) s/VP→ np/NP,vp/VP : [arc(sub,NP,VP)] % Declarative sentence (R2) s/VP→ vp/VP : [] % Imperative sentence ========= np/Noun Phrase ======== (R3) np/N→ n/N : [] % Single noun (R4) np/N2→ n/N1,n/N2 : [arc(nc,N1,N2)] % Compound noun (R5) np/N→ det/DET,n/N : [arc(det,DET,N)] % (R6) np/NP→ np/NP,pp/PP : [arc(npp,PP,NP)] % Prepositional phrase attachment (R7) np/N→ ving/V,n/N : [arc(adjs,V,N)] % Adjectival usage(subject) (R8) np/N→ ving/V,n/N : [arc(adjo,V,N)] % Adjectival usage(object) (R9) np/V→ ving/V,np/NP : [arc(obj,NP,V)] % Gerund phrase (R10) np/V→ ving/V,np/NP,pp/PP : [arc(obj,NP,V),arc(vpp,PP,V)] % Gerand phrase with PP (R11) np/NP→ np/NP0,and/AND,np/NP: [arc(and,NP0,NP),arc(cnj,AND,NP0)]% Coordination (and) (R12) np/NP→ np/NP0,or/OR,np/NP : [arc(or,NP0,NP),arc(cnj,OR,NP0)] % Coordination (or) ========= vp/Verb ======== phrase (R13) vp/V→ v/V : [] % Intransitive verb (R14) vp/V→ v/V,np/NP : [arc(obj,NP,V)] % Transitive verb (R15) vp/V→ be/BE,ving/V,np/NP : [arc(obj,NP,V),arc(prg,BE,V)] % Progressive (R16) vp/BE→ be/BE,np/NP : [arc(dsc,NP,BE)] % Copular (R17) vp/VP→ vp/VP,pp/PP : [arc(vpp,PP,VP)] % PP-attachment (R18) vp/VP→ adv/ADV,vp/VP : [arc(adv,ADV,VP)] % Adverb modification (R19) vp/V→ v/V,np/NP,adv/ADV,relc/RELP % non-projective pattern :[arc(obj,NP,V),arc(adv,ADV,V),arc(rel,RELP,NP)] ======== pp/Prepositional phrase ======== (R20) pp/P→ pre/P,np/NP :[arc(pre,NP,P)] Grammar rules for typical ambiguities (PP-attachment, Coordination, be-verb usage)
root vpp27,5 vpp16,15 root23,0 pre12,10 pre24,10 npp26,5 obj6,20 npp14,10 sub33,20 0,I 1,saw 2,a 3,girl 4,with 5,a 6,telescope 7,in 8,the 9,forest det11,0 det42,0 det4,0 npp29,5 Node 0,I : [i]-n-0 1,saw : [saw]-v-1 2,a : [a]-det-2 3,girl : [girl]-n-3 4,with : [with]-pre-4 5,a : [a]-det-5 6,telescope : [telescope]-n-6 7,in : [in]-pre-7 8,the : [the]-det-8 9,forest : [forest]-n-9 root : [root]-x-root PP-attachment Ambiguity Input sentence: I saw a girl with a telescope in the forest. Five well-formed dependency trees Crossing Single role
Coordination Scope Ambiguity Input sentence :Earth and Moon or Jupiter and Ganymede. Five well-formed dependency trees and14,5 root and12,10 and18,12 and25,20 root26,0 0,earth 1,and 2,moon 3,or 4,jupitor 5,and 6,ganymede cnj6,0 cnj14,0 cnj2,0 or22,3 Crossing or9,4 Node 0,earth : [earth]-n-0 1,and : [and]-and-1 2,moon : [moon]-n-2 3,or : [or]-or-3 4,jupiter : [jupiter]-n-4 5,and : [and]-and-5 6,ganymede : [ganymede]-n-6 root : [root]-x-root Single role
root vpp24,7 root41,0 npp27,3 det1,0 pre22,0 root44,0 adj4,12 prg2,10 0,my 1,hobby 2,is 3,watching 4,birds 5,with 6,telescope dsc33,8 obj6,15 npp23,5 sub38,10 sub5,5 sub35,1 dsc36,10 Structural Interpretation Ambiguity and PP-attachment Ambiguity Input sentence: My hobby is watching birds with telescope Ten well-formed dependency trees Node 0,my : [my]-det-0 1,hobby : [hobby]-n-1 2,is : [is]-be-2 3,watching : [watching]-ving-3 4,birds : [birds]-n-4 5,with : [with]-pre-5 6,telescope : [telescope]-n-6 root : [root]-x-root
s s vp vp vp vp np adv vp pp np adv vp pp vpp adv She curiously saw a cat in the forest She curiously saw a cat in the forest curiously She saw a cat in the forest N to 1 Correspondence from PSTs to One DT(1) • Spurious ambiguity (Eisner96),(Noro05) (R17)vp/VP→ vp/VP,pp/PP : [arc(vpp,PP,VP)] % PP-attachment (R18)vp/VP→ adv/ADV,vp/VP : [arc(adv,ADV,VP)] % Adverb modification Rule application: R18→ R17 Rule application: R17→ R18
np np np np np np cnj cnj np np pp pp and4,20 root npp8,0 pre7,0 cnj2,0 root12,0 Earth Earth and and Jupiter Jupiter in Solar System in Solar System 0,Earth 1,and 2,Jupiter 3,in 4,Solar System N to 1 Correspondence from PSTs to One DT(2) • Modification scope problem(Mel'uk88) Dependency structure has ambiguities in modification scope when it has a head word which has dependants located at the right-hand side and the left-hand side of the head word. ex. Earth and Jupiter in Solar System. ・ Introduction of “Grouping”(Coordination and operator words (ex. not, only)) [Mel'uk88] ・ Japanese has no modification scope problem because it has no right to left dependency.
Generation of Non-projective Dependency Tree • Grammar rule for non-projective dependency tree (R19)vp/V → v/V,np/NP,adv/ADV,relc/REL :[arc(obj,NP,V),arc(adv,ADV,V),arc(rel,RELP,NP)] Input sentence : She saw the cat curiously which was Persian*1 adv10,15 re1 11,10 root obj6,20 root14,0 5,which was Persian 0,She 1,saw 2,the 3,cat 4,curiously sub12,20 det4,0 *1: Artificial example for showing the rule applicability
Conclusion • Dependency forest is a packed shared data structure - Bridge between phrase structure and dependency structure usable for Multilevel Packed Shared Data Connection MODEL of PDG - High flexibility in describing constraints Future work • Extension of the framework for the modification scope problem (Grouping) • Real-world systemimplementation