1 / 41

Datalingvistiska tillämpningar

Datalingvistiska tillämpningar. Maskinöversättning Dialogsystem (ev. talad dialog, ev. multimodal) Textförståelsesystem Informationsextraktion Informationssökning Grammatikkontroll Datorstödd språkinlärning etc. Datalingvistiska ‘komponentteknologier’. Analys och generering av tal

iliana
Download Presentation

Datalingvistiska tillämpningar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Datalingvistiska tillämpningar • Maskinöversättning • Dialogsystem (ev. talad dialog, ev. multimodal) • Textförståelsesystem • Informationsextraktion • Informationssökning • Grammatikkontroll • Datorstödd språkinlärning • etc. FST - Torbjörn Lager, UU

  2. Datalingvistiska ‘komponentteknologier’ • Analys och generering av tal • Ordklasstaggning • Morfologisk analys och generering • Syntaktisk analys (parsning) • Semantisk tolkning • Referenslösning • Planering och planigenkänning • Kunskapsrepresentation och inferens • etc. FST - Torbjörn Lager, UU

  3. Ordklasstaggning: Exempel 1 • He can can a can • He/pron can/aux can/vb a/det can/n • He/{pron} can/{aux,n} can/{vb} a/{det} can/{n,vb} FST - Torbjörn Lager, UU

  4. Ordklasstaggning: Exempel 2 • I can light a fire and you can open a can of beans. Now the can is open and we can eat in the light of the fire. • I/PRP can/__ light/__ a/DT fire/NN and/CC you/PRP can/__ open/__ a/DT can/__ of/IN beans/NNS ./. Now/RB the/DT can/__ is/VBZ open/__ and/CC we/PRP can/__ eat/VB in/IN the/DT light/__ of/IN the/DT fire/NN ./. FST - Torbjörn Lager, UU

  5. Ordklasstaggning: Exempel 2 • I can light a fire and you can open a can of beans. Now the can is open and we can eat in the light of the fire. • I/PRP can/MD light/VB a/DT fire/NN and/CC you/PRP can/MD open/VB a/DT can/NN of/IN beans/NNS ./. Now/RB the/DT can/NN is/VBZ open/JJ and/CC we/PRP can/MD eat/VB in/IN the/DT light/NN of/IN the/DT fire/NN ./. FST - Torbjörn Lager, UU

  6. Olika typer av relevant information • lexikal information • kontextuell information FST - Torbjörn Lager, UU

  7. Varför ordklasstagga? • Korpuslingvistisk forskning • Ett försteg till ordbetydelsebestämning • Ett försteg till parsning • ? FST - Torbjörn Lager, UU

  8. Knowledge Processor POS tagged text Text Part-of-speech tagging • Needed:- some strategy for representing the knowledge - some method for acquiring the knowledge - some method of applying the knowledge FST - Torbjörn Lager, UU

  9. Knowledge Processor POS tagged text Text Some POS-tagging issues • Accuracy • Speed • Space requirements • Robustness • Learning FST - Torbjörn Lager, UU

  10. Vanliga indelningar • Taggningmetoder • regelbaserade • statistiska • Inlärningsmetoder • 'Supervised learning' • 'Unsupervised learning' FST - Torbjörn Lager, UU

  11. Formella verktyg • Formell logik • Sannolikhetsteori och statistik • Automatateori och matematisk lingvistik • Algoritm- och komplexitetsteori FST - Torbjörn Lager, UU

  12. Taggningsmetoder • HMM-taggning • statistikbaserad (probabilistisk) • 'supervised learning' • Brilltaggning • regelbaserad • 'supervised learning' • Constraint-Grammar tagging • regelbaserad • ingen inlärning FST - Torbjörn Lager, UU

  13. Hidden Markov Modelling • statistikbaserad • Den modiga ansatsen: "På basis av en sammanvägning av tillgänglig lexikal och kontextuell information, gissa! FST - Torbjörn Lager, UU

  14. FST - Torbjörn Lager, UU

  15. Brilltaggning • Strategi: "Gissa först, men ändra sen om nödvändigt" • Enkelt "heuristiskt" lexikon • En sekvens av transformationsregler betingade på lokal kontext: • Regelexempel: tag:vb>nn <- tag:dt@[-1] FST - Torbjörn Lager, UU

  16. Brilltaggning steg 1 • I can light a fire and you can open a can of beans. Now the can is open and we can eat in the light of the fire. • I/PRP can/MD light/JJ a/DT fire/NN and/CC you/PRP can/MD open/JJ a/DT can/MD of/IN beans/NNS ./. Now/RB the/DT can/NN is/VBZ open/JJ and/CC we/PRP can/MD eat/VB in/IN the/DT light/JJ of/IN the/DT fire/NN ./. FST - Torbjörn Lager, UU

  17. Transformation-based tagging • Representational strategy: • Simple lexica • Ordered lists of transformations, conditioned on (small amounts) of local context • Learning strategy: Transformation-based learning FST - Torbjörn Lager, UU

  18. Transformation-based tagging • Three steps: • Lexical look-up • Lexical rule application for unknown words • Contextual rule application FST - Torbjörn Lager, UU

  19. blue yellow blue brown red red blue blue brown green Transformation-based tagging K. Samuel 1998 FST - Torbjörn Lager, UU

  20. I PRP Now RB a DT and CC beans NNS can MD eat VB fire NN in IN is VBZ light JJ of IN open JJ the DT we PRP you PRP . . Lexikon för Brilltaggning FST - Torbjörn Lager, UU

  21. 'Constraint-Grammar'-taggning • Regelbaserad • Den försiktiga ansatsen: "Gissa inte! Eliminera bara det omöjliga!" FST - Torbjörn Lager, UU

  22. Ordklasstaggning: Exempel 2 • I can light a fire and you can open a can of beans. Now the can is open and we can eat in the light of the fire. • I/{PRP} can/{MD,NN} light/{JJ,NN,VB} a/{DT} fire/{NN} and/{CC} you/{PRP} can/{MD,NN} open/{JJ,VB} a/{DT} can/{MD,NN} of/{IN} beans/{NNS} ./{.} Now/{RB} the/{DT} can/{MD,NN} is/{VBZ} open/{JJ,VB} and/{CC} we/{PRP} can/{MD,NN} eat/{VB} in/{IN} the/{DT} light/{JJ,NN,VB} of/{IN} the/{DT} fire/{NN} ./{.} FST - Torbjörn Lager, UU

  23. Problem • Ambiguitet • Okända ord • Ovanliga ord • Ovanliga kontexter FST - Torbjörn Lager, UU

  24. Assessing the Brill tagger PARAMETER Accuracy 96.5% Speed Very fast Space req. Moderate Robustness Robust Learning Yes FST - Torbjörn Lager, UU

  25. Ordklasstaggning: Några ansatser • Den modiga ansatsen: "På basis av en sammanvägning av tillgänglig information, gissa! • Den försiktiga ansatsen: "Gissa inte! Eliminera bara det omöjliga!" • Den vankelmodiga ansatsen: "Gissa först, men ändra sen om nödvändigt" FST - Torbjörn Lager, UU

  26. Parsning • 'Klassisk' parsning med frasstrukturgrammatik • Ytparsning FST - Torbjörn Lager, UU

  27. Fragmentlisa springer lisa skjuter en älg Grammatiks --> np, vp. np --> pn.np --> det, n. vp --> v.vp --> v, np. pn --> [kalle].pn --> [lisa].det --> [en].n --> [älg].v --> [springer].v --> [skjuter]. En enkel frasstrukturgrammatik FST - Torbjörn Lager, UU

  28. Igenkänning och Parsning • Igenkänning ?- s([lisa,springer],[]).yes ?- s([springer,lisa],[]).no • Parsning • ?- s(Tree,[lisa,springer],[]).Tree = s(np(pn(lisa)),vp(v(springer))) FST - Torbjörn Lager, UU

  29. Parsning Frasstruktur FST - Torbjörn Lager, UU

  30. Grammatik s(s(NP,VP)) --> np(NP),vp(VP). np(np(PN)) --> pn(PN). np(np(DET,N)) --> det(DET),n(N). vp(vp(V)) --> v(V). vp(vp(V,NP)) --> v(V), np(NP). pn(pn(lisa)) --> [lisa]. det(det(en)) --> [en]. n(n(älg)) --> [älg]. v(v(går)) --> [går]. v(v(skjuter)) --> [skjuter]. Bygga träd i ett argument FST - Torbjörn Lager, UU

  31. Parsning ?- s(Tree,[lisa,skjuter,en,älg],[]). Tree = s( np( pn(lisa)), vp( v(skjuter), np( det(en), n(älg)))) Bygga träd i ett argument FST - Torbjörn Lager, UU

  32. Parsning med meta-interpretator s --> np, vp. det --> [en]. np --> pn. n --> [älg]. np --> det, n. tv --> [skjuter]. vp --> v, np. pn --> [lisa]. ? - parse(s,[lisa,skjuter,en,älg],[],Tree). Tree = s/(np/pn/lisa,vp/(v/skjuter,np/(det/en,n/älg))) FST - Torbjörn Lager, UU

  33. Parsning med meta-interpretator parse(A,P0,P,A/Trees) :- (A --> B), parse(B,P0,P,Trees). parse((B,Bs),P0,P,(Tree,Trees)) :- parse(B,P0,P1,Tree), parse(Bs,P1,P,Trees). parse([Word],[Word|P],P,Word). FST - Torbjörn Lager, UU

  34. Strukturell ambiguitet • Den gamla damen träffade killen med handväskan • John saw a man in the park with a telescope • Råttan åt upp osten och hunden och katten jagade råttan FST - Torbjörn Lager, UU

  35. Lokal ambiguitet • The old man the boats • The horse raced past the barn fell FST - Torbjörn Lager, UU

  36. Knowledge Processor Parsed text Text Some parsing issues • Accuracy • Speed • Space requirements • Robustness • Learning FST - Torbjörn Lager, UU

  37. Problems with traditional parsers • Correct lowlevel parses are often rejected because they do not fit into a global parse -> brittleness • Ambiguity -> indeterminism -> search -> slow parsers • Ambiguity -> sometimes hundreds of thousands of parse trees, and what can we do with these? FST - Torbjörn Lager, UU

  38. Another strategy (Abney) • Start with the simplest constructions (’easy-first parsing’) and be as careful as possible when parsing them -> ’islands of certainty’ • ’islands of certainty’ -> do not reject these parses even if they do not fit into a global parse -> robustness • When you are almost sure of how to resolve an ambiguity, do it! -> determinism • When you are uncertain of how to resolve an ambiguity, don’t even try! -> ’containment of ambiguity’ -> determinism • determinism -> no search -> speed FST - Torbjörn Lager, UU

  39. Shallow syntax • analyses less complete than conventional parser output • identifies some phrasal constituents (e.g. NPs), without indicating their internal structure and their function in the sentence. • or identifies the functional role of some of the words, such as the main verb, and its direct arguments. FST - Torbjörn Lager, UU

  40. Deterministic bottom-up parsing • Adapted from Karttunen 1996: define NP [(d) a* n+] ; regex NP @-> “[NP” ... “]” .o. v “[NP” NP “]” @-> “[VP” ... “]” ; apply down dannvaan [NP dann][VP v [NP aan]] • Note the use of the longest-match operator! FST - Torbjörn Lager, UU

  41. FST - Torbjörn Lager, UU

More Related