1 / 27

CSA3050: Natural Language Algorithms

CSA3050: Natural Language Algorithms. Finite State Devices. Sources. Blackburn & Striegnitz Ch. 2. Parsers vs. Recognisers. Recognizers tell us whether a given input is accepted by some finite state automaton. Often we would like to have an explanation of why it was accepted.

Download Presentation

CSA3050: Natural Language Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA3050: Natural Language Algorithms Finite State Devices

  2. Sources • Blackburn & Striegnitz Ch. 2 CSA3050 NLP Algorithms

  3. Parsers vs. Recognisers • Recognizers tell us whether a given input is accepted by some finite state automaton. • Often we would like to have an explanation of why it was accepted. • Parsers give us that kind of explanation. • What form does it take? CSA3050 NLP Algorithms

  4. Finite State Parser • The output of a finite state parser is a sequence of nodes and arcs. If we, gave the input [h,a,h,a,!] to a parser for our first laughing automaton, it should give us [1,h,2,a,3,h,2,a,3,!,4]. • The technique in Prolog for turning a recognizer into a parser is to add one or more extra arguments to keep track of the structure that was found. CSA3050 NLP Algorithms

  5. Recogniser recognize1(Node,[ ]) :-    final(Node). Parser parse1(Node,[ ],[Node]) :-    final(Node). Base Case CSA3050 NLP Algorithms

  6. Recogniser recognize1(Node1, String) :- arc(Node1,Node2,Label), traverse1(Label, String, NewString), recognize1(Node2, NewString). Parser parse1(Node1, String, [Node1,Label|Path]) :-arc(Node1,Node2,Label),traverse1( Label, String, NewString), parse1(Node2, NewString, Path). Recursive Case CSA3050 NLP Algorithms

  7. Complex Labels • So far we have only considered transitions with single-character labels. • More complex labels are possible – e.g. symbols comprising several characters. • We can construct an FSA recognizing English noun phrases that can be built from the words:the, a, wizard, witch, broomstick, hermione, harry, ron, with, fast. CSA3050 NLP Algorithms

  8. FSA for Noun Phrases CSA3050 NLP Algorithms

  9. initial(1).final(3).arc(1,2,a).arc(1,2,the).arc(2,2,brave).arc(2,2,fast).arc(2,3,witch). initial(1).final(3).arc(1,2,a).arc(1,2,the).arc(2,2,brave).arc(2,2,fast).arc(2,3,witch). arc(2,3,wizard).arc(2,3,broomstick).arc(2,3,rat).arc(1,3,harry).arc(1,3,ron).arc(1,3,hermione).arc(3,1,with). FSA for NPs in Prolog CSA3050 NLP Algorithms

  10. Parsing a Noun Phrase testparse1(Symbols,Parse) :- initial(Node),parse1(Node,Symbols,Parse). ?- testparse1([the,fast,wizard],Z). Z=[1, the, 2, fast, 2, wizard, 3] CSA3050 NLP Algorithms

  11. Rewriting Categories • It is also possible to obtain a more abstract parse, e.g. ?- testparse2([the,fast,wizard],Z). Z=[1, det, 2, adj, 2, noun, 3] • What changes are required to obtain this behaviour? CSA3050 NLP Algorithms

  12. 1. Changes to the FSA %FSA %Lexicon initial(1).           lex(a,det).final(3).             lex(the,det).arc(1,2,det).         lex(fast,adj).arc(2,2,adj).         lex(brave,adj).arc(2,3,cn).          lex(witch,cn).arc(1,3,pn).          lex(wizard,cn).arc(3,1,prep).        lex(broomstick,cn).                      lex(rat,cn).                      lex(harry,pn).                      lex(hermione,pn).                      lex(ron,pn).                      lex(with,prep). CSA3050 NLP Algorithms

  13. Parse1 parse1(Node1, String, [Node1,Label|Path]) :-arc(Node1,Node2,Label),traverse1( Label, String, NewString), parse1(Node2, NewString, Path). Changes to the Parser Parse2 parse2(Node1, String, [Node1,Label|Path]) :-arc(Node1,Node2,Label),traverse2( Label, String, NewString), parse2(Node2, NewString, Path). traverse2(Label,[Symbol|Symbols],Symbols) :-   lex(Symbol,Label).

  14. Handling Jumps traverse3('#',String,String). traverse3(Cat,[Word|Words],Words) :-   lex(Word,Cat). CSA3050 NLP Algorithms

  15. Finite State Transducers • A finite state transducer essentially is a finite state automaton that works on two (or more) tapes. • The most common way to think about transducers is as a kind of ``translating machine'‘ which works by reading from one tape and writing onto the other. CSA3050 NLP Algorithms

  16. initial state: arrowhead final state:double circle a:b read from first tape and write to second tape A Translator from a to b CSA3050 NLP Algorithms

  17. Prolog Representation :- op(250,xfx,:).initial(1).final(1).arc(1,1,a:b). CSA3050 NLP Algorithms

  18. Modes of Operation • generation mode: It writes a string of as on one tape and a string bs on the other tape. Both strings have the same length. • recognition mode: It accepts when the word on the first tape consists of exactly as many as as the word on the second tape consists of bs. • translation mode (left to right): It reads as from the first tape and writes an b for every a that it reads onto the second tape. • translation mode (right to left): It reads bs from the second tape and writes an a for every f that it reads onto the first tape. CSA3050 NLP Algorithms

  19. Transducers can make jumps going from one state to another without doing anything on either one or on both of the tapes. So, transitions of the form a:# or #:a or #:# are possible. Transducers and Jumps CSA3050 NLP Algorithms

  20. Simple Transducer in Prolog transduce1(Node,[ ],[ ]) :-    final(Node). transduce1(Node1,Tape1,Tape2) :-arc(Node1,Node2,Label),traverse1(Label, Tape1, NewTape1, Tape2, NewTape2),transduce1(Node2,NewTape1,NewTape2). CSA3050 NLP Algorithms

  21. Traverse for FST traverse1(L1:L2, [L1|RestTape1], RestTape1, [L2|RestTape2], RestTape2). testtrans1(Tape1,Tape2) :-    initial(Node),    transduce1(Node,Tape1,Tape2). CSA3050 NLP Algorithms

  22. Handling Jumps:4 cases • Jump on both tapes. • Jump on the first but not on the second tape. • Jump on the second but not on the first tape. • Jump on neither tape (this is what traverse1 does). CSA3050 NLP Algorithms

  23. 4 Corresponding Clauses traverse2('#':'#',Tape1,Tape1,Tape2,Tape2). traverse2('#':L2,Tape1,Tape1,[L2|RestTape2],RestTape2). traverse2(L1:'#',[L1|RestTape1],RestTape1,Tape2,Tape2). traverse2(L1:L2, [L1|RestTape1], RestTape1, [L2|RestTape2], RestTape2). CSA3050 NLP Algorithms

  24. Morphological Analysis with FSTs • Morphology is concerned with the internal structure of words. • How can a word be decomposed into morphemes? • How do the morphemes combine? • What are legitimate combinations? • Basic idea is to write FSTs that map the surface form of a word to a description of the morphemes that constitute that word or vice versa. • Example: wizard+s to wizard+PL or kiss+ed to kiss+PAST. CSA3050 NLP Algorithms

  25. Plural Nouns in English • Regular Forms • add an s as in wizard+s. • add –es as in witch +s • Handled with morpho-phonological rules that insert an e whenever the morpheme preceding the s ends in s, x, ch or another fricative. • Irregular forms • mouse/mice • automaton/automata • Handled on a case-by-case basis • Require transducer that translates wizard+s into wizard+PL, witch+es into witch+PL, mice, into mouse+PL and automata into automaton+PL. CSA3050 NLP Algorithms

  26. FST for English Plurals CSA3050 NLP Algorithms

  27. FST in Prolog lex(wizard:wizard,`STEM-REG1').lex(witch:witch,`STEM-REG2').lex(automaton:automaton,`IRREG-SG').lex(automata:`automaton-PL',`IRREG-PL').lex(mouse:mouse,`IRREG-SG').lex(mice:`mouse-PL',`IRREG-PL'). CSA3050 NLP Algorithms

More Related