200 likes | 314 Views
CSA305: Natural Language Algorithms. Deterministic and Non Deterministic Recognition. Acknowledgement. Material presented adapted from Jurafsky and Martin Ch 2. Representation of Automata using Transition Tables. Transition Table Representation in Prolog. S a b ! s(0,1,0,0). s(1,0,2,0).
E N D
CSA305: Natural Language Algorithms Deterministic and Non Deterministic Recognition CSA3050 NLP Algorithms
Acknowledgement • Material presented adapted fromJurafsky and Martin Ch 2 CSA3050 NLP Algorithms
Representation of Automata using Transition Tables CSA3050 NLP Algorithms
Transition Table Representation in Prolog S a b ! s(0,1,0,0). s(1,0,2,0). s(2,0,3,0). s(3,0,3,4). s(4,0,0,0). next(OldState,a,NewState) :- s(OldState,NewState,_,_). next(OldState,b,NewState) :- s(OldState,_,NewState,_). next(OldState,’!’,NewState) :- s(OldState,_,_,NewState). CSA3050 NLP Algorithms
A Better Representation s(0,b,1). s(1,a,2). s(2,a,3). s(3,a,3). s(3,’!’,4). next(OldState,Sym,NewState) :- s(OldState,Sym,NewState). CSA3050 NLP Algorithms
The Process of Recognition 1 • Start in the initial state and at the first symbol of the word. • If there is an arc labelled with that symbol, the machine transitions to the next state, and the symbol is consumed. • The process continues with successive symbols until .... CSA3050 NLP Algorithms
The Process of Recognition 2 One or more of these conditions holds: • A. All symbols in the input are consumed • IF current state is final, succeed, else fail • B. There are no transitions out of a state for the current symbol. • fail CSA3050 NLP Algorithms
Deterministic Recognition • A deterministic algorithm is one that has no choice points • The following algorithm takes as input a tape and an automaton. • returns accept else reject CSA3050 NLP Algorithms
DETERMINISTIC FSA RECOGNITION CSA3050 NLP Algorithms
Skeleton of Prolog Implementation drec(Tape,Machine,State,Result). drec([ ], M, S, yes) :- final(S). drec([H|T], M, S, Result) :- tran(M,S,H,N), drec(T,M,N,Result). drec(_,_,_,no). CSA3050 NLP Algorithms
Failure States • We can regard failure as a special state. • That state is reached by adding supplementary arcs that represent invalid input. CSA3050 NLP Algorithms
Adding a Failure State CSA3050 NLP Algorithms
Deterministic versus Non Deterministic Recognition. • The behaviour of the automata we have considered is fully determined by the current state, and the input symbol. • The recognition process is said to be deterministic • This is not necessarily the case. • Several arcs with the same label. • -Transitions. Arcs with no label. • Automata like this are called non-determinstic CSA3050 NLP Algorithms
Non Deterministic FAs CSA3050 NLP Algorithms
Non Deterministic Recognition • There are three ways of dealing with non-deterministic recognition: • Backtracking: at every choice point, record the state and as yet unexplored choices. • Lookahead: peek ahead n symbols in the input in order to decide which path to take. • Parallel search: look at every path in parallel. CSA3050 NLP Algorithms
ND-RECOGNISE • function ND-RECOGNISE(tape,machine) returns accept or reject • agenda { (q0(machine),0) } • search_state NEXT(agenda) • loop • if ACCEPT-STATE?(search_state) = true • then return accept • else • agenda agenda GENERATE-NEW-STATES(search_state) • if agenda is empty • then return reject • else current_state NEXT(agenda) • end CSA3050 NLP Algorithms
ACCEPT-STATE? function ACCEPT-STATES?(search_state) mstate first(search_state) tape_pos second(search_state) if tape[tape_pos] = end_input and IS-FINAL?(mstate) then return true elsereturn false CSA3050 NLP Algorithms
GENERATE-NEW-STATES function GENERATE-NEW-STATES(search_state) mstate first(search_state) tape_pos second(search_state) return {(x,tape_pos) | x=trantable[mstate,] } {(x, tape_pos + 1) | trantable[mstate, tape[tape_pos]]} CSA3050 NLP Algorithms
Recognition as Search • Recognition can be regarded as a search problem • Initial state, Goal State • Rules • Strategy • Different search behaviours (depth first, breadth first) can be evoked by managing the agenda in different ways. • See Jurafsky & Martin sect 2.2 CSA3050 NLP Algorithms
Deterministic and Non Deterministic FSAs • The class of languages recognisable by NDFSA is identical to that recognised by DFSA. • For every NDFSA ND there is an equivalent FSA D. • The states of D correspond to sets of states in ND • If N is the number of states in ND, the number of states in D is ≤ 2N CSA3050 NLP Algorithms