330 likes | 510 Views
Ling 570. Day #2 . Tokenizing and evaluating tokenization. Tokenization.
E N D
Ling 570 Day #2
Tokenizing and evaluating tokenization Tokenization
After coming close to a partial settlement a year ago, shareholders who filed civil suits against Ivan F. Boesky and the partnerships he once controlled again are approaching an accord, people familiar with the case said. Meanwhile, within the next few weeks, the limited partners in Ivan F. Boesky & Co. L.P. are expected to reach a partial settlement with Drexel Burnham Lambert Inc. regarding the distribution of the $330 million in partnership assets, said one of the individuals. One individual said the shareholders' accord was "well worked out." There are at least 27 class-action shareholder suits that have been consolidated in federal court in New York under U.S. District Judge Milton Pollack.
Tokenize • After coming close to a partial settlement a year ago, shareholders who filed civil suits against Ivan F. Boesky and Co. L.P. Drexel’s plaintiffs’ …
FSAs Formally • A Finite-State Automaton (FSA) is a 5-tuple: • A set of states Q {q0,q1,q2,q3,q4} • A finite alphabet Σ {b,a,!} • A start state q0 • A set of accepting states {q4} • A transition function Q x Σ Q
FSA Example • An automaton: • Σ
FSA Example • An automaton: • Σ= {a,b} • Q =
FSA Example • An automaton: • Σ= {a,b} • Q = {q0,q1}; start: ; final:
FSA Example • An automaton: • Σ= {a,b} • Q = {q0,q1}; start: q0; final: {q1} • Regex=
FSA Example • An automaton: • Σ= {a,b} • Q = {q0,q1}; start: q0; final: {q1} • Regex= a*b+
FSA Example • An automaton: • Σ= {a,b} • Q = {q0,q1}; start: q0; final: {q1} • Regex= a*b+
FSA Example • An automaton: • Σ= {a,b} • Q = {q0,q1}; start: q0; final: {q1} • Regex= a*b+
FSA Example • An automaton: • Σ= {a,b} • Q = {q0,q1}; start: q0; final: {q1} • Regex= a*b+
FSA Example • An automaton: • Σ= {a,b} • Q = {q0,q1}; start: q0; final: {q1} • Regex= a*b+
Another FSA Example • Another automaton:
Two Views of FSAs • Recognition: An FSA is a model that, given an input string, accepts the string if it is in the language, and rejects otherwise • Generation: An FSA m is a model that can generate all and only the strings in L(m).
FSTs • Finite automaton that maps between two strings • Automaton with two labels/arc • input:output
FST Applications • Tokenization • Segmentation • Morphological analysis • Transliteration • Translation • Speech recognition • Spoken language understanding
Approaches to FSTs • FST as recognizer: • Takes pair of input:output strings • Accepts if in language, o.w. rejects
Approaches to FSTs • FST as recognizer: • Takes pair of input:output strings • Accepts if in language, o.w. rejects • FST as generator: • Outputs pairs of strings in languages
Approaches to FSTs • FST as recognizer: • Takes pair of input:output strings • Accepts if in language, o.w. rejects • FST as generator: • Outputs pairs of strings in languages • FST as translator: • Reads an input string and prints output string
Approaches to FSTs • FST as recognizer: • Takes pair of input:output strings • Accepts if in language, o.w. rejects • FST as generator: • Outputs pairs of strings in languages • FST as translator: • Reads an input string and prints output string • FST as set relator: • Computes relations between sets
FST as Translator FR: ce bill met de le baume sur une blessure EN: this bill putsbalm on a sore wound
FST Application Examples • Case folding: • He said he said
FST Application Examples • Case folding: • He said he said • Tokenization: • “He ran.” “ He ran . “
FST Application Examples • Case folding: • He said he said • Tokenization: • “He ran.” “ He ran . “ • POS tagging: • They can fish PRO VERB NOUN
FST Application Examples • Pronunciation: • B AH T EH R B AH DX EH R • Morphological generation: • Fox s Foxes • Morphological analysis: • cats cat s
Stemming/WFSTs/Markov Chains • Next Class: