320 likes | 650 Views
Problem 1: Word Segmentation. whatdoesthisreferto. what does this refer to. Application: Chinese Text. Application: Internet Domain Names. www. visitbritain .com. Visit Britain. Statistical Machine Learning. Best segmentation = one with highest probability
E N D
Problem 1: Word Segmentation whatdoesthisreferto what doesthis refer to
Application: Internet Domain Names www.visitbritain.com Visit Britain
Statistical Machine Learning • Best segmentation= one with highest probability • Probability of a segmentation= P(first word) × P(rest of segmentation) • P(word)= estimated by counting
Statistical Machine Learning choosespain Choose Spain Chooses pain P(“Choose Spain”) > P(“Chooses Pain”)
Example • segment(“nowisthetime…”) • Pf(“n”) × Pr(“owisthetime…”) • Pf(“no”) × Pr(“wisthetime…”) • Pf(“now”) × Pr(“isthetime…”) • Pf(“nowi”) × Pr(“sthetime…”) • ……
Example • segment(“nowisthetime…”)
Performance • Accuracy = 98% • Trained on 1.7B words (English) • Typical errors: • baseratesoughtto • base rate sought to • smallandinsignificant • small and in significant • ginormousego • g in or mouse go
Some Results • whorepresents.com[“who”, “represents”] • therapistfinder.com[“therapist”, “finder”] • expertsexchange.com[“experts”, “exchange”] • speedofart.net[“speed”, “of”, “art”] • penisland.comerror: expected [“pen”, “island”]
Problem 2: Spelling Correction • Mehran Salami • Typical word processor: Tehran Salami • But Google can …
Statistical Machine Learning • Best correction=one with highest probability • Probability of a spelling correction c=P(c as a word) ×P(original is a typo for c) • P(c as a word)= estimated by counting • P(original is a typo for c)= proportional to number of changes
Problem 3: Speech Recognition • An informal, incomplete grammar of the English language runs over 1,700 pages. • Invariably, simple models and a lot of data trump more elaborate models based on less data.
Problem 3: Speech Recognition • If you have a lot of data, memorisation is a good policy. • For many tasks such as speech recognition, once we have a billion or so examples, we essentially have a closed set that represents (or at least approximates) what we need, without general rules.
Problem 3: Speech Recognition “Every time I fire a linguist, the performance of our speech recognition system goes up.” --- Fred Jelinek
Conclusion (Statistical) [Machine] Learning Is The Ultimate Agile Development Tool Peter Norvig (Director of Research, Google)