80 likes | 223 Views
Improving Speech Recognition with SVM. Jerry Zhu CALD KDD Lab 2001/2/23 (Many thanks to all my reviewers!). What’s inside a speech recognizer?. Language Model. It cites class size quality of life as problems . P(A|S) S -5522539 it sites class eyes quality of life as problems
E N D
Improving Speech Recognition with SVM Jerry Zhu CALD KDD Lab 2001/2/23 (Many thanks to all my reviewers!)
Language Model It cites class size quality of life as problems. P(A|S)S -5522539 it sites class eyes quality of life as problems -5556088 it sites class size quality of life has problems -5556088 it cites class size quality of life has problems -5622228 it sitesklaseyes quality of life has problems -5653812 it sites class size quality of life as problems -5653812 it cites class size quality of life as problems ........ (many, many other hypotheses) • S* = argmaxs P(S|A) = argmax P(A|S)*P(S)
Trigram Language Model • Trigram langauge model: P(S)=P(w1…wn)= P(wi|wi-1,wi-2) • Widely used; short-sighted: gives high P(S) to bad sentences: he took over or is it by all human beings but the abortion debate would develop of mark you take the fifth time on foreign relations committee senator dole does have knowledge of forty years so that's one reason i came in for purposes of this i mean the emergency workers working like little crimes the defense lawyers and they have them yesterday do we told you earlier on inside politics weekend
Why is it bad • We may pick the wrong sentence during decoding. • Idea: penalize P(S) if S looks like a bad sentence: S* = argmax P(A|S)*P(S)*P(S is bad) • We need a classifier! SVM Natural sentence trigram-generated sentence S P(S is bad)
Why SVM? • It’s cool. • It performs very good for document classification. • Its kernel trick allows sentence level interactions. • But: sentence vector representation? Bag-of-word vector (doesn’t work) S=“Let me stop you at that point” <a, aardvark, aardwolf, … at …let …me ... that …zoo> <0, 0, 0, … 1 … 1 … 1 … 1 …. 0 > Value can be binary, raw count, frequency.
Part-of-speech sequence vector S=“Let me stop you at that point” pos=“VB PRP VB PRP IN DT NN” Vector space = all sequences of length k, say k=3 <… PRP-IN-NN …VB-PRP-PRP … VB-PRP-VB …> <… 6 … 4 … 3 …> Intuition: sentences with similar sequences are in same class. This doesn’t work (accuracy 58%) (detail: excluding trigram influence actually hurts)
What to try next • POS with stopwords • Parsing • Semantic coherence “zip-lock” and “Japanese bank” “VB me VB you at that NN”