The 1980’s

The 1980’s • Collection of large standard corpora • Front ends: auditory models, dynamics • Engineering: scaling to large vocabulary continuous speech • Second major (D)ARPA ASR project • HMMs become ready for prime time

Standard Corpora Collection • Before 1984, chaos • TIMIT • RM (later WSJ) • ATIS • NIST, ARPA, LDC

Front Ends in the 1980’s • Mel cepstrum (Bridle, Mermelstein) • PLP (Hermansky) • Delta cepstrum (Furui) • Auditory models (Seneff, Ghitza, others)

Mel Frequency Scale

Spectral vs Temporal Processing Analysis (e.g., cepstral) frequency Spectral processing Time Processing (e.g., mean removal) frequency Temporal processing

Dynamic Speech Features • temporal dynamics useful for ASR • local time derivatives of cepstra • “delta’’ features estimated over multiple frames (typically 5) • usually augments static features • can be viewed as a temporal filter

“Delta” impulse response .2 .1 0 -2 -1 0 1 2 frames -.1 -.2

HMM’s for ContinuousSpeech • Using dynamic programming for cts speech(Vintsyuk, Bridle, Sakoe, Ney….) • Application of Baker-Jelinek ideas to continuous speech (IBM, BBN, Philips, ...) • Multiple groups developing major HMMsystems (CMU, SRI, Lincoln, BBN, ATT) • Engineering development - coping with data, fast computers

2nd (D)ARPA Project • Common task • Frequent evaluations • Convergence to good, but similar, systems • Lots of engineering development - now up to 60,000 word recognition, in real time, on aworkstation, with less than 10% word error • Competition inspired others not in project -Cambridge did HTK, now widely distributed

Knowledge vs. Ignorance • Using acoustic-phonetic knowledge in explicit rules • Ignorance represented statistically • Ignorance-based approaches (HMMs) “won”, but • Knowledge (e.g., segments) becoming statistical • Statistics incorporating knowledge

Some 1990’s Issues • Independence to long-term spectrum • Adaptation • Effects of spontaneous speech • Information retrieval/extraction withbroadcast material • Query-style systems (e.g., ATIS) • Applying ASR technology to relatedareas (language ID, speaker verification)

Where Pierce Letter Applies • We still need science • Need language, intelligence • Acoustic robustness still poor • Perceptual research, models • Fundamentals of statistical patternrecognition for sequences • Robustness to accent, stress,rate of speech, ……..

Progress in 25 Years • From digits to 60,000 words • From single speakers to many • From isolated words to continuousspeech • From no products to many products,some systems actually saving LOTSof money

Real Uses • Telephone: phone company services(collect versus credit card) • Telephone: call centers for queryinformation (e.g., stock quotes, parcel tracking) • Dictation products: continuous recognition, speaker dependent/adaptive

But: • Still <97% accurate on “yes” for telephone • Unexpected rate of speech causes doublingor tripling of error rate • Unexpected accent hurts badly • Accuracy on unrestricted speech at 60% • Don’t know when we know • Few advances in basic understanding

ErrorRate Class 1 2 3 4 5 6 7 8 9 0 1 191 0 0 5 1 0 1 0 2 0 4.5 2 0 188 2 0 0 1 3 0 0 6 6.0 3 0 3 191 0 1 0 2 0 3 0 4.5 4 8 0 0 187 4 0 1 0 0 0 6.5 5 0 0 0 0 193 0 0 0 7 0 3.5 6 0 0 0 0 1 196 0 2 0 1 2.0 7 2 2 0 2 0 1 190 0 1 2 5.0 8 0 1 0 0 1 2 2 196 0 0 2.0 9 5 0 2 0 8 0 3 0 179 3 10.5 0 1 4 0 0 0 1 1 0 1 192 4.5 Overall error rate 4.85% Confusion Matrix for Digit Recognition

‘88 ‘89 ‘90 ‘91 ‘92 ‘93 ‘94 Large Vocabulary CSR ErrorRate% • 12 • 9 • Ø 1 • 6 • 3 Year --- RM ( 1K words, PP 60) ___WSJØ, WSJ1(5K, 20-60K words, PP 100) ~~ ~~

The 1980’s

The 1980’s

Presentation Transcript

Neuroorthopaedics

Fads

The Iran-Iraq War (1980-1988)

Sports of the 1980’s

The First Intermediate Period ( 2151-1980 )

French and English Relations 1980’s 1990’s

Fads

The Staggers Rail Act of 1980

1980

SUCCESSION

Alaska National Interest Lands Act of 1980

Established 1980 Sponsors: WMO (1980+), ICSU (1980+) and IOC (1992+) Objectives