EE 516 Lecture 1

EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

Our Topics Introducing today! From JHU 2002 SuperSID Final Presentation – Reynolds et al.

Topic Coverage By Day • Data Representations and Models (4/23) • Vector Quantization • Gaussian Mixtures • The EM Algorithm • Speaker Identification (5/7) • Language Identification (5/7) • Hidden Markov Models (5/14) • Dynamic Programming • Building a Speech Recognizer (5/14)

Language Identification – Why Do it? • Multi-lingual society • Applications should be able to deal with anyone • Businesses • Automated help systems • Reservations, account access, etc. • Travel • Airport Kiosks • Train stations • Government • Funds research to identify languages • Runs evaluations in it

How Do You Do it? English Acoustic Model French Acoustic Model Output Likeliest … Tamil Acoustic Model Gaussian Mixture Models - 4/23

How Do You Do It? (2) “p ih n s” – probably English… “k r p s t” – probably Czech… Simple HMMs – 5/14 Language Models – 4/30 After Zissman 1996

How Do You Do It (3) Same methods multiple times Acero et al., Chapter 4 4/23 After Zissman 1996

How Do You Do It? (4) Run a complete speech recognizer in each language And we will see several other ways, and combinations! After Zissman 1996

Gauging Progress – The NIST Evaluations • National Institute of Standards and Technology • Has sponsored benchmark tests in multiple language processing areas for over a decade • Topic Detection & Tracking • Content Extraction • Video Analysis • Speech Recognition • Language Identification • Speaker Identification • Machine Translation • http://www.itl.nist.gov/iad/mig/tests/ • Coordination with site funding by Defense Advanced Research Projects Agency (DARPA) • Along with business interest, the driving force in advancing the State-of-the-Art

For Example, Progress in Speech Recognition

Language Identification - How Well Can It Be Done – Who Salutes? From NIST 2007 LRE Website

How Well Can it Be Done – What Languages? From NIST 2007 LRE Website

How Well Can It Be Done? – Testing Conditions • 26 languages and dialects • Telephone speech • Multiple duration conditions • 3, 10, 30 seconds • Detection Error Tradeoff (DET) Curves used to measure performance

How Well Can it Be Done – Some Numbers From NIST 2007 LRE Website

Language Identification Project • Build a language ID system with the Call Friend Data set • Implement several of the main techniques • Set up a demo on your laptop that will recognize someone’s language

Flavors of Speaker Recognition Our Focus! From JHU 2002 SuperSID Final Presentation – Reynolds et al.

Speaker Recognition – Why Do It? • Personal Applications • Voice-print passwords • Voicemail transcription – who left that message? • Business Applications • Calling your bank • Government • Is that Osama calling from Pakistan? • Prison call monitoring • Automated parolee calling – is he where you think?

How Do You Do It? The most basic approach: Gaussian Mixture Models - 4/23 More recently: Support vector machines operating on GMMs (!)

How Do You Do It? (2) Also use high-level information! From JHU 2002 SuperSID Final Presentation – Reynolds et al.

How Well Can It Be Done – Who Salutes? From NIST 2008 SRE Presentation, Martin & Greenberg

More Salutes From NIST 2008 SRE Presentation, Martin & Greenberg

From Europe From NIST 2008 SRE Presentation, Martin & Greenberg

More From Europe From NIST 2008 SRE Presentation, Martin & Greenberg

U.S. Entries From NIST 2008 SRE Presentation, Martin & Greenberg

How Well Can It Be Done – Testing Conditions • Conditions for different amounts of data • 10 sec. • 3-5 minutes • 8 minutes • Separate channel and summed channel conditions • English-speakers, non-English speakers, multilingual speakers

How Well Can It Be Done?

Speaker Verification Project • Implement a Speaker-ID system • Template based • GMM based • SVM based • Vector space model • Demonstrate it: • NIST data, e.g. 2001 Evaluation • Your own voice – implement on laptop

Speech Recognition Project • Implement an HMM based recognition system • Use, e.g., Phonebook isolated word data data set or Aurora digit set • Write features with existing front-end • Build your own HMM trainer/decoder • Set it up on your laptop for online word recognition (?!)

Highlights of Syllabus • Required Texts: • Huang, Acero, Hon: Spoken Language Processing • Deng and O’Shaughnessy, Speech Processing • EE516 Reader, at Professional Copy ‘n Print, 4200 University Way • Grading: • Projects: 50% • Final Exam: 30% • Homework 20% • Projects: • Small team or individual • Teams are self-forming • Presentation times TBD • Read ahead & pick an area!!! • Talk to relevant instructor • Suggest deciding no later than 4/30 • Office Hours at end of class and by appointment • Please sign in on email list!

EE 516 Lecture 1

EE 516 Lecture 1

Presentation Transcript

EE 516 Lecture 1

EE 434 Lecture 12

EE 615 Lecture 2

EE 211 Lecture 5

EE 615 Lecture 3

Speech Coding EE 516 Spring 2009

EE 671 – Neural Networks Lecture 1

EE 42 lecture 5

Speech Enhancement EE 516 Spring 2009

EE 330 Lecture 28

EE 4BD4 Lecture 22

EE 4BD4 Lecture 6

EE 4BD4 Lecture 25

EE 4BD4 Lecture 24

EE 211 Lecture 6

EE 211 Lecture 1 Thomas H. Ortmeyer

EE 4BD4 Lecture 19

EE 122: Lecture 5

EE 627 Lecture 11

EE 4BD4 Lecture 12

EE 42 lecture 5

EE 211 Lecture 6