160 likes | 282 Views
EE 225D. Audio Signal Processing in Humans and Machines Prof. N. Morgan and friends MW 4:00-5:30 http://www.icsi.berkeley.edu/eecs225d/spr12/overview.html. Textbook:. Prerequisites. Speech and Audio Signal Processing Gold, Morgan, and Ellis Wiley&Sons, 2 nd edition, 2011.
E N D
EE 225D Audio Signal Processing in Humans and Machines Prof. N. Morgan and friends MW 4:00-5:30 http://www.icsi.berkeley.edu/eecs225d/spr12/overview.html
Textbook: Prerequisites Speech and Audio Signal Processing Gold, Morgan, and Ellis Wiley&Sons, 2nd edition, 2011 EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor
Speech and audio signal processing: why does this material matter? • Speech w/o visual vs visual w/o speech • Requires DSP, machine learning • Multidisciplinary tasks are good training • Many applications!
What should we be able to do(automatically)? • Human example suggests, plenty • What was said • Who said it • When they said it • What it meant • How to respond
Why is it hard? • Speaker variability (within and between) • Noise, reverberation, channel • Confusable vocabulary • Meaning and tone
Course Philosophy I • People can do these tasks effortlessly • Include psychoacoustics and physiology • Also some acoustics • But of course, also DSP and machine learning
Course Philosophy II • First part of the course is basic stuff • The rest is applications • Much of the course grade based on an original project • Some practice in oral presentation
Section I: Broad background • Synthesis/vocoding history (chaps 2&3) • Recognition history (chap 4) • Machine recognition basics (chap 5) • Human recognition basics (chap 18)
Section II: Scientific background • Pattern classification (chaps 8 and 9) • Ear physiology (chap 14) • Acoustics (chaps 10 and 13) • Linguistic sound categories (chap 23)
Section IIIa: Engineering Apps • Signal processing “front end” (chaps 19-22) • Perceptual audio coding (chap 35) • Music signal analysis (chap37) • Source separation (chap 39)
Section IIIb: Engineering Apps • Deterministic sequence recognition (chap 24) • Statistical modeling and inference (chaps 25,26) • Discriminant methods and adaptation (chaps 27,28)
Section IIIc: Engineering Apps • Speech synthesis (chap 30) • Spoken dialog systems (chap29++) • Speaker verification (chap 41) • Speaker diarization (chap 42)
Course grading • Quizzes/assignments (for first half): 30% • Project proposal: 10% • Project oral presentation: 20% • Project write-up & results: 40%
Course location • After today, 6th floor ICSI • 1947 Center Street, between Milvia and MLK • Class will start at 4:15 instead of 4:10 (15 minute walk from Cory) • Office hour, one hour before each class