180 likes | 229 Views
EE 225D. Audio Signal Processing in Humans and Machines Prof. N. Morgan and friends MW 4:00-5:30 http://www.icsi.berkeley.edu/eecs225d/spr14/overview.html http://www.icsi.berkeley.edu/eecs225d/spr14/slides/. Textbook. Speech and Audio Signal Processing Gold, Morgan, and Ellis
E N D
EE 225D Audio Signal Processing in Humans and Machines Prof. N. Morgan and friends MW 4:00-5:30 http://www.icsi.berkeley.edu/eecs225d/spr14/overview.html http://www.icsi.berkeley.edu/eecs225d/spr14/slides/
Textbook Speech and Audio Signal Processing Gold, Morgan, and Ellis Wiley&Sons, 2nd edition, 2011
Prerequisites EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor
Speech and audio signal processing: why does this material matter? • Speech w/o visual vs visual w/o speech • Requires DSP, machine learning • Multidisciplinary tasks are good training • Many applications!
What should we be able to do(automatically)? • Human example suggests, plenty • What was said • Who said it • When they said it • What it meant • How to respond
Why is it hard? • Speaker variability (within and between) • Noise, reverberation, channel • Confusable vocabulary • Meaning and tone
Course Philosophy I • People can do these tasks effortlessly • Include psychoacoustics and physiology • Also some acoustics • But of course, also DSP and machine learning
Course Philosophy II • First part of the course is basic stuff • The rest is applications • Much of the course grade based on an original project • Some practice in oral presentation • Middle of the course has students presenting the material (slides from previous classes can help)
Section I: Broad background • Synthesis/vocoding history (chaps 2&3) • Recognition history (chap 4) • Machine recognition basics (chap 5) • Human recognition basics (chap 18)
Section II: Scientific background • Pattern classification (chaps 8 and 9) • Acoustics (chaps 10 and 13) • Linguistic sound categories (chap 23) • (Auditory neurophysiology late in the course)
Section IIIa: Engineering AppsSpeech recognition • Signal processing “front end” (chaps 19-22) • Deterministic sequence recognition (chap 24) • Statistical modeling and inference (chaps 25,26) • Discriminant methods and adaptation (chaps 27,28) • Speech recognition and understanding (chap 29)
Section IIIb: Engineering AppsOther speech applications • Speech synthesis (chap 30) • Speaker verification (chap 41)
Section IIIc: Engineering AppsOther audio applications • Perceptual audio coding (chap 35) • Music signal analysis (chap37) • Source separation (chap 39)
Section IV: Hearing[presented by Prof. Oded Ghitza, Boston University] • Auditory physiology (chap 14) • Psychoacoustics (chap 15,16)
Section V: Student Projects • Project proposal: By spring break, iterate on proposed project • Last week of class, students present their projects, modeled after ICASSP or Interspeech • Finals week, submit written version of project, schedule demos • Any topic in speech/music/general audio potentially OK, including tutorial or original research
Course grading • Quizzes/homeworks (for first half): 20% • Student presentations/participation: 20% • Project proposal: 10% • Project oral presentation: 20% • Project write-up & results: 30%
Course location • After today, 6th floor ICSI • 1947 Center Street, between Milvia and MLK • Class will start at 4:15 instead of 4:10 (15 minute walk from Cory) • Office hour, one hour before each class