EE 225D

EE 225D Audio Signal Processing in Humans and Machines Prof. N. Morgan and friends MW 4:00-5:30 http://www.icsi.berkeley.edu/eecs225d/spr14/overview.html http://www.icsi.berkeley.edu/eecs225d/spr14/slides/

Textbook Speech and Audio Signal Processing Gold, Morgan, and Ellis Wiley&Sons, 2nd edition, 2011

Prerequisites EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor

Speech and audio signal processing: why does this material matter? • Speech w/o visual vs visual w/o speech • Requires DSP, machine learning • Multidisciplinary tasks are good training • Many applications!

What should we be able to do(automatically)? • Human example suggests, plenty • What was said • Who said it • When they said it • What it meant • How to respond

Why is it hard? • Speaker variability (within and between) • Noise, reverberation, channel • Confusable vocabulary • Meaning and tone

Course Philosophy I • People can do these tasks effortlessly • Include psychoacoustics and physiology • Also some acoustics • But of course, also DSP and machine learning

Course Philosophy II • First part of the course is basic stuff • The rest is applications • Much of the course grade based on an original project • Some practice in oral presentation • Middle of the course has students presenting the material (slides from previous classes can help)

Section I: Broad background • Synthesis/vocoding history (chaps 2&3) • Recognition history (chap 4) • Machine recognition basics (chap 5) • Human recognition basics (chap 18)

Section II: Scientific background • Pattern classification (chaps 8 and 9) • Acoustics (chaps 10 and 13) • Linguistic sound categories (chap 23) • (Auditory neurophysiology late in the course)

Section IIIa: Engineering AppsSpeech recognition • Signal processing “front end” (chaps 19-22) • Deterministic sequence recognition (chap 24) • Statistical modeling and inference (chaps 25,26) • Discriminant methods and adaptation (chaps 27,28) • Speech recognition and understanding (chap 29)

Section IIIb: Engineering AppsOther speech applications • Speech synthesis (chap 30) • Speaker verification (chap 41)

Section IIIc: Engineering AppsOther audio applications • Perceptual audio coding (chap 35) • Music signal analysis (chap37) • Source separation (chap 39)

Section IV: Hearing[presented by Prof. Oded Ghitza, Boston University] • Auditory physiology (chap 14) • Psychoacoustics (chap 15,16)

Section V: Student Projects • Project proposal: By spring break, iterate on proposed project • Last week of class, students present their projects, modeled after ICASSP or Interspeech • Finals week, submit written version of project, schedule demos • Any topic in speech/music/general audio potentially OK, including tutorial or original research

Course grading • Quizzes/homeworks (for first half): 20% • Student presentations/participation: 20% • Project proposal: 10% • Project oral presentation: 20% • Project write-up & results: 30%

Course location • After today, 6th floor ICSI • 1947 Center Street, between Milvia and MLK • Class will start at 4:15 instead of 4:10 (15 minute walk from Cory) • Office hour, one hour before each class

Course location

EE 225D

EE 225D

Presentation Transcript

EE 4780

EE 4780

EE 314: Basic EE II

EE 225D, Section I: Broad background

EE 2415

EE 225D, Section I: Broad background

EE 1205

EE:

Java EE

EE 225D

EE(EE+)

EE 7700

EE 365

EE 3131 EE 3801 Telecommunications Fundamentals

EE 365

EE 3131 EE 3801 Telecommunications Fundamentals

EE 365

EE 3131 EE 3801 Telecommunications Fundamentals