EE 225D

EE 225D Audio Signal Processing in Humans and Machines Prof. N. Morgan and friends MW 4:00-5:30 http://www.icsi.berkeley.edu/eecs225d/spr12/overview.html

Textbook: Prerequisites Speech and Audio Signal Processing Gold, Morgan, and Ellis Wiley&Sons, 2nd edition, 2011 EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor

Speech and audio signal processing: why does this material matter? • Speech w/o visual vs visual w/o speech • Requires DSP, machine learning • Multidisciplinary tasks are good training • Many applications!

What should we be able to do(automatically)? • Human example suggests, plenty • What was said • Who said it • When they said it • What it meant • How to respond

Why is it hard? • Speaker variability (within and between) • Noise, reverberation, channel • Confusable vocabulary • Meaning and tone

Course Philosophy I • People can do these tasks effortlessly • Include psychoacoustics and physiology • Also some acoustics • But of course, also DSP and machine learning

Course Philosophy II • First part of the course is basic stuff • The rest is applications • Much of the course grade based on an original project • Some practice in oral presentation

Section I: Broad background • Synthesis/vocoding history (chaps 2&3) • Recognition history (chap 4) • Machine recognition basics (chap 5) • Human recognition basics (chap 18)

Section II: Scientific background • Pattern classification (chaps 8 and 9) • Ear physiology (chap 14) • Acoustics (chaps 10 and 13) • Linguistic sound categories (chap 23)

Section IIIa: Engineering Apps • Signal processing “front end” (chaps 19-22) • Perceptual audio coding (chap 35) • Music signal analysis (chap37) • Source separation (chap 39)

Section IIIb: Engineering Apps • Deterministic sequence recognition (chap 24) • Statistical modeling and inference (chaps 25,26) • Discriminant methods and adaptation (chaps 27,28)

Section IIIc: Engineering Apps • Speech synthesis (chap 30) • Spoken dialog systems (chap29++) • Speaker verification (chap 41) • Speaker diarization (chap 42)

Course grading • Quizzes/assignments (for first half): 30% • Project proposal: 10% • Project oral presentation: 20% • Project write-up & results: 40%

Course location • After today, 6th floor ICSI • 1947 Center Street, between Milvia and MLK • Class will start at 4:15 instead of 4:10 (15 minute walk from Cory) • Office hour, one hour before each class

Course location

EE 225D

EE 225D

Presentation Transcript

EE 4780

EE 4780

EE 4780

EE 314: Basic EE II

EE 225D, Section I: Broad background

ee cummings

EE 2415

EE 225D, Section I: Broad background

EE 1205

ee cummings

EE 2415

EE:

Java EE

EE(EE+)

EE 365

EE 365

EE 3131 EE 3801 Telecommunications Fundamentals

EE 225D

EE ePortfolio