1 / 15

EE 225D

EE 225D. Audio Signal Processing in Humans and Machines Prof. N. Morgan and friends MW 4:00-5:30 http://www.icsi.berkeley.edu/eecs225d/spr12/overview.html. Textbook:. Prerequisites. Speech and Audio Signal Processing Gold, Morgan, and Ellis Wiley&Sons, 2 nd edition, 2011.

Download Presentation

EE 225D

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EE 225D Audio Signal Processing in Humans and Machines Prof. N. Morgan and friends MW 4:00-5:30 http://www.icsi.berkeley.edu/eecs225d/spr12/overview.html

  2. Textbook: Prerequisites Speech and Audio Signal Processing Gold, Morgan, and Ellis Wiley&Sons, 2nd edition, 2011 EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor

  3. Speech and audio signal processing: why does this material matter? • Speech w/o visual vs visual w/o speech • Requires DSP, machine learning • Multidisciplinary tasks are good training • Many applications!

  4. What should we be able to do(automatically)? • Human example suggests, plenty • What was said • Who said it • When they said it • What it meant • How to respond

  5. Why is it hard? • Speaker variability (within and between) • Noise, reverberation, channel • Confusable vocabulary • Meaning and tone

  6. Course Philosophy I • People can do these tasks effortlessly • Include psychoacoustics and physiology • Also some acoustics • But of course, also DSP and machine learning

  7. Course Philosophy II • First part of the course is basic stuff • The rest is applications • Much of the course grade based on an original project • Some practice in oral presentation

  8. Section I: Broad background • Synthesis/vocoding history (chaps 2&3) • Recognition history (chap 4) • Machine recognition basics (chap 5) • Human recognition basics (chap 18)

  9. Section II: Scientific background • Pattern classification (chaps 8 and 9) • Ear physiology (chap 14) • Acoustics (chaps 10 and 13) • Linguistic sound categories (chap 23)

  10. Section IIIa: Engineering Apps • Signal processing “front end” (chaps 19-22) • Perceptual audio coding (chap 35) • Music signal analysis (chap37) • Source separation (chap 39)

  11. Section IIIb: Engineering Apps • Deterministic sequence recognition (chap 24) • Statistical modeling and inference (chaps 25,26) • Discriminant methods and adaptation (chaps 27,28)

  12. Section IIIc: Engineering Apps • Speech synthesis (chap 30) • Spoken dialog systems (chap29++) • Speaker verification (chap 41) • Speaker diarization (chap 42)

  13. Course grading • Quizzes/assignments (for first half): 30% • Project proposal: 10% • Project oral presentation: 20% • Project write-up & results: 40%

  14. Course location • After today, 6th floor ICSI • 1947 Center Street, between Milvia and MLK • Class will start at 4:15 instead of 4:10 (15 minute walk from Cory) • Office hour, one hour before each class

  15. Course location

More Related