1 / 18

EE 225D

EE 225D. Audio Signal Processing in Humans and Machines Prof. N. Morgan and friends MW 4:00-5:30 http://www.icsi.berkeley.edu/eecs225d/spr14/overview.html http://www.icsi.berkeley.edu/eecs225d/spr14/slides/. Textbook. Speech and Audio Signal Processing Gold, Morgan, and Ellis

shandah
Download Presentation

EE 225D

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EE 225D Audio Signal Processing in Humans and Machines Prof. N. Morgan and friends MW 4:00-5:30 http://www.icsi.berkeley.edu/eecs225d/spr14/overview.html http://www.icsi.berkeley.edu/eecs225d/spr14/slides/

  2. Textbook Speech and Audio Signal Processing Gold, Morgan, and Ellis Wiley&Sons, 2nd edition, 2011

  3. Prerequisites EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor

  4. Speech and audio signal processing: why does this material matter? • Speech w/o visual vs visual w/o speech • Requires DSP, machine learning • Multidisciplinary tasks are good training • Many applications!

  5. What should we be able to do(automatically)? • Human example suggests, plenty • What was said • Who said it • When they said it • What it meant • How to respond

  6. Why is it hard? • Speaker variability (within and between) • Noise, reverberation, channel • Confusable vocabulary • Meaning and tone

  7. Course Philosophy I • People can do these tasks effortlessly • Include psychoacoustics and physiology • Also some acoustics • But of course, also DSP and machine learning

  8. Course Philosophy II • First part of the course is basic stuff • The rest is applications • Much of the course grade based on an original project • Some practice in oral presentation • Middle of the course has students presenting the material (slides from previous classes can help)

  9. Section I: Broad background • Synthesis/vocoding history (chaps 2&3) • Recognition history (chap 4) • Machine recognition basics (chap 5) • Human recognition basics (chap 18)

  10. Section II: Scientific background • Pattern classification (chaps 8 and 9) • Acoustics (chaps 10 and 13) • Linguistic sound categories (chap 23) • (Auditory neurophysiology late in the course)

  11. Section IIIa: Engineering AppsSpeech recognition • Signal processing “front end” (chaps 19-22) • Deterministic sequence recognition (chap 24) • Statistical modeling and inference (chaps 25,26) • Discriminant methods and adaptation (chaps 27,28) • Speech recognition and understanding (chap 29)

  12. Section IIIb: Engineering AppsOther speech applications • Speech synthesis (chap 30) • Speaker verification (chap 41)

  13. Section IIIc: Engineering AppsOther audio applications • Perceptual audio coding (chap 35) • Music signal analysis (chap37) • Source separation (chap 39)

  14. Section IV: Hearing[presented by Prof. Oded Ghitza, Boston University] • Auditory physiology (chap 14) • Psychoacoustics (chap 15,16)

  15. Section V: Student Projects • Project proposal: By spring break, iterate on proposed project • Last week of class, students present their projects, modeled after ICASSP or Interspeech • Finals week, submit written version of project, schedule demos • Any topic in speech/music/general audio potentially OK, including tutorial or original research

  16. Course grading • Quizzes/homeworks (for first half): 20% • Student presentations/participation: 20% • Project proposal: 10% • Project oral presentation: 20% • Project write-up & results: 30%

  17. Course location • After today, 6th floor ICSI • 1947 Center Street, between Milvia and MLK • Class will start at 4:15 instead of 4:10 (15 minute walk from Cory) • Office hour, one hour before each class

  18. Course location

More Related