170 likes | 501 Views
[Advanced] Speech & Audio Signal Processing . ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006. State of the Art in Speech/Audio. Speech and audio processing may be divided into “low-level” and “high-level” inference
E N D
[Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006
State of the Art in Speech/Audio • Speech and audio processing may be divided into “low-level” and “high-level” inference • Speech enhancement, compression, and coding are all widely used technologies • This low-level work is the most mature • High-level tasks will drive future advances • Speech/music database information retrieval • Automatic speaker and speech recognition • But low-level issues also remain…
How to obtain highly structured representations of speech and audio signals? Time frequency “atoms” as building blocks How can statistical inference enable advances in speech signal processing? A means to obtain an “atomic decomposition” Statistical modeling of time-frequency coefficients provides a principled solution Fundamental Questions
Missing data in the context of VOIP: Original Missing Restored Source / Speaker Separation Source 1 Source 2 Mixture 1 Mixture 2 Recovery 1 Recovery 2 Representative Applications
Male & Female Speaker Original Fast Faster Slower Trumpet Original Fast Slow Time-Scale Modification • Speech and Quasi-Periodic Audio • Sinewave-based Modification • Voicing-dependent Rate Factor
Falling Can, Bongo Drums, Loon Original Slow More Time-Scale Modification • Complex Non-Speech Signals • Phase-Vocoder-based Modification • Event-Dependent Phase Coherence
Male & Female Speaker Original Low pitch/Long vocal tract High pitch/Short vocal tract Male Speaker Original and Monotone Pitch and Vocal Tract Change • Sinewave-based Modification
Female Speaker Original CELP 8000 bps Sine 4800 bps Sine 2400 bps Speech Coding • Sinewave-based • Code-Excited Linear Prediction • Male Speaker • Original • CELP 8000 bps • Sine 4800 bps • Sine 2400 bps
Cell Phone Noise, Cocktail Party, Automobile Noise Original Enhanced Noise Reduction • Adaptive Wiener Filter • Adaptation Based on Spectral Change
Low-noise case Original 1.5 dB Reduction 3.0 dB Reduction Compression • Reduction of Peak-to-RMS amplitude ratio • Based on Sinewave Analysis/Synthesis • High-noise case • Original • 1.5 dB Reduction • 3.0 dB Reduction