1 / 71

Christophe d’Alessandro LIMSI-CNRS Orsay, France

New paradigms for speech analysis and processing: the source-filter model revisited and gesture-controlled analysis-by-synthesis. Christophe d’Alessandro LIMSI-CNRS Orsay, France. Acknowledgements. Contributions of :

gladysm
Download Presentation

Christophe d’Alessandro LIMSI-CNRS Orsay, France

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New paradigms for speech analysis and processing: the source-filter model revisited and gesture-controlled analysis-by-synthesis Christophe d’Alessandro LIMSI-CNRS Orsay, France Speech Analysis and Processing for Knowledge Discovery

  2. Acknowledgements Contributions of : Boris Doval, Nathalie Henrich, Baris Bozkurt, Thierry Dutoit, Nicolas Sturmel, Albert Rilliard, Sylvain Le Beux Are gratefully acknowledged Speech Analysis and Processing for Knowledge Discovery

  3. Voice, Speech, Singing, Meaning and Expression • Functions of voice in communication: • Linguistic and pragmatic functions : to convey linguistic meaning (ideas, concepts, facts …), to perform speech acts (command, promise …). Mainly associated to phoneme and words (double articulation). Noted using writing. • Expressive function: to make audible attitudes, feelings, emotions, personality, mood. Speech beyond (or below) linguistic meaning. Mainly associated to prosody and voice quality. Difficult to note using writing. • “The music of speech” • Musical function: singing, non linguistic but highly structured communication Speech Analysis and Processing for Knowledge Discovery

  4. Challenges in speech processing Ubiquitous speech processing, human-machine dialogue Expressive Speech Synthesis (speaking machine vs. reading machines) Recognition of emotion, attitudes, moods, aging: robustness in Automatic Speech Recognition Speaker dependant features: speaker identification, voice aging Voice pathology and diagnosis Speech Analysis and Processing for Knowledge Discovery

  5. Static and dynamic features in speech Knowledge discovery in speech analysis and processing is based on both static and dynamic features of the speech signals. Static features are corresponding to parameters of a model or “settings”. Dynamic features are corresponding to parameter trajectories, or “gestures”. Speech Analysis and Processing for Knowledge Discovery

  6. Content of the talk • Introduction • Voice source, voice quality and parameter estimation • Real-time instruments for synthesis of voice quality and real time analysis/modification/synthesis of prosody Speech Analysis and Processing for Knowledge Discovery

  7. Emergence of voice quality in speech studies Until recently, voice quality and its functions in speech communication has been only marginally considered in the speech communication community. However, there is some evidence that voice quality settings and voice quality modulations are playing a central role in human voice-based communication, i.e. speech, singing and other kinds of expressive vocalizations. Speech Analysis and Processing for Knowledge Discovery

  8. Voice source analysis • Voice source analysis is an important but difficult issue for speech processing: • Source tract decomposition formant estimation, front-end for speech recognition, low-rate coding etc.) • Voice source parameters estimation (prosodic analysis, diagnosis, voice quality, speaker characterisation, singing …) • Pitch period marking (speech synthesis, prosodic analysis, pitch synchronous processing …) Speech Analysis and Processing for Knowledge Discovery

  9. Voice source analysis • Voice source analysis is an important but difficult issue for speech processing: • No reference for the “true” source and vocal tract components • Rapidly time-varying signals • Wide individual and inter-subject variations • Source tract interactions Speech Analysis and Processing for Knowledge Discovery

  10. The source-filter model revisited • Two aspects of voice source analysis recently developed at LIMSI (Orsay) and FPMs (Mons) are discussed. • Causal-anticausal linear model (CALM) of glottal flow signals (Doval, d’Alessandro, Henrich, 2003, 2006) • zeros of the Z-Transform (ZZT) speech representation (Bozkurt, Doval, d’Alessandro, Dutoit, Sturmel, 2005, 2006, 2008) Speech Analysis and Processing for Knowledge Discovery

  11. The Spectrum of Glottal flow modelsCausal-Anticausal Linear Model(CALM) (Doval, d’Alessandro, Henrich, Acustica united with Acta Acustica 2006, ISCA-ITRW Voqual’03) Speech Analysis and Processing for Knowledge Discovery

  12. k2 , r2 k1 ,r1 d1 kc lg d2 Psubglottal Voice source models • Signal models of glottal waveforms: a few acoustic parametersdescribing the shape of one cycle of the glottal flow 2-mass model (IF 1972) • Physical models of speech production: more than 19 physical parameters governing the behavior of the glottis Speech Analysis and Processing for Knowledge Discovery

  13. KLGLOTT88 (Klatt & Klatt, Jasa 1988) Glottal flow models Rosenberg C (Rosenberg Jasa 1971) LF model,( Liljenkrants, Fant, Lin KTH -STL, 1985) Speech Analysis and Processing for Knowledge Discovery

  14. Glottal flow models : time domain Examples: Rosenberg C (Rosenberg, 1971) LF (Liljencrants & Fant, 1985) Klatt (Klatt & Klatt, 1990) R++ (Veldhuis, 1998) Speech Analysis and Processing for Knowledge Discovery

  15. A unified set: 5 time-domain parameters (Doval, d’Alessandro & henrich, Acta Acustica 2006) • T0, fundamental period • Av, voiced amplitude • Oq , open quotient • am, asymmetry coefficient • (equivalent to speed quotient) • Qa, return phase quotient Other parameters of interest :J, total flow of a single pulse • E, negative peak amplitude of the glottal flow derivative Speech Analysis and Processing for Knowledge Discovery

  16. Voice quality dimensions Four main dimensions: voice registers :voice “mechanisms”: creak, modal, falsetto, whistle noise: breathiness, hoarseness Pressure: pressed/lax voice, “strangled” tones. Effort: accentuation, force. Speech Analysis and Processing for Knowledge Discovery

  17. Time-domain equations In the case of Qa = 0 (abrupt closure), the GFM can all be expressed as : normalized glottal flow model : ng (x, am) depends on the model Speech Analysis and Processing for Knowledge Discovery

  18. Glottal flow models : frequency domain Glottal flow: Glottal flow derivative: Ng (x, am) : Fourier transform of ng (x, am) N’g (x, am) : Fourier transform of n’g (x, am) These two functions depend on the model Speech Analysis and Processing for Knowledge Discovery

  19. Glottal flow models : spectral description « glottal formant » : spectral slope : Speech Analysis and Processing for Knowledge Discovery

  20. Spectral / Time domain :open quotient, asymmetry Speech Analysis and Processing for Knowledge Discovery

  21. Spectral / Time domain: spectral tilt Effect of E and Spectral tilt Speech Analysis and Processing for Knowledge Discovery

  22. Glottal flow model and anticausal linear filter Speech Analysis and Processing for Knowledge Discovery

  23. Causal-Anticausal linear voice source model (CALM) Doval, d’Alessandro, Henrich (2003) Anticausal filter Convergence region for a stable CALM Causal filter Frequency response Glottal pulse (CALM vs. R++) Speech Analysis and Processing for Knowledge Discovery

  24. Conclusions • An unified view of glottal flow models • An unified set of time-domain and spectral parameters • Links between time-domain and spectral parameters • A causal-anticausal linear model of the glottal flow signal Speech Analysis and Processing for Knowledge Discovery

  25. Zero of the Z-Transform (ZZT) Representation of Speech Speech Analysis and Processing for Knowledge Discovery

  26. A new signal representation method (Bozkurt, Doval, d’Alessandro & Dutoit, IEEE SIg. Proc Let, 2005): • Inspired by : • Mixed phase nature of speech (Causal/anticausal voice model, CALM) • Group delay representation • A remark by Yegnanarayana and Murthy, 1989, explaining that group delay is noisy because of roots of the z-transform polynomial close to unit circle » The ZZT Zero of the Z-Transform representation Speech Analysis and Processing for Knowledge Discovery

  27. Spectral analysis of signals z-transform Fourier transform Causality: time and phase (or group delay) domains Speech Analysis and Processing for Knowledge Discovery

  28. Zeros of Z-Transform(ZZT) Representation Almost impossible to study analytically for most of the functions, therefore numerical methods are used (roots function of Matlab) Basic elementary signal : power series Speech Analysis and Processing for Knowledge Discovery

  29. Zero-patterns for the ‘LF model’ First phase Return phase Speech Analysis and Processing for Knowledge Discovery

  30. ZZT representation of speech   = + + = first phase of the glottal flow adds zeros outside the unit circle periodicity results in many zeros on the unit circle vocal tract response zeros lie inside the unit circle Speech Analysis and Processing for Knowledge Discovery

  31. ZZT of windowed speech Non-Glottal Closure Instant (GCI) Synchronous windowing GCI Synchronous windowing Rectangular window Rectangular window Speech Analysis and Processing for Knowledge Discovery

  32. ZZT of windowed speech Rectangular window Blackman window Speech Analysis and Processing for Knowledge Discovery

  33. Source/tract decomposition algorithm Speech Analysis and Processing for Knowledge Discovery

  34. Example of decomposition Speech Analysis and Processing for Knowledge Discovery

  35. ZZT for source-tract separation Original amp. spectum Original windowed speech Real speech ZZT reconstructed glottal amp. spectrum reconstructed glottal excitation reconstructed vocal tract response reconstructed tract transfer function Zero-decomposition Copy-Synth Noise excited tract Speech Analysis and Processing for Knowledge Discovery

  36. Comparison of ZZT and LPC • Tested methods: • LPC autocorrelation • LPC covariance • PSIAIF • ZZT Speech Analysis and Processing for Knowledge Discovery

  37. Comparison of ZZT and LPC • Tested methods: • LPC autocorrelation • LPC covariance • PSIAIF • ZZT • Spectral distance analysis: • ZZT performs much better than inverse filtering for Open Quotient and asymetry estimation • Robustness to noise: ZZT is not better than inverse filtering Speech Analysis and Processing for Knowledge Discovery

  38. Glottal formant: ZZT estimation Real speech Synthetic vowels ‘a, u, i OQ from EGG f0=100Hz Fg=f(F0,1/OpenQuotient,Asym.) f0=200Hz Speech Analysis and Processing for Knowledge Discovery

  39. Conclusions • A new speech signal representation exploiting the phase structure of glottal flow signals • An associated estimation algorithm • Applications to source/tract decomposition • Applications to voice source parameter estimation • Better than inverse filtering (LPC) for glottal flow estimation Speech Analysis and Processing for Knowledge Discovery

  40. Source-filter model revisited • Glottal flow models can be represented by a causal-anticausal (mixed phase) filter • Then a method designed for causal-anticausal decomposition is proposed • This method is applied to source/tract decomposition • … and voice source parameter estimation Speech Analysis and Processing for Knowledge Discovery

  41. Real time synthesis of expressive voice:vocal instruments • Real-time vocal instrument: voice quality synthesis • Real-time intonation synthesis: a study of intonation gestures, towards modeling intonation in terms of movements (cinematic modeling) Speech Analysis and Processing for Knowledge Discovery

  42. Aims of real-time voice synthesis • A gesture interface for driving (“conducting”) a speech synthesis system • Aim: Add expression and emotion to the speech flow • Real time modification of voice synthesis • Gesture interpretation algorithms and speech signal modification algorithms Speech Analysis and Processing for Knowledge Discovery

  43. Vocal instruments: • A short historical review • Application to voice quality synthesis • Computerized chironomy: hand-controlled vocal instruments • Experiments in intonation reiteration Speech Analysis and Processing for Knowledge Discovery

  44. Vocal instruments Liénard’s reconstruction, 1968 Mechanical instrument: the Von Kempelen Machine (Mechanismus der menschlichen Sprache nebst Beschreibung einer sprechenden Maschine (1791)) Machine in the Deutsches Museum, Munich. Speech Analysis and Processing for Knowledge Discovery

  45. Vocal instruments Electrical instrument: the Voder (1939) Speech Analysis and Processing for Knowledge Discovery

  46. Recent vocal instruments • Sydney Fels (U. Brit. Colomb): Glove Talk (1993) Speech Analysis and Processing for Knowledge Discovery

  47. Recent vocal instruments • Perry Cook (Comp. Sci., Princeton): Lisa (2001) Speech Analysis and Processing for Knowledge Discovery

  48. Devices VoiceDimensions Noise Vowel Pressure Effort Intonation ModelParameters Formants Oq αm Ta AN F0 AV Bg Fg Shimmer Tilt Jitter Synthesizers Mapping Speech Analysis and Processing for Knowledge Discovery

  49. CALM synthesis algorithm Speech Analysis and Processing for Knowledge Discovery

  50. Real-time Voice quality synthesis • Non-preferred Hand • Joystick: • Central button: 2-D vocalic space • Front-rear: structural noise • Right-left: lax-tense (+ additive noise) • Preferred Hand • Wacom Tablet: • X-axis -> F0 • Y-axis -> Vocal effort • Z-axis -> amplitude Speech Analysis and Processing for Knowledge Discovery

More Related