1 / 14

Agnieszka Wagner Department of Phonetics, Institute of Linguistics,

Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours. Agnieszka Wagner Department of Phonetics, Institute of Linguistics, Adam Mickiewicz University in Poznań. Humboldt-Kolleg, Słubice 13.-15. November 2008. Introduction.

soleil
Download Presentation

Agnieszka Wagner Department of Phonetics, Institute of Linguistics,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spoken Language Technologies:A review of application areas and research issuesAnalysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics, Institute of Linguistics, Adam Mickiewicz University in Poznań Humboldt-Kolleg, Słubice 13.-15. November 2008

  2. Introduction The need for and increasing interest in SLT systems: • oral information is more efficient than a written message • speech is the easiest and fastest way of communication (man – man, man – machine) Progress in the field: • technological advances in computer science • availability of specialized speech analysis and processing tools • collection and management of large speech corpora • investigation of acoustic dimensions of speech signals fundamental frequency (F0), duration, intensity and spectral characteristics Spoken Language Technologies: Introduction (1)

  3. The tasks of SLT systems (TTS and ASR) Speech synthesis (TTS, text-to-speech) systems • generate speech signal for a given input text • example: BOSS (Polish module developed at Dept. of Phonetics in cooperation with IKP, Uni Bonn) • ECESS (European Centre of Excellence in Speech Synthesis): standards of development of language resources, tools, modules and systems Automatic speech recognition (ASR) systems • provide text of the input speech signal • example: Jurisdic (first Polish ASR system for needs of Police, Public Prosecutors and Administration of Justice) Spoken Language Technologies: Introduction (2)

  4. Application areas Speech synthesis • telecommunications (access to textual information over the telephone) • information retrieval • measurement and control systems • fundamental & applied research on speech and language • a tool of communication e.g. for the visually handicapped Speech recognition & related technologies • text dictation • information retrieval & management • man machine communication (together with speech synthesis): - dialogue systems, - speech-to-speech translation, - Computer Assisted Language Learning, CALL (e.g. the AZAR tutoring system developed in the scope of the EURONOUNCE project) Spoken Language Technologies: Application areas

  5. Performance Generally,the output quality is high as regards generation/recognition of the linguistic propositional content of speech Speech synthesis • high intelligibility and naturalness in limited domains (e.g. broadcasting news) Speech recognition • the best results for small vocabulary tasks • the state-of-the-art speaker-independent LVCSR systems achieve a word-error rate of 3% Spoken Language Technologies: Performance of TTS and ASR systems

  6. Limitations • insufficient knowledge about methods for processing the non-verbal content of speech i.e. affective information – speaker’s attitude, emotional state, mood, interpersonal stances & personality traits Speech synthesis • lack of variability in speaking style which encodes affective information can be detrimental to communication (e.g. in speech-to-speech translation) • data-driven approach to conversational, expressive speech synthesis is inflexible and quite costly Speech recognition • transcription of conversational and expressive speech – substantially higher word-error rate Spoken Language Technologies: Limitations of TTS and ASR systems

  7. Progress • the need of modeling the non-verbal content of speech i.e. affective information Applications: • high-quality conversational and emotional speech synthesis (for dialogue or speech-to-speech translation systems) • commerce – monitoring of the agent-customer interactions, information retrieval and management (e.g. QA5) • public security, criminology – secured area access control (speaker verification), truth-detection invesitgation (e.g. Computer Voice Stress Analyzer, Layered Voice Analysis) Humboldt-Kolleg, Słubice 13.-15. November 2008 Spoken Language Technologies: Progress in the field (1)

  8. Emotion: Anger, Fear, Elation • higher mean F0 • higher F0 variability • higher intensity • increased speaking rate • Emotion: Sadness, Boredom • lower mean F0 • lower F0 variability • lower intensity • decreased speaking rate Progress Prosodic features: fundamental frequency (F0 – the central acoustic variable that underlies intonation), intensity, duration and voice quality -> encoding and decoding of affective information • Intonation models: • hierarchical, sequential, acousitc-phonetic, phonological, etc. • linguistic variation – well handled • affective, emotional variation – unaccounted for Humboldt-Kolleg, Słubice 13.-15. November 2008 Spoken Language Technologies: Progress in the field (2)

  9. analysis (encoding) intonation description F0 generation (decoding) The comprehensive intonation model: Components • a module of F0 contour analysis • a module of F0 contour synthesis • description of intonation • discrete tonal categories (higher-level, access to the meaning of the utterance) • acoustic parameters (low-level) The comprehensive intonation model: Components

  10. Automatic analysis of F0 contours • Summary • results comparable to inter-labeler consistency in manual annotation of intonation • high accuracy achieved using small vectors of acoustic features • statistical modeling techniques • application: 1) automatic labeling of speech corpora, 2) lexical & semantic content, 3) ambiguous parses, 4) estimation of F0 targets • Automatic synthesis of F0 contours • Summary • estimation of F0 values with a regression model • results comparable to those reported in the literature • natural (similar to the original ones) F0 contours for synthesis of a high quality and comprehensible speech (confirmed in perception tests) The comprehensive intonation model: Analysis and Synthesis

  11. Audio (1): Mean opinion in the perception test: no audible difference The comprehensive intonation model: Synthesis example (1)

  12. Audio (2): Mean opinion in the perception test: very good quality The comprehensive intonation model: Synthesis example (2)

  13. Future research Extensive and systematic investigation of the mechanisms in voice production and perception of affective speech: • contribution from other knowledge domains (psychology) • affective speech data collection • classification of affective states • types of acoustic parameters • measurement of affective inferences Humboldt-Kolleg, Słubice 13.-15. November 2008 Spoken Language Technologies: Future research issues

  14. THANK YOU FOR YOUR ATTENTION!

More Related