110 likes | 129 Views
Dive into the domain of Speech Research and Technologies, focusing on NLP and Computational Linguistics, exploring Automatic Speech Recognition, Speech-to-Text, and more. Understand acoustic/prosodic cues and current approaches through corpus studies and machine learning.
E N D
Advanced NLP: Speech Research and Technologies Julia Hirschberg CS 6998
Spoken Natural Language Processing • NLP/Computational Linguistics historically text-oriented • Speech research domain of EE and Linguistics • 1980s: efforts to bring together by DARPA • Today: applications motivate collaboration • Automatic Speech Recognition (ASR) • Text/Concept-to-Speech (TTS/CTS) • Spoken Dialogue Systems (SDS), Speech-to-Speech Translation, Speech Search/Data Mining
Studying Speech is Different • Understanding input and generating output are more complicated • ASR errors and lack of formatting cues • TTS/CTS naturalness issues • But there is also more information to take advantage of • Pitch variation, loudness, rate, voice quality • Filled pauses, self-repairs
Acoustic/Prosodic Cues Can Convey…. • Topic Structure • Information Status: what’s shared knowledge? What’s important? • Speaker State/Emotion • Speech Acts • Syntactic Structure • Semantic Information
Current Approaches • Corpus-based studies • Hand-labeled/automatically-labeled data • Tools: • Analysis (pitch tracks, spectrograms….) • ASR toolkits • TTS systems • Machine learning • Laboratory studies • Evaluation
CS 6998 • Requirements: • Class Participation: • Questions for class discussion • Helping lead a class • Lab exercises • Project • Literature review • Data collection and/or analysis from a corpus
Building a system or system component (e.g. a preprocessor to assign intonation in a generation system) • Conduct an experiment: perception or production • Examples: • How do people convey contrast? • Given/new information? • What tells people that they can ‘take the next turn’?
What is the relationship between syntactic structure and intonation? • How do people convey anger? Uncertainty? Other emotions? • How can you tell if people are deceiving you? • How might we recognize disfluencies?
Next Week • Read Hirschberg 2003 and ToBI conventions • Make sure you have access to supplementary readings if you need them • Bring 3 discussion questions to class • Check access on cs servers to corpora and /proj/nlp/tools/mathTools/ • Xwaves (solaris and linux) esps531.sol, esps531.linux (also downloadable from KTH) • wavesurfer (win, linux, mac) available at KTH
Projects: • Start thinking about what area you want to work in for your project and what type of project you’d like to do