180 likes | 243 Views
L 2 F - S poken Language Systems Lab. Genesis. Created in January 2001 As a result of a major restructuring of several groups within and outside INESC ID Lisbon Goal
E N D
Genesis • Created in January 2001 • As a result of a major restructuring of several groups within and outside INESC ID Lisbon • Goal • Bring together several research groups to add relevant contributions to the area of computational processing of spoken language for European Portuguese • United by the problem we want to solve, not by the technology we share • People • About 10 PhD researchers, 10 PhD students, 3 MSc students, 12 undergraduated students • Formal cooperation with CLUL (Center of Linguistics of the Univ. of Lisbon)
Mission Creating technology to bridge the gap between natural spoken language and the underlying semantic information
Lines of Activity • Prioritary • Semantic processing of multimedia contents • Spoken dialogue systems platforms • Emerging • Computer enhanced human-to-human communication • Automatic transcription of meetings • Speech-to-speech translation • Continuing • Processing other varieties of Portuguese • E-inclusion • E-learning
Core technologies • Speech Coding • Speech Synthesis • DIXI+ • Speech Recognition • AUDIMUS • Language / Accent Identification • Natural Language Processing • Dialogue Management
DIXI+ • Continuation of the DIXI project (1991) • Synthesis by concatenation, instead of by rule • More elaborate prosodic models • Developed within the Festival framework • Focused on alternative and augmentative communication applications • Currently under development
AUDIMUS • Continuous speech recognition system for the European Portuguese language • Hybrid system combining the Multilayer Perceptrons and Hidden Markov models (MLP/HMM) • Vocabularies from 5K, 64K, ... depending on the task • Stochastic language model of the N-gram type • Speaker independent system or speaker adapted depending on the task • First application: radiology report
AUDIMUS results on BN Speech Recognition Word Error Rate (WER %)
Semantic processing of multimedia contents • ALERT Selective Dissemination of Multimedia Information • IPSOM Indexing, Integration and Sound Retrieval in Multimedia Documents • Improved access to spoken books by the visually impaired (indexing words, sentences, topics) • Development of multimedia interfaces for accessing and retrieving spoken books(didactic applications, etc.)
Multimedia Document Image / video processing Speech processing Automatic topic detection Match topics found against user profiles Alert Specific Users Multimedia document database Label database Video based segmentation Media watch If video contained Audio based segmentation If audio contained Transcription Keywords If text contained
Spoken dialogue systems Goal: to develop Spoken dialogue systems and intelligent multimodal interfaces: • phone-based information system; • "intelligent" demo room controllable by voice; • the development of a story teller: a fully embodied conversational agent for reading stories to children.
118 - Telephone number synthesizer The requested number is xx-xxx-xx-xx, repeat, xx-xxx-xx-xx
Telephone Database Internet Speech AUDIMUS SQL Dialogue Database Text DIXI+ Speech Speech based interface for a dialogue system Updater Dialogue • Telephone speech • Speech recognition (AUDIMUS) of natural language queries • Query understanding and info retrieval from database • Generation of natural language reply • Text-to-Speech synthesis (DIXI+) adapted to limited domain
Speech based control system for an Hi-Fi TURN ON PLAY CD ONE Hi-Fi turn on and play CD one The computer interprets the command... Speech was recognized... …and sends the IR command The user spoke... Hi-FI - turn on and play CD 1.
Processing other varieties of Portuguese • Research Topics: • Multi-accent corpora • Multi-accent robust speaker independent ASR • Language and accent ID • Computer Aided Language Learning (CALL)/ e-Learning
E-inclusion: Eugénio - the word genius • Vord prediction tool for people with motor impairments • Cooperation with cerebral palsy centers • Public domain tool • New version released in 2003
Agents Multimodal HCI Info. Retrieval Speech Mining Spoken Language Systems Computer Graphics Source separation Electronics Signal Processing Synergies with other INESC ID Groups
More information in: www.l2f.inesc-id.pt info@l2f.inesc-id.pt