ICSLP’ 98 CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM

ICSLP’ 98CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM J. Ferreiros, J. Colás, J. Macías-Guarasa, A. Ruiz, J. M. Pardo Grupo de Tecnología del Habla - Departamento de Ingeniería Electrónica E.T.S.I. Telecomunicación - Universidad Politécnica de Madrid Ciudad Universitaria s/n, 28040 Madrid Spain

General Architecture Context Dependent Rules Context Dependent Rules SCHMM + Word Pair Tagged Dictionary HIFI Status Speech Recogniser Tagger Tags Refiner Understanding Actuator IR-LED Text to Speech Speech Generation Module Alternative Expresions

Speech recogniser • Characteristics: • Continuous speech commands • One-pass search with word-pair grammar • 163 words • SCHMM phone models • Implementation: • Front-end: DSP LSI board • Rest of processing: PC

Speech understanding (I) • TAGGER: • 78 semantic tags • several tags applied to each word • “garbage” tag used for no-meaning words • Gives robustness against speech recogniser errors • Will allow OOV in the recognised string “Please, set the volume higher” • Tagging directly specified in the lexicon

Speech understanding (II) • TAGS REFINER: • Aims: • Numbers processing • Disambiguation of words with several tags • “garbage” removal • May change the literal of the words “two five”  “25” • May introduce new refined semantic tags • Context dependent rules word: “right” tags: “position increment” rule: “if there exists any other word tagged as a tape parameter, then the word right is the position of this tape else it is a increment indicator”

Speech understanding (III) • UNDERSTANDING STAGE: • Context dependent rules • Gives independence on the order of the concepts • Trying to fill in frames: SUBSYSTEM=(radio,cd-player,cassette,...) PARAMETER=(volume,tone,broadcast station,song,...) VALUE=(higher,number,...) • One or several frames for each command • More specific rules: first to be executed • We also fill in message strings • With the “reasoning” • With the problems in the understanding stage

Speech understanding (IV) • ACTUATOR: • Sends IR commands to the HIFI set • Keeps track of the set status • Informs the user of the actions performed or the problems found USER: “switch the radio on” ACTUATOR: “The radio was already on”

Speech generation • Input: pattern string of both literals and concepts coming from the rest of the architecture • Performs random concepts substitution by text to achieve a certain degree of naturalness / variety Input: “C_SEEING the word higher with an increment meaning, C_THINK that put means an increasing action” C_SEEING “As I can see", "As I have discovered", "As It appears", ... C_THINK "I think", "I imagine", "I suppose"... • Output through a text-to-speech subsystem

CONCLUSIONS & FUTURE WORK • Supporting ideas of the system: • Semantic-like tagging • Context dependent rules • “garbage” tag • pattern-based generation • random concepts substitution for generation • Desirable new aspects: • Use of more information of the recognised sentences • Handle more complex commands • Introducing semantic-syntactic parsing of the sentence structure • Introduce dialogue to complete not understood or not given information and as a confirmation strategy

ICSLP’ 98 CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM

ICSLP’ 98 CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM

Presentation Transcript

Monitoring and Controlling the Project

The Speech Mechanism

Computerized Speech Lab CSL

Speech Segregation

Speech Recognition and Understanding

Understanding Dyslexia

Fundamentals of the Nervous System and Nervous Tissue

Maid Of Honor Speech Ideas

CONTINUOUS VERB TENSES

Farewell Speech

Reducing and controlling deductions Increasing profitability by improving retail compliance

Drugs Affecting the Respiratory System

Part-of-speech tagging

Speech Segregation

Fundamentals of the Nervous System and Nervous Tissue

Chapter 2 Speech Sounds

Parts of Speech

Number System

Speech Recognition

Feature Computation: Representing the Speech Signal