Experiments on ”stir-sir”-paradigm using large vocabulary ASR

Experiments on ”stir-sir”-paradigm using large vocabulary ASR Kalle Palomäki Adaptive Informatics Research Centre Helsinki University of Technology

Introduction • Aim: Test large vocabulary ASR in stir – sir paradigm • Motivation: Large vocabulary ASR has learned phoneme models close to humans • ASR: a newly trained English-English large vocabulary recogniser • Trained on read Wall street journal articles • Sampling rate 16 kHz

ASR details • Standard features: Mel freq. cepstral coefficients (MFCCs) + power + deltas + accelerations • Triphone HMMs with acoustic likelihood modeled by Gaussian mixture model • Supervised adaptation using constrained maximum likelihood linear regression, CMLLR • Can be formulated as linear feature transformation

Experiments • Three things tested for • Free recognition result • Recognizer chooses in between: ”next_you'll_get_sir_to_click_on” “next_you'll_get_stir_to_click_on” • Temporally averaged log-probability of ”t”

Experiments • Experiment 1: ”dry” models with no adaptation • Experiment 2: ”dry” models adapted to right conditions • Near-near adapted with near-near • Far-far adapted with far-far • Supervised adaptation with utterances at ends of continuum • Experiment 3: "dry” models adapted to both ”near near”, and ”far-far” • Supervised adaptation with utterances at the ends of continuum

Exp. 1: “dry” models, no adaptation • Free recognition: • near-near: “nantz two-a-days so far”, “nursing care so far” • far-far: “nantz th”, “NMS death”, “ “ • Choose in between “next_you'll_get_sir_to_click_on”, “next_you'll_get_stir_to_click_on” and silence model • Near near: change in between conditions 08 and 09 • Far-far: everything silence

Exp. 1: “dry” models, no adaptation

Exp. 1: “dry” models, adapted to right cond. • Free recognition: • Near-near: “next month though the khon” • Far-far: ”next he’ll throw the khon” • Choose in between “next_you'll_get_sir_to_click_on”, “next_you'll_get_stir_to_click_on” and silence model • Near near: change in between conditions 03 and 04 • Far-far: ”sir” all the time

Exp. 1: “dry” models, adapted to right cond.

Exp. 1: “dry” models, adapted to both • Free recognition: • Near-near: next month though the khon • Far far: “next month khon” or “nantz khon” • Choose in between “next_you'll_get_sir_to_click_on”, “next_you'll_get_stir_to_click_on” and silence model • Switches in between the sentences oddly

Exp. 1: “dry” models, adapted to both

Discussion & Future directions • Currently ”unconvincing” • Poor free recognition performance • Especially poor far-far performance • May be hard to obtain similar sensitivity as human listeners have • Tricks to get around the poor performance • Cooke (2006) uses a priori masks in order to find glimpses of speech • Choose in between two sentences rather than free recogniton • Measure log-prob instead of recogn performance • How to model Compensation which is the main issue

Experiments on ”stir-sir”-paradigm using large vocabulary ASR

Experiments on ”stir-sir”-paradigm using large vocabulary ASR

Presentation Transcript

INTRODUCTION Aircraft Through the Ages

How to think in Map-Reduce Paradigm

Bagged vs. Loose Tea

Inquiry into Vocabulary Instruction

VOCABULARY DEVELOPMENT

Vocabulary List 2

Easy C hicken Stir Fry

Vocabulary List #1

Unit 6 Vocabulary

Experiments with ODD outside the TEI framework

Medical Ethics

Toward a new paradigm of managment

Vocabulary Unit 16

The Reactive Paradigm

The Reactive Paradigm

Large-scale Deployment in P2P Experiments Using the JXTA Distributed Framework

DATA ACQUISITION EXPERIMENTS - A NEW PARADIGM IN TEACHING PHYSICS IN HIGH SCHOOL

How to think in Map-Reduce Paradigm

The Trigger for experiments at the Large Hadron Collider

Paradigms of Literacy and Instruction

Large Boilover experiments in Japan