120 likes | 230 Views
A Prototype Personal Dictation System. Adam Janin janin@icsi.berkeley.edu. Final Goal – A Portable Meeting Recorder. Record impromptu meetings in a natural environment. Detect multiple speakers. Allow correction and annotation. Support indexing and searching. Self-contained (using IRAM).
E N D
A Prototype Personal Dictation System Adam Janin janin@icsi.berkeley.edu
Final Goal – A Portable Meeting Recorder • Record impromptu meetings in a natural environment. • Detect multiple speakers. • Allow correction and annotation. • Support indexing and searching. • Self-contained (using IRAM).
Intermediate Goal – A Personal Dictation System • Record a single user dictating text. • Allow correction and editing. • Hosted system: • ASR runs on workstation. • GUI runs on Pilot. • Communicate via wired network. • Close-talking mic. • Limited domain (Broadcast News).
Asides... • Why not Wizard of Oz? • Structure of correction mechanism is recognizer specific. • Develop infrastructure. • Produce a working demo. • Informal user study, mostly with speech researchers.
Architecture Palm Pilot Correct transcripts Edit transcripts Create new text Sun Workstation Audio frontend Speech recognizer Correction server
Correcting and Editing • Correcting – informing the recognizer that it has made an error. • If recognizer has a good idea of alternatives, it may be faster to correct than to edit. • Recognizer can adapt to user and vocabulary. • Editing – changing the output. • “That’s not what I meant to say”. • Text vs. speech input.
Correction Methods: Background • Lattice contains recognizer’s best guesses. • More compact than N-best lists. • Contains word order and timing. 1). the records … 2). a rack ... 3). the wreck or … 4). a record ...
System picks all words that overlap in time. Correction Methods: Selecting Hypotheses • User corrects “records”. • Presents in order from most likely to least. • Note: full overlap is probably not optimal. 1). the records … 2). a rack ... 3). the wreck or … 4). a record ...
Select only paths with “record”. • Rescore lattice. Correction Methods: Rescoring • User corrects “records” to “record”. Unexpected changes! 1). the records … 2). a rack ... 3). the wreck or … 4). a record ...
Editing • Allows user to add or edit text arbitrarily. • Must synchronize with correction server. • Edit vs. Correct is currently implemented modally with push buttons on-screen. • Gestural interface for correcting and editing would be preferable.
Details... • Correction allows for words not in lattice. • Tap to correct worked better than press-and-hold. • System updates text when user pauses. • Doesn’t handle punctuation, paragraphs, etc. • Correction is fast, but dictation is slow.
Future Work • “Real” user studies. • Experiment more with correction mechanisms. • Implement editing synchronization. • Implement gestures. • Move to wireless network and mic. • Add punctuation, paragraphs, etc.