310 likes | 494 Views
Error Correction of Continuous Handwriting Recognition by Multimodal Fusion. Xiang Ao 11/4/2014. Error correction by speech. Why error correction matters?. Correction of recognition errors is important for a recognition-based interfaces, because Recognition errors are inevitable.
E N D
Error Correction of Continuous Handwriting Recognition by Multimodal Fusion Xiang Ao 11/4/2014
Why error correction matters? • Correction of recognition errors is important for a recognition-based interfaces, because • Recognition errors are inevitable. • Usually, these errors needs correction. • User satisfaction is not only determined by recognition accuracy, but also by • the complexity of error correction dialogues • the amount gained for the effort of correction.
Our approach Existing Correction Techniques • Respeaking • N-best List • Adaptive modalities • Mutimodal correction
Why speech? • We use speech to correct handwriting recognition errors because: • It is natural • It mimics our habit of proofreading. • It is efficient • It needs little effort • It does not make busy hands busier. • It is effective • Complimentarity and redundancy of different modalities • cross-modal dependency
Find the handwriting recognition result whose pronunciation best matches the speech.
The fusion • Task: Find the handwriting recognition result whose pronunciation best matches the speech.
The fusion – the keywords • Find the handwriting recognition result whose pronunciation best matches the speech. • “handwriting recognition result” • What is the search space? • “matches” • “Matching” implies “comparing”. How is the “comparing”? • “Find” • How to make the searching efficient?
is recognized as “棍”. However, it is “概” segmented as should be Handwriting recognition errors and candidates • Handwriting recognition errors • Character recognition errors • Character segmentation (extraction) errors
棍 概 椒 橄 k candidates M Handwriting recognition errors and candidates • Handwriting recognition candidates • Character recognition candidates
Over-segmentation fragment Handwriting recognition errors and candidates • Handwriting recognition candidates • Character segmentation candidates Six graphemic pattern
The number of paths: Handwriting recognition errors and candidates Fragment graph
Handwriting recognition errors and candidates For a text line with T fragments, the number of recognition candidates is:
The fusion – the keywords • Find the handwriting recognition result whose pronunciation best matches the speech. • “handwriting recognition result” • What is the search space? • “matches” • “Matching” implies “comparing”. How is the “comparing”? • “Find” • How to make the searching efficient?
Phoneme • Hanyu pinyin is used as a symbolized pronunciation of a word. • A pinyin is composed of an initial, a final and a tone. • A phoneme is defined as a pair: [initial, final] Initial: t 逃 táo Phoneme: [t, ao] Finla: ao
The fusion – phoneme sequences’ similarity • A phoneme sequence is written as • Similarity of two phoneme sequence is defined as their Levenshtein distance (Edit distance). kitten → sitten (substitution of 's' for 'k') sitten → sittin (substitution of 'i' for 'e') sittin → sitting (insert 'g' at the end)
The fusion – the keywords • Find the handwriting recognition result whose pronunciation best matches the speech. • “handwriting recognition result” • What is the search space? • “matches” • “Matching” implies “comparing”. How is the “comparing”? • “Find” • How to make the searching efficient?
Fusion by an Exhaustive Search S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11,… Compare Speech:
Fusion by an Exhaustive Search • The time complexity of the exhaustive search:
Fusion by a Divide-Conquer Search Over-segmentation Speech:
[0,3],[4,7] [0,2],[3,7] [0,1],[2,7] [0,4],[5,7] [0,5],[6,7] q Fusion by a Divide-Conquer Search
Fusion by a Divide-Conquer Search • The time complexity of the divide-conquer search:
Weighted Phoneme • Speech recognition has errors, which make its phonemes inaccurate. • Candidates of speech recognition could improve the phoneme representation of speech. • Weighted Phoneme
“逃” Weighted Phoneme
Null phonemes Weighted Phoneme • Weighted phonemes can also represent different segmentations in speech recognition
Weighted Phoneme • Similarity of weighted phonemes
The fusion - summary • Find the handwriting recognition result whose pronunciation best matches the speech. • “handwriting recognition result” • Candidates of segmentation and recogntion. • “matches” • Phoneme • Weighted phoneme • Similarity of (weighted) phoneme sequences • “Find” • A divide-conqure search