120 likes | 314 Views
Speech Recognition Application. Voice Enabled Phone Directory - Yousef Rabah. Process of Speech Recognition. Speaker dependent vs. Speaker Independent Vocabulary Isolated vs. Continuous Frequency changes Pronunciation Speech Processing
E N D
Speech Recognition Application Voice Enabled Phone Directory - Yousef Rabah
Process of Speech Recognition • Speaker dependent vs. Speaker Independent • Vocabulary Isolated vs. Continuous • Frequency changes • Pronunciation • Speech Processing • HMM – Probabilities, Parameters, Training • Phonemes to words
Problem • Automatic speech interacting phone directory assistance without human interaction.
Automatic Speech Recognition - Sphinx • Acoustic modeling • Language Model • Unigrams: <s> & </s> • Bigrams: P(word2 | word1) • Trigrams: P(word3| word2 | word1) • Lexicon Structure • ZERO Z IH R OW • ONE W AH N • TWO T UW • <sil>
24003 samples in file /usr/local/share/sphinx3/model/lm/an4/hell.raw INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> INFO: live.c(239): live_nfeatvec: 12 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> A(2) INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> EIGHTH INFO: live.c(239): live_nfeatvec: 12 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E INFO: live.c(239): live_nfeatvec: 12 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L INFO: live.c(239): live_nfeatvec: 12 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L OH Backtrace(null) LatID SFrm EFrm AScr LScr Type 254 0 45 -391470 -74100 -1<sil> 594 46 81 -472155 -148846 0 H 1291 82 102 -288621 -148846 0 E 1850 103 126 -235274 -148846 0 L 2599 127 147 -430694 -148846 0 L 2650 148 148 0 -148846 0 </s> 0 148 -1818214 -818330 (Total) FWDVIT: H E L L (null) Input / Output
Difficulties • Hardware issues • ASR software issues • Letter phonemes - “e-set” • Time
Solution • Database (PostgreSQL) • Names • Numbers • Phone number • Fast access
Architecture of application User Interaction Connects to Database Communicates with Sphinx Uses of C, Perl, shell scripts Example (general idea): … PC: Say the letters of first name, press space bar before and after you speak: User: S AA EM PC: Did you say, SAM ? … Solution
Check List • Reading • ASR system • Database - PSQL • Applications in C, Perl, PHP, vxml, shell