140 likes | 271 Views
Speech Recognition Application. Voice Enabled Phone Directory - Yousef Rabah رباح يوسف -. Why Speech Enabled Phone Directory. Growing Technology Easy Access Mainly used for: Educational purposes People with certain Disabilities Mobile use. Problem.
E N D
Speech Recognition Application Voice Enabled Phone Directory - Yousef Rabah رباحيوسف -
Why Speech Enabled Phone Directory • Growing Technology • Easy Access • Mainly used for: • Educational purposes • People with certain Disabilities • Mobile use
Problem • Automatic speech interacting phone directory assistance
Automatic Speech Recognition - Sphinx • Speaker Dependent vs. Independent • Acoustic modeling • Isolated vs. Continuous • HMM – Probabilities, Parameters, Training • Language Model • Unigrams: <s> & </s> • Bigrams: P(word2 | word1) • Phonemes • Lexicon Structure • ZERO Z IH R OW • TWO T UW • H A HEIGH H
24003 samples in file /usr/local/share/sphinx3/model/lm/an4/hell.raw INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> INFO: live.c(239): live_nfeatvec: 12 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> A(2) INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> EIGHTH INFO: live.c(239): live_nfeatvec: 12 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E INFO: live.c(239): live_nfeatvec: 12 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L INFO: live.c(239): live_nfeatvec: 12 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L OH Backtrace (null) LatID SFrm EFrm AScr LScr Type 254 0 45 -391470 -74100 -1<sil> 594 46 81 -472155 -148846 0 H 1291 82 102 -288621 -148846 0 E 1850 103 126 -235274 -148846 0 L 2599 127 147 -430694 -148846 0 L 2650 148 148 0 -148846 0 </s> 0 148 -1818214 -818330 (Total) FWDVIT: H E L L (null) Input / Output
Difficulties • Hardware issues • ASR software issues • Letter phonemes • Time
Solution 4 Stage Process :
Solution • Database (PostgreSQL) • Names • Phone numbers • Fast access
Architecture of application db.pm people.pm people.pl record.pl wav_to_raw.pl get_speech.pl display_speech.pm display_speech.pl VEPD.pm VEPD.pl Example: … PC: press space bar before and after you speak: User: S AH EM PC: Decoded as, SAM ? Results | 1 1. SAM |SMITH | 765-973-2145 … Solution
Results • A first step towards hands free speech enabled phone directory • Speaker Independent • Application’s Features: • Adding user • Retrieving user (via speech) • Manual search • Viewing current phone directory
Possible Future Enhancement • ASR enabled for : • Adding users • Phone # search • Word Recognition (instead of letters) • More accurate ASR (as tech. Grows) • Graphical outlook (via perl/tk) • Communication through VoiceXML
Special Thanks • To friends and family • Jim Rogers • Hassan Halta • Skylar Thompson • Kushboo Goel • Rabah family • El-Shabab el-taybeh