1 / 14

Wake-up Word Detector

Wake-up Word Detector. Douglas Rauscher ECE5525 April 30, 2008. Introduction. The purpose of this project is to generate feature vectors and Hidden Markov Models for a single word Data is processed using Sphinx and Matlab The Wake-up Word chosen is “Help”. Corpus.

zlata
Download Presentation

Wake-up Word Detector

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

  2. Introduction • The purpose of this project is to generate feature vectors and Hidden Markov Models for a single word • Data is processed using Sphinx and Matlab • The Wake-up Word chosen is “Help” Douglas Rauscher

  3. Corpus • The corpus used is the original WUW_Corpus, provided on the ECE5526 server: ftp://163.118.203.219/CORPORA/WUW_Corpora/WUW_Corpus/ • This corpus was used because single utterances of the word “Help” were frequent in the data set • Data is in µ-law format Douglas Rauscher

  4. File lists & Transcriptions • Before processing in Sphinx, “transcription” and “fileids” files need to be created: • wuw_corpus_train.fileids • wuw_corpus_train.transcription • wuw_corpus_test.fileids • wuw_corpus_test.transcription • These were created in Matlab by searching the given “|”-delimited file for “Help” utterances. • 80% of “Help” utterances were used in the training list. The remaining 20% were used in the test list. • All utterances that did not contain “Help” were included in the test set to test for false alarms. • A handful of the utterances in the original .trans file were manually removed from the list because either • They had no data bytes in the file • Sphinx had trouble with the sound quality • The utterance was cut off in such a way that Sphinx threw an error Douglas Rauscher

  5. close all; clear all; clc; A = textread('C:\CMUtutorial\WUW_Corpus\wuw.trans','%s','delimiter','|'); idx = 1:length(A); idx = idx((strcmp(A,'Male')+strcmp(A,'Female'))>0); gender = A(idx); dialect = A(idx+1); phone_type = A(idx+2); filename = A(idx+3); CallNO = A(idx+4); UttNO = A(idx+5); Ortho = A(idx+6); AllIdx = 1:length(Ortho); HelpIdx = AllIdx(strcmp(Ortho,'Help')); NotHelpIdx = AllIdx(~strcmp(Ortho,'Help')); N = floor(length(HelpIdx)*0.8); fout = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_train.fileids','w'); ftsn = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_train.transcription','w'); for k=1:N fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k)))); fprintf(ftsn,'<s> %s </s> (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),... char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k)))); end fclose(fout); fclose(ftsn); fout = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_test.fileids','w'); ftsn = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_test.transcription','w'); % Remaining "Help" for k=(N+1):length(HelpIdx) fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k)))); fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),... char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k)))); end % Other utterances for k=1:length(NotHelpIdx) fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(NotHelpIdx(k))),... char(filename(NotHelpIdx(k))),... char(CallNO(NotHelpIdx(k)))); fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(NotHelpIdx(k)))),... char(filename(NotHelpIdx(k))),... char(CallNO(NotHelpIdx(k)))); end fclose(fout); fclose(ftsn); dcr_extract.m Douglas Rauscher

  6. Data preparation • Corpus data was originally: • file extension .ulaw • 8-bit µ-law format • 8kHz sample rate • This data must be converted, as .ulaw files are not readable by Sphinx. • Format chosen to convert to: • File extension .raw • 16-bit linear quantization • 16kHz (linearly interpolated) Douglas Rauscher

  7. ulaw2raw.m for k=0:252 ulaw2raw(sprintf('C:\\CMUtutorial\\WUW_Corpus\\calls\\%05d\\',k),0); end function ulaw2raw(filepath,playflag) % ulaw2raw('C:\CMUtutorial\WUW_Corpus\calls\00000\'); cd_save = cd; cd(filepath); files = dir; % US standard u-law coeff u=255; for k=3:length(files) if (files(k).isdir==0) && (strcmp(files(k).name(end-4:end),'.ulaw')) disp(files(k).name); fin = fopen(files(k).name,'r'); A = fread(fin,'int8'); % move data to proper sign A1 = A.*(A<=0)+(127-A).*(A>0); % remove u-law B1 = sign(A1).*(1/u).*(((1+u).^abs(A1/128))-1); B2 = reshape([B1,((B1+[B1(2:end);0])./2)].',1,[]); if(playflag) sound(B2,16000) pause(length(B2)/16000); end fclose(fin); generateRawWav(files(k).name(1:end-5),B2); end end cd(cd_save); function generateRawWav(filename,data) fout = fopen(strcat(filename,'.raw'),'w'); dataq = round(32768.*data./128); fwrite(fout,dataq,'int16'); fclose(fout); Douglas Rauscher

  8. Language model creation • For a Wake-up Word recognizer, a language model is not particularly desirable in detecting the word. • Sphinx allows you to weight the priority of the language model in it’s calculations, but does not appear to allow the user to disable the language model all together. • Therefore, to avoid errors, a custom language model had to be created. • The lm tool generator was used to convert a text file that contained only the word “Help” to a .lm file. http://www.speech.cs.cmu.edu/tools/lmtool.html • The lm3g2dmp tool was used to convert the .lm file to .lm.DMP format. run cmd cd C:\CMUtutorial\lm3g2dmp\Debug> lm3g2dmp 7092.lm ./ Douglas Rauscher

  9. Training the Model • Sphinx Training Configuration file was edited to use proper input files • The Max Number of Gaussians was set to 8 • The Number of HMM States was increased from 3 to 5, without significant improvement • Sphinx commands: cd c:/CMUtutorial/WUW_Corpus/ perl scripts_pl/make_feats.pl -ctl etc/wuw_corpus_train.fileids -cfg etc/sphinx_train.cfg -param etc/feat.params perl scripts_pl/RunAll.pl Douglas Rauscher

  10. Testing the Model • Sphinx Testing Configuration file was edited to use proper input files. • Language model weight was set to “1” (the lowest allowable setting) • Number of Gaussians was set to 8 to match the training configuration • Sphinx commands: perl scripts_pl/make_feats.pl -ctl etc/wuw_corpus_test.fileids -cfg etc/sphinx_decode.cfg -param etc/feat.params perl scripts_pl/decode/slave.pl Douglas Rauscher

  11. Sphinx Output • Sphinx was used to calculate Acoustic Scoring only, not to perform thresholding. • These resulting scores were parsed in Matlab and PDF/CDF plots were generated. • See attached output document for raw Cygwin output Douglas Rauscher

  12. plotDistributions.m % plotDistributions clear all; clc; close all; fn = 'C:\CMUtutorial\WUW_Corpus\logdir\decode\wuw_corpus-1-1.log'; RawText = textread(fn,'%s'); idx = []; for k=1:(length(RawText)-6) if(~isempty(findstr(char(RawText(k)),'fv:')) &&... strcmp(char(RawText(k+1)),'HELP')) idx = [idx; k:k+7]; end end RawText = RawText(idx); % fetch and plot Acoustic Score histograms HelpAScr = []; FalsAScr = []; for k=1:size(RawText,1) if(findstr(char(RawText(k,1)),'_008>')) % True HELP HelpAScr = [HelpAScr str2num(char(RawText(k,5)))]; else % Not a HELP FalsAScr = [FalsAScr str2num(char(RawText(k,5)))]; end end mn = min(min(HelpAScr),min(FalsAScr)); mx = max(max(HelpAScr),max(FalsAScr)); vals = mn:((mx-mn)/100):mx; HelpAScrHist = hist(HelpAScr,vals); HelpAScrHist = HelpAScrHist./sum(HelpAScrHist); FalsAScrHist = hist(FalsAScr,vals); FalsAScrHist = FalsAScrHist./sum(FalsAScrHist); for k=1:length(vals) HelpAScrCDF(k) = sum(HelpAScrHist(1:k)); FalsAScrCDF(k) = sum(FalsAScrHist(k:end)); end figure; subplot(2,1,1); plot(vals,HelpAScrHist,'b',vals,FalsAScrHist,'r'); title('Probability Density Function') legend('Help','Other Utterances') axis([mn,mx,0,1.1*max(max(HelpAScrHist),max(FalsAScrHist))]); subplot(2,1,2); plot(vals,HelpAScrCDF, 'b',vals,FalsAScrCDF, 'r'); title('Cumulative Distribution Function') axis([mn,mx,0,1.1]); Douglas Rauscher

  13. plotDistributions.m Douglas Rauscher

  14. Conclusions • Sphinx had problems correctly detecting the word “Help” in this test, but there is clearly a decent model created. • The test set was rather constrained and limited, and would benefit from a much larger sampling of “Help” utterances. • Sphinx features that would have been nice: • Native .ulaw file input • Simpler mechanism to input sample rate • Native text file input for language model, by integrating the .lm generator and .lm.DMP converter into Sphinx. • Better handling of utterance fragments Douglas Rauscher

More Related