Research Presentation:

Research Presentation: On a Utility for Speaker Verification Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering

Set up a standard IES environment • The first appearance at CAVS is good. • The first thing to do is set up IES environment. • Create Enlistment • Our production system is consist of many classes • I’m surprised at the structure of our software • environment. Even though many works has been • already done, I need to consolidate our system • with other IFCers. • GroupWise : • Good communication and schedule management • tools within our group • After that, I could make a program and compile it in my local machine. SERVER repository CVS client

First IFC program, instruction • First simple IFC program do the following instructions • Reads a 3×3 float matrix from an Sof file. • Reads a 3×1 float vector from an Sof file. • Multiples the vector and matrix using the equation Z=alpha*A*B • Writes the result to an Sof file. • Allows the value of alpha to be set from the command line: • foo.exe –alpha 2.0 input.sof output.sof

First IFC program, flow Foo.exe Foo.exe –alpha 2.0 input_file output_file Read Input Sof Read 3×3 float matrix Read 3×1 float vector Multiples the vector and matrix Write to Output Sof file

First IFC program • After completing first IFC program, I ’m more familiar with our production system. • When I have questions about our production system, • our prominent group members always helps me about my questions. • It’s good to study alone, but sometime • it is better to ask an expert in the programming. • The more I know about our production system, • The more I have many questions.

First IFC program • First Question • -How can we view the contents of the class? • Answer : • It is possible through debug method. • In order to view the contents of the Sof object, it is so hard to figure out during the debugging time. Instead of, I used debug method that is included in the source code. Sometimes this may retard our debugging time, but I know this is best way until now. Thus, I can figure out which data is contained in the Sof object. All other class variables are same.

First IFC program • Second Question • - Using debug method, why changes string capacity? • SysString token = “oscar had a heap of apples” • Using debug method, we can see the each value. • <SysString::token> value_d = (5 >= 5) oscar • <SysString::token> value_d = (5 >= 3) had • <SysString::token> value_d = (5 >= 1) a • <SysString::token> value_d = (5 >= 4) heap • <SysString::token> value_d = (5 >= 2) of • <SysString::token> value_d = (6 >= 6) apples • Answer • In the expression (n >= m), 'n' is to total capacity of the data structure, • and 'm' is the current length. So for the line: • <SysString::token> value_d = (5 >= 3) had • The capacity of the SysString is 5 and the current length is 3, which is obvious from the string 'had'.

First IFC program • Third Question • - What “L” means? • - In our production system, all of the classes uses “L” character. • For example, • SysString file1; • file1.assign(L"/tmp/foo_bin.sof"); • I didn’t exactly figure out why this “L” is used. • Answer •  The "L" is just a macro that tells the compiler that the following string is a Unicode string.

ISIP_VERIFY • Basic Work Flow • - Decide what added and removed in the new version • - Analyze old version • - Draw class diagram • - Design new version • - Coding and Compilation • - Testing and fixing bugs

ISIP_VERIFY • Decide what added and removed in the new version • - Currently, isip_verify does Speaker Verification, but only uses HMM algorithm. We want new isip_verify performs that function using HMM, SVM, RVM algorithm. This means new version of “isip_verify” will be more general utility than the old version. • Analyze old version • - isip_verify utility uses SpeakerVerifier,VerifyHMM,HMM classes, and • does both training and testing.Different to the “GMM” case, “SVM” statistical model have “isip_svm_learn” and “isip_svm_classify”. • While “isip_svm_learn” utility can process training, “isip_svm_classify” • can process testing.

ISIP_VERIFY • The problem : • 1. isip_verify can process only using “GMM” statistical model. • 2. We does not have “RVM” routine which can do same function of • “SVM” utility. • Solution : • 1. Add SVM, RVM routine in the isip_verify • 2. Add same functionality in the RVM class. • 3. Modify the SpeakerVerifier class. • We can make a utility which can do all functions which I mentioned. • To begin with, I drew class block diagrams of each utility and make sure the relationship of classes and functions. After that, I could figure out more easily about these utilities. Next, I drew the flow chart of new utilities.

ISIP_VERIFY Class Block Diagram ISIP_VERIFY (util/speech) Parameter check SpeakerVerifier (asr) run() If algorithm = HMM Train and model creation VerifyHMM (pr) HiddenMarkovModel(pr) If algorithm = TRAIN If algorithm = VERIFY algorithm = TRAIN Implementation = BAUM WELCH else Set algorithm Set implementation Verify() If implementation = LIKELIHOOD LIKELIHOOD RATIO Verifyl() Verifylr() linearDecoder() Run()

ISIP_VERIFY ISIP_SVM_LEARN isip_svm_learn (util/speech) Parameter check SupportVectorMachine(pr) loadFeature() positive example, negative example train() if algorithm = SEQUENTIAL_MINIMAL_OPTIMIZATION sequentialMinimalOptimization() determine the support vector writeModel() StatisticalModelBase StatisticalModel(stat) – SupportVectorModel type SupportVectorModel(stat) getSupportVectorModel() getBias() write() getKernels() getAlphas() getSupportVectors()

ISIP_VERIFY ISIP_SVM_CLASSIFY write the distance to output file isip_svm_classify (util/speech) StatisticalModel(stat) FeatureFile(mmedia) AudioDatabase(mmedia) read() open() open getBufferData() getRecord() getSupportVectorModel() getDistance()

ISIP_VERIFY FLOW CHART Isip_verify (new version) isip_verify -param.sof .... -algo_type [hmm,svm,rvm] –mode [train, test] algorithm Check “algo_type” option Since no algo_type was specified, HMM algo_type was chosen No Yes HMM SVM RVM Check statistical_model = “SVM” Check statistical_model = “RVM” Check statistical_model = “GMM” statistical model statistical model statistical model Yes No Yes No No Yes error error error (Model incorrect) (Model incorrect) (Model incorrect) mode mode Check “mode” option Check “mode” option isip_verify (old version) No Yes Yes No verifyHMM class processes parameter file for isip_verify which can do both training and testing = gmmVerify() error error You must specify mode train test test train svmTest() rvmTrain() svmTrain() rvmTest()

ISIP_VERIFY • Coding and Compilation • 1. Add and remove parameters and check the parameters (Won) • 2. Combine three functionality • - new “isip_verify” performs run() method in that utility and run() method • call support vector machine object or relevance vector machine object, • then performing training. This enables us to implement three models • on one utility. • 3. SpeakerVerifier class • - include SVM, RVM class • - modify parameter check • - modify run(sdb) method • - add run(pos_sdb,neg_sdb) method • 4. RVM class • - Add training and testing module (Sridhar)

ISIP_VERIFY • Problems during coding and compilation • - How to verify SpeakerVerifier class? • After modifying existing class, we need to verify the correctness. • Diagnose method performs this functionality in our production system. • This method is implemented *_02.cc in every class. • After compiling the class, we execute “make test”. • This automatically check every function in that class. • - How can we resolve segmentation fault? • One of the most difficult things to figure out the reason. • Comment out all new modules, and then add one module, compile the • class. And then test it. This is continued when every new module is • tested.

ISIP_VERIFY • Problems during coding and compilation • - Compilation, debugging time • - When developing a new program, one of the most time consuming • works is compiling and debugging. • - In our production system, it takes much time to compile and debug a • program. We have so many linking processes when compiling a • program. • - How can we resolve it? •  It is faster to do in our local repository.

ISIP_VERIFY • Testing and fixing bugs • This part is as important as previous steps. • We can find faults and missing points during this step. • Problems : • What happens sdb object? • Normally, sdb object contains every commandline options.(except parameters) • However, the sdb object loses its contents when passing to the SpeakerVerifier class. • How can fix that? • Comment out all code except control code. • This is because I did not give list file option.

Software Release • What need to know for Software Release? • Varmint utility : to track down all problems • Production system : • In order to better understand our system, I did and will do the followings. • Data Preparation • Feature Extraction • Recognition • Acoustic modeling • Language modeling •  These will be more specifically explain after this topic

Software Release • ProductionRuleTokenType class • It uses lots of if-else statement when doing read/write function. • Instead of doing this, we can use NameMap class. • In order to do that, • Declare the NameMap class and modified related module. • Problems : I met run-time errors. • Solution : • I made a simple program that includes diagnose method in • prtt_02.cc. • After track down the function, I could find the reason. • I firstly checked in this class on our production system.

Software Release • isip_lm_tester • This utility randomly generates sentences based on the language model file and tests the language model. • Problem : • Currently, generating state transcriptions won’t generate past first • symbols at the highest level. • What to do? • I need to track down this problem, but it requires to the understanding • of language model. •  Read and study our tutorial on the production system thoroughly, • and then can involve in fixing bugs in isip_lm_tester.

Production System • In this part, I will go from Data preparation to Feature extraction. • How can we better understand our production system? • - Data Preparation • - Feature extraction • - Recognition • - Acoustic modeling • - Language modeling

Production System, Data Preparation • Data Preparation • Why difficult as a beginner? • In normal programming, preparing input data is not hard. • But, in our production system, it is not easy to prepare that for a • beginner. •  It requires the knowledge of speech. • This includes speech file format, file conversion, sampling • Speech file • - Header + Sampled data • - Sampled data  raw files header header Data Data Data header Data header Data WAV, Sof SPHERE, AU Raw

Production System, Data Preparation • Sof Format • Information of thelocation of each object stored in the file, and the • corresponding object data. • Support two basic storage formats • - text : human readable files • - binary : sampled data • Used by all data objects in the ISIP environment to unify and simplify I/O. • Binary format : • - Handle machine architecture differences with automatic byte • transformations. • - Used for large quantities of data for the obvious efficiency gains. • - The objects are stored in a binary tree and a symbol table is used to • hold the object class names.

Production System, Data Preparation • Text format : • - Used User input parameter files in the ISIP environment. • - Simple format that consists of object names and tags, • followed by the object data • - Example : @ Sof v1.0 @ @ Float 0 @ value = 1.3; @ VectorFloat 0 @ value = 3.5,5.7,3.8; @ VectorLong 0 @ value = 2,3;

Production System, Data Preparation • Converting from external (i.e., SPHERE, WAV) format to raw format. • speech.sph  speech.raw • 1. Convert the SPHERE file's binary data • to 16-bit linear samples using w_decode •  w_decode -o pcm speech.sph speech-nb.sph • 2. Strip the file's header using h_strip •  h_strip speech-nb.sph speech.raw • 3. The result is speech.raw which is identical • everything except missing first 1024-bytes • header information • 4. One line command : 2 + 3 • w_decode -o pcm speech.sph - | h_strip - - > speech.raw Header Data

Production System, Data Preparation • Verification of Conversion to Raw • SoX: Audio Playback •  sox -t .sw -r 16000 speech.raw -t .au speech.au •  audioplay speech.au • File Size Comparison: Using "ls -l" •  ls -l speech.* • -rw-rw-r-- 1 may isip 97486 Sep 10 15:19 speech.raw -rw-rw-r-- 1 may isip 98510 Sep 10 15:12 speech.sph •  We can see the fifth field is file size. • Speech.raw is 1024 bytes smaller than speech.sph. • Octel Dump (od): Listing Values •  od -t d2 speech.raw

Production System, Data Preparation • Creating Sof file : raw file  Sof file • Using isip_make_sof • type the following : • isip_make_sof speech.raw • This creates binary file. • If you want to create text version, type the following •  isip_make_sof -type text -suffix _text speech.raw Data Header Data Isip_make_sof

Production System, Feature Extraction • What is feature extraction? • Speech Recognizer dose not • understand human voice • Only certain features of human voice • are useful for recognizer decoding • Must be numerically measured and • stored  feature vector • The process of taking these • measurements is known as • feature extraction. • Include the followings. • converting the signal to a digital form • measuring some important characters of the signal • augmenting these measurements Human voice MicroPhone Digital Signal

Production System, Feature Extraction • Frame • Typical frame duration in speech recognition is 10 ms • Determines the number of times we produce a feature vector • Window • Typical window duration is 25 ms • Surrounding the frame for smoother representation of the speech data • Determine the number of samples • Sampling rate : • number of samples per second taken from a continuous signal to make a discrete signal • Example) 8 Khz sampling rate with a frame duration of 10 ms, measurements would be taken over 80 samples to produce one feature vector.

Production System, Feature Extractionm, Signal Flow Graph • Basic process of extracting a single feature • Input : Speech data stored in digital form on a computer. • Energy : A computer program or algorithm specifically designed to • measure energy values in the speech data. • Ouput : A computer file which stores the measurements of features • Including window • Determine the number of samples used to calculate the energy • measurements input Energy output input Wind Energy output

Production System, Feature Extraction, Signal Flow Graph • Process of computing the frequency spectrum for a speech signal • Energy – time domain • Converting signals from the time domain to the frequency domain • Spec : represents the Fourier Transform • Additional methods are needed to fully measure the features needed by a • speech recognizer. • Further analyze FFT of speech signal • MFCC : Use a mathematical transformation called the cepstrum which computes the inverse Fourier transform of the log-spectrum of the speech signal. input Wind Spec output input Wind Spec Ceps output

Production System, Feature Extraction, Signal Flow Graph • Recipe : • The information for each component is stored in a single entity. • - format of the speech input • - algorithms for extracting the features • - format of the output • - make recipe using isip_transform • - Example) simple signal flow graph for extracting energy Recipe1 out Recipe File Engy inp

Production System, Feature Extraction, Signal Flow Graph • More complex Recipes • A single recipe file is produced for the entire graph. Recipe2 out Recipe File Wind Engy Ceps inp

Q & A • 1. ordinary data type and function • - In our production system, all data type is used in our classes. • Instead of using float, why we use Float? • This made me so confused. When I tried to use commandline • interface, I used cout, cin function in C++ class. However, the • situation is different in our system.

Reference • Production System Tutorial http://www.cavs.msstate.edu/hse/ies/projects/speech/software/tutorials/production/fundamentals/current/

Research Presentation: