1 / 37

Research Presentation:

Research Presentation:. On a Utility for Speaker Verification. Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering. Set up a standard IES environment. The first appearance at CAVS is good.

dani
Download Presentation

Research Presentation:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Presentation: On a Utility for Speaker Verification Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering

  2. Set up a standard IES environment • The first appearance at CAVS is good. • The first thing to do is set up IES environment. • Create Enlistment • Our production system is consist of many classes • I’m surprised at the structure of our software • environment. Even though many works has been • already done, I need to consolidate our system • with other IFCers. • GroupWise : • Good communication and schedule management • tools within our group • After that, I could make a program and compile it in my local machine. SERVER repository CVS client

  3. First IFC program, instruction • First simple IFC program do the following instructions • Reads a 3×3 float matrix from an Sof file. • Reads a 3×1 float vector from an Sof file. • Multiples the vector and matrix using the equation Z=alpha*A*B • Writes the result to an Sof file. • Allows the value of alpha to be set from the command line: • foo.exe –alpha 2.0 input.sof output.sof

  4. First IFC program, flow Foo.exe Foo.exe –alpha 2.0 input_file output_file Read Input Sof Read 3×3 float matrix Read 3×1 float vector Multiples the vector and matrix Write to Output Sof file

  5. First IFC program • After completing first IFC program, I ’m more familiar with our production system. • When I have questions about our production system, • our prominent group members always helps me about my questions. • It’s good to study alone, but sometime • it is better to ask an expert in the programming. • The more I know about our production system, • The more I have many questions.

  6. First IFC program • First Question • -How can we view the contents of the class? • Answer : • It is possible through debug method. • In order to view the contents of the Sof object, it is so hard to figure out during the debugging time. Instead of, I used debug method that is included in the source code. Sometimes this may retard our debugging time, but I know this is best way until now. Thus, I can figure out which data is contained in the Sof object. All other class variables are same.

  7. First IFC program • Second Question • - Using debug method, why changes string capacity? • SysString token = “oscar had a heap of apples” • Using debug method, we can see the each value. • <SysString::token> value_d = (5 >= 5) oscar • <SysString::token> value_d = (5 >= 3) had • <SysString::token> value_d = (5 >= 1) a • <SysString::token> value_d = (5 >= 4) heap • <SysString::token> value_d = (5 >= 2) of • <SysString::token> value_d = (6 >= 6) apples • Answer • In the expression (n >= m), 'n' is to total capacity of the data structure, • and 'm' is the current length. So for the line: • <SysString::token> value_d = (5 >= 3) had • The capacity of the SysString is 5 and the current length is 3, which is obvious from the string 'had'.

  8. First IFC program • Third Question • - What “L” means? • - In our production system, all of the classes uses “L” character. • For example, • SysString file1; • file1.assign(L"/tmp/foo_bin.sof"); • I didn’t exactly figure out why this “L” is used. • Answer •  The "L" is just a macro that tells the compiler that the following string is a Unicode string.

  9. ISIP_VERIFY • Basic Work Flow • - Decide what added and removed in the new version • - Analyze old version • - Draw class diagram • - Design new version • - Coding and Compilation • - Testing and fixing bugs

  10. ISIP_VERIFY • Decide what added and removed in the new version • - Currently, isip_verify does Speaker Verification, but only uses HMM algorithm. We want new isip_verify performs that function using HMM, SVM, RVM algorithm. This means new version of “isip_verify” will be more general utility than the old version. • Analyze old version • - isip_verify utility uses SpeakerVerifier,VerifyHMM,HMM classes, and • does both training and testing.Different to the “GMM” case, “SVM” statistical model have “isip_svm_learn” and “isip_svm_classify”. • While “isip_svm_learn” utility can process training, “isip_svm_classify” • can process testing.

  11. ISIP_VERIFY • The problem : • 1. isip_verify can process only using “GMM” statistical model. • 2. We does not have “RVM” routine which can do same function of • “SVM” utility. • Solution : • 1. Add SVM, RVM routine in the isip_verify • 2. Add same functionality in the RVM class. • 3. Modify the SpeakerVerifier class. • We can make a utility which can do all functions which I mentioned. • To begin with, I drew class block diagrams of each utility and make sure the relationship of classes and functions. After that, I could figure out more easily about these utilities. Next, I drew the flow chart of new utilities.

  12. ISIP_VERIFY Class Block Diagram ISIP_VERIFY (util/speech) Parameter check SpeakerVerifier (asr) run() If algorithm = HMM Train and model creation VerifyHMM (pr) HiddenMarkovModel(pr) If algorithm = TRAIN If algorithm = VERIFY algorithm = TRAIN Implementation = BAUM WELCH else Set algorithm Set implementation Verify() If implementation = LIKELIHOOD LIKELIHOOD RATIO Verifyl() Verifylr() linearDecoder() Run()

  13. ISIP_VERIFY ISIP_SVM_LEARN isip_svm_learn (util/speech) Parameter check SupportVectorMachine(pr) loadFeature() positive example, negative example train() if algorithm = SEQUENTIAL_MINIMAL_OPTIMIZATION sequentialMinimalOptimization() determine the support vector writeModel() StatisticalModelBase StatisticalModel(stat) – SupportVectorModel type SupportVectorModel(stat) getSupportVectorModel() getBias() write() getKernels() getAlphas() getSupportVectors()

  14. ISIP_VERIFY ISIP_SVM_CLASSIFY write the distance to output file isip_svm_classify (util/speech) StatisticalModel(stat) FeatureFile(mmedia) AudioDatabase(mmedia) read() open() open getBufferData() getRecord() getSupportVectorModel() getDistance()

  15. ISIP_VERIFY FLOW CHART Isip_verify (new version) isip_verify -param.sof .... -algo_type [hmm,svm,rvm] –mode [train, test] algorithm Check “algo_type” option Since no algo_type was specified, HMM algo_type was chosen No Yes HMM SVM RVM Check statistical_model = “SVM” Check statistical_model = “RVM” Check statistical_model = “GMM” statistical model statistical model statistical model Yes No Yes No No Yes error error error (Model incorrect) (Model incorrect) (Model incorrect) mode mode Check “mode” option Check “mode” option isip_verify (old version) No Yes Yes No verifyHMM class processes parameter file for isip_verify which can do both training and testing = gmmVerify() error error You must specify mode train test test train svmTest() rvmTrain() svmTrain() rvmTest()

  16. ISIP_VERIFY • Coding and Compilation • 1. Add and remove parameters and check the parameters (Won) • 2. Combine three functionality • - new “isip_verify” performs run() method in that utility and run() method • call support vector machine object or relevance vector machine object, • then performing training. This enables us to implement three models • on one utility. • 3. SpeakerVerifier class • - include SVM, RVM class • - modify parameter check • - modify run(sdb) method • - add run(pos_sdb,neg_sdb) method • 4. RVM class • - Add training and testing module (Sridhar)

  17. ISIP_VERIFY • Problems during coding and compilation • - How to verify SpeakerVerifier class? • After modifying existing class, we need to verify the correctness. • Diagnose method performs this functionality in our production system. • This method is implemented *_02.cc in every class. • After compiling the class, we execute “make test”. • This automatically check every function in that class. • - How can we resolve segmentation fault? • One of the most difficult things to figure out the reason. • Comment out all new modules, and then add one module, compile the • class. And then test it. This is continued when every new module is • tested.

  18. ISIP_VERIFY • Problems during coding and compilation • - Compilation, debugging time • - When developing a new program, one of the most time consuming • works is compiling and debugging. • - In our production system, it takes much time to compile and debug a • program. We have so many linking processes when compiling a • program. • - How can we resolve it? •  It is faster to do in our local repository.

  19. ISIP_VERIFY • Testing and fixing bugs • This part is as important as previous steps. • We can find faults and missing points during this step. • Problems : • What happens sdb object? • Normally, sdb object contains every commandline options.(except parameters) • However, the sdb object loses its contents when passing to the SpeakerVerifier class. • How can fix that? • Comment out all code except control code. • This is because I did not give list file option.

  20. Software Release • What need to know for Software Release? • Varmint utility : to track down all problems • Production system : • In order to better understand our system, I did and will do the followings. • Data Preparation • Feature Extraction • Recognition • Acoustic modeling • Language modeling •  These will be more specifically explain after this topic

  21. Software Release • ProductionRuleTokenType class • It uses lots of if-else statement when doing read/write function. • Instead of doing this, we can use NameMap class. • In order to do that, • Declare the NameMap class and modified related module. • Problems : I met run-time errors. • Solution : • I made a simple program that includes diagnose method in • prtt_02.cc. • After track down the function, I could find the reason. • I firstly checked in this class on our production system.

  22. Software Release • isip_lm_tester • This utility randomly generates sentences based on the language model file and tests the language model. • Problem : • Currently, generating state transcriptions won’t generate past first • symbols at the highest level. • What to do? • I need to track down this problem, but it requires to the understanding • of language model. •  Read and study our tutorial on the production system thoroughly, • and then can involve in fixing bugs in isip_lm_tester.

  23. Production System • In this part, I will go from Data preparation to Feature extraction. • How can we better understand our production system? • - Data Preparation • - Feature extraction • - Recognition • - Acoustic modeling • - Language modeling

  24. Production System, Data Preparation • Data Preparation • Why difficult as a beginner? • In normal programming, preparing input data is not hard. • But, in our production system, it is not easy to prepare that for a • beginner. •  It requires the knowledge of speech. • This includes speech file format, file conversion, sampling • Speech file • - Header + Sampled data • - Sampled data  raw files header header Data Data Data header Data header Data WAV, Sof SPHERE, AU Raw

  25. Production System, Data Preparation • Sof Format • Information of thelocation of each object stored in the file, and the • corresponding object data. • Support two basic storage formats • - text : human readable files • - binary : sampled data • Used by all data objects in the ISIP environment to unify and simplify I/O. • Binary format : • - Handle machine architecture differences with automatic byte • transformations. • - Used for large quantities of data for the obvious efficiency gains. • - The objects are stored in a binary tree and a symbol table is used to • hold the object class names.

  26. Production System, Data Preparation • Text format : • - Used User input parameter files in the ISIP environment. • - Simple format that consists of object names and tags, • followed by the object data • - Example : @ Sof v1.0 @ @ Float 0 @ value = 1.3; @ VectorFloat 0 @ value = 3.5,5.7,3.8; @ VectorLong 0 @ value = 2,3;

  27. Production System, Data Preparation • Converting from external (i.e., SPHERE, WAV) format to raw format. • speech.sph  speech.raw • 1. Convert the SPHERE file's binary data • to 16-bit linear samples using w_decode •  w_decode -o pcm speech.sph speech-nb.sph • 2. Strip the file's header using h_strip •  h_strip speech-nb.sph speech.raw • 3. The result is speech.raw which is identical • everything except missing first 1024-bytes • header information • 4. One line command : 2 + 3 • w_decode -o pcm speech.sph - | h_strip - - > speech.raw Header Data

  28. Production System, Data Preparation • Verification of Conversion to Raw • SoX: Audio Playback •  sox -t .sw -r 16000 speech.raw -t .au speech.au •  audioplay speech.au • File Size Comparison: Using "ls -l" •  ls -l speech.* • -rw-rw-r-- 1 may isip 97486 Sep 10 15:19 speech.raw -rw-rw-r-- 1 may isip 98510 Sep 10 15:12 speech.sph •  We can see the fifth field is file size. • Speech.raw is 1024 bytes smaller than speech.sph. • Octel Dump (od): Listing Values •  od -t d2 speech.raw

  29. Production System, Data Preparation • Creating Sof file : raw file  Sof file • Using isip_make_sof • type the following : • isip_make_sof speech.raw • This creates binary file. • If you want to create text version, type the following •  isip_make_sof -type text -suffix _text speech.raw Data Header Data Isip_make_sof

  30. Production System, Feature Extraction • What is feature extraction? • Speech Recognizer dose not • understand human voice • Only certain features of human voice • are useful for recognizer decoding • Must be numerically measured and • stored  feature vector • The process of taking these • measurements is known as • feature extraction. • Include the followings. • converting the signal to a digital form • measuring some important characters of the signal • augmenting these measurements Human voice MicroPhone Digital Signal

  31. Production System, Feature Extraction • Frame • Typical frame duration in speech recognition is 10 ms • Determines the number of times we produce a feature vector • Window • Typical window duration is 25 ms • Surrounding the frame for smoother representation of the speech data • Determine the number of samples • Sampling rate : • number of samples per second taken from a continuous signal to make a discrete signal • Example) 8 Khz sampling rate with a frame duration of 10 ms, measurements would be taken over 80 samples to produce one feature vector.

  32. Production System, Feature Extractionm, Signal Flow Graph • Basic process of extracting a single feature • Input : Speech data stored in digital form on a computer. • Energy : A computer program or algorithm specifically designed to • measure energy values in the speech data. • Ouput : A computer file which stores the measurements of features • Including window • Determine the number of samples used to calculate the energy • measurements input Energy output input Wind Energy output

  33. Production System, Feature Extraction, Signal Flow Graph • Process of computing the frequency spectrum for a speech signal • Energy – time domain • Converting signals from the time domain to the frequency domain • Spec : represents the Fourier Transform • Additional methods are needed to fully measure the features needed by a • speech recognizer. • Further analyze FFT of speech signal • MFCC : Use a mathematical transformation called the cepstrum which computes the inverse Fourier transform of the log-spectrum of the speech signal. input Wind Spec output input Wind Spec Ceps output

  34. Production System, Feature Extraction, Signal Flow Graph • Recipe : • The information for each component is stored in a single entity. • - format of the speech input • - algorithms for extracting the features • - format of the output • - make recipe using isip_transform • - Example) simple signal flow graph for extracting energy Recipe1 out Recipe File Engy inp

  35. Production System, Feature Extraction, Signal Flow Graph • More complex Recipes • A single recipe file is produced for the entire graph. Recipe2 out Recipe File Wind Engy Ceps inp

  36. Q & A • 1. ordinary data type and function • - In our production system, all data type is used in our classes. • Instead of using float, why we use Float? • This made me so confused. When I tried to use commandline • interface, I used cout, cin function in C++ class. However, the • situation is different in our system.

  37. Reference • Production System Tutorial http://www.cavs.msstate.edu/hse/ies/projects/speech/software/tutorials/production/fundamentals/current/

More Related