180 likes | 289 Views
Linguist Module in Sphinx-4 By Sonthi Dusitpirom. Objective. How to change dictionary in Sphinx-4. Sphinx-4 .
E N D
Objective • How to change dictionary in Sphinx-4
Sphinx-4 • Sphinx-4 is an open source framework for speech recognition, written in the Java programming to help in the research of speech recognition system. In Sphinx-4 it has 3 main components • The FrontEnd • The Decoder • The Linguist
Sphinx-4 • In this project we focus on the Linguist componentthat has 3 subcomponents • The Acoustic Model • Acoustic model is pronounced of individual characters, known as phonemes. • The Dictionary • Dictionary is the pronunciation of all the words that the system can recognize. • The Language Model • Language model describes how the grammar looks like.
Acoustic Model • The acoustic model in Sphinx-4 consists of a set of left-to-right Hidden Markov Models for basic sound units. The units represent phones in a triphone context. • The acoustic model in Sphinx-4 is packed in JAR file. The advantage of packing it in a JAR file is that the file can be included in the classpath and referenced in the configuration file for it to be used in a Sphinx-4 application.
Acoustic Model • In sphix-4 we have two important models that are for difference purpose • TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800.jar is designed and created for number. If you need to recognize number then you should use this model • WSJ_8gau_13dCep_16k_40mel_130Hz_6800.jar is designed and created for text. If you want to recognize text then you should use this model.
Dictionary • Dictionary provides pronunciations for words found in the language model. The pronunciations split words into sequences of phonemes that found in the acoustic model.
Language Model • There are two types of model that describe language • Grammars language model • Grammars describe very simple types of languages for command and control, and you are written by hand or generated automatically with plain code. • Statistical language model • Statistical language model estimate the probability of the distribution of natural language. The most widely used statistical language model is N-gram
Create a new dictionary • In Sphinx-4 we already have a dictionary. This is the way to change dictionary • Extract WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar in lib directory. • Go to dict folder and open cmudict.0.6.d file in that folder. • Insert words and phonemes into cmudict.0.6d file and save. • Zip the folder that we extract in zip file. • Remove WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar from libraries in build path and add zip file into libraries in build path.
XML Configuration File • The configuration of a particular Sphin-4 system is determined by a configuration file. This configuration file defines the following • The names and types of all of the components of the system. • The connectivity of these components – that is, which components talk to each other. • The detailed configuration for each of these elements.
XML configuration File • Determining which components are to be used in the system. • Determining the detailed configuration of each of these components.
Use Model in Sphinx-4 • There are three steps to use new model from Sphinx-4 • Defining a language model. • Defining a dictionary. • Defining an acoustic model.
Define a Language Model <component name="jsgfGrammar" type="edu.cmu.sphinx.jsapi.JSGFGrammar"> <property name="grammarLocation“ value=" the path to the grammar folder "/> <property name="dictionary" value="dictionary"/> <property name="grammarName" value=“the name of grammar"/> <property name="logMath“ value="logMath"/> </component>
Define a Language Model <component name="trigramModel" type="edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel"> <property name="unigramWeight“ value="0.7"/> <property name="maxDepth" value="3"/> <property name="logMath" value="logMath"/> <property name="dictionary" value="dictionary"/> <property name="location" value="the name of the language model file" </component>
Define a Dictionary <component name="dictionary" type="edu.cmu.sphinx.linguist.dictionary.FastDictionary"> <property name="dictionaryPath" value="the name of the dictionary file" <property name="fillerPath" /> value="the name of the filler file"/> <property name="addSilEndingPronunciation" value="false"/> <property name="allowMissingWords" value="false"/> <property name="unitManager" value="unitManager"/> </component>
Define an Acoustic Model <component name="sphinx3Loader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader"> <property name="logMath" value="logMath"/> <property name="unitManager" value="unitManager"/> <property name="location" value="the path to the model folder"/> <property name="location" value="the path to the model folder"/> </component> <component name="acousticModel" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel"> <property name="loader" value="sphinx3Loader"/> <property name="unitManager" value="unitManager"/> </component>