150 likes | 265 Views
Two digits recognition. By: Meghal Bhatt. Sphinx4. Sphinx4 is a state of the art speaker independent , continuous speech recognition system written entirely in java programming language.
E N D
Two digits recognition By: Meghal Bhatt
Sphinx4 • Sphinx4 is a state of the art speaker independent , continuous speech recognition system written entirely in java programming language. • The design of sphinx4 is based on patterns that have emerged from the design of past systems as well as new requirements based on that researchers currently want to explore. • Sphinx4 also includes several implementation of both simple and state of art technique.
Sphinx4 • It has different parts: 1) Recognizer 2) Decoder 3) linguistic 4) Acoustic model 5) Front end 6) Instrumentation
Recognizer • It recognizes the audio signal spoken by the human and the searches the same in the transcript file. • And it is capable of recognizing discreet and continuous speech.
Decoder • The decoder of the sphinx -4 speech recognition systems incorporates several new designs strategies which have not been used in hmm based large vocabulary speech recognition systems. • Contains the search manager performs search using the algorithm used like breadth search, best first search, death first search and also contain feature scorer and pruner. • It uses the new aspects of graph construction by using multi level parallel decoding with independent simultaneous features streams without the use of compound HMM structure.
FRONT END • Performs the digital signal processing on the incmoing data. The sequence of operation performed by sphinx -4 front end is that it creates mel-cepstra from an audio file. • It also includes pluggable language model support for ASCII,, Hamming window, FFT , Mel frequency filter bank, discrete cosine transform , cepstral mean normalization and feature extraction of cepstra, delta cepstra features.
Acoustic model • In sphin-4 we have two important models that are for difference purpose • TIDIGITS_8GAU_13dcep_16K_40 mel_130Hz_6800.jar is designed and created for number that you should use this model for the acoustic Model. • WSJ_8gau_13dCep_16k_40mel_130Hz_6800.jar is designed and created for the text data.if a user wants to recognize text then should use this model for the text.
Dictionary • Dictionary provides pronounciation for words found in language model. The pronounciations splits words into sequences of phonemes which which are found in the acoustic model. • Responsible for how the word is pronounced this is the main task.
Language model • It contains representation of probability of occurrence of words.There are basically two types of model that describe the language: • Statistical language model: • Statistical language model estimate the probability of the distribution of natural language . The most widely used statistical language model is N-gram. • Grammar language model: • Grammar describes a very simple parts and types of languages for command and control, and you are written by hand or is generated automatically by plain code.
XML configuration File • Configuration file determines the configuration of a open source frame network sphinx-4 . This configuration files defines the following: • The different types of components and its names. • The in between connectivity of the components how they corresponds to each other. • And also shows the detailed configuration for each of these elements.
To use model in sphinx-4 • Basically there are three steps to use new model from sphinx-4 • Defining a language model. • Defining a dictionary. • Defining a acoustic model.
Defined language model <component name="jsgfGrammar" type="edu.cmu.sphinx.jsapi.JSGFGrammar"> <property name="grammarLocation“ value=" the path to the grammar folder "/> <property name="dictionary" value="dictionary"/> <property name="grammarName" value=“the name of grammar"/> <property name="logMath“ value="logMath"/> </component>
Defined acoustic model <component name="sphinx3Loader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader"> <property name="logMath" value="logMath"/> <property name="unitManager" value="unitManager"/> <property name="location" value="the path to the model folder"/> <property name="location" value="the path to the model folder"/> </component> <component name="acousticModel" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel"> <property name="loader" value="sphinx3Loader"/> <property name="unitManager" value="unitManager"/> </component>
Defined dictionary model <component name="dictionary" type="edu.cmu.sphinx.linguist.dictionary.FastDictionary"> <property name="dictionaryPath" value="the name of the dictionary file" <property name="fillerPath" /> value="the name of the filler file"/> <property name="addSilEndingPronunciation" value="false"/> <property name="allowMissingWords" value="false"/> <property name="unitManager" value="unitManager"/> </component>