Two digits recognition

Two digits recognition By: Meghal Bhatt

Sphinx4 • Sphinx4 is a state of the art speaker independent , continuous speech recognition system written entirely in java programming language. • The design of sphinx4 is based on patterns that have emerged from the design of past systems as well as new requirements based on that researchers currently want to explore. • Sphinx4 also includes several implementation of both simple and state of art technique.

Sphinx4 • It has different parts: 1) Recognizer 2) Decoder 3) linguistic 4) Acoustic model 5) Front end 6) Instrumentation

Recognizer • It recognizes the audio signal spoken by the human and the searches the same in the transcript file. • And it is capable of recognizing discreet and continuous speech.

Decoder • The decoder of the sphinx -4 speech recognition systems incorporates several new designs strategies which have not been used in hmm based large vocabulary speech recognition systems. • Contains the search manager performs search using the algorithm used like breadth search, best first search, death first search and also contain feature scorer and pruner. • It uses the new aspects of graph construction by using multi level parallel decoding with independent simultaneous features streams without the use of compound HMM structure.

FRONT END • Performs the digital signal processing on the incmoing data. The sequence of operation performed by sphinx -4 front end is that it creates mel-cepstra from an audio file. • It also includes pluggable language model support for ASCII,, Hamming window, FFT , Mel frequency filter bank, discrete cosine transform , cepstral mean normalization and feature extraction of cepstra, delta cepstra features.

Acoustic model • In sphin-4 we have two important models that are for difference purpose • TIDIGITS_8GAU_13dcep_16K_40 mel_130Hz_6800.jar is designed and created for number that you should use this model for the acoustic Model. • WSJ_8gau_13dCep_16k_40mel_130Hz_6800.jar is designed and created for the text data.if a user wants to recognize text then should use this model for the text.

Dictionary • Dictionary provides pronounciation for words found in language model. The pronounciations splits words into sequences of phonemes which which are found in the acoustic model. • Responsible for how the word is pronounced this is the main task.

Language model • It contains representation of probability of occurrence of words.There are basically two types of model that describe the language: • Statistical language model: • Statistical language model estimate the probability of the distribution of natural language . The most widely used statistical language model is N-gram. • Grammar language model: • Grammar describes a very simple parts and types of languages for command and control, and you are written by hand or is generated automatically by plain code.

XML configuration File • Configuration file determines the configuration of a open source frame network sphinx-4 . This configuration files defines the following: • The different types of components and its names. • The in between connectivity of the components how they corresponds to each other. • And also shows the detailed configuration for each of these elements.

To use model in sphinx-4 • Basically there are three steps to use new model from sphinx-4 • Defining a language model. • Defining a dictionary. • Defining a acoustic model.

Defined language model <component name="jsgfGrammar" type="edu.cmu.sphinx.jsapi.JSGFGrammar"> <property name="grammarLocation“ value=" the path to the grammar folder "/> <property name="dictionary" value="dictionary"/> <property name="grammarName" value=“the name of grammar"/> <property name="logMath“ value="logMath"/> </component>

Defined acoustic model <component name="sphinx3Loader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader"> <property name="logMath" value="logMath"/> <property name="unitManager" value="unitManager"/> <property name="location" value="the path to the model folder"/> <property name="location" value="the path to the model folder"/> </component> <component name="acousticModel" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel"> <property name="loader" value="sphinx3Loader"/> <property name="unitManager" value="unitManager"/> </component>

Defined dictionary model <component name="dictionary" type="edu.cmu.sphinx.linguist.dictionary.FastDictionary"> <property name="dictionaryPath" value="the name of the dictionary file" <property name="fillerPath" /> value="the name of the filler file"/> <property name="addSilEndingPronunciation" value="false"/> <property name="allowMissingWords" value="false"/> <property name="unitManager" value="unitManager"/> </component>

Thank you

Two digits recognition

Two digits recognition

Presentation Transcript

Significant Digits

Digits Task

Multiply By Two Digits

Handwritten Digits Recognition using Multilayer Perceptron

Dialling the Last Two Digits

Speech Recognition using Sphinx 4 (Ti Digits test)

Significant digits

Multiply By Two Digits

Significant Digits

Significant Digits

Significant Digits

Significant Digits

Significant Digits

Multiply By Two Digits

Significant Digits

Significant Digits

SIGNIFICANT DIGITS

Significant Digits

Multiply By Two Digits

Digits to Digits (D2D) April 2014

Multiplying 2-digits by 2-digits.

Multiply By Two Digits