160 likes | 329 Views
Speech Recognition. - Ajay Iyer. Outline. What is a Spectrogram? Types of Spectrogram Linguistic and Acoustic Category Prosodic Analysis Pitch Estimation. What is a Spectrogram?. A Spectrogram is a visual representation of an acoustic signal.
E N D
Speech Recognition - Ajay Iyer
Outline • What is a Spectrogram? • Types of Spectrogram • Linguistic and Acoustic Category • Prosodic Analysis • Pitch Estimation
What is a Spectrogram? • A Spectrogram is a visual representation of an acoustic signal. • It displays the degrees of amplitude, frequency and temporal content of the signal. • Depending on the size of the Fourier analysis window, different resolutions in frequency/time are achieved. • A long analysis window, resolves frequency at the expense of time thereby giving a “Narrowband spectr0gram”. • A short analysis window on the other hand, resolves time at the expense of frequency – hence called a “Wideband spectrogram”.
Types of Spectrograms Narrowband Spectrogram Wideband Spectrogram
Linguistic/ Acoustic Categories • Labeling of the Linguistic and/or Acoustic categories aids in speeding up the search and decoding algorithms, by discarding the impossible and highly unlikely phoneme combinations. • Implementation : The given phoneme is compared to the different categories according to TIMIT lexicon. • The category thus obtained is displayed along with the phoneme as shown in the following slide.
Prosodic Analysis • Acoustically speaking, prosodies refer to variation in syllable duration, loudness, pitch and the formant frequencies of the speech signal. • Prosodic features are suprasegmental, i.e they are not restricted to any one segment of speech. They occur in some higher level of an utterance. • Say for example: “No!”, “Don’t!”
Pitch • Of the various prosodic features, the most important one is the pitch. • Its knowledge enables one to differentiate between contexts in which a word is spoken viz. Alerting or Referential contexts. • Thus incorporation of pitch information increases the accuracy of the recognizer.
Implementation • The pitch.m file uses cepstral analysis to extract pitch information. • Pitch.m performs analysis on one analysis frame segment. • Frame based analysis has been coded for pitch estimation of the entire speech signal. • The estimated fundamental frequency (pitch) is for the instance of time tpitch= tinterval(frameNum - 1) + fo/Fs;
References • Prosodic_Modeling_for_Improved_Speech_Recogntion_and_Understanding_Wang_phd_thesis.pdf • Prosodic Analysis of Alerting and Referential Context of Sentinel Words_final_draft.pdf • Discrimination_of_Sentinel_Word_Contexts_using_Prosodic_Features_Journal_v1.pdf • http://home.cc.umanitoba.ca/~robh/howto.html • http://en.wikipedia.org/wiki/Prosody_(linguistics)