330 likes | 656 Views
Acoustic Analysis of Speech. Robert A. Prosek, Ph.D. CSD 301. Acoustic Analysis. Instrumental acoustical analyses have been used for over 100 years Analog techniques dominated the first 60 of these years More recently, digital techniques have dominated the field
E N D
Acoustic Analysis of Speech • Robert A. Prosek, Ph.D. • CSD 301
Acoustic Analysis • Instrumental acoustical analyses have been used for over 100 years • Analog techniques dominated the first 60 of these years • More recently, digital techniques have dominated the field • We will begin by introducing a few of the important analog methods, then turn to the digital
Oscillograph/Oscillogram • Any device that can display a waveform is an oscillograph • The output (display or hardcopy) is an oscillogram • There is limited information available in a waveform • silence • burst • noise • periodicity
Filter Bank Analysis • In this procedure, a filter bank or a single filter is used to divide the signal energy into frequency bands • The output energy is displayed for each band • This is a form of spectral analysis • The output typically is displayed in the form of an histogram • The technique is very common in audiology and hearing applications
Sound Spectrograph/Spectrogram • The instrument is called a spectrograph • The output (usually a hardcopy) is a spectrogram • This is the most commonly used device in speech research • The spectrograph can capture the dynamics of speech • Acoustic signals vary only in frequency, amplitude and time • The sound spectrograph captures all of these
Sound Spectrogram • Abscissa is time • Ordinate is frequency • Intensity is shown as shades of gray • Black areas indicate the highest amplitudes • White areas indicate the noise floor • Amplitudes between these extremes are shown in varying shades of grey • the more intense the signal is at a particular frequency and time, the darker the trace
Digital Signal Processing (1) • In the late 1960’s general purpose digital computers made it possible to analyze acoustic signals on the computer • These techniques are necissarily discrete as well as digital • Once in discrete form, the signal can be stored conveniently and analyzed in many way that were not possible with analog techniques
Digital Signal Processing (2) • Presampling or brickwall filtering • Nyquist Theorum • In order to represent a signal faithfully, it must be sampled at a rate equal to twice its highest frequency • The brickwall filter removes all of the energy above the Nyquist frequency • The clinician/researcher determines the Nyquist frequency • Some knowledge of speech and speech and language disorders is required
Digital Signal Processing (3) • Sampling • Analog-to-digital conversion • Signal must be sampled at the Nyquist rate • Sampling decides the times at which the signal will be • Sampling converts the acoustic signal into a series of numbers • Instead of amplitudes at all instances of time, no matter how small the time interval, amplitudes in the digital world exist only at the sampling interval • Aliasing
Digital Signal Processing (4) • Quantization • Discrete number of amplitude levels • The more quantizer levels available, the more the discrete signal represents the original analog signal • In our applications, 16 -bit quantizers over a 20-volt range are typical • This yields an amplitude resolution of 300 μvolts and a signal to noise ratio of 96 dB
Digital Signal Processing (5) • After A/D conversion • the signal is stored as a stream of numbers • time is related by the index to the sampling rate • the amplitude is the stored number • in this form, many operations can be performed
Waveform Display • Duration measurements • speech changes gradually • some consistent rules need to be adopted • Signal editing • again, some consistent rules need to be adopted • Amplitude measurements • rms is the most common • vocal fundamental frequency
Digital Spectrum Analysis • The Fourier Transform revisited (FFT) • Periodic waveforms can be thought of as a series of sinusoids • amplitude and phase • The Fourier Transform and the Inverse Fourier transform allow powerful analysis-by-synthesis techniques
Digital Spectrograph • This is a series of spectra based on the FFT or LPC (see below) • The amplitude is depicted as shades of gray • PRAAT is an example of a digital spectrograph • Speech Filing System, Speech Station 2, Wavesurfer, and many other free or commercially spectrographs are available
Linear Predictive Coding (1) • Speech is highly predictable over the short term • It is not hard to predict the amplitude of the next time sample of the speech waveform from a knowledge of the previous amplitudes • As few as 10 to 15 previous samples is all that is required
LPC (2) • From statistics, we know that: • y= a0+a1(x-1)+a2(x-2)+...+an(x-n) • where y is the amplitude of the next sample • and x is one of the previous samples • This is linear prediction
LPC (3) • Linear Predictive Coding (LPC) is one of the most powerful techniques in speech analysis • The a’s in the previous equation can be used as estimates of the resonances of the vocal tract. • They can represent sections of the vocal tract
Wideband versus Narrowband Spectrograms • Wideband (0.005, 0.007, 0.009) • Short time window • Good for measuring formant frequencies • Narrowband (0.1, 0.05) • Long time window • Good for showing and measuring harmonics