220 likes | 403 Views
Acoustics of the Vocal Tract. Before we take our discussion of acoustics into the vocal tract, we need to consider one more phenomenon: RESONANCE .
E N D
Before we take our discussion of acoustics into the vocal tract, we need to consider one more phenomenon: RESONANCE Recall our tuning fork discussion from last time:Striking the tuning fork sets the tuning fork prongs into vibration, and also sets air particles in the vicinity into vibration. These are two different types of vibration: In FREE VIBRATION (e.g,. the tuning fork), a given mass always vibrates at the same frequency (or frequencies); this is the natural frequency of the mass. The amplitude of that vibration depends on applied force. In FORCED VIBRATION (e.g., surrounding air molecules), a mass vibrates at the frequency of the source of motion; the amplitude of that vibration depends on applied force and the applied frequency.
The amplitude of the forced vibration depends in part on applied frequencybecause: If a mass is forced to vibrate at its NATURAL FREQUENCY, the vibration will have greater amplitude than at another frequency. This phenomenon: RESONANCE To think about…Given: a glass with a natural, resonant frequency of 1050 HzANDa person singing, high c (525 Hz)If the singer is close enough to the glass, the glass will vibrate. (Probably none of you are old enough to remember a demonstration of this in Memorex tape commercials on TV…)WHY?
1st natural frequency 2nd natural frequency Amplitude Amplitude 384 Frequency (Hz) Frequency (Hz) RESONATOR: something set into vibration by another vibration (e.g., the glass just discussed or the vocal tract). RESONANCE CURVE: representation of potential amplitude of response of a resonator to different driving or input frequencies Tuning fork: Certain tubes:
100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 Input Resonance curve Output (=source frequency) (= filter)
When a source wave is passed through a resonator (filter): FREQUENCY of components is determined by SOURCE. AMPLITUDE of components is determined by FILTER. Source-Filter Theory of Speech Production (Johannes Müller, 1848): FILTER: vocal tract SOURCE: for voiced sounds: vocal cord vibration
500 1000 1500 2000 2500 500 1000 1500 2000 2500 F0 = 200 HZ F0 = 100 HZ SOURCE CHARACTERISTICS GLOTTAL TONE: 1) F0 depends on rate of vocal cord vibration 2) Harmonics occur at every whole # multiple of F0 3) Harmonics decrease in amplitude as they increase in frequency e.g.,
FILTER CHARACTERISTICS The VOCAL TRACT changes its filter characteristics depending on its shape (e.g., tongue, lip, jaw, velum position). We’ll begin with the vocal tract in its “neutral” configuration: [´]. VOCAL TRACT in neutral ([´]) configuration ≈ tube closed at one end (glottis) and open at other (lips). Also, in neutralconfiguration, tube has roughly uniform cross-sectional area. When the vocal tract is in a neutral configuration, its resonant frequencies are determined as follows: NF1 depends on tube length: NF1 = c / = c / 4 x tube lengthLet’s see why...
Standing wave: a condition in which multiple reflections of a wave are self-reinforcing. Pressure is zero at the node, while pressure is at a maximum at the antinode. In a tube closed at one end and open at the other, the lowest frequency that gives rise to a standing wave can fit 1/4 of its wavelength into the tube.
Calculating NF1 for [´]: NF1 = c / = c / 4 * tube length c = velocity of sound in air = 34,400 cm/sec = wavelength; here = 4 * length of vocal tract average male vocal tract: 17 cm long So: NF1 = 34,400 / 4 x 17 = ~ 500 Hz for “average” male. All other NFs for [´] are at odd-numbered multiples of NF1 NF2 = 3 * NF1 (~ 1500 Hz for “average” male) NF3 = 5 * NF1(~ 2500 Hz for “average” male) because standing waves occur at these multiples (check out Fig. 5.10 in Johnson’s text, p. 95).
So the general formula for calculating the natural frequencies of the neutral vocal tract is: frequency = c * (2n - 1) 4l n = 1, 2, 3 . . . (1st, 2nd, 3rd ... vocal tract resonance) c = velocity of sound in air l = length of vocal tract
F0 = 200 Hz F0 = 100 Hz Glottal tone Vocal tract resonances NF3 NF2 NF1 NF3 NF2 NF1 500 1000 1500 2000 2500 500 1000 1500 2000 2500 Spectrum of output Effect of passing glottal tone through vocal tract in [´] configuration
F2 F1 F2 F1 F3 F3 F0 = 200 Hz F0 = 100 Hz Spectrum of output h1 h5 h1 h7 h15 Natural resonant frequencies of the vocal tract are formants In periodic speech sounds: Fundamental frequency (F0) is determined by rate of vocal cord vibration. Formant frequencies are determined by vocal tract configuration. Therefore: Harmonic frequencies are determined by rate of vocal cord vibration. Harmonic amplitudes are determined by vocal tract configuration.
DIGITAL SIGNAL PROCESSING We need a basic understanding of how computers perform acoustic analysis and how acoustic information is represented in different types of spectral displays. DSP converts a continuous, analog signal (the original speech stream) into a discrete, digital signal. Because the information in between discrete signals is lost, we need to assure that the conversion process retains auditorily relevant information—in our case, information relevant to human ears. The conversion process requires us to make informed decisions about sampling rate (# of samples/sec) and quantization (amplitude accuracy).
.01 sec Time: Sampling rate How many samples per second is “enough”? The answer depends on the signal being converted. We need at least 2 samples per cycle to determine that there is a component at that frequency. Capturing a 100 Hz sine wave requires sampling 200 times per second. The Nyquist frequency is the highest-frequency component allowed by a given sampling rate, and is 1/2 the sampling rate.
Aliasing: If the input analog signal contains component frequencies that are higher than the Nyquist frequency, the sampled waveform will misrepresent these components as having a lower frequency, an error known as aliasing (one frequency looks like another). Anti-aliasing: To prevent aliasing, we must filter out frequencies that are higher than the Nyquist frequency (i.e., higher than 1/2 the sampling rate). So, if we want to analyze frequency components up to 10 kHz in speech, we would use a low-pass filter that would block all components above 10 kHz and sample at 20 kHz (actually, a bit higher). The human ear can detect frequencies up to 20 kHz, so a sampling rate of ~ 40 kHz would always be “enough”. The downside is that such a high rate takes up a lot of computer space. Some researchers use different rates, depending on the signal being analyzed (e.g., vowels vs. fricatives). PRAAT has 3 sampling rates: 11, 22, and 44 kHz.
Amplitude: Quantization How many levels of amplitude are “enough”? Regardless of the number of amplitude levels or steps chosen, there will always be a mismatch between the analog and digital signals: the waveform of the analog signal varies smoothly while that of the digital signal is “stepped” due to limited precision. This mismatch is quantization noise. So the question is how much noise is acceptable. Obviously the more amplitude steps we use, the less noise (or mismatch) there will be: 8 bits: 256 amplitude steps 12 bits: 4,096 amplitude steps 16 bits: 65,536 amplitude steps
Time/Frequency Tradeoff Filters have a tradeoff between the resolution of time and frequency. The wider the frequency bandwidth of the filter, the poorer the frequency resolution, but the better the time resolution. And vice versa: the narrower the frequency bandwidth of the filter, the better the frequency resolution, but the poorer the time resolution. Let’s consider the consequences of this tradeoff for the types of spectral representations that we’ll be using in here.
Sampling Window Duration Interval between Rate (Hz) Points ms estimates (Hz) 40,000 20,000 10,000 Power spectra: FFTs FFT spectra give a freq. x amplitude display of the signal at a particular slice in time. The duration of the time slice has consequences for frequency resolution.
Spectrograms Spectrograms provide a time by frequency by amplitude representation of the signal. Amplitude: relative darkness of acoustic energy Frequency Time Analog spectrograms: narrowband: 45 Hz bandwidth filter; 20 msec time resolution broadband: 300 Hz bandwidth filter; 3 msec time resolution Digital spectrograms: Effective width of filter is manipulated by changing # of samples in analysis window. (In PRAAT: select “spectrogram settings” under View and change the window length: 5 ms (0.005) for broadband and 29 ms (0.029) for narrowband.)
Amplitude: relative darkness of acoustic energy Frequency Time Where are the formants? Fundamental frequency? Formants: Because amplitude is represented by relative darkness, darker horizontal bands correspond to formants. F0: Narrowband spectrogram: F0determined from narrow horizontal striations ( = harmonics). Broadband spectrogram: F0 determined from vertical striations which to opening of the vocal cords. (But if F0 > 300 Hz, vertical striations smear, and instead horizontal striations corresponding to harmonics are seen.)