Computer Science 121

Scientific Computing Winter 2012 Chapter 13 Sounds and Signals Computer Science 121

Background: Sounds and Signals • Recall transducer view of computer: convert input signal into numbers. • Signal: a quantity that changes over time • Body temperature • Air pressure (sound) • Electrical potential on skin (electrocardiogram) • Seismological disturbances • We will study audio signals (sounds), but the same issues apply across a broad range of signal types.

13.1 Basics of Computer Sound >> [x, fs, bits] = wavread(‘FH.wav'); >> size(x) ans = 41777 1 >> [max(x) min(x)] ans = 0.9922 -1.0000 >> fs fs = 11025 >> bits bits = 8 >> sound(x, fs)

13.1 Basics of Computer Sound

13.1 Basics of Computer Sound • x contains the sound waveform (signal) – essentially, voltage levels representing transduced air pressure on microphone. • fsis the sampling frequency – how many time per second (Hertz, Hz), did we measure the voltage? • bits is the number of bits used to represent each sample.

Questions • Why does the sound waveform range from -1 to +1? • These values are essentially arbitrary. One nice feature of a ±x representation is that zero means silence. • What role does the sampling frequency play in the quality of the sound? • The more samples per second, the closer the sound is to a “perfect” recording. • What happens if we double (or halve) the sampling frequency at playback, and why? • What is it about the waveform that determines the sound we're hearing (which vowel), and the speaker's voice?

Questions • What is it about the waveform that determines the sound we're hearing (which vowel), and the speaker's voice? • Most of this information is encoded in the frequencies that make up the waveform – roughly, the differences between locations of successive peaks – and not in the actual waveform values themselves. • We can do some useful processing on the “raw” waveform, however – e.g., count syllables:

Syllable Counting by Smoothing and Peak-Picking

function res = syllables(x, fs) % SYLLABLES(X, FS) counts syllables in speech waveform X by peak-picking % on smoothed rectified signal. FS is sampling rate. % how much higher a peak must be than its neighbors DIFF = .001; % size of moving-average "window" around each point, empirically determined winsize = fix(fs / 20); % rectify signal x = abs(x); % create smoothed signal from rectified y = zeros(1, fix(length(x)/winsize)); for i = winsize:winsize:length(x)-winsize y(fix(i/winsize)) = mean(x(i-winsize+1:i+winsize)); end plot(y) hold on % pick peaks in smoothed peaks = find((y(2:end-1)-y(1:end-2))>DIFF & (y(2:end-1)-y(3:end))>DIFF) + 1; plot(peaks, y(peaks), 'ro') res = length(peaks);

13.2 Perception and Generation of Sound • Sound is the perception of small, rapid vibrations in air pressure on the ear. • Simplest model of sound is a function P(t) expressing pressure P at time t: P(t) = A sin(2πft + φ) where A = amplitude (roughly, loudness) f = frequency (cycles per second) φ = phase (roughly, starting point) • This is the equation for a pure musical tone (just one pitch)

13.2 Perception and Generation of Sound • Inverse of frequency is period (distance between peaks):

13.2 Perception and Generation of Sound • E.g., whistling a musical scale:

13.2 Perception and Generation of Sound (ignore textbook) • Most real sounds are complicated mixtures of many frequencies (no pure tones in nature). • Still, we can learn some basic concepts by experimenting with pure tones: • >> FS = 10000; % sampling frequency • >> f = 500; % sound frequency • >> A = 1.0; % amplitude • >> t = linspace(0,1,FS); % 1 sec at 10 kHz • >> Pt = A * sin(2*pi*f*t); % ignore phase

13.2 Perception and Generation of Sound (ignore textbook) • >> Pt = A * sin(2*pi*f*t); • >> plot(t, Pt) • >> xlim([0 .01]) % plot from 0 to .01 sec

Multiplying the frequency by k gives us k times as many cycles in the same amount of time…. • >> Pt = A * sin(2*pi*3*f*t); % k = 3 • >> plot(t, Pt),xlim([0 .01])

Multiplying the amplitude by a number between 0 and 1 adjusts the loudness (volume) of the sound: • >> Pt = 0.5 * A * sin(2*pi*3*f*t); % half the loudness • >> plot(t, Pt), xlim([0 .01]) • >> ylim([-1 1]) % keep Y axis scaling

13.3 Synthesizing Complex Sounds (ignore textbook) • Any sound can (in principle) be expressed as the sum of a set of pure tones of various frequencies, amplitudes, and phases. • People are (arguably) insensitive to phase distinctions, so we will ignore phase here. • Consider a sound containing a 500 Hz and a 1200 Hz component at half the amplitude...

>> FS = 10000; >> t = linspace(0, 1, FS); >> f = 500; >> A = 1.0; >> Pt = A * sin(2*pi*f*t); >> f2 = 1200; >> A2 = 0.5; >> Pt2 = A2 * sin(2*pi*f2*t); >> Pt3 = Pt + Pt2; >> plot(t, Pt3), xlim([0 .01])

13.3 Synthesizing Complex Sounds • More generally, we have the formula • n • P(t) = S Ai sin(2 π fi t + φi ) • i=1 • With all φi typically set to zero.

13.4 Transducing and Recording Sound • Convert sound pressure to voltage, then digitize voltage into N discrete values in interval [xmin, xmax], by sampling at frequency Fs. • This is done by a analog /digital converter. • Another device must pre-amplify sound to match input expectations of a/d converter. • N is typically a power of 2, so we can use bits to express sampling precision (minimum 8 for decent quality). This is called quantization. • For Matlab, xmin, = -1.0, xmax= +1.0 • Various things can go wrong if we don't choose these values wisely....

13.4 Transducing and Recording Sound Figure 13.5. A segment of the sound “OH” transduced to voltage. Top: The preamplifier has been set appropriately so that the analog voltage signal takes up a large fraction of the A/D voltage range. The digitized signal closely resembles the analog signal even though the A/D conversion is set to 8 bits. Bottom: The preamplifier has been set too low. Consequently, there is effectively only about 3 bits of resolution in the digitized signal; most of the range is unused.

13.4 Transducing and Recording Sound Figure 13.6. Clipping of a signal (right) when the preamplifier has been set too high, so that the signal is outside of the −5 to 5 V range of the A/D converter.

13.5 Aliasing and the Sampling Frequency • Someone has an alias when they use more than one name (representation) • In the world of signals, this means having more than one representation of an analog signal, because of inadequate sampling frequency • Familiar visual aliasing from the movies (when 32 frames per second is too slow) • Wagon wheel / propeller going backwards • Scan lines appearing on computer screen • Inadequate Fscan result in aliasing for sounds too....

13.5 Aliasing and the Sampling Frequency

13.5 Aliasing and the Sampling Frequency Figure 13.8. Aliasing. A set of samples marked as circles. The three sine waves plotted are of different frequencies, but all pass through the same samples. The aliased frequencies are F +m/∆T, where m is any integer and ∆T is the sampling interval. The sine waves shown are m = 0, m = 1, and m = 2.

13.5 Aliasing and the Sampling Frequency • Nyquist's Theorem tells us that Fs should be at least twice the maximum frequency Fmax we wish to reproduce. • Intuitively, we need two values to represent a single cycle: one for peak, one for valley:

Aliasing in the Time Domain

Computer Science 121