600 likes | 940 Views
Sound in Multimedia and HCI. • sound • The physical characteristics of sound • The psychological characteristics of sound • Quality • Sound file formats • Sound on the Internet. Sound. Sound is a continuous wave that travels through the air
E N D
Sound in Multimedia and HCI • sound • The physical characteristics of sound • The psychological characteristics of sound • Quality • Sound file formats • Sound on the Internet
Sound • Sound is a continuous wave that travels through the air • The wave is made up of pressure differences. • Sound is detected by measuring the pressure level at a location • Sound waves have normal wave properties (reflection, refraction, diffraction etc.)
Physical characteristics of sound (I) • Sound – is a pressure wave which travels in air at 330ms – with a frequency between 20 and 20,000 Hz (variations/second) • Sound is a perceptual effect caused by a pressure wave of between 20 and 20KHz being detected at the ear.
Physical characteristics of sound (II) • The pressure wave has two physical characteristics: • Amplitude – The measure of displacement of the air pressure wave • Frequency – Represents the number of periods in a second. – and is measured in hertz (Hz) or cycles per second.
Devices for sound generation and transduction • For input to a computer, the pressure wave from the microphone is – converted to an analogue electrical signal (transduced) – converted to a digital signal ( digitized) (Analogue to Digital Converter), (ADC) • For output from a computer, the digitized signal is – converted to an analogue signal (Digital to Analogue Converter,) DAC – converted to a pressure wave to loudspeaker
Psychological characteristics of sound (I) •Sound has three defining characteristics: 1- loudness – how loud (intense) the sound appears 2- pitch – can be said to be simply the pitch of a sound is determined by its frequency 3- timbre – the nature of the sound (e.g. distinctive timbre of instruments) • All sounds have a loudness, – but many are unpitched • timbre is often used as a catch -all term to describe those aspects of the sound not captured by loudness and pitch.
Measuring loudness • Our ears have (essentially) a logarithmic response (with respect to sound amplitude) – loudness depends on power: proportional to amplitude * amplitude – actually, (real, perceptual) loudness is difficult to compute • Decibels – the ratio of the power of two signals is measured in decibels (dB)
Psychological characteristics of sound (II) • Direction – Sound is normally generated by some source • and normally there are lots of concurrent sources – and each source has some location – so that the sound from it is perceived to come from some direction
Basically, brain identifies source of a sound on the basis of differences in intensity and phase between the signals received from the Left (L) and Right (R) ears Earlier & louder L R
Psychological characteristics of sound (III) • Distance – we can often also tell (roughly) the distance of the sound source – this comes partly from the loudness of the sound, and partly from other characteristics of the sound • Physical correlates of distance are – reflections and spectral shape
Psychological characteristics of sound (IV) • Associations – many sounds have associations – these may be obvious (and usable)… • breaking glass • scream • door slamming – or may be personal (and different for each individual) • dog barking
Quality • Whenever sound is transduced, digitized, or reconverted to analogue, the original signal is altered in some way. • Digitizing sound – sound is digitized using an analogue to digital converter (ADC) – sound is converted back to analogue using a digital to analogue converter (DAC) – Both forms of conversion can introduce alterations in the sound • but the ADC is the more problematic. • Analogue to digital conversion has two parameters: – sampling rate (determined by the sampling process) – sample size (determined by the quantization process)
Sampling and Quantizing • To include sound in a multimedia application, the sound waves must be converted from analog to digital form • This conversion is called sampling – every fraction of a second a sample of the sound is recorded in digital bits • Sampling – process of acquiring an analogue signal • Quantizing – conversion of held signal into sequence of digital values
Sampling and Quantization • Sampling rate: Number of samples per second (measured in Hz) Sampling 3-bit quantization • 3-bit quantization gives 8 possible sample values
Sampling rate • Sampling rate describes how frequently the analogue signal is converted (i.e. analogue signal’s value is measured at discrete intervals) – Normally measured in samples/second • conversion is done regularly, at a fixed number of samples/second – sampling rate must be at least twice the highest frequency of interest in the signal • Nyquist sampling theorem • otherwise aliasing can occur - see later
Nyquist Theorem • The sampling frequency determines the limit of audio frequencies that can be reproduced digitally. One of the most important rules of sampling is called the Nyquist Theorem, which states that the highest frequency which can be accurately represented is less than one-half of the sampling rate. So, if we want a full 20 kHz audio bandwidth, we must sample at least twice that fast, i.e. over 40 kHz. If we don't, bad things happen. Here's our example sine wave
Consider a sine wave • The dashed vertical lines are sample intervals, and the blue dots are the crossing points - the actual samples taken by the conversion process. The sampling rate here is below the Nyquist frequency, so when we reconstruct the waveform we see the problem quite readily: • For Lossless digitization, the sampling rate should be at least twice the maximum frequency responses
A high frequency signal sampled at too low a rate looks like … … a lower frequency signal. What happens if sampling rate not high enough?
Application of Nyquist Theorem • Nyquist theorem is used to calculate the optimum sampling rate in order to obtain good audio quality. • The CD standard sampling rate of 44100 Hz means that the waveform is sampled 44100 times per sec. • Digitally sampled audio has a bandwidth of (20 Hz - 20 KHz). By sampling at twice the maximum frequency (40 KHz) we could have achieved good audio quality.
Sample size (II) • Quantization may be linear or logarithmic – Linear: levels to which a signal is quantized are linearly spaced. – logarithmic: provides more resolution at lower levels - idea is to use non -linearly spaced quantization levels, with higher levels spaced further apart levels, than the low ones, so quieter sounds are represented in greater detail than louder ones.
7 6 5 4 3 2 1 0 3-bit Quantization A 3-bit binary (base 2) number has 23 = 8 values. Amplitude Time — measure amp. at each tick of sample clock A rough approximation
14 12 10 8 6 4 2 0 4-bit Quantization A 4-bit binary number has 24 = 16 values. Amplitude Time — measure amp. at each tick of sample clock A better approximation
16-bit Sample Word Length A 16-bit integer can represent 216, or 65,536, values (amplitude points). We typically use signed 16-bit integers, and center the 65,536 values around 0. 32,767 0 -32,768
Audio Sampling Variables • Three main criteria: • How many samples? OR “sampling rate” • How much data per sample? OR “bit depth” • How many channels sampled?
Audio Quality • Factors involved: – The quality of the original audio source – The quality of the capture device and supporting hardware – The characteristics used for capture: frequency, data rate (amplitude), number of channels – The capability of the playback environment
How Many Samples?Audio “Sampling Rates” • Digital Video • CD Quality • Stereo • FM Radio • AM Radio • Telephone
Sample Rate Quality Less 8000 Telephone 11000 AM Radio 16000 FM Radio 22050 per Stereo Other Sample Rates
How Much Dataper Sample? • Common Sampling “Bit Depth” • 8 bits of data per sample • 16 bits of data per sample
How Many Channels Sampled • Number of Channels • Stereo (2 channels) • Mono (1 channel) • Multiple tracks
Record Settings Sound Quality File Size Audio Sampling Variables Sample Rate Bit Depth Number of Channels
Larger Audio Files More Audio Samples Audio Record Rate
Sample Rate Bit Depth No of Channels Audio File Size File size is determined by a combination of:
Sample Rate Bit Depth No of Channels Audio File Size File size is determined by a combination of: Length in Minutes
Various Sample Rates 8 bit or 16 bit Stereo or Mono Length in Minutes Audio File Size Variables of concern:
Audio File Size CD characteristics… - Sampling rate: 44,100 samples per second (44.1 kHz) - Sample word length: 16 bits (i.e., 2 bytes) per sample - Number of channels: 2 (stereo) How big is a 5-minute CD-quality sound file?
Audio File Size How big is a 5-minute CD-quality sound file? 44,100 samples * 2 bytes per sample * 2 channels = 176,400 bytes per second 5 minutes * 60 seconds per minute = 300 seconds 300 seconds * 176,400 bytes per second = 52,920,000 bytes = 50.5 megabytes (MB)
Sound compression (I) • Audio compression is a form of data compression designed to reduce the size of audio files. Audio compression algorithms are typically referred to as audio codecs. As with other specific forms of data compression, there exist many "lossless" and "lossy" algorithms to achieve the compression effect.
• Compression of sound data requires different techniques from those for graphical data • Requirements are less stringent than for video – data rate for CD quality audio is much less than for video, but still exceeds the capacity of dial -up Internet connections •Data rate is 44100*2*2 bytes/sec=176400bytes/s=1.41Mbits/sec – 3 minute song recorded in stereo occupies 31Mbytes
• Sound is difficult to compress using lossless methods – complex and unpredictable nature of sound waveforms • Different requirements depending on the nature of the sound – speech – music – natural sounds natural – …and on nature of the application
Sound compression (II) • A simple lossless compression method is to record the length of a period of silence – no need to record 44,100 samples of value zero for each second of silence – form of run-length encoding – in reality this is not lossless, as “silence” rarely corresponds to sample values of exactly zero; rather some threshold value is applied
• Difference between how we perceive sounds and images results in different lossy compression techniques for the two media – high spatial frequencies can be discarded in images – high sound frequencies, however, are highly significant • So what can we discard from sound data?
1- Companding • Non-linear quantization developed by telephone companies – known as companding (compressing/expanding) • mu-law ( (µ-law)) • A-law • Telephone signals are sampled at 8KHz. At this rate, µ-law compression is able to squeeze a dynamic range of 12 bits into just 8 bits, giving a one -third reduction in data-rate.
2- Pulse-code Modulation (PCM) • Is a digital representation of an analog signal where the magnitude of the signal is sampled regularly at uniform intervals, then quantized to a series of symbols in a digital (usually binary) code. PCM has been used in digital telephone systems and is also the standard form for digital audio in computers and the compact discred book format .
3- Linear Predictive Coding • Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digitalsignal of speech in compressed form, using the information of a linear predictive model. • It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters.
Resource Interchange File Format (RIFF) • The Resource Interchange File Format (RIFF), a tagged file structure, is a general specification upon which many file formats can be defined. The main advantage of RIFF is its extensibility; file formats based on RIFF can be future-proofed, as format changes can be ignored by existing applications. • The RIFF file format is suitable for the following multimedia tasks: • Playing back multimedia data • Recording multimedia data • Exchanging multimedia data between applications and across platforms
Sound files and formats It is important to distinguish between a file format and a codec. Though most audio file formats support only one audio codec, a file format may support multiple codecs, as AVI does. There are three major groups of audio file formats: • common formats, such as WAV, AIFF and AU. • formats with lossless compression, such as (filename extension APE), and lossless Windows Media Audio (WMA). • formats with lossy compression, such as MP3, lossy Windows Media Audio (WMA) and AAC.
There are many sound file formats - Windows PCM waveform (.wav), a form of RIFF specification; basically uncompressed data. - Windows ADPCM waveform (.wav), another form of RIFF file, but compressed to 4 bits/channel. - CCITT mu-law (A-law) waveforms (.wav), another form using 8 bit logarithmic compression. - NeXT/SUN file format (.snd , or .au), actually many different varieties: header followed by data, data may be in many forms, linear, or mu-law, etc.