340 likes | 472 Views
EE2F1 Multimedia (1): Speech & Audio Technology Lecture 10: Audio Technology Martin Russell Electronic, Electrical & Computer Engineering School of Engineering The University of Birmingham. See http://www.howstuffworks.com. Human Hearing – reminder. Human ear – quoted as 20Hz to 20kHz range
E N D
EE2F1Multimedia (1): Speech & Audio TechnologyLecture 10: Audio TechnologyMartin RussellElectronic, Electrical & Computer EngineeringSchool of EngineeringThe University of Birmingham See http://www.howstuffworks.com
Human Hearing – reminder • Human ear – quoted as 20Hz to 20kHz range • Upper limit drops with age – many can’t hear above 15kHz • Other animals can hear up to 50kHz (ultrasonic whistle) • Lower limit – below 40Hz we ‘feel’ the frequency with our bodies – the ear becomes less sensitive at low frequencies • For speech, we need 300Hz to 3.4kHz – based on intelligibility tests – hence telephone lines have a bandwidth of 3.1kHz
Human hearing • We have two ears! • Allows direction finding – two methods are used • Ear nearest to the sound hears the sound both: • Louder since we have one on each side of our head • Sooner since the travel time of the sound is less (this also produces a phase shift between the ears) • Using this we can identify the direction of the sound along the Left-Right axis • ….but the ear can also identify front-back direction (but not as accurately as L-R) • ….and also to a lesser extent up-down
Analogue sound • 1877 -Thomas Edison – metal cylinder phonograph. Record and playback in mono. Low bandwidth (few kHz). 1885 - Wax cylinders for general use. • 1900 - 78’s – 10 inch disc, holds 3 minutes of audio. Mechanical playback (motor and horn). Low bandwidth (5 kHz) • 1948 – LP’s 33RPM, 20 minutes per side, electrically assisted playback. Stereo. High bandwidth (18 kHz) • 1949 – 45’s – single track per side http://www.history-of-rock.com/record_formats.htm
Mechanical modulation • All these formats relied on a modulated groove which moved a mechanical stylus. • Prone to wear – mechanical contact • Mechanical damage produces scratches, clicks and pops • Modulated groove produces movement in the stylus which moves a magnet. Coil is placed next to magnet and current is induced producing electrical signal
Analogue Radio • First was AM radio - Medium wave (520-1630kHz) and long wave (150-300kHz) • Mono, 4.5kHz bandwidth. • In USA, stereo AM introduced. Many competing standards, but still low quality. • FM radio – VHF (88-108MHz) • Stereo, 15kHz bandwidth. • Ideal would be to send Left and Right as 2 channels – but mono receivers would need to be stereo and add channels together. Mono cost > stereo cost! • Solution: send L+R and L-R. Mono just receives L+R • Stereo decodes both and uses op-amps to de-multiplex L and R
http://www.st-andrews.ac.uk/~www_pa/Scots_Guide/RadCom/part21/page1.htmlhttp://www.st-andrews.ac.uk/~www_pa/Scots_Guide/RadCom/part21/page1.html
Magnetic Tape • All need electronics – tape is magnetic • 1940’s – commercial use of magnetic tape. Reel-to-reel, not suitable for domestic use • 1964 – Philips introduce the “Cassette” for dictation (speech only) machines. Mono and low quality by design! • 1966 – 8-track cartridge for stereo use, mainly in cars. Competed with cassette for several years, but Philips improved cassette (stereo and hi-fi) and it won.
Magnetic tape • Thin plastic tape coated with ferric oxide powder • If exposed to magnetic field, Fe2O3 is permanently magnetized See http://howstuffworks.com
Digital Audio - CD’s • 1982 - First digital audio introduced to consumers • 12cm single sided disc, stereo, 44.14kHz, 16 bit • Main marketing claim – indestructible • Sales slow to start – Audiophiles didn’t like the sound • Eventually convenience becomes main marketing force • Technology: Laser pickup reads ‘pits’ and ‘bumps’ representing 1’s and 0’s
CD construction • CD is sandwich of plastic with reflective aluminium • Laser focuses on one bit at a time See www.howstuffworks.com/cd.htm
Pits and bumps – 1’s and 0’s • Reflective layer has small bumps 125m high – this is ¼ wavelength of the laser • Laser beam covers both the bump and the surrounding area • Light from the surrounding areas is reflected, plus light from the bump – but this light has 125 + 125m extra to travel so reflection is 180 degrees out of phase with surround light and cancels out – a 0 • Where there is no bump, all the light is reflected (in phase) – a 1 • Tracking – laser needs to stay over the bumps – most common way is the Three-beam pickup
Data rate • CD’s are uncompressed – 16 bit, 44.14kHz sampling, 74 minute max length • Capacity = 16 * 44140 * 2 * 60 * 74 / 8 = 740Mbytes • Error checking is built in – but for audio some errors can be hidden my interpolation • E.g. samples go 341, 368, (error), 422, 448 • Simply average the samples around the error • (368+422)/2 = 395 – linear interpolation • Note Data CD’s are lower capacity – more error checking
Mono and stereo • Mono – one sound source in front of the listener • Information can be delivered – mono still used for speech • No directional information – not relevant for many apps • Stereo – we have two ears, so use two sound sources • Directional information – can localise sound L/R making a ‘soundstage’ • Most modern analogue sources are stereo
Stereo basics • Two discrete channels for Left and Right • Sound of equal amplitude and in phase appears to come from the centre between the two speakers • Sound louder on one channel and in phase appears to come from between the speakers but closer to the louder one • Sounds that are out of phase appear to come from nowhere in particular – ambience or the surroundings • Fundamental idea is amplitude variations steer sound left to right, phase variations steer sound front to surround
Multi channel sound • Human ear detects direction of sound (front to back) • Earlobe produces frequency signature depending on direction – hence complex shape of earlobe – a mechanical filter • Hafler Stereo – uses a third loudspeaker to reproduce the out of phase sounds
Cinema Audio • “Don Juan” Warner Brothers 1926 – first commercial film with sound (music but no dialogue) • “The Jazz Singer”, Warner Bothers 1927 – (music, effects plus some dialogue) • Sound-on-disc record player playing wax record synchronised to projector • Sound-on-disc replaced with sound-on-film in 1930s See http://www.howstuffworks.com
Sound-on-film • Two technologies: • Optical • Magnetic • Optical technology: • Transparent strip recorded on side of film • Width of strip varies – variable area soundtrack • Light source focussed onto strip – shines through onto photocell • Varying width give varying current • Variation is variable density soundtrack
Magnetic Sound-on-Film • Became popular in 1950s • Advantages • Stereo, better sound quality • Drawbacks • Added after filming, more expensive, shorter life, more easily damaged • Up to 6 tracks, but too expensive • Stereo optical tracks too noisy until 1965
Dolby noise reduction • Split audio into 4 bands • Pre-emphasis • Companding • Re-combination See http://www.howstuffworks.com
Surround Stereo • Two optical strips used to encode 4 channels: • Left • Right • Centre • Rear • Encoding based on Boolean logic See http://www.howstuffworks.com
Dolby Stereo screen Surround Right Centre Left Audience
Home Systems • Dolby Surround • Dolby ProLogic • Four tracks of information on two physical tracks • ‘Backwards compatible’ with stereo • Dolby Surround uses Left and Right speakers to produce ‘phantom’ centre speaker • ProLogic sends centre sound to actual centre speaker See http://www.howstuffworks.com
Digital sound • First used in “Jurassic Park” • DTS – Digital Theatre System • Optical track on film synchronises with audio on CD • Six audio tracks compressed onto 1 or 2 CDs • Initially viewed as temporary solution, but has lasted much longer than expected See http://www.howstuffworks.com
Digital sound-on-film • Dolby Digital • Space between sprocket holes used to encode information See http://www.howstuffworks.com
TV and videotape audio • Analogue TV – 15 kHz bandwidth, mono. Outside UK, stereo analogue TV introduced • Videotape (V2000, Betamax, VHS) – all used a simple linear audio track – similar to cassette. Low quality • Stereo video introduced in late 1970’s – simply split the mono track in half, so quality even poorer • Hi-Fi introduced in 1984 – first was Sony Betamax, VHS followed a year later. Note that modern Hi-Fi VHS machines use analogue FM for audio – not NICAM!
Nicam 728 - Stereo TV • Near Instantaneous Companded Audio Multiplex • http://tallyho.bc.nu/~steve/nicam.html • Sound is digitised to 14 bits accuracy at a sampling rate of 32KHz. • The upper frequency limit of a sound channel is 15KHz due to anti-aliasing filters at the encoder. • The 14 bit original sound samples are companded digitally to 10 bits for transmission. (Digital compansion ensures that the encoding and decoding algorithms can track perfectly).
Nicam (continued) • Near Instantaneous Companding - 1ms worth of sound data has to be input before the companding process can do its work. The "Audio Multiplex" term implies that the system is not limited just to stereo operation. • Each packet consists of a 32 bit header • Plus a 'payload' packet of 704 data bits – total rate of 728 bits/ms or 728Kbits/sec NICAM packet 32 bit header 704 bits data Header contains information to enable data to be de-compressed
Companding; Non-linear coding • Takes 1 ms of raw 14-bit samples (i.e. 32 samples) and reduces them from 14 bits to 10. • The numerically largest (positive or negative) sample of the 32 is used to choose which 10 bits of all 32 samples in the block will actually be transmitted. • Note 3 range bits to signal negative values as well • Compression factor is about 0.7 – but is lossy – zero’s replace Least Significant Bits
Compression • Human ear is non-linear – can tolerate more distortion (or less accuracy) for louder sounds – basic companding • Masking – loud sound can mask lower-amplitude sound of similar frequency • Perceptual coding – MP3’s!
Digital broadcast formats - MPEG • Digital TV (Sky, Cable, ITV Digital) - all provide stereo only (now….) • MPEG-1 (ISO/IEC 11172-3) provides single-channel ('mono') and two-channel ('stereo' or 'dual mono') coding at 32, 44.1, and 48 kHz sampling rate. The predefined bit rates range from 32 to 448 kbit/s for Layer I, from 32 to 384 kbit/s for Layer II, and from 32 to 320 kbit/s for Layer III.
MPEG (continued) • MPEG-2 BC (ISO/IEC 13818-3) provides a backwards compatible multichannel extension to MPEG-1; up to 5 main channels plus a 'low frequency enhancement' (LFE) channel can be coded; the bit rate range is extended up to about 1 Mbit/s; • An extension of MPEG-1 towards lower sampling rates 16, 22.05, and 24 kHz for bitrates from 32 to 256 kbit/s (Layer I) and from 8 to 160 kbit/s (Layer II & Layer III). • http://mpeg.telecomitalialab.com/faq/audio.htm
Summary • Analogue audio • Digital audio • Cinema audio • Home systems • Video and TV audio