520 likes | 682 Views
CGMB 324: Multimedia System design. Chapter 05: Multimedia Element III – Audio. Objectives. Upon completing this chapter, you should be able to: Understand how audio data is being represented in computers Understand the MIDI file format Understand how to apply audio in multimedia systems.
E N D
CGMB 324: Multimedia System design Chapter 05: Multimedia Element III – Audio
Objectives Upon completing this chapter, you should be able to: • Understand how audio data is being represented in computers • Understand the MIDI file format • Understand how to apply audio in multimedia systems
Digitization Of Sound Facts about sound • Sound is a continuous wave that travels through the air, at around 1000 km/h. • without air there is no sound !!! (like in space) • The wave is made up of pressure differences. • Sound is detected by measuring the pressure level at a certain location. • Sound waves have normal wave properties • Reflection (bouncing) • Refraction (change of angle when entering a medium with different density) • Diffraction (bending around an obstacle) • This makes the design of “surround sound” possible
Digitization Of Sound • Human ears can hear in the range of 20 Hz (a deep rumble) to about 20 kHz and at intensities starting –10dB to 25 dB. • This changes with age, as the sensitivity of our hearing usually reduces. • The intensity of sound can be measured in terms of Sound Pressure Level (SPL) in decibels (dBs).
Digitization Of Sound • A turbojet engine might be as loud as 165 dB. • A car driving on the highway, about 100 dB. • And, a whisper, averaging around 35 dB. • We cannot hear very high frequencies outside our hearing range and neither can we hear very low ones.
Digitization Of Sound • Sounds around us happen in a wide range of frequency and intensity. • For example, leaves rustling are very soft, or very low intensity (amplitude) sounds, but they have high frequencies. • A jet engine is a very high amplitude sound and it is also in the high frequency area. • A cargo truck is very loud (high intensity), but within the low frequency area. • Our speech sounds spread across many frequencies and vary in intensity.
Digitization Of Sound Digitization In General • Microphones and video cameras produce analog signals (continuous-valued voltages)
Digitization Of Sound • In audio digitization, two components form the composite sinusoidal signal of the actual sound – the fundamental sine waveand the harmonics. • Amplitude = intensity • Frequency = pitch sine wav pattern derive from y=sin x graph
Analog - Digital Converter (ADC) ADC Voice 10010010000100 Sine wave
Digitization Of Sound • To get audio or video into a computer, we must digitize it (convert it into a string of numbers) • So, we have to understand discrete sampling • Sampling -- divide the horizontal axis (the time dimension) into discrete pieces. Uniform sampling is ubiquitous (everywhere at once).
Digitization Of Sound • Quantization (sampling in the amplitude)-- divide the vertical axis (signal strength) into pieces. Sometimes, a non-linear function is applied. • 8-bit quantization divides the vertical axis into 256 levels. • 16 bit gives you 65536 levels. • Digital audio is a real representation of sound in the form of bits and bytes.
Amplitude Quantization Levels Sampling t (seconds) Quantization & Sampling
Quantization & Sampling Question: Given a 16 bit CD-quality musical piece to be sampled using 44.1KHz (sampling rate per second) for 10 minutes; what is the size of the file in mono and stereo? (Use x1024 conversion) Answer: 44.1 x 103 x 16 bit x (10 x 60) / 8 (bits) = 50.486 MBytes For stereo (left/right) channel, the amount is doubled: 50.486MBytes x 2 = 100.936 MBytes
Quantization & Sampling 44.1 x 103 x 16 bit x (10 x 60) = 423360000 bits (/8) (mono) = 52920000 bytes (/1024) = 51679.688 KB (/1024) = 50.468 MB Stereo (mono x 2) = 50.468 x 2 = 100.936 MB That’s why we need COMPRESSION!
Digitizing Audio Questions for producing digital audio (Analog-to-Digital Conversion): • How often do you need to sample the signal? • How good is the signal? • How is audio data formatted?
The Nyquist Theorem, also known as the sampling theorem, is a principle that engineers follow in the digitization of analog signals. • For analog-to-digital conversion (ADC) to result in a faithful reproduction of the signal, slices, called samples, of the analog waveform must be taken frequently. • The number of samples per second is called the sampling rate or sampling frequency. Nyquist Theorem • Suppose we are sampling a waveform. How often do we need to sample it to figure out its frequency?
Nyquist Theorem • If we sample only once per cycle (blue area), we may think the signal is a constant.
Nyquist Theorem • If we sample at another low rate, e.g., 1.5 times per cycle, we may think it's a lower frequency waveform
Nyquist Theorem • Nyquist rate -- It can be proven that a bandwidth-limited signal can be fully reconstructed from its samples, if the sampling rate is at least twice the highest frequency in the signal. • The highest frequency component, in hertz, for a given analog signal is fmax. • According to the Nyquist Theorem, the sampling rate must be at least 2(fmax), or twice the highest analog frequency component
Typical Audio Formats • Popular audio file formats include .au (Unix workstations), .aiff (MAC), .wav (PC etc) • A simple and widely used audio compression method is Adaptive Delta Pulse Code Modulation (ADPCM). • Based on past samples, it predicts the next sample and encodes the difference between the actual value and the predicted value.
Audio Quality vs. Data Rate • For 44.1 KHz, CD quality recording, one sample is taken every 1 / 44.1x103 = 22.675 s (microsecond) for a single channel • For DVD quality recording, one sample is taken every 1/ 192,000 = 5.20 s (microsecond) for a single channel. • You can expect it to have 4 times better accuracy and fidelity than a CD!
Applying Digital Audio In MM Systems • When using audio in a multimedia system, you have to consider some things, like : • The format of the original file (ex. WAV) • The overall amplitude (is it loud enough) • Trimming (how long do you want it to be?) • Fade In & Fade Out • Time stretching • Frequency and channels (ex. 44.1KHz, Stereo) • Effects • File size
Applying Digital Audio In MM Systems • The format we usually work with is WAV. • It is uncompressed but allows us to preserve the highest quality. • If the waveform isn’t loud enough, you can either ‘normalize’ it, which increases the amplitude as high as it can get without clipping (pops). • Or if it still isn’t loud enough, you might need to use dynamics processing, which can enhance vocals over instruments. • For example, this is useful for making heavy metal songs (which usually have portions of whispering and loud guitar solos) more palatable to the audience using such a MM system.
Normalization After Normalization Before Normalization
Applying Digital Audio In MM Systems • Sometimes, we may only want a portion of the music file, so we need to ‘trim’ it. • This can be done by selecting the portion you want to keep and then, executing the ‘trim’ command. • Usually, after trimming, we’re left with a sample that ‘just starts’ and ‘suddenly ends’. • To make it more pleasing to the ear, fade-ins and fade-outs are performed. • This is usually done to the first and last 5 seconds of the waveform (depending on how long your sample is), to make it seem as if the clip is just starting and ends properly
Trimming A portion of the waveform is selected And then, the ‘trimming’ function removes everything else, leaving just the part we require
Fades The front portion (usually first 5 seconds) of the trimmed clip is selected Then, the ‘fade in’ function is executed The resulting clip
Fades The front portion (usually first 5 seconds) of the trimmed clip is selected Then, the ‘fade in’ function is executed The resulting clip
Applying Digital Audio In MM Systems • Some sound files end abruptly with a simple drum beat or scream. • If it’s too short, you can always ‘time-stretch’ that portion. • This powerful function usually distorts the waveform a little. • However, for a sharp, loud shriek at the end (of a song, for example), stretching it to last maybe 0.25 seconds longer, might actually make it sound better. • Time stretching is also useful when your audio stream doesn’t quite match the video stream. • It can then be used to equal their lengths for synchronization. The better way, is of course, to adjust the video frame rate or delete some scenes, but that is another story.
Applying Digital Audio In MM Systems • You also need to decide, what frequency you wish to keep the audio sample in. • If it’s a song, CD quality is standard (known as Red Book or ISO 10149). • If it’s merely narration, a lower frequency of about 11.025 KHz, and a single channel (mono) is sufficient. • You can also do the same for simple music, like reducing the frequency to 22.05 KHz, which still sounds good.
Applying Digital Audio In MM Systems • Another technique is resampling. • This involves reducing the bitrate from 16 to 8 or 24 to 16 and so forth. • Resampling usually has a greater effect on the overall sound quality (lower bitrate means poorer quality). • So, it’s often best to retain the bitrate, but reduce the sample frequency. • All of this is done to save precious space and memory.
Applying Digital Audio In MM Systems • There are times when you need to add certain effects to your audio. • Unlike the things we can do with images, audio effects are not always so obvious. • For example, we can mimic the stereo effect of an inherently mono signal by duplicating the waveform (creating two channels from the single one), and then, slightly delaying one of them (left or right) to create a pseudo-stereo effect. • Other effects or functions like reverb, allow you to make a studio recording sound like a live performance or simply make your own voice recording sound a little better than it actually is.
Applying Digital Audio In MM Systems • Finally, the file size is important. If you saved as much memory as you can by using a lower frequency, shorter clip and lower bitrate, you can still save more by using a good compression scheme like mp3, which gives you a ratio of about 12:1. • There are often other audio compression codecs available to you as well. • Remember though, that compression has its price – the user will need to be able to decode that particular codec and usually, more processing power is required to play the file. • It may also cause synchronization problems with video. All things considered, developers usually still compress their audio.
MIDI • What is MIDI? MIDI is an acronym for Musical Instruments Digital Interface • Definition of MIDI:a protocol that enables computers, synthesizers, keyboards, and other musical devices to communicate with each other. • It is a set of instructions how a computer should play musical instruments.
History of MIDI • MIDI is a standard method for electronic musical equipment to pass messages to each other. • These messages can be as simple as play middle C until I tell you to stop or more complex like adjust the VCA bias on oscillator 6 to match oscillator 1. • MIDI was developed in the early 1980s and proceeded to completely change the musical instrument market and the course of music. • Its growth exceeded the wildest dreams of its inventors, and today, MIDI is entrenched to the point that you cannot buy a professional electric instrument without MIDI capabilities.
Terminologies: Synthesizer: • It is a sound generator (various pitch, loudness etc.). • A good (musician's) synthesizer often has a microprocessor, keyboard, control panels, memory, etc.
Terminologies: Sequencer: • It can be a stand-alone unit or a software program for a personal computer. (It used to be a storage server for MIDI data. Nowadays it is more a software music editor on the computer.) • It has one or more MIDI INs and MIDI OUTs.
Terminologies: Track: • Track in the sequencer is used to organize the recordings. • Tracks can be turned on or off on recording or playback. • To illustrate, one might record an oboe melody line on Track Two, then record a bowed bass line on Track Three. • When played, the sounds can be simultaneous. • Most MIDI software now accommodates 64 tracks of music, enough for a rich orchestral sound. • Important: Tracks are purely for convenience; channels are required.
Terminologies: Channel: • MIDI channels are used to separate information in a MIDI system. • There are 16 MIDI channels in one cable. • Each channel address one MIDI instrument. • Channel numbers are coded into each MIDI message.
Terminologies: Timbre: • The quality of the sound, e.g., flute sound, cello sound, etc. • Multi-timbral -- capable of playing many different sounds at the same time (e.g., piano, brass, drums, etc.) Pitch: • musical note that the instrument plays
Terminologies: Voice: • Voice is the portion of the synthesizer that produces sound. • Synthesizers can have many (16, 20, 24, 32, 64, etc.) voices. • Each voice works independently and simultaneously to produce sounds of different timbre and pitch. Patch: • the control settings that define a particular timbre.
General MIDI General MIDI • MIDI + Instrument Patch Map + Percussion Key Map a piece of MIDI music (usually) sounds the same anywhere it is played • Instrument patch map is a standard program list consisting of 128 patch types. • Percussion map specifies 47 percussion sounds. • Key-based percussion is always transmitted on MIDI channel 10.
General MIDI Requirements for General MIDI Compatibility: • Support all 16 channels. • Each channel can play a differentinstrument/program (multi-timbral). • Each channel can play many voices (polyphony). • Minimum of 24 fully dynamically allocated voices.
General MIDI • The playback on MIDI will only be accurate if the playback device is identical to the one used for production. • Even with the general MIDI standard, the sound of a MIDI instrument varies according to the electronics of the playback device and the sound generation method it uses. • MIDI is also unsuitable for spoken dialog. • MIDI usually requires a certain amount of knowledge in music theory.
Application Of MIDI • Is MIDI suitable for MM Systems? • Sometimes it is, sometimes not. • A webpage is a valid MM system. • If the choice of music is not complex and repetitive, then MIDI is ideal for playing in the background, because it is small and compatible. • Otherwise, most MM Systems work with digital audio, such as WAV, mp3 etc.
Example • A musician pushes down (and holds down) the middle C key on a keyboard. • This causes a MIDI Note-On message to be sent out of the keyboard's MIDI OUT jack. • That message is received by the second instrument which sounds its middle C in unison.