1 / 48

Chapter 12 Sound

Chapter 12 Sound. Multimedia Systems. Key Points. Sound is a complex mixture of physical and psychological factors, which is difficult to model accurately. Sounds can be characterized by their waveforms , which plot amplitude against time.

xiang
Download Presentation

Chapter 12 Sound

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 12Sound Multimedia Systems

  2. Key Points • Sound is a complex mixture of physical and psychological factors, which is difficult to model accurately. • Sounds can be characterized by their waveforms, which plot amplitude against time. • CD quality sound is sampled at 44.1 kHz, using a sample size of 16 bits. Multimedia productions may have to use lower sampling rates and smaller sample sizes.

  3. Key Points • The quality of digitized sound can be improved by dithering — adding a small quantity of noise to randomize the quantization error. • Software can provide the functions of a recording studio, including multi-track recording, mixing and effects, on a desktop computer. • The most vexatious aspect of recording is getting the levels right. • Audio filters are used to remove noise and unwanted frequency components.

  4. Key Points • Digital versions of established effects, such as reverb and envelope shaping are used to alter the quality of sounds. Digital technology permits new kinds of alteration, including time stretching and pitch alteration. • Speech data can be compressed using established technology, including µ-law and A-law companding and ADPCM. • MPEG-1 Layer 3 audio (MP3) is a lossy method of audio compression that uses a psycho-acoustical model to determine which information to discard.

  5. Key Points • Each of the three major platforms has its own sound file format: AIFF for MacOS, WAV for Windows, and AU for Unix. RealAudio is used for streaming audio. • MIDI (The Musical Instruments Digital Interface) provides a standard for controlling digital instruments and communicating between them and computers running sequencer programs. • When sound is combined with video, synchronization must be established and maintained.

  6. The Nature of Sound • All sounds are produced by the conversion of energy into vibrations in the air or some other elastic medium • ex: tuning forks (音叉) and guitars • A good tuning fork produces the clean tines at a single frequency, most other sound sources vibrate in more complicated ways. • A single note is composed of several components at frequencies that are multiplies of fundamental pitch of the note.

  7. Harmonic • The spectrum of a single note from a musical instrument usually has a set of peaks at (approximately) harmonic ratios. • That is, if the fundamental frequency is f, there are peaks at f, and also at (about) 2f, 3f, 4f, etc. • The pitch of a note refers to the fundamental frequency with which the source of the tone resonates.

  8. Frequency Spectrum • Percussive sounds and most natural sounds do not even have a single identifiable fundamental frequency, but can still be decomposed into a collection of frequency components. • Frequency spectrum: relative amplitudes of its frequency components

  9. The Nature of Sound • The human ear is able to detect frequencies in the range between 20 Hz and 20 kHz • Upper limit decreases with increasing age • We can display the waveform of any sound by plotting its amplitude against time • Figs. 12.1-7some waveforms for a range of types of sound

  10. Speech • Speaker repeats “Feisty teenager” twice, then a more distance responds. • The second time faster and with more emphasis • Record in open air and there is background noise. • Compress speech: removing the silences Feisty teenager

  11. Instruments • Figs. 12.2-5 Didgeridoo Boogie-woogie

  12. Violin, cello and piano Men grow cold...

  13. Water sounds A trickling stream The sea

  14. Stereophony • One of the most useful illusions in sound perception is stereophony. • Brain identifies the source of a sound on the basis of the differences in intensity and phrase between the signals received from the left and right ears.

  15. Digitizing Sound • Sampling • The selection of the sampling rate • If limiting of hearing is 20 kHz, a minimum rate of 40 kHz is required by the Sampling Theorem. • The sampling rate of audio CDs is 44.1 kHz • 22.05 kHz is commonly used for Internet11.025 kHz for speech • DAT (digital audio tape): 48 kHz

  16. Sampling • How does sampling work in computer system • Sound card • Digital audio inputs are uncommon • Analog line output of DAT or CD is re-digitalized by sound card • Incompatible rate: re-sampling • It’s called jitter that the intervals between samples drift

  17. Sampling • If sampling rate = 40 kHz, the inaudible components will manifest as aliasing when signal is reconstructed. • A filter is used to remove any frequencies than half the sampling rate before the signal is sampled.

  18. Digitizing Sound • Quantization • It’s usually 65536 quantization levels for CD audio • 16 bits • Undersampling a pure sine wave • An analogue signal will be coarsely approximated by samples that jump between just a few quantized values • Dithering • When a small amount of random noise is added to the analogue signal before sampling

  19. Quantization Undersampling a pure sine wave

  20. Dithering Dithering

  21. Dithering • Sampling and dithering on frequency spectrum

  22. Processing Sound • Modern multi-track recording studio • There is presently no single sound application that has the de facto status. • MIDI sequencing • Multi-track recording • Video editing packages include some integrated sound editing and processing facilities.

  23. Recording and Importing Sound • Sampling rate and sampling size • If level of signal is too low, then resulting recording will be quiet. • If level is too high, clipping will occur. • Fig. 12.10 • Gain control can be used to alter level. • Automatic gain control

  24. Sound Editing and Effects • Interface: timeline • Tracks • Creation of loops • Very short loops are needed to create voices for the electric musical instruments known as samplers. • Longer loops are used in certain styles of dance music • Post-production • Correct defects, enhance quality, modify their character. • Premiere’s effects plug-in format is widely used. • Professional level: Cubase VST, DigiDesign ProTools

  25. Removal of unwanted noise • Noise gate • Eliminates all samples whose value falls below a specified threshold • Specify a minimum time that must elapse before a sequence of low amplitude samples counts as a silence and a similar limit before a sequence whose values exceed the threshold counts as sound. • This prevents the gate being turned on or off by transient glitches (短暫的電磁波干擾).

  26. Noise Gate • Since noise gate has no effect on speaker’s words, the background noise will cut in and out as he speaks. • Noise combined with signal • Noise gate: all-or-nothing filtering • Low-pass, high-pass, notch filters • Specialized filters • de-esser: remove the sibilance (絲絲聲) that results from speaking or singing into microphone placed too close to performer • Click repairer • Remove clicks from recording taken from damaged or dirty vinyl records.

  27. Single effect may be used in different ways depending on values of parameters • Reverb effect • Small delay and low reflectivity: inside a small room • Longer reverb times: concert hall or stadium

  28. Graphic Equalization • Transforms spectrum of a sound using a bank of filters, each controlled by its own slider and each affecting a fairly narrow band of frequencies.

  29. Envelope Shaping • Changing outline of a waveform • Allow user to draw a new envelope around the waveform, altering its attack and decay and introducing arbitrary fluctuations of amplitude. • Fader: a specialized versions of envelope shaping • Volume to be gradually increased and decreased • Tremolo (顫音) • Cause the amplitude to oscillate periodically from zero to its maximum value

  30. Time stretching and pitch alteration are two closely related effects • Analogue recordings can only be achieved by altering speed at which it is played back, and this alters the pitch. • With digital sound, the duration can be changed without altering the pitch by inserting or removing samples. • The pitch can be altered without affecting duration • Time stretching required when sound is being synchronized to video or another sound.

  31. Compression • 3 minutes, stereo: 25 MBytes • Huffman coding • Run-length coding: silence

  32. Speech Compression • Telephone companies, 1960s • Companding: compressing/expanding • non-linear quantization: Fig. 12.11 • G.711: -law, North America and Japan, SUN • A-law • ADPCM, adaptive differential pulse code modulation • Differential pulse code modulation • Linear Predictive Coding • Mathematical model of state of vocal tract as its representation of speech • 2.4 kbps, machine-like quality

  33. Perceptually Based Compression • Threshold of hearingminimum level at which a sound can be heard • Fig. 12.12, the threshold of hearing • Very low or high frequency sound must be much louder than a mid-range tone to be heard. • Phycho-acoustical model • Mathematical description of aspects of the way the ear and brain perceive sounds • Loud tones can obscure softer tones that occur at the same time • Depends on the relative frequencies of the two tones

  34. Masking • A modification of threshold of hearing curve in region of a loud tone • Fig.12.13, the threshold is raised in neighborhood of masking tone • The raised portion, or masking curve is non-linear, and asymmetrical, raising faster than it falls • Any sound that lies within the masking curve will be inaudible, even though it raises above the unmodified threshold of hearing. • Because masking hides noise as well as some components of the signal, quantization noise can be masked. • Where a masking sound is present, the signal can be quantized relatively coarsely, using fewer bits than would otherwise be needed, because the resulting quantization noise can be hidden under the masking curve.

  35. Compression • Use a bank of filters to split signal into bands of frequencies; 32 bands are commonly used. • The average signal level in each band is calculated, and using these values and a psycho-acoustical model, a masking level for each band is computed.

  36. MPEG Audio • 3 layers • Layer 1: 192 kbps for each channelLayer 2: 128 kbps for each channelLayer 3: 64 kbps for each channel • MP3 = MPEG-1 Layer 3compression rate = 10:1

  37. Formats • AIFF for MacOSWAV for WindowsAU for Unix • Each can store audio data at a variety of commonly used sampling rates and sample sizes. • Each supports uncompressed or compressed data with a range of compressors.

  38. Streaming Audio • Sound is delivered over a network and played as it arrives without having to be stored on user’s machine first. • Because of lower bandwidth required by audio, streaming is more successful for sound than it is for video. • Real Networks’ RealAudio • Streaming QuickTime • Play on demand

  39. MIDI • The Musical Instruments Digital Interface • Standard protocol for communicating between electronic instruments, such as synthesizers, sampler, and drum machines. • MIDI allowed instruments to be controlled automatically by devices that could be programmed to send out sequences of MIDI instructions.

  40. MIDI Messages • An instruction that controls some aspect of the performance of an instrument • Status byte= type of messageone or two bytes giving the values of parameters • Note On, Note Off, Key Pressure • Running status MIDI data is transmitted using a 10-bit packet that includes a start and stop bit The MIDI message Note On is followed by two data bytes, as is the Note Off message.

  41. General MIDI and QuickTime • General MIDI specifies 128 standard voices, Table 12.1 • Drum machine and percussion samplers • Drum kits, Table 12.2 • There is no guarantee that identical sounds will be generated for each name by different instruments. • A good sampler may use high quality samples of the corresponding real instruments. • QuickTime : MIDI-like functionality

  42. MIDI Software • MIDI sequencing programs • Capture and editing functions equivalent to those of video editing software. • Multiple tracks • Composition • Music can be captured as it is palyed from MIDI controllers attached to a computer via a MIDI interface. • Punch in • The start and end point of a defective passage are marked, the sequencer starts playing before the beginning, and then switches to record mode, allowing a new version of the passage to be recorded to replace the original.

  43. Sequencers • Quantize tempo during recording, fitting the length of notes to exact sixteenth notes, or eighth note triplets, or whatever duration is specified. • Most programs allow music to be entered using classical music notation. • Printed sheet music to be scanned and will perform optical character recognition to transform the music into MIDI. • The opposite transformation, from MIDI to a printed score, is also often provided, enabling transcriptions of performed music to be made automatically.

  44. Piano-roll interface, Fig. 12.14 • Major limitations of MIDI • Impossibility of representing vocals • MIDI can be transformed into audio. • Reverse transformation is sometimes supported, although it is more difficult to implement.

  45. Computer Sequencing Software

  46. Music Notation Software

  47. Combing Sound and Picture • Voice-overs should match the picture they describe, music will often be related to edits, and natural sounds will be associated with events on screen. • Synchronization, timecode • If sound and video are physically independently, synchronization will sometimes be lost. • Audio and video data streams must carry the equivalent of timecode, so that their synchronization can be checked. • Audio and video play from local hard disk • For short clips, it is possible to load the entire sound track into memory before playback begins. • This is impractical for movies. Fore these, it is normal to interleave the audio and video.

More Related