Sound and Music for Video Games

Sound and Music for Video Games Technology Overview Roger Crawfis Ohio State University

Overview • Fundamentals of Sound • Psychoacoustics • Interactive Audio • Applications

What is sound? • Sound is the sensation perceived by the sense of hearing • Audio is acoustic, mechanical, or electrical frequencies corresponding to normally audible sound waves

Dual Nature of Sound • Transfer of sound and physical stimulation of ear • Physiological and psychological processing in ear and brain (psychoacoustics)

Transmission of Sound • Requires a medium with elasticity and inertia (air, water, steel, etc.) • Movements of air molecules result in the propagation of a sound wave

Longitudinal Motion of Air

Wavefronts and Rays

Reflection of Sound

Absorption of Sound • Some materials readily absorb the energy of a sound wave • Example: carpet, curtains at a movie theater

Refraction of Sound

Diffusion of Sound • Not analogous to diffusion of light • Naturally occurring diffusions of sounds typically affect only a small subset of audible frequencies • Nearly full diffusion of sound requires a reflection phase grating (Schroeder Diffuser)

The Inverse-Square Law (Attenuation) I is the sound intensity in W/cm^2 W is the sound power of the source in W r is the distance from the source in cm

The Skull • Occludes wavelengths “small” relative to the skull • Causes diffraction around the head (helps amplify sounds) • Wavelengths much larger than the skull are not affected (explains how low frequencies are not directional)

The Pinna

Ear Canal and Skull • (A) Dark line – ear canal only • (B) Dashed line – ear canal and skull diffraction

Auditory Area (20Hz-20kHz)

Spatial Hearing • Ability to determine direction and distance from a sound source • Not fully understood process • However, some cues have been identified as useful

The “Duplex” Theory of Localization • Interaural Intensity Differences (IIDs) • Interaural Arrival-Time Differences (ITDs)

Interaural Intensity Difference • The skull produces a sound shadow • Intensity difference results from one ear being shadowed and the other not • The IID does not apply to frequencies below 1000Hz (waves similar or larger than size of head) • Sound shadowing can result in up to ~20dB drops for frequencies >=6000Hz • The Inverse-Square Law can also effect intensity

Head Rotation or Tilt • Rotation or tilt can alter interaural spectrum in predictable manner

Interaural Arrival-Time Difference • Perception of phase difference between ears caused by arrival-time delay (ITD) • Ear closest to sound source hears the sound before the other ear

Digital Sound • Remember that sound is an analogue process (like vision). • Computers need to deal with digital processes (like digital images). • Many similar properties between computer imagery and computer sound processing.

Class or Semantics • Sample • Stream Sounds • Music • Tracks • MIDI

Sound for Games • Stereo doesn’t cut it anymore – you need positional audio. • Positional audio increases immersion • The Old: Vary volume as position changes • The New: Head-Related Transfer Functions (HRTF) for 3d positional audio with 2-4 speakers • Games use: • Dolby 5.1: requires lots of speakers • Creative’s EAX: “environmental audio” • Aureal’s A3D: good positional audio • DirectSound3D: Microsoft’s answer • OpenAL: open, cross-platform API

Amplitude Frequency Audio Basics • Has two fundamental physical properties • Frequency (the pitch of the wave – oscillations per second (Hertz)) • Amplitude (the loudness or strength of the wave - decibels)

Sampling • A sound wave is “sampled” • measurements of amplitude taken at a “fast” rate • results in a stream of numbers

Data Rates for Sound • Human ear can hear frequencies between ?? and ??. • Must sample at twice the highest frequency. • Assume stereo (two channels) • Assume 44Khz sampling rate (CD sampling rate) • Assume 2 bytes per channel per sample • How much raw data is required to record 3 minutes of music?

Waveform Sampling: Quantization • Quantization • Introduces • Noise • Examples: 16, 12, 8, 6, 4 bit music • 16, 12, 8, 6, 4 bit speech

Limits of Human Hearing • Time and Frequency Events longer than 0.03 seconds are resolvable in time shorter events are perceived as features in frequency 20 Hz. < Human Hearing < 20 KHz. (for those under 15 or so) “Pitch” is PERCEPTION related to FREQUENCY Human Pitch Resolution is about 40 - 4000 Hz.

Limits of Human Hearing • Amplitude or Power??? • “Loudness” is PERCEPTION related to POWER, not AMPLITUDE • Power is proportional to (integrated) square of signal • Human Loudness perception range is about 120 dB, where +10 db = 10 x power = 20 x amplitude • Waveform shape is of little consequence. Energy at each frequency, and how that changes in time, is the most important feature of a sound.

Limits of Human Hearing • Waveshape or Frequency Content?? • Here are two waveforms with identical power spectra, and which are (nearly) perceptually identical: Wave 1 Wave 2 Magnitude Spectrum

Limits of Human Hearing • Masking in Amplitude, Time, and Frequency • Masking in Amplitude: Loud sounds ‘mask’ soft ones. • Example: Quantization Noise • Masking in time: A soft sound just before a louder • sound is more likely to be heard than if it is just after. • Example (and reason): Reverb vs. “Preverb” • Masking in Frequency: Loud ‘neighbor’ frequency • masks soft spectral components. Low sounds • mask higher ones more than high masking low.

Limits of Human Hearing • Masking in Amplitude • Intuitively, a soft sound will not be heard if there is a competing loud sound. Reasons: • Gain controls in the ear stapedes reflex and more • Interaction (inhibition) in the cochlea • Other mechanisms at higher levels

Limits of Human Hearing • Masking in Time • In the time range of a few milliseconds: • A soft event following a louder event tends to be grouped perceptually as part of that louder event • If the soft event precedes the louder event, it might be heard as a separate event (become audible)

Limits of Human Hearing • Masking in Frequency Only one component in this spectrum is audible because of frequency masking

Sampling Rates • For Cheap Compression, Look at Lowering the Sampling Rate First • 44.1kHz 16 bit = CD Quality • 8kHz 8 bit MuLaw = Phone Quality • Examples: • Music: 44.1, 32, 22.05, 16, 11.025kHz • Speech: 44.1, 32, 22.05, 16, 11.025, 8kHz

Views of Digital Sound • Two (mainstream) views of sound and their implications for compression • 1) Sound is Perceived • The auditory system doesn’t hear everything present • Bandwidth is limited • Time resolution is limited • Masking in all domains • 2) Sound is Produced • “Perfect” model could provide perfect compression

Production Models • Build a model of the sound production system, then fit the parameters • Example: If signal is speech, then a well-parameterized vocal model can yield highest quality and compression ratio • Benefits: Highest possible compression • Drawbacks: Signal source(s) must be assumed, known, or identified

MIDI and Other ‘Event’ Models • Musical Instrument Digital Interface • Represents Music as Notes and Events • and uses a synthesis engine to “render” it. • An Edit Decision List (EDL) is another example. • A history of source materials, transformations, and processing steps is kept. Operations can be undone or recreated easily. Intermediate non-parametric files are not saved.

Event Based Compression • A Musical Score is a very compact representation of music • Benefits: • Highest possible compression • Drawbacks: • Cannot guarantee the “performance” • Cannot assure the quality of the sounds • Cannot make arbitrary sounds

Event Based Compression • Enter General MIDI • Guarantees a base set of instrument sounds, • and a means for addressing them, • but doesn’t guarantee any quality • Better Yet, Downloadable Sounds • Download samples for instruments • Benefits: Does more to guarantee quality • Drawbacks: Samples aren’t reality

Event Based Compression • Downloadable Algorithms • Specify the algorithm, the synthesis engine runs it, and we just send parameter changes • Part of “Structured Audio” (MPEG4) • Benefits: • Can upgrade algorithms later • Can implement scalable synthesis • Drawbacks: • Different algorithm for each class of sounds (but can always fall back on samples)

Compressed Audio Formats

To be continued … • Stop here • Sound Group Technical Presentations. • Suggested Topics: • Compression • Controlling the Environment • ToolKit I features • ToolKit II features • Examples and Demos

Environmental Effects • Obstruction/Occlusion • Reverberation • Doppler Shift • Atmospheric Effects

Obstruction • Same as sound shadowing • Generally approximated by a ray test and a low pass filter • High frequencies should get shadowed while low frequencies diffract

Obstruction

Occlusion • A completely blocked sound • Example: A sound that penetrates a closed door or a wall • The sound will be muffled (low pass filter)

Reverberation • Effects from sound reflection • Similar to echo • Static reverberation • Dynamic reverberation

Sound and Music for Video Games