Maximizing Use of Sound in Games

Use of Sound in Games CIS 487/587 Bruce R. Maxim UM-Dearborn

Speech Technology • Discrete word recognition • Continuous speech recognition • Speech store and forward • Speech generation

Discrete Word Recognition • 90 to 98% reliability for small vocabulary • Usually requires speaker dependent training • Most people would rather type than dictate

When should you use it? • Speakers hands are busy • Mobility required • Speakers eyes are occupied • Harsh or cramped conditions prevent use of key board

Speech Store and Forward • Voice mail type technology • Video games • Low cost • Resource intensive

When to use computer generated speech? • Message is simple • Message is short • Message will not be referred to later • Message deals with events in time • Message requires immediate response • Visual communications channels are overloaded • Environment lighting is bad • User must move around • User subjected to high G forces or lack of oxygen

Sound • Sound travels more slowly than light (this makes coordination tricky) • Sound wave travel at constant speed and can be specified with two parameters: • Amplitude (wave height, volume of air moved) • Frequency (number of complete cycles per second)

The next 4 slides are from Rabin’s book

Digital Representationof a Sound Wave • Most common technique known as sampling • Sampling involves measuring the amplitude of the analog wave file at discrete intervals • The frequency of sampling is known as sampling rate • Each sample is typically stored in a value ranging from 4 to 24 bits in size • The size of the sample value in bits is known as the ‘bit depth’ • Music CDs have a sample rate and bit depth of 44.1 kHz (samples/sec) and 16 bits (sample size)

Bit Depth and Signal Noise • Bit depth of sample data affects signal noise • Signal to noise ratio = number of available bits / 1 • For example, 8-bit samples have a 256:1 SNR (~48 dB), and 16-bit samples have a 65,536:1 SNR (~96 dB) • Decibel ratio is calculated using 10 x log10 (ratio) or 8.685890 x log e (ratio)

Sampling Frequency and Frequency Reproduction • Sampling frequency affects range and quality of high-frequency reproduction • Nyquist Limit • Frequencies up to one-half the sampling rate can be reproduced • Audio quality degrades as frequency approaches this limit

Modern Audio Hardware • Samples are piped into sound “channels” • Often a hardware pipeline from this point • Various operations, such as volume, pan, and pitch may be applied • 3D sounds may apply HRTF algorithms and/or mix the sound into final output buffers.

Wave Shapes • Sine wave (pure sound) • Square wave • Saw tooth wave • Half-rectified sine wave • Most sounds are mixtures of several waves, their spectrums (frequency distributions) look quite ragged • To make realistic sounds we need to replicate this spectrum

Computer Sound • Two types of computer generated sounds • Digital (recordings of sound) • Used for sound effects and people talking • Synthesized (programmed reproductions of sounds) • Might only be used for music

Digital Sound • Analog-to-digital • Created by converting the analog sound vibrating a microphone to bit string that can be written to a disk • Sample rate (frequency) – should be two times the frequency of original sound (e.g 400 Hz for male voice) • Amplitude resolution (8 bits for games and 16 bits for professional sounds and music)

Synthesized Sound • Not as good as digital • People are used to hearing 16 to 32 different tones in a sound’s spectrum (not just one pure note) • FM synthesizers use feedback to synthesize additional “background” noise in a sound’s spectrum

MIDI • Musical Instrument Digital Interface • Example (using English): • Turn on channel 1 using an A • Turn on channel 2 using a C# • Turn off channel 1 • Turn off all channels • Good for music, bad for explosions

Wave Table Synthesis • Mix between synthesis and digital recording • Real sampled sounds are stored in a wave table to be played back by the DSP (digital sound processor)

Wave-Guide Synthesis • Uses DSP chips and special hardware • Sound synthesizer can generate a mathematical model of a virtual instrument and play it • Most game companies buy sound libraries for sound effects

Getting Sample Sounds • Sample from real world using a microphone • Buy a sample sound library (e.g. LaMothe CD) • Use synthesizer (e.g. Sound Forge) to create them either 22 KHz or 11 KHz using either 8-bit or 16-bit

Recording You Own Sounds • Use 16 bit samples, 22 KHz, mono (two microphones need wide separation to notice stereo effect) • Cleanup sounds with Sound Forge • Apply effects (frequency shift, echoes, distortions) • Write your processed sounds using the same settings as recording

The next 10 slides are from Rabin’s book

Sound Playback Techniques • Two basic playback methods: 1. Play sample entirely from memory buffer 2. Stream data in real-time from storage medium • Streaming is more memory efficient for very large audio files, such as music tracks, dialogue, etc • Streaming systems use either a circular buffer with read-write pointers, or a double-buffering algorithm

Sample Playback and Manipulation • Three basic operations you should know • Panning is the attenuation of left and right channels of a mixed sound • Results in spatial positioning within the aural stereo field • Pitch allows the adjustment of a sample’s playback frequency in real-time • Volume control typically attenuates the volume of a sound • Amplification is generally never supported

Compressed Audio Format • Compressed audio formats allow sound and music to be stored more compactly • Bit reduction codecs generally are lightweight • ADPCM compression is implemented in hardware on all the major current video game console systems • Psycho-acoustic codecs often have better compression • Require substantially more computational horsepower to decode

MP3, Ogg Vorbis,Licensing & Patent Issues • The MP3 format is patented • Any commercial game is subject to licensing terms as determined by Fraunhofer & Thompson Multimedia, the holders of the patents • Ogg Vorbis is similar to MP3 in many ways • Open source and patent-free (royalty-free) • Be aware of patent and license restrictions when using 3rd party software

Programming Music Systems • Two common music systems • MIDI-based systems • (Musical Instrument Digital Interface) • Digital audio streaming systems • (CD audio, MP3 playback, etc)

Advantages and Disadvantages of MIDI • Actual music data size is negligible • Easy to control, alter, and even generate in real-time • High quality music is more difficult to compose and program • Only effective if you can guarantee playback of a common instrument set

Other MIDI-based technologies to be aware of • DLS (DownLoadable Sound) Format • A standardized format for instrument definition files • iXMF (Interactive eXtensible Music Format) • New proposed standard for a container format for interactive music

Advantages / Disadvantages of Digital Audio Streams • Superb musical reproduction is guaranteed • Allows composers to work with any compositional techniques • Some potential interactivity is sacrificed for expediency and musical quality • Generally high storage requirements

A Conceptual Interactive Music Playback System • Divide music into small two to eight-bar chunks that we’ll call segments. • A network of transitions from segment to segment (including loops and branches) is called a theme. • Playing music is now as simple as choosing a theme to play. The transition map tracks the details.

API Choices • DirectSound (part of DirectX API) • Only available on Windows platforms • OpenAL • Newer API • Available on multiple platforms • Proprietary APIs • Typically available on consoles • 3rd Party Licensable APIs • Can offer broad cross-platform solutions

Direct Sound • IDirectSound object – need one for each installed sound card (emulation possible if sound card is missing) • IDirectBuffer – use primary and secondary buffers like DirectDraw • IDirectSoundCapture – used to record sounds and speech recognition • IDirectSoundNotify – used to send messages in complex systems

LaMothe Examples

Game_Init( ) // create directsound object and test for error if (DirectSoundCreate(NULL,&lpds,NULL)!=DD_OK) return(0); // set cooperation level to normal priority if (lpds->SetCooperativeLevel(main_window_handle,DSSCL_NORMAL)!=DS_OK) return(0);

Game_Shutdown( ) // release the directsoundobject if (lpds!=NULL) lpds->Release();

Generating a Sound // this example does everything: it sets up directsound // creates a secondary buffer, loads it with a synthesizer // sine wave and plays it void *audio_ptr_1 = NULL, // used to lock memory *audio_ptr_2 = NULL; DWORD dsbstatus; // status of sound buffer DWORD audio_length_1 = 0, // length of locked memory audio_length_2 = 0, snd_buffer_length = 64000; // working buffer // allocate memory for buffer UCHAR *snd_buffer_ptr = (UCHAR *)malloc(snd_buffer_length);

Generating a Sound // we need some data for the buffer, you could load a .VOC or .WAV // but as an example, lets synthesize the data // fill buffer with a synthesized 100hz sine wave for (int index=0; index < (int)snd_buffer_length; index++) snd_buffer_ptr[index] = 127*sin(6.28*((float)(index%110))/(float)110); // note the math, 127 is the scale or amplitude // 6.28 is to convert to radians // (index % 110) read below // we are playing at 11025 hz or 11025 cycles/sec therefore, in 1 sec // we want 100 cycles of our synthesized sound, thus 11025/100 is approx. // 110, thus we want the waveform to repeat each 110 clicks of index, so // normalize to 110

Generating a Sound // set cooperation level if (lpds->SetCooperativeLevel(main_window_handle,DSSCL_NORMAL)!=DS_OK) return(0); // set up the format data structure memset(&pcmwf, 0, sizeof(WAVEFORMATEX)); pcmwf.wFormatTag = WAVE_FORMAT_PCM; pcmwf.nChannels = 1; pcmwf.nSamplesPerSec = 11025; pcmwf.nBlockAlign = 1; pcmwf.nAvgBytesPerSec = pcmwf.nSamplesPerSec * pcmwf.nBlockAlign; pcmwf.wBitsPerSample = 8; pcmwf.cbSize = 0;

Generating a Sound // create the secondary buffer (no need for a primary) memset(&dsbd,0,sizeof(DSBUFFERDESC)); dsbd.dwSize = sizeof(DSBUFFERDESC); dsbd.dwFlags = DSBCAPS_CTRLDEFAULT | DSBCAPS_STATIC | DSBCAPS_LOCSOFTWARE; dsbd.dwBufferBytes = snd_buffer_length+1; dsbd.lpwfxFormat = &pcmwf; if (lpds->CreateSoundBuffer(&dsbd,&lpdsbsecondary,NULL)!=DS_OK) return(0); // copy data into sound buffer if (lpdsbsecondary->Lock(0, snd_buffer_length, &audio_ptr_1, &audio_length_1, &audio_ptr_2, &audio_length_2, DSBLOCK_FROMWRITECURSOR)!=DS_OK) return(0);

Generating a Sound // copy first section of circular buffer CopyMemory(audio_ptr_1, snd_buffer_ptr, audio_length_1); // copy last section of circular buffer CopyMemory(audio_ptr_2,(snd_buffer_ptr+audio_length_1),audio_length_2); // unlock the buffer if (lpdsbsecondary->Unlock(audio_ptr_1, audio_length_1, audio_ptr_2, audio_length_2)!=DS_OK) return(0); // play the sound in looping mode if (lpdsbsecondary->Play(0,0,DSBPLAY_LOOPING )!=DS_OK) return(0); // release the memory since DirectSound has made a copy of it free(snd_buffer_ptr);

Reading from Files • DirectSound has no support for reading .WAV files from disk (involves 2 steps) • Reading .WAV files, involves reading a header to get format info (e.g. # channels, bits/channel, playback rate, length of sample sound • Loading the sound

Dsound_Load_WAV • Open .Wav file and extract header info • Create and fill DirectSound buffer • Stores info in open slot in sound.fx[ ] and returns SounfID as an index • Sound card can be played at any time using sound_fx[sound_id].dsbuffer->Play(0,0,DSBPLAY_LOOPING);

Globals // this holds a single sound typedef struct pcm_sound_typ { LPDIRECTSOUNDBUFFER dsbuffer; // the ds buffer containing the sound int state; // state of the sound int rate; // playback rate int size; // size of sound int id; // id number of the sound } pcm_sound, *pcm_sound_ptr;

Globals LPDIRECTSOUND lpds; // directsound interface pointer DSBUFFERDESC dsbd; // directsound description DSCAPS dscaps; // directsound caps HRESULT dsresult; // general directsound result DSBCAPS dsbcaps; // directsound buffer caps LPDIRECTSOUNDBUFFER lpdsbprimary, // you won't need this normally lpdsbsecondary; // the sound buffers WAVEFORMATEX pcmwf; // generic waveformat structure pcm_sound sound_fx[MAX_SOUNDS]; // array of secondary sound buffers HWND freq_hwnd, // window handles for controls volume_hwnd, pan_hwnd; int sound_id = -1; // id of sound we load for demo

Game_Init( ) // create a directsound object if (DirectSoundCreate(NULL, &lpds, NULL)!=DS_OK ) return(0); // set cooperation level if (lpds->SetCooperativeLevel(main_window_handle,DSSCL_NORMAL)!=DS_OK) return(0); // clear array out memset(sound_fx,0,sizeof(pcm_sound)*MAX_SOUNDS);

Game_Init( ) // initialize the sound fx array for (int index=0; index<MAX_SOUNDS; index++) { // test if this sound has been loaded if (sound_fx[index].dsbuffer) { // stop the sound sound_fx[index].dsbuffer->Stop(); // release the buffer sound_fx[index].dsbuffer->Release(); } // end if // clear the record out memset(&sound_fx[index],0,sizeof(pcm_sound)); // now set up the fields sound_fx[index].state = SOUND_NULL; sound_fx[index].id = index; } // end for index

Game_Init( ) // load a wav file in if ((sound_id = DSound_Load_WAV("FLIGHT.WAV"))!=-1) { // start the voc playing in looping mode sound_fx[sound_id].dsbuffer->Play(0,0,DSBPLAY_LOOPING); } // end if

Game_Shutdown( ) // release the sound buffer if (sound_fx[sound_id].dsbuffer) sound_fx[sound_id].dsbuffer->Release(); // release the directsoundobject if (lpds!=NULL) lpds->Release();

DSound_Load_Wav( ) HMMIO hwav; // handle to wave file MMCKINFO parent, // parent chunk child; // child chunk WAVEFORMATEX wfmtx; // wave format structure int sound_id = -1, // id of sound to be loaded index; // looping variable UCHAR *snd_buffer, // temporary sound buffer to hold voc data *audio_ptr_1=NULL, // data ptr to first write buffer *audio_ptr_2=NULL; // data ptr to second write buffer DWORD audio_length_1=0, // length of first write buffer audio_length_2=0; // length of second write buffer

Maximizing Use of Sound in Games