630 likes | 787 Views
Cosc 6326/Psych6750X. Audition and Auditory Displays. Use of auditory displays. Sound in information display. speech provides a high bandwidth communication channel audition is a long distance sense without field of view restrictions
E N D
Cosc 6326/Psych6750X Audition and Auditory Displays
Sound in information display • speech provides a high bandwidth communication channel • audition is a long distance sense without field of view restrictions • Sound is useful for information display (Cohen & Wenzel 1995) • when origin of message is a sound (voice, music)
when message is simple and short (e.g. event markers) • when message will not be referred to later (e.g. time) • when message deals with events in time • warnings or prompts (hearing is always on, no field of view issues) • continuously changing information (e.g. countdown) • when other systems (e.g. vision) are overloaded
when verbal response is required (compatibility) • when illumination or disability prevents vision (e.g. alarm clock, limited field of view, blindness) • when the user moves from place to place (sound as an ubiquitous I/O channel)
Sonification • In ‘visualization’ situations, ‘sonification’ of data can assist in the exploration of complex datasets • In these applications ‘realism’ is typically not a major issue • Sound can help interpret complex or multidimensional data; can provide an independent display dimension
In addition to information display, in immersive displays sound contributes to: • realism, situational awareness and presence • ambience and emotive context • cueing visual attention • natural communication • space perception
Realism and ambience • High quality sound improves perceived ‘quality’ of visual displays • Sounds in the environment provides vital information that contributes to situational awareness • Persistence of sounds of objects out of field of view may help maintain object permanence
Sound is believed to be vital for conveying emotion and ambience in movies • Ambient sounds can be realistic or abstract (e.g. music to set mood) • Absence of appropriate sound degrades realism
If background sounds are not well matched to visuals participant may feel detached –‘presence’ may be degraded • Relation between presence and realism is not straightforward (later lecture) • Sound is an omni-directional sense and may help user feel immersed in the VE • Auditory collision cues may help navigating a VE (especially with HMDs)
Sound • Sound is “mechanical vibrations and waves of an elastic medium, particularly in the frequency range of human hearing (16 Hz to 20 kHz)” • Normally, the medium is air. Sound is an air pressure wave. • Sound is usually used to describe the physical stimulus.
Audition refers to perception. • An auditory event is usually elicited by a sound event. • A sinusoidal pressure wave is known as a pure tone.
Sinusoid • x(t) = A cos(2f0t + ) A is amplitude f0 is frequency is phase • T0 is period • is related to time shift of peak x(t) t T0=1/f0
Dimensions of sound • Harmonic content: pitch, melody, harmony, waveshape, timbre, vibrato • Timing: duration, tempo, rhythm, • Loudness, envelope • Spatial: azimuth, elevation, distance • Ambience: resonance, reverberation, spaciousness • Representation: literal, auditory icons, abstract
Perceptual and physical dimensions are analogous but distinct • pitch and frequency (directly related for pure tones) • loudness and intensity • timbre and complexity
Physiology and psychophysics • Cochlea performs mechanical spectral analysis of sound signal • Pure tone induces traveling wave in basilar membrane. • maximum mechanical displacement along membrane is function of frequency (place coding) • Displacement of basilar membrane changes with compression and rarefaction (frequency coding)
Matlin and Foley, Sensation and Perception Kandel et al, Principles of Neural Science
Perception of pitch • Along the basilar membrane, hair cell response is tuned to frequency • each neuron in the auditory nerve responds to acoustic energy near its preferred frequency • preferred frequency is place coded along the cochlea. Frequency coding believed to have a role at lower frequencies • Higher auditory centers maintain frequency selectivity and are ‘tonotopically mapped’
Pitch is related to frequency for pure tones. • For periodic or quasi-periodic sounds the pitch typically corresponds to inverse of period • Some have no perceptible pitch (e.g. clicks, noise) • Sounds can have same pitch but different spectral content, temporal envelope … timbre
Perception of loudness • Intensity is measured on a logarithmic scale in decibels • Range from threshold to pain is about 120 dB-SPL • Loudness is related to intensity but also depends on many other factors (attention, frequency, harmonics, …)
Spatial hearing • Auditory events can be perceived in all directions from observer • Auditory events can be localized internally or externally at various distances • Audition also supports motion perception • change in direction • Doppler shift
Ability to localize depends on sound source and environment • a tone in reverberant room is difficult to locate in time and space • a click in an anechoic chamber, on the other hand, is precisely located and time limited
Auditory Scene Analysis • Process of separating out the different sources present in the environment • Detection and segregation of distinct sources • Grouping of sounds in spatial and temporal proximity into single streams
Cocktail party effect • In environments with many sound sources it is easier to process auditory streams if they are separated spatially • Spatial sound techniques can help in sound discrimination, detection and speech comprehension in busy immersive environments
Spatial Auditory Cues • Two basic types of head-centric direction cues • binaural cues • spectral cues
Binaural Directional Cues • When a source is located eccentrically it is closer to one ear than the other • sound arrives later and weaker at one ear • head ‘shadow’ also weakens sound arrive at opposite ear • Binaural cues are robust but ambiguous
Interaural time differences (ITD) • ITD increase with directional deviation from the median plane. It is about 600 sfor a source located directly to one side. • Humans are sensitive to as little as 10 s ITD. Sensitivity decreases with ITD. • For a given ITD, phase difference is linear function of frequency • For pure tones, phase based ITD is ambiguous
At low to moderate frequencies phase difference can be detected. At high frequencies can use ITD in signal envelope. • ITD cues appear to be integrated over a window of 100-200ms (binaural sluggishness, Kollmeier & Gillkey, 1990)
Interaural intensity differences (IID) • With lateral sources head shadow reduces intensity at opposite ear • Effect of head shadow most pronounced for high frequencies. • IID cues are most effective above about 2000 Hz • IID of less than 1dB are detectable. At 4000 Hz a source located at 90° gives about 30 dB IID (Matlin and Foley, 1993)
Ambiguity and Lateralization Goldstein, Sensation and Perception
Ambiguity and Lateralization • These binaural cues are ambiguous. The same ITD/IID can arise from sources anywhere along a ‘cone of confusion’ • Spectral cues and changes in ITD/IID with observer/object motion can help disambiguate • When directional cues are used in headphone systems, sounds are lateralised left versus right but seem to emanate from inside the head (not localised)
also for near sources (less than 1 m) there is significant IID due to differences in distance to each ear even at lower frequencies (Shinn-Cunningham et al 2000) • Intersection of these ‘near field’ IID curves with cones of confusion constrains them to toroids of confusion
Spectral Cues • Pinnae or outer ears and head shadow each each ear and create frequency dependent attenuation of sounds that depend on direction of source • Pinnae are relatively small, spectral cues are effective predominately at higher frequencies (i.e. above 6000 Hz)
Direction estimation requires separation of spectrum of sound source from spectral shaping by the pinnae • Shape of the pinnae shows large individual differences which is reflected in differences in spectral cues
Distance Cues • anechoic • intensity decreases with distance • attenuation is higher at high frequency • confound with spectrum and intensity of source • Near field IID http://headwize.com/tech/aureal1_tech.htm
reverberation • ratio of direct to reverberant energy indicates distance wrt environment • reverberation pattern indicates ‘spaciousness’ of the environment • reverberation is more realistic but can degrade localisation, speech recognition …
Visual-Auditory Interactions • Auditory cues associated with visual targets can cue visual attention • Latency for audition is less than vision • A sound associated with visual target • can speed visual search • can reduce response times • facilitate saccadic eye movements • can cue attention outside the field of view
Ventriloquism and visual capture • When a visual and auditory source are grouped, the sound is usually perceived in the direction of the visual target
Headphone displays • Precise independent control of inputs to each ear. • Individual display. • Closed ear type can exclude external sounds. Reduces interference from external sources; simplifies AR systems. • Entail an encumbrance. • Diotic, dichotic (stereo) and spatialised displays • Head fixed frame of reference. Display needs to be head tracked to register with virtual world.
Speaker systems • Simpler, less encumbrance, multi-user • Cannot ‘occlude’ real world sounds but can sometimes mask • Complication with echoes and cross-coupling between channels • Interference from/with visual displays • World frame of reference. • Subwoofer allows for deep bass. Could augment headphones
Spatialised audio • simple ITD, IID cues in a display lateralize a sound. Sound is not ‘externalized’ • spatialised audio: generate most of the spatial cues in real world environment using signal processing • with appropriate modeling of sound sources and user tracking can provide a compelling illusion of spatial sound in a VE
Binaural recording http://www.engr.sjsu.edu/~knapp/HCIROD3D/3D_sys1/binaural.htm
Head related transfer function (HRTF) • describes how sound at a given location is transformed (by pinnae etc.) as it travels to the ear, as a function of frequency • function of source direction and distance and frequency (4D) • equivalent to the Fourier transform of the response to a impulse source at the desired position