1 / 72

John A. Greenfield 11/05/03

A Perceptual Study of the Effects of Localized Audio in Increasing the Human Participation in Videoconferencing and Virtual Reality Environments. John A. Greenfield 11/05/03. Contents. Problem Description Approach Pilot Experiments Localization Accuracy Visual Localization Main Experiment

feoras
Download Presentation

John A. Greenfield 11/05/03

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Perceptual Study of the Effects of Localized Audio in Increasing the Human Participation in Videoconferencing and Virtual Reality Environments John A. Greenfield 11/05/03

  2. Contents • Problem Description • Approach • Pilot Experiments • Localization Accuracy • Visual Localization • Main Experiment • Discussion • Conclusion and Future Work

  3. Video-conferencing History • Began in 1960’s with AT&T Picturephone. • Until 1990’s Video-conferencing designed for small screens. • Limited to 1-3 on-screen Videos, and 3-6 individuals. • Typically expensive and limited use.

  4. Large Scale Videoconference • In late 1990’s Large screen videoconferencing with large numbers of on-screen participants introduced. • Mbone – University College London, 1996 • MASH – UC Berkeley, 1997 • AccessGrid – Argonne National Lab, 1999 • Internet based, inexpensive, wide use.

  5. Virtual Reality Video-conferencing • Video-conferencing has been implemented in Virtual Reality environments • Real video • Avatars based on real video • Avatars based on motion tracking • Collaborative Virtual Environments • Avatar based similar to a videoconference

  6. Video-conference SystemAccessGrid Studio

  7. Video-conferencing AccessGrid Session

  8. Large-scale Video-conferencing Features • 10 – 30 on-screen sub-windows • 1-10 people per sub-window • Multiple simultaneous speaking people possible • Both presentations and discussions • 200 + Installed AccessGrid studios

  9. Limitations • It is difficult to identify which image on-screen contains the currently speaking person. • It can take from one second to several minutes to scan all the faces and see which lips are moving. • Interferes with communication and comprehension.

  10. Benefits of Identifying Speaking Person • Not frustrated by search • Expression and gestures add information • Argyle, M. Bodily Communication. 1975. • Visual context identifies speaking person site • Comfort with conversation enhanced • Improved feeling of “presence,” sense of being there.

  11. Measuring Effectiveness • Less time to find the speaking person’s image on screen = less frustration for the listener • Less time to find image = more visual information obtained.

  12. Identification Approaches • Display only speaking person • Eliminates information about other participants • Multiple speaking people can be confusing • Highlight speaking person sub-window • Can be distraction and still need to search. • Requires looking at screen • Multiple speaking people can be confusing • Localized Sound

  13. Localized Sound Benefits • Works for multiple simultaneous speaking people • Works even if listener looks away from screen. • No visual distraction added • Can enhance tracking of multiple conversations – Cocktail Party Effect. • Bolia, et al, “Asymmetric Performance in the Cocktail Party Effect”. Human Factors, 2001 • Can enhance comprehension of conversation • Baldis, J. “Effects of Spatial Audio on Memory, Comprehension and Preference During Desktop Conferences. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2001. • Can work if speaking person off-screen • Closer to real life

  14. Hypothesis • Axiom: Localized sound makes the speaker’s voice appear to come from their image on-screen • Hypothesis: The addition of spatially localized sound to large format video-conferencing systems will significantly reduce the visual search times of users.

  15. Localized Sound Implementation • Stereo Panning • Head-related Transfer Functions in headphones • Surround sound or 3-D panning • Wall of Sound Transducers

  16. Localized Sound Used • Stereo Panning • Sound transducers, tracked headphones, or head mounted display (HMD) give horizontal localization • Head Related Transfer Function (HRTF) • tracked headphones or HMD used to give both horizontal and vertical localization.

  17. Human Hearing

  18. HRTF • Head-Related Transfer Function • Head and ear alter frequency and amplitude depending on azimuth and elevation relative to ear. • Uses convolutional transfer function of standard human ear to mimic the frequency and amplitude effects of sound coming from specific physical location. • Produces 3-D Localization using tracked headphones

  19. Contents • Problem Description • Approach • Pilot Experiments • Localization Accuracy • Visual Localization • Main Experiment • Discussion • Conclusion and Future Work

  20. Approach • Implement simplified video-conference in Virtual Reality System. • Fewer variables than real video-conference systems • Represent faces with simple cartoon animation. • Test with human subjects: search time and accuracy.

  21. Scope of Experiments • Compare search times for non-localized sound, and localized sound, with three independent variables: • Levels of visual complexity (# faces) • Levels of visual distraction. (# Blinking eyes) • Levels of sound distraction. (second voice) • Variables are primary features of video-conference situation.

  22. Contents • Problem Description • Approach • Pilot Experiments • Localization Accuracy • Visual Localization • Main Experiment • Discussion • Conclusion and Future Work

  23. Pilot Experiments • Sound Localization • Determine accuracy of localization • Visual Localization with no distracters • Determine localization times • Visual Localization with blinking • Determine localization times with blinking eyes • Distracters added because they exist in video-conferencing application.

  24. Sound Localization

  25. Sound Localization Results • Stereo: • Error more prevalent to right than left • HRTF: • Error more prevalent to left than right 9 6 3 0 -3 -6 -9 Error in degrees Stereo HRTF

  26. Visual Localization Experiment

  27. Visual Localization Animation

  28. Visual Localization Details • Animated face mouth • Sound 1 • Sound 2

  29. Visual Localization Experiment Features • Within subject tests • Localized and non-localized trials interspersed. • Video contrast uneven over all locations • Might increase delay for some locations • Mouse selection of column • Compare search time for localized and unlocalized sound

  30. Visual Localization Analysis • Paired t-test, two-talied: • H0: Localized Sound • H1: Unlocalized Sound • p-value of 0.05 or less statistically significant. • 95% confidence interval.

  31. Visual Localization Sample Size • Sample size: • No-blink: 11 subjects • Blink: 10 subjects • 20 iterations localized sound per subject. • 20 iterations unlocalized sound per subject • 5 practice iterations

  32. Visual Localization Results

  33. Visual Localization Experiment Results • Non-blinking • Statistically significant improvement (0.3 seconds) • Blinking • Statistically significant improvement with larger magnitude (2.7 seconds)

  34. Conclusions from Pilot Experiments • Accuracy sufficient for experiments. • Localized sound has significant positive effect. • Particularly in the presence of visual distracters • Bracket typical videoconference • No-blink less difficult • All blinking more difficult.

  35. Contents • Problem Description • Approach • Pilot Experiments • Localization Accuracy • Visual Localization • Main Experiment • Discussion • Conclusion and Future Work

  36. Main Experiments • Determine dependencies involved with levels and types of distracters. • Numbers of faces: 40, 30, 12 • Levels of blinking: 50% or more, less than 50% • Audio Distracter: with, without • Variable Face Sizes: • Large, • Large & small, • Very large & large & small

  37. Main Experiments (cont.) • Stereo/HRTF • One face size • 40, 30, and 12 faces • 2 Blink levels • Audio distracter • Variable Face Size • All large, • Large & small, • Very large & large & small • 100% Blink level

  38. 40 Face Display

  39. Display Animation

  40. Main Experiment Details • Sound 1 • Sound 2 • Audio Distracter

  41. Small Face Displays

  42. Variable Size Face Displays

  43. Main Experiments Details • Within subjects tests • Same order of trials for all participants • Participants don’t know if sound localized or not

  44. Form of Expected Results

  45. Gender Age Experience # faces Blink level Audio Distracter Face sizes Main Experiment Variables

  46. Main Experiments Details • For each # faces, blink level: • 10 localized with audio distracter • 10 unlocalized with audio distracter • 10 localized without audio distracter • 10 unlocalized without audio distracter.

  47. Stereo Results • 31 Participants • 23 Inexperienced • 8 Experienced – participated in Pilots • Significant: • Localization for # faces (Sig <0.001) • Localization for blink level (Sig < 0.001) • Overall Localization (Sig < 0.001)

  48. Stereo Errors

  49. Stereo Means

  50. Stereo Resultsvariable face size • No Audio Distracter, 100% blinking eyes. • Significant: • Localization for face Size (Sig 0.002) • Overall localization (Sig <0.001)

More Related