1 / 45

AudioSense: A Simulation

AudioSense: A Simulation. Progress Report EECS 578 Allan Spale. Background of Concept. Taking the train home and listening to the sounds around me How would deaf people be able to perceive the environment? What assistance would be useful in helping people adapt to the environment?.

willisr
Download Presentation

AudioSense: A Simulation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AudioSense: A Simulation Progress Report EECS 578 Allan Spale

  2. Background of Concept • Taking the train home and listening to the sounds around me • How would deaf people be able to perceive the environment? • What assistance would be useful in helping people adapt to the environment?

  3. Project Goals • Develop a CAVE application that will simulate aspects of audio perception • Display the text of “speaking” objects in space • Display the description text of “non-speaking” objects in space • Display visual cues of multiple sound sources • Allow the user to selectively listen to different sound sources

  4. Topics in the Project • Augmented reality • Illustrated by objects in a virtual environment • 3D sound • Simulated by an object’s interaction property • Speech recognition • Simulated by text near the object • Will remain static during simulation • Virtual reality / CAVE • Method for presenting the project • Not discussed in this presentation

  5. Augmented Reality • Definition • “…provides means of intuitive information presentation for enhancing situational awareness and perception by exploiting the natural and familiar human interaction modalities with the environment.” -- Behringer et al. 1999

  6. Augmented Reality:Device Diagnostics • Architecture components aid in performing a diagnostic tests • Computer vision used to track the object in space • Speech recognition (command-style) used for user interface • 3D graphics (wireframe and shaded objects) to illustrate an object’s internal structure • 3D audio emits from an item that allows the user to find the location within the object

  7. Augmented Reality • Device diagnostics

  8. Augmented Reality • Device diagnostics

  9. Augmented Reality:Device Diagnostics • Summary • Providing 3D graphics and sound helps the user better diagnose items • Might also want text information on the display • Tracking methodology still needs improvement • Speech recognition of commands could be expanded to include annotation • Utilize IP connection to distribute computing power from the wearable computer

  10. Augmented Reality:Multimedia Presentations in the Real World • Mobile Augmented Reality System (MARS) • Tracking performed by Global Positioning System (GPS) and another device • Display is a see-through and head-mounted • Interaction based on location and gaze • Additional interaction provided by hand-held device

  11. Augmented Reality:Multimedia Presentations in the Real World • System overview • Selection occurs through proximity or gaze direction followed by a menu system • Information presentation • Video (on hand-held deivce) or images accompanied by narration (on head-mounted display) • Virtual reality (for places that are not able to be visited) • Augmented reality (illustrate where items were)

  12. Augmented Reality • Multimedia presentations in the real world

  13. Augmented Reality • Multimedia presentations in the real world

  14. Augmented Reality:Multimedia Presentations in the Real World • Conclusions • Current system is too heavy and visually undesirable • Might want to make hand-held display a palm-top computer • Permit authoring of content • Create a collaboration between indoor and outdoor system users

  15. 3D Sound:Audio-only Web Browsing • Must overcome difficulties with utilizing 3D sound • X axis sounds identifiable, Y and Z axes sounds are not identifiable • Need exists to create structure in audio rendered web pages • Document reading appears spatially from left to right in an adequate amount of time • Utilize earcons and selective listening • Provide meta-content for quick document overview

  16. 3D Sound • Audio-only Web browsing

  17. 3D Sound:Audio-only Web Browsing • Future work • Improve link information that extends beyond web page title and time duration • Benefits of auditory browsing aids • Improved comprehension • Better browsing experience for visually impaired and sited users

  18. 3D Sound:Interactive 3D Sound Hyperstories • Hyperstories • Story occurring in a hypermedia context • Forms a “nested context model” • World objects can be passive, active, static, or dynamic

  19. 3D Sound:Interactive 3D Sound Hyperstories • AudioDoom • Like computer game of Doom, but different • All world objects represented with sound • Sound represented in a “volume” almost parallel to the user’s eyes • User interacts with the world objects using an ultrasonic joystick with haptic functionality • Organized by partitioned spaces

  20. 3D Sound • Interactive 3D sound hyperstories

  21. 3D Sound • Interactive 3D sound hyperstories

  22. 3D Sound:Interactive 3D Sound Hyperstories • Despite elapsed time between sessions, users remembered the world structure well • Authors illustrate the possibility of “render[ing] a spatial navigable structure by using only spatialized sound.” • Opens the possibilities for educational software for the blind within the hyperstory context

  23. Speech Recognition:Media retrieval and indexing • Problems with media retrieval and indexing • Lots of media being generated; too costly and time-consuming to index manually • Ideal system design • Speaker independence • Noisy-recording environment capability • Open vocabulary

  24. Speech Recognition:Media retrieval and indexing • Using Hidden Markov Models the system achieved the results in Table 1 • To improve results, “using string matching techniques” will help overcome recognition stream errors

  25. Speech Recognition:Media retrieval and indexing • String matching strategy • Develop the search term • Divide the recognition stream into a set of sub-strings • Implement an initial filter process • “Identify edit operations for remaining sub-strings in [the] recognition stream” • Calculate the similarity measure for the search term and matched strings

  26. Speech Recognition • Media retrieval and indexing

  27. Speech Recognition:Media retrieval and indexing • Results of implementing the string matching strategy • Permitting more operations improved recall performance but degraded precision performance • Despite low performance rates, a system performing these tasks will be commercially viable

  28. Speech Recognition:Continuous Speech Recognition • Problems with continuous speech recognition • Has unpredictable errors that are unlike other “predictable” user input errors • The absence of context aids makes recognition difficult for the computer • Speech user interfaces are still in a developmental stage and will improve over time

  29. Speech Recognition:Continuous Speech Recognition • Two modes • Keyboard-mouse and speech • Two tasks • Composition and transcription • Results • Keyboard-mouse tasks were faster and more efficient than speech tasks

  30. Speech Recognition:Continuous Speech Recognition • Correction methods • Two general correction methods • Inline correction, separate proofreading • Speech inline correction methods • Select text and reenter, delete text and reenter, use correction box, correct problems during correction

  31. Speech Recognition • Continuous speech recognition

  32. Speech Recognition • Continuous speech recognition

  33. Speech Recognition:Continuous Speech Recognition • Discussion of errors • Inline correction is preferred by users regardless of modality • Proofreading had increased usage with speech because of unpredictable system errors • Keyboard-mouse involved deleting and reentering the word • Despite ability to correct inline with speech, errors typically occurred during correction • Dialog boxes used as a last resort

  34. Speech Recognition:Continuous Speech Recognition • Discussion of results • Users still do not feel that they can be productive using a speech interface for continuous recognition • More studies must be conducted to improve the speech interface for users

  35. Project Implementation • Write a CAVE application using YG • 3D objects simulate sound producing objects • No speech recognition will occur since predefined text will be attached to each object • Objects will move in space • Objects will not always produce sound • Objects may not be in the line of sight

  36. Project Implementation • Write a CAVE application using YG • Sound location • Show directional vectors for each object that emits a sound • Longer the vector, the farther away the object is from the user • X, Y will use arrowheads, Z will use dot / "X" symbol • Dot is for an object behind the user, "X" symbol is for an object in front of the user • Only visible if sound can be “heard” by the user

  37. Project Implementation • Write a CAVE application using YG • Sound properties • Represented using a square • Size represents volume/amplitude (probably will not consider distance that affects volume) • Color represents pitch/frequency • Only visible if sound can be “heard” by the user

  38. Project Implementation • Write a CAVE application using YG • Simulate “cocktail party effect” • Allow user to enlarge text from an object that is far away • Provide configuration section to ignore certain sound properties • Volume/amplitude • Pitch/frequency

  39. Project Tasks Completed • Basic project design • Have read some documentation about YG • Tested functionality of YG in my account • Established contacts with people that have programmed CAVE applications using YG • Will provide 3D models and code that demonstrates some functionalities of YG features upon request • Will help with answering questions and demonstrating and explaining features of YG

  40. Project Timeline • Week of March 25 • Practice modifying existing YG programs • Collect needed 3D models for program • Week of April 1 • Code objects and their accompanying text • Implement movement patterns for objects

  41. Project Timeline • Week of April 8 • Attempt to “turn on and off” the sound of objects • Work with interaction properties of objects that will determine visualizing sound properties • Week of April 15 • Continue working on visualizing sound properties • Work on “enlarging/reducing” text of an object

  42. Project Timeline • Week of April 22 • Create simple sound filtering menus • Test program in CAVE • EXAM WEEK: Week of April 29 • Practice presentation • Present project

  43. Bibliography Behringer, R., Chen, S., Sundareswaran, V., Wang, K., and Vassiliou, M. (1998). A Novel Interface for Device Diagnostics Using Speech Recognition, Augmented Reality Visualization, and 3D Audio Auralization, in Proceedings of IEEE International Conference on Multimedia Computing and Systems Vol I, Institute of Electrical and Electronics Engineers, Inc., 427-432. Goose, S. and Moller, C. (1999). A 3D Audio Only Interactive Web Browser: Using Spatialization to Convey Hypermedia Document Structure, in Proceedings of the seventh ACM international conference on Multimedia (Orlando FL, October 1999), ACM Press, 363-371.

  44. Bibliography Hollerer, T., Feiner, S., and Pavlik, J. (1998). Situated Documentaries: Embedding Multimedia Presentations in the Real World, in Proceedings of the 3rd International Symposium on Wearable Computers (October 1999, San Francisco CA), Institute of Electrical and Electronics Engineers, Inc., 1-8. Karat, C.-M., Halverson, C., Horn, D., and Karat, J. (1999). Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems, in CHI '99, Proceeding of the CHI 99 conference on Human factors in computing systems: the CHI is the limit (Pittsburgh PA, May 1999), ACM Press, 568-575.

  45. Bibliography Lumbreras, M., Sanchez, J. (1999). Interactive 3D Sound Hyperstories for Blind Children, in CHI '99, Proceeding of the CHI 99 conference on Human factors in computing systems: the CHI is the limit (Pittsburgh PA, May 1999), ACM Press, 318-325. Robetison, J., Wong, W. Y., Chung, C., Kim, D. K. (1998). Automatic Speech Recognition for Generalised Time Based Media Retrieval and Indexing, in Proceedings of the sixth ACM international conference on Multimedia (Bristol UK, September 1998), ACM Press, 241-246.

More Related