370 likes | 488 Views
CMPUT 301: Lecture 31 Out of the Glass Box. Martin Jagersand Department of Computing Science University of Alberta. Overview. Idea: why only use the sense of vision in user interfaces?
E N D
CMPUT 301: Lecture 31Out of the Glass Box Martin Jagersand Department of Computing Science University of Alberta
Overview • Idea: • why only use the sense of vision in user interfaces? • increase the bandwidth of the interaction by using multiple sensory channels, instead of overloading the visual channel
Overview • Multi-sensory systems: • use more than one sensory channel in interaction • e.g., sound, video, gestures, physical actions etc.
Overview • Usable senses: • sight, sound, touch, taste, smell, • Haptics, proprioception and accelerations • each is important on its own • together, they provide a fuller interaction with the natural world
Overview • Usable senses: • computers rarely offer such a rich interaction • we can use sight, sound, and sometimes touch • Flight simulators and some games uses accelerations to create a multimodal immersion experience. • we cannot (yet) use taste or smell
Overview • Multi-modal systems: • use more than one sense in the interaction • e.g., sight and sound: a word processor that speaks the words as well as rendering them on the screen
Overview • Multi-media systems: • use a number of different media to communicate information • e.g., a computer-based teaching system with video, animation, text, and still images
Speech • Human speech: • natural mastery of language • instinctive, taken for granted • difficult to appreciate the complexities • potentially a useful way to extend human-computer interaction
Speech • Structure: • phonemes (English) • 40 (24 consonant and 16 vowel sounds) • basic atomic units of speech • sound slightly different depending on context …
Speech • Structure: • allophones: • 120 to 130 • all the sounds in the language • count depends on accents
Speech • Structure: • morphemes • basic atomic units of language • part or whole words • formed into sentences using the rules of grammar
Speech • Prosody: • variations in emphasis, stress, pauses, and pitch to impart more meaning to sentences • Co-articulation: • the effect of context on the sound • transforms phonemes into allophones
Speech Recognition • Problems: • different people speak differently(e.g., accent, stress, volume, etc.) • background noises • “ummm …” and “errr …” • speech may conflict with complex cognition
Speech Recognition • Issues: • recognizing words is not enough • need to extract meaning • understanding a sentence requires context, such as information about the subject and the speaker
Speech Recognition • Phonetic typewriter: • developed for Finnish(a phonetic language) • trained on one speaker, tries to generalize to others • uses neural network that clusters similar sounds together, for a character • poor performance on speakers it has not been trained on • requires a large dictionary of minor variations
Speech Recognition • Currently: • single user, limited vocabulary systems can work satisfactorily • no general user, general vocabulary systems are commercial successful, yet • Current commercial examples: • Simple telephone based UI such as Train schedule information systems
Speech Recognition • Potential: • for users with physical disabilities • for lightweight, mobile devices • for when user’s hands are already occupied with a manual task (auto mechanic, surgeon)
Speech Synthesis • What: • computer-generated speech • natural and familiar way of receiving information
Speech Synthesis • Problems: • human find it difficult to adjust to monotonic, non-prosodic speech • computer needs to understand natural language and the domain • Speech is transient(hard to review or browse) • produces noise in the workplace or requires headphones(intrusive)
Speech Synthesis • Potential: • screen readers • read a textual display to a visually impaired person • warning signals • spoken information especially for aircraft pilots whose visual and haptic channels are busy
Speech Synthesis • Virtual newscaster (Ananova)
Uninterpreted Speech • What: • fixed, recorded speech • e.g., played back in airport announcements • e.g., attached as voice annotation to files
Uninterpreted Speech • Digital processing: • change playback speed without changing pitch • to quickly scan phone messages • to manually transcribe voice to text • to figure out the lyrics and chords of a song • spatialization and environmental effects
Non-Speech Sound • What: • boings, bangs, squeaks, clicks, etc. • commonly used in user interfaces to provide warnings and alarms
Non-Speech Sound • Why: • fewer typing mistakes with key clicks • video games harder without sound
Non-Speech Sound? • D’oh!
Non-Speech Sound • Dual mode displays: • information presented along two different sensory channels • e.g., sight and sound • allows for redundant presentation • user uses whichever they find easiest • allows for resolution of ambiguity in one mode through information in the other
Non-Speech Sound • Dual mode displays: • humans can react faster to auditory than visual stimuli • sound is especially good for transient information that would otherwise clutter a visual display • sound is more language and culture independent (unlike speech)
Non-Speech Sound • Auditory icons: • use natural sounds to represent different types of objects and actions in the user interface • e.g., breaking glass sound when deleting a file • direction and volume of sounds can indicate position and importance/size • SonicFinder • not all actions have an intuitive sound
Non-Speech Sound • Earcons: • synthetic sounds used to convey information • structured combinations of motives (musical notes) to provide rich information
Non-Speech Sound • Earcons:
Handwriting Recognition • Handwriting: • text and graphic input • complex strokes and spaces • natural
Handwriting Recognition • Problems: • variation in handwriting between users • variation from day to day and over years for a single user • variation of letters depending on nearby letters
Handwriting Recognition • Currently: • limited success with systems trained on a few users, with separated letters • generic, multi-user, cursive text recognition systems are not accurate enough to be commercially successful • Current applications e.g. pre-sorting of mail (but human has to assist with failures)
Newton: printing or cursive writing recognition dictionary of words contextual recognition fine tune spacing and letter shapes fine tune recognition speed learn handwriting over time Handwriting Recognition
Handwriting Recognition • Newton:
End • What did I learn today? • What questions do I still have?