580 likes | 879 Views
Outline and Goal. What are perceptual interfaces?Perceptive vs perceptualMultimodal interfacesChallenge: Do our interfaces work?How do we find out?Challenge: Broaden our scopeLeverage other natural human capabilities. Perceptive to Perceptual. Perceptive UI: aware of userInput to computer: us
E N D
1. Leveraging Human CapabilitiesinPerceptual Interfaces George G. Robertson
Microsoft Research
2. Outline and Goal What are perceptual interfaces?
Perceptive vs perceptual
Multimodal interfaces
Challenge: Do our interfaces work?
How do we find out?
Challenge: Broaden our scope
Leverage other natural human capabilities Begin with a review of what we are really talking about
Describe the difference between perceptive and perceptual UI
Describe multimodal ui and how it fits with perceptual UI
Challenge #1
Do our perceptual interfaces really work?
How do we find out?
Challenge #2
We have been focusing mostly on perceptive UI
I will argue that we should consider other human capabilities, particularly other perceptual and cognitive skills Begin with a review of what we are really talking about
Describe the difference between perceptive and perceptual UI
Describe multimodal ui and how it fits with perceptual UI
Challenge #1
Do our perceptual interfaces really work?
How do we find out?
Challenge #2
We have been focusing mostly on perceptive UI
I will argue that we should consider other human capabilities, particularly other perceptual and cognitive skills
3. Perceptive to Perceptual Perceptive UI: aware of user
Input to computer: use human motor skills
Multimodal UI: use communication skills
We use multiple modalities to communicate
Perceptual UI: use many human abilities
Perception, cognition, motor, communication Perceptive interfaces
Make computer more aware of user
Vision-based (or other sensors)
Makes use of what user is doing with hands (motor skills)
Much of last years PUI focused on this
Multimodal interfaces:
Human face to face communication is multimodal (speech & gesture)
MM UI attempts to use those communication skills
Mostly MM input
But there are many other uses of the term
Some of last years PUI focused on this
Perceptual UI
Take advantage of other natural human abilities
Broaden focus of PUI
More consideration for the computer to human part of the equationPerceptive interfaces
Make computer more aware of user
Vision-based (or other sensors)
Makes use of what user is doing with hands (motor skills)
Much of last years PUI focused on this
Multimodal interfaces:
Human face to face communication is multimodal (speech & gesture)
MM UI attempts to use those communication skills
Mostly MM input
But there are many other uses of the term
Some of last years PUI focused on this
Perceptual UI
Take advantage of other natural human abilities
Broaden focus of PUI
More consideration for the computer to human part of the equation
4. What are Modalities? What is a modality
In human communication
Take sensory input (particularly hearing and seeing)
Map input into human communication channels
A modality is one of those mappings
From computer point of view:
Multimodal output: audio and video
Multimodal input: speaking and gesturingWhat is a modality
In human communication
Take sensory input (particularly hearing and seeing)
Map input into human communication channels
A modality is one of those mappings
From computer point of view:
Multimodal output: audio and video
Multimodal input: speaking and gesturing
5. What are Multimodal Interfaces? Attempts to use human communication skills
Provide user with multiple modalities
May be simultaneous or not
Fusion vs Temporal Constraints
Multiple styles of interaction What are multimodal interfaces?
Unfortunately, the term has been used in the literature in many different ways
Basic:
Take advantage of natural human communication skills
Provide multiple channels (or modalities)
These may or may not be simultaneous
Most published literature is about multimodal input
Some analysis has been done to understand when the modalities are
fused (used as one)
used in sequences
used independantly
Some people use the term to describe systems that support multiple styles of interactionWhat are multimodal interfaces?
Unfortunately, the term has been used in the literature in many different ways
Basic:
Take advantage of natural human communication skills
Provide multiple channels (or modalities)
These may or may not be simultaneous
Most published literature is about multimodal input
Some analysis has been done to understand when the modalities are
fused (used as one)
used in sequences
used independantly
Some people use the term to describe systems that support multiple styles of interaction
6. Examples Bolt, SIGGRAPH80
Put That There
Speech and gestures used simultaneously Im going to quickly review some of the literature
Earliest published system
Richard Bolt, MIT
Put That There
Speech and gestures used simultaneouslyIm going to quickly review some of the literature
Earliest published system
Richard Bolt, MIT
Put That There
Speech and gestures used simultaneously
7. Put That There Here is the Media Room for the Put That There system in 1980
Had large rear-projected wall-sized display
As well as two smaller monitors
User moved objects by combination of spoken commands and pointing
Here is the Media Room for the Put That There system in 1980
Had large rear-projected wall-sized display
As well as two smaller monitors
User moved objects by combination of spoken commands and pointing
8. Examples (continued) Buxton and Myers, CHI86
Two-handed input
Cohen et al, CHI89
Direct manipulation and NL
Hauptmann, CHI89
Speech and gestures The term multimedia interfaces has been used in a wide variety of work
CHI86: Buxton and Myers introduced two-handed input
CHI89
Cohen et al discussed combination of direct manipulation and NL
Haupmann discussed combination of speech and gestures
The term multimedia interfaces has been used in a wide variety of work
CHI86: Buxton and Myers introduced two-handed input
CHI89
Cohen et al discussed combination of direct manipulation and NL
Haupmann discussed combination of speech and gestures
9. Examples (continued) Bolt, UIST92
Two-handed gestures and Gaze
Blattner & Dannenberg, 1992 book
Hanne: text & gestures (interaction styles)
Pausch: selection by multimodal input
Rudnicky: speech, gesture, keyboard
Bier et al, SIGGRAPH93
Tool Glass; two-handed input 1992:
Bolt described system that combined two-handed gestures and gaze at UIST92
Blattner & Dannenberg published book on multimedia when included three papers on multimodal interfaces
1993:
Bier et al published the Tool Glass work, which used two-handed input
1992:
Bolt described system that combined two-handed gestures and gaze at UIST92
Blattner & Dannenberg published book on multimedia when included three papers on multimodal interfaces
1993:
Bier et al published the Tool Glass work, which used two-handed input
10. Examples (continued) Balboa & Coutaz, Intelligent UI93
Taxonomy and evaluation of MMUI
Walker, CHI94
Facial expression (multimodal output)
Nigay & Coutaz, CHI95
Architecture for fused multimodal input Also in 1993
Balboa & Coutaz published a taxonomy of MMUI along with some evaluation
1994
Walker published on multimodal output, generating facial expressions
1995
Nigay & Coutaz published a general architecture for fused multimodal input
Also,
Cohen & Oviatt have published on multimodal integration
and Oviatt has published an analysis of the myths of multimodal uiAlso in 1993
Balboa & Coutaz published a taxonomy of MMUI along with some evaluation
1994
Walker published on multimodal output, generating facial expressions
1995
Nigay & Coutaz published a general architecture for fused multimodal input
Also,
Cohen & Oviatt have published on multimodal integration
and Oviatt has published an analysis of the myths of multimodal ui
11. Why Multimodal Interfaces? Now fall far short of human capabilities
Higher bandwidth is possible
Different modalities excel at different tasks
Errors and disfluencies reduced
Multimodal interfaces are more engaging Why so much interest?
It is clear that our current interfaces fall far short of what the human can do
Much higher bandwidth is possible with these systems
Different modalities excel at different tasks
And this may change over time and user
Errors and disfluencies can be reduced dramatically
Multimodal interfaces are more natural, so they are more engagingWhy so much interest?
It is clear that our current interfaces fall far short of what the human can do
Much higher bandwidth is possible with these systems
Different modalities excel at different tasks
And this may change over time and user
Errors and disfluencies can be reduced dramatically
Multimodal interfaces are more natural, so they are more engaging
12. Leverage Human Capabilities Leverage senses and perceptual system
Users perceive multiple things at once So how do multimodal systems work?
Multimodal output leverages the human senses and perceptual system
Because we can perceive multiple things at once
Multimodal input leverages the human motor capabilities and communication skills
Because we can do multiple things at onceSo how do multimodal systems work?
Multimodal output leverages the human senses and perceptual system
Because we can perceive multiple things at once
Multimodal input leverages the human motor capabilities and communication skills
Because we can do multiple things at once
13. Senses and Perception Use more of users senses
Not just vision
Sound
Tactile feedback
Taste and smell (maybe in the future)
Users perceive multiple things at once
e.g., vision and sound For multimodal output
We current use mostly the visual modality
We use audio some, but not nearly enough
We dont use tactile channel much at all
We dont use taste or smell
So, there is room to do much more that we are currently doingFor multimodal output
We current use mostly the visual modality
We use audio some, but not nearly enough
We dont use tactile channel much at all
We dont use taste or smell
So, there is room to do much more that we are currently doing
14. Motor & Effector Capabilities Currently: pointing or typing
Much more is possible:
Gesture input
Two-handed input
Speech and NL
Body position, orientation, and gaze
Users do multiple things at once
e.g., speak and use hand gestures For multimodal input
Most interfaces still use pointing and typing
Some are using pointing and speech
Few pay any attention to what we do with our bodies
Position, pose, orientation, and gaze
So, there is room for more here alsoFor multimodal input
Most interfaces still use pointing and typing
Some are using pointing and speech
Few pay any attention to what we do with our bodies
Position, pose, orientation, and gaze
So, there is room for more here also
15. Simultaneous Modalities? Single modality at a time
Adapt to display characteristics
Let user determine input mode
Redundant, but only one at a time
Multiple simultaneous modalities
Two-handed input
Speech and hand gestures
Graphics and sound Some talk about multimodal systems, but intend that only one modality be used at a time
In some cases, the different modalities are used to adapt to hardware configurations
In some cases, the user is simply given a choice
Other systems use multiple modalities together
either simultaneously are nearly so
Two-handed input combines actions of the two hands
Speech and hand gestures are the most obvious
Graphics and sound together (multimodal output)Some talk about multimodal systems, but intend that only one modality be used at a time
In some cases, the different modalities are used to adapt to hardware configurations
In some cases, the user is simply given a choice
Other systems use multiple modalities together
either simultaneously are nearly so
Two-handed input combines actions of the two hands
Speech and hand gestures are the most obvious
Graphics and sound together (multimodal output)
16. Taxonomy (Balboa, 1993) The taxonomy by Balboa & Coutaz at IUI93 shows this chart of
temporal constraints along the x axis
Independent vs sequential vs concurent
degree of fusion along the y axis
The degree to which the modalities are fused into one actionThe taxonomy by Balboa & Coutaz at IUI93 shows this chart of
temporal constraints along the x axis
Independent vs sequential vs concurent
degree of fusion along the y axis
The degree to which the modalities are fused into one action
17. Modality = Style of Interaction Many styles exist
Command interface
NL
Direct manipulation (WIMP and non-WIMP)
Conversational (with an interface agent)
Collaborative
Mixed styles produce multimodal UI
Direct manipulation and conversational agent Finally, a small camp uses the term in an entirely different way
Here we are talking about styles of interaction
typed commands
NL
direct manipulation
conversation
Some describe mixing these styles in a single interface as MMUIFinally, a small camp uses the term in an entirely different way
Here we are talking about styles of interaction
typed commands
NL
direct manipulation
conversation
Some describe mixing these styles in a single interface as MMUI
18. Multimodal versus Multimedia Multimedia is about media channels
Text, graphics, animation, video: all visual media
Multimodal is about sensory modalities
Visual, auditory, tactile,
Multimedia is a subset of Multimodal Output Weve talked a lot about multimodal interfaces
How does multimedia fit in?
Multimedia is technology driven
It is about particular media channels that are available
For example, text, graphics, animation, video are all visual media
they are different media
but the same modality
Multimodal is human driven
It is about human sensory modalities
All of the visual media are using the visual modality
In this sense, multimedia systems are a subset of multimodal outputWeve talked a lot about multimodal interfaces
How does multimedia fit in?
Multimedia is technology driven
It is about particular media channels that are available
For example, text, graphics, animation, video are all visual media
they are different media
but the same modality
Multimodal is human driven
It is about human sensory modalities
All of the visual media are using the visual modality
In this sense, multimedia systems are a subset of multimodal output
19. How Do The Pieces Fit? So, here is how I see these various pieces fitting together
A lot of work on multimodal input has been done
Recent work on adding awareness,
or what Ive called Perceptive UI
is partly multimodal input
and some additional work
A lot of work has been done on multimedia
Logically, this is a subset of multimodal output
Not much has been done on multimodal output outside of multimedia
Perceptual UI is really about both input and output, and includes all of what we have been discussing
So, here is how I see these various pieces fitting together
A lot of work on multimodal input has been done
Recent work on adding awareness,
or what Ive called Perceptive UI
is partly multimodal input
and some additional work
A lot of work has been done on multimedia
Logically, this is a subset of multimodal output
Not much has been done on multimodal output outside of multimedia
Perceptual UI is really about both input and output, and includes all of what we have been discussing
20. Challenge Do our interfaces actually work?
How do we find out? So, we are assuming that perceptive, multimodal, and perceptual interfaces are actually better because they come closer to human-human communication.
Is that assumption correct?
Often our papers describe techniques that intuitively solve some problems
Often we get excited because an interface looks cool
Rarely do our papers provide any proof that the intuition is correct
or that the cool effect is useful and usableSo, we are assuming that perceptive, multimodal, and perceptual interfaces are actually better because they come closer to human-human communication.
Is that assumption correct?
Often our papers describe techniques that intuitively solve some problems
Often we get excited because an interface looks cool
Rarely do our papers provide any proof that the intuition is correct
or that the cool effect is useful and usable
21. Why Test For Usability? Commercial efforts require proof
Cost benefit analysis before investment
Intuitions are great for design
But intuition is not always right!
Peripheral Lens Large scale commercial efforts require some proof
It is clear that these techniques are hard to get right
It is clear that some of them are hard to implement
Before a large commercial effort will invest what it takes for this to succeed, some kind of proof is needed
Intuition is wonderful for design insights
But it isnt always right
Last year, I spent some time implementing an idea that seemed really right, called the Peripheral LensLarge scale commercial efforts require some proof
It is clear that these techniques are hard to get right
It is clear that some of them are hard to implement
Before a large commercial effort will invest what it takes for this to succeed, some kind of proof is needed
Intuition is wonderful for design insights
But it isnt always right
Last year, I spent some time implementing an idea that seemed really right, called the Peripheral Lens
22. Peripheral Vision Does peripheral vision make navigation easier?
Can we simulate peripheral vision? Intuition:
Peripheral vision helps with real world navigation
e.g., makes locomotion through real hallways work
Peripheral vision may be part of what makes VR more immersive than desktop graphics
We ought to be able to simulate peripheral vision in desktop graphics and make navigation easierIntuition:
Peripheral vision helps with real world navigation
e.g., makes locomotion through real hallways work
Peripheral vision may be part of what makes VR more immersive than desktop graphics
We ought to be able to simulate peripheral vision in desktop graphics and make navigation easier
23. A Virtual Hallway Here is a virtual hallway connected to other hallways
The letters were part of an experiment, using a visual search task
Note where M is.
There is one letter to its right Here is a virtual hallway connected to other hallways
The letters were part of an experiment, using a visual search task
Note where M is.
There is one letter to its right
24. Peripheral Lenses Peripheral Lens uses 3 cameras
Two side cameras are adjusted so they look to the side
Rendering time is about 2x slower (not 3x because of lower fill)
There is about 3x more information in the periphery
Peripheral Lens uses 3 cameras
Two side cameras are adjusted so they look to the side
Rendering time is about 2x slower (not 3x because of lower fill)
There is about 3x more information in the periphery
25. Peripheral Lens This shows the spatial relationship of the 3 cameras.
The UIST97 paper gives details about computing these angles This shows the spatial relationship of the 3 cameras.
The UIST97 paper gives details about computing these angles
26. Peripheral Lens Intuitions Locomotion should be easier
Especially around corners
Wayfinding should be easier
You can see far sooner Turning corners is particularly hard in a virtual environment
You keep hitting the corner
Peripheral vision should help avoid that problem
Peripheral Lenses make it possible to see around a corner earlier
So, you see distant objects sooner
That should help with wayfindingTurning corners is particularly hard in a virtual environment
You keep hitting the corner
Peripheral vision should help avoid that problem
Peripheral Lenses make it possible to see around a corner earlier
So, you see distant objects sooner
That should help with wayfinding
27. Peripheral Lens Findings Lenses were about the same speed
Harder to use for inexperienced people
Corner turning was not faster
Users were about the same speed with or without Peripheral Lenses
They were harder to use for people with no 3D graphics experience
We ran an additional study on corner turning behavior, and found that corner turning was not any faster with Lenses
Users were about the same speed with or without Peripheral Lenses
They were harder to use for people with no 3D graphics experience
We ran an additional study on corner turning behavior, and found that corner turning was not any faster with Lenses
28. The Lesson Do not rely solely on intuition
Test for usability! It is a mistake to rely only on our intuition
We must make evaluation a standard part of what we do!
Ideal approach:
Usability testing is part of research and design cycle
Papers that report on a new interfaces should provide some evaluation as part of the workIt is a mistake to rely only on our intuition
We must make evaluation a standard part of what we do!
Ideal approach:
Usability testing is part of research and design cycle
Papers that report on a new interfaces should provide some evaluation as part of the work
29. Challenge Are we fully using human capabilities?
Peceptive UI is aware of the body
Multimodal UI is aware the we use multiple modalities, sometimes simultaneous
Perceptual UI should go beyond both of these Second challenge
We focus mainly on the perceptive interfaces and multimodal input
Other human capabilities can be brought to bear
To make interfaces more effective
Particularly the human input
Most interfaces have focused on the visual channel
But other perceptual channels can be used in parallel
And other human abilities can be more fully used
Also, it is likely that the interface we present for our visualizations itself presents a distraction
Can our UI take advantage of other human capabilities?Second challenge
We focus mainly on the perceptive interfaces and multimodal input
Other human capabilities can be brought to bear
To make interfaces more effective
Particularly the human input
Most interfaces have focused on the visual channel
But other perceptual channels can be used in parallel
And other human abilities can be more fully used
Also, it is likely that the interface we present for our visualizations itself presents a distraction
Can our UI take advantage of other human capabilities?
30. Research Strategy Approach to fixing these problems
Proposed research strategy for UI work in general
Three parts:
First, identify and leverage natural human capabilities
Second, Look for technology discontinuities and exploit them
e.g., 3D graphics about to be ubiquitous
Third, Pick a task domain that will make a major difference
Information access is the driving application of this decade
Probably will be next decade as well
Over half of current GNP comes from information tasks
Key: look at the intersection. If you are only task driven, or only technology driven, you are likely to miss the mark
Approach to fixing these problems
Proposed research strategy for UI work in general
Three parts:
First, identify and leverage natural human capabilities
Second, Look for technology discontinuities and exploit them
e.g., 3D graphics about to be ubiquitous
Third, Pick a task domain that will make a major difference
Information access is the driving application of this decade
Probably will be next decade as well
Over half of current GNP comes from information tasks
Key: look at the intersection. If you are only task driven, or only technology driven, you are likely to miss the mark
31. Engaging Human Abilities understand complexity
new classes of tasks
less effort Key to all this:
identify and leverage natural human capabilities
These will make it possible to
understand added complexity
understand new classes of tasks
with less effort
What follows are examples from each of these areasKey to all this:
identify and leverage natural human capabilities
These will make it possible to
understand added complexity
understand new classes of tasks
with less effort
What follows are examples from each of these areas
32. Examples: Communication Language
Gesture
Awareness
Emotion
Multimodal First, look at how we communicate with each other
Natural language is flexible
Paraphrasing leads to robust communication
Dialog to resolve ambiguity
First, look at how we communicate with each other
Natural language is flexible
Paraphrasing leads to robust communication
Dialog to resolve ambiguity
33. Examples: Communication Language
Gesture
Awareness
Emotion
Multimodal When we communicate face to face, what we do with our
hands,
body, and
face
convey an enormous amount of information
consider the effects of eye contact, frowns, smiles, looking away
current interfaces pay no attention to gestureWhen we communicate face to face, what we do with our
hands,
body, and
face
convey an enormous amount of information
consider the effects of eye contact, frowns, smiles, looking away
current interfaces pay no attention to gesture
34. Camera-BasedConversational Interfaces Leverage face to face communication skills
Face to face communication
Body posture
Hand gestures
Facial expressions
Eye Contact: Embodied agent can follow you as you move
Facial expression to determine mood or emotion
Happy, sad, angry, puzzled,
Interface that responds to mood will be perceived as more trustworthy and friendly
Drive improvements to speech recognition
Lip reading
Steering phased array microphonesFace to face communication
Body posture
Hand gestures
Facial expressions
Eye Contact: Embodied agent can follow you as you move
Facial expression to determine mood or emotion
Happy, sad, angry, puzzled,
Interface that responds to mood will be perceived as more trustworthy and friendly
Drive improvements to speech recognition
Lip reading
Steering phased array microphones
35. Examples: Communication Language
Gesture
Awareness
Emotion
Multimodal Computer has little awareness of what we are doing
Restrooms have more awareness...
Are we in the room?
Are we sitting at the computer?
Are we facing the computer?
Is there anyone else there?
Are we busy doing something else (talking on the phone)?
What are we looking at?
Interface that is aware will be perceived as more engaging.
Computer has little awareness of what we are doing
Restrooms have more awareness...
Are we in the room?
Are we sitting at the computer?
Are we facing the computer?
Is there anyone else there?
Are we busy doing something else (talking on the phone)?
What are we looking at?
Interface that is aware will be perceived as more engaging.
36. Camera-Based Awareness What is the user doing?
What is the user doing?
Is the user in the room?
At the computer?
Facing the computer?
Is anyone else there?
Is the user talking on the phone?
What is the user looking at?
System can be more responsive if it is aware of user actions
Suspend activity while user is
on the phone
talking to someone who just came into the office
A responsive system that is aware of the user will be perceived as more engaging and engaged
What is the user doing?
Is the user in the room?
At the computer?
Facing the computer?
Is anyone else there?
Is the user talking on the phone?
What is the user looking at?
System can be more responsive if it is aware of user actions
Suspend activity while user is
on the phone
talking to someone who just came into the office
A responsive system that is aware of the user will be perceived as more engaging and engaged
37. Examples: Communication Language
Gesture
Awareness
Emotion
Multimodal Nass and Reeves (Stanford)
Social (or emotional) response to computers
User perceives emotion and personality in computer regardless of what the designer does
Current interfaces perceived as being
cold, uncaring, non-communicative
Simple change:
careful choice of dialogue text can convey a different personality
We can build interfaces that detect our emotional state,
and adapt to respond to that state
Nass and Reeves (Stanford)
Social (or emotional) response to computers
User perceives emotion and personality in computer regardless of what the designer does
Current interfaces perceived as being
cold, uncaring, non-communicative
Simple change:
careful choice of dialogue text can convey a different personality
We can build interfaces that detect our emotional state,
and adapt to respond to that state
38. Examples: Communication Language
Gesture
Awareness
Emotion
Multimodal Natural communication uses multiple modalities
speech
gesture
Discussed at length already ----------------------------------------------------------
Bolt 1980 Put That There (Speech and gesture)
User has a choice of modality
Errors and disfluencies are reduced
Higher bandwidth is possible
Different modalities excel at different tasks
Multimodal interfaces are more engaging
Natural communication uses multiple modalities
speech
gesture
Discussed at length already ----------------------------------------------------------
Bolt 1980 Put That There (Speech and gesture)
User has a choice of modality
Errors and disfluencies are reduced
Higher bandwidth is possible
Different modalities excel at different tasks
Multimodal interfaces are more engaging
39. Examples: Motor Skills Bimanual skills
Muscle memory Multimodal Map Manipulation
Two hands
Speech Bimanual skills are very natural
Non-dominant hand for gross positioning
Dominant hand for fine manipulation
Ken Hinkleys multimodal map manipulation
2-hands for zooming and panning
Speech for long distance jumps
modalities are complimentaryBimanual skills are very natural
Non-dominant hand for gross positioning
Dominant hand for fine manipulation
Ken Hinkleys multimodal map manipulation
2-hands for zooming and panning
Speech for long distance jumps
modalities are complimentary
40. Camera-Based Navigation How do our bodies move when we navigate?
How do our bodies move when we navigate?
Observe how Nintendo user leans into a turn
Use forward/backward motion for relative speed control
Use side-to-side motion to control turns
Users hands are free for other purposesHow do our bodies move when we navigate?
Observe how Nintendo user leans into a turn
Use forward/backward motion for relative speed control
Use side-to-side motion to control turns
Users hands are free for other purposes
41. Examples: Perception Spatial relationships
Pattern recognition
Object constancy
Parallax
Other Senses Cone Tree: part of Information Visualizer developed at Xerox PARC
designed to visualize hierarchical information structures
3D perception helps user understand spatial relationships
based on relative size and other depth cues (e.g., occlusion)
pre-attentive, no cognitive load
Patterns become apparent
especially if search result is shown in context of structure
Object constancy
ability to perceive a moving object as one object
makes possible complex changes without cognitive load
EXAMPLE (next slide):
without animation, selection yields something unobvious
with animation, user understands without thinking about itCone Tree: part of Information Visualizer developed at Xerox PARC
designed to visualize hierarchical information structures
3D perception helps user understand spatial relationships
based on relative size and other depth cues (e.g., occlusion)
pre-attentive, no cognitive load
Patterns become apparent
especially if search result is shown in context of structure
Object constancy
ability to perceive a moving object as one object
makes possible complex changes without cognitive load
EXAMPLE (next slide):
without animation, selection yields something unobvious
with animation, user understands without thinking about it
42. Cone Tree Object constancy
ability to perceive a moving object as one object
makes possible complex changes without cognitive load
EXAMPLE:
without animation, selection yields something unobvious
with animation, user understands without thinking about itObject constancy
ability to perceive a moving object as one object
makes possible complex changes without cognitive load
EXAMPLE:
without animation, selection yields something unobvious
with animation, user understands without thinking about it
43. Examples: Perception Spatial relationships
Pattern recognition
Object constancy
Parallax
Other Senses Key 3D depth cue
Sensor issues
Camera-based head-motion parallax Motion parallax is one of most effective 3D depth cues
More effective than stereopsis (Colin Ware)
Head-motion parallax is one key way to get motion parallax
VR gets some of its power from this
But, many user are not willing or able to wear sensors
Camera-based head-motion parallax may be answer
Could make desktop 3d graphics more usableMotion parallax is one of most effective 3D depth cues
More effective than stereopsis (Colin Ware)
Head-motion parallax is one key way to get motion parallax
VR gets some of its power from this
But, many user are not willing or able to wear sensors
Camera-based head-motion parallax may be answer
Could make desktop 3d graphics more usable
44. Camera-Based Head-Motion Parallax Motion parallax is one of strongest 3D depth cues
Motion parallax is very effective 3D depth cue
More effective than stereopsis (Colin Ware)
Head-motion is one good way to get motion parallax
VR gets some of its power from this
BUT: Most Desktop graphics users not willing to wear sensors
Camera-based tracking can solve the problem
Can extend usefulness of desktop graphics
Horizontal head-motion
In plane of body? Look-at point?
Rotation about center of object? Non-linear?
Vertical
Not as natural; not as much range
Increased noise
Zoom: forward/backward motion to drive zoom
Awareness: Shouldnt track when user turns away
Motion parallax is very effective 3D depth cue
More effective than stereopsis (Colin Ware)
Head-motion is one good way to get motion parallax
VR gets some of its power from this
BUT: Most Desktop graphics users not willing to wear sensors
Camera-based tracking can solve the problem
Can extend usefulness of desktop graphics
Horizontal head-motion
In plane of body? Look-at point?
Rotation about center of object? Non-linear?
Vertical
Not as natural; not as much range
Increased noise
Zoom: forward/backward motion to drive zoom
Awareness: Shouldnt track when user turns away
45. Examples: Perception Spatial relationships
Pattern recognition
Object constancy
Parallax
Other Senses Auditory
Tactile
Kinesthetic
Vestibular
Taste
Olfactory Most work to date has focused on the visual channel
Auditory channel:
Reinforcement of what happens in visual channel
Objects become more real (take on weight, substance)
Attention (alerts)
Tactile channel:
Much work on force feedback devices (Some in game applications)
Fred Brooks (molecular docking with atomic forces)
Also some work on passive haptics
Kinesthetic (muscle movement and body position)
Tool belt in VR
Vestibular (balance)
Reinforcement for sense of locomotion (location-based entertainment)
Taste: ??
Olfactory channel: Maybe soonMost work to date has focused on the visual channel
Auditory channel:
Reinforcement of what happens in visual channel
Objects become more real (take on weight, substance)
Attention (alerts)
Tactile channel:
Much work on force feedback devices (Some in game applications)
Fred Brooks (molecular docking with atomic forces)
Also some work on passive haptics
Kinesthetic (muscle movement and body position)
Tool belt in VR
Vestibular (balance)
Reinforcement for sense of locomotion (location-based entertainment)
Taste: ??
Olfactory channel: Maybe soon
46. Examples: Perception Olfactory? Maybe soon? Olfactory displays are an active area of work
But little published progress in last 2-3 yearsOlfactory displays are an active area of work
But little published progress in last 2-3 years
47. Examples: Cognition Spatial memory
Cognitive chunking
Attention
Curiosity
Time Constants 3D layout designed so that user places objects
Assumption: spatial memory works in virtual environment
users will remember where they put objects
Maya Designs Workscape
Xerox PARC Web Forager
At MSR, we have been studying this with a visualization we call the Data Mountain
Now have good evidence that spatial memory does work in 3D virtual environments3D layout designed so that user places objects
Assumption: spatial memory works in virtual environment
users will remember where they put objects
Maya Designs Workscape
Xerox PARC Web Forager
At MSR, we have been studying this with a visualization we call the Data Mountain
Now have good evidence that spatial memory does work in 3D virtual environments
48. Data Mountain Favorites Management
Exploits:
Spatial memory
3D perception
Pattern recognition Advantages:
Spatial organization
Not page at a time
3D advantages with 2D interaction Document management -- IE Favorites, window management
Pages of interest are placed on a mountain side (a tilted plane in the initial prototype)
Act of placing page makes it easier to remember where it is
Usability test: Storage & retrieval test for 100 pages; ~26% faster
By exploiting 3D perception and spatial memory
User can organize documents spatially
Can get more info in same space with no additional cognitive load
Can see multiple pages at a time
Can see patterns of related documents
Advantages of 3D with 2D interaction technique
Document management -- IE Favorites, window management
Pages of interest are placed on a mountain side (a tilted plane in the initial prototype)
Act of placing page makes it easier to remember where it is
Usability test: Storage & retrieval test for 100 pages; ~26% faster
By exploiting 3D perception and spatial memory
User can organize documents spatially
Can get more info in same space with no additional cognitive load
Can see multiple pages at a time
Can see patterns of related documents
Advantages of 3D with 2D interaction technique
49. Sample User Reaction Here is a sample user layout
100 pages
Markings are landmarks with no intrinsic meaning
May look random, but in fact makes a lot of sense to this user
Typical comments
Strongest cue is the relative size
I know where that is Here is a sample user layout
100 pages
Markings are landmarks with no intrinsic meaning
May look random, but in fact makes a lot of sense to this user
Typical comments
Strongest cue is the relative size
I know where that is
50.
51. Data Mountain Usability Spatial memory works in virtual environments!
26% faster than IE4 Favorites
2x faster with Implicit Query We have run a series of studies
reported in UIST next month
submitted to CHI
future submissions
Basic findings
Spatial memory does work in virtual environments
It works over extended time periods
DM is significantly faster than IE4 Favorites (26%)
We have additional techniques that get us up to 2x fasterWe have run a series of studies
reported in UIST next month
submitted to CHI
future submissions
Basic findings
Spatial memory does work in virtual environments
It works over extended time periods
DM is significantly faster than IE4 Favorites (26%)
We have additional techniques that get us up to 2x faster
52. Implicit Query Visualization Highlight related pages
Slightly slower for storage
Over 2x faster for retrieval Implicit Query
When user selects a page, system finds related pages and highlights them
Based on similar contents (word frequency analysis)
No action is required by user
Highlight is designed to avoid distraction
Notice the entertainment related pages to the left of the selected page
We tested two versions of Implicit Query with the Data Mountain
IQ1 was based on simple word frequency (vector space model)
IQ2 was based on proximity analysis of previous users layouts
(I.e., two pages are similar if they are spatially close together in several previous users layouts
Users took longer to store pages
Created more categories
Were more consistent in their categories
Users were more than 2x faster on retrieval
Since typical use patterns suggest than each page is used about 5 times
Overall performance will be about 2x faster
Implicit Query
When user selects a page, system finds related pages and highlights them
Based on similar contents (word frequency analysis)
No action is required by user
Highlight is designed to avoid distraction
Notice the entertainment related pages to the left of the selected page
We tested two versions of Implicit Query with the Data Mountain
IQ1 was based on simple word frequency (vector space model)
IQ2 was based on proximity analysis of previous users layouts
(I.e., two pages are similar if they are spatially close together in several previous users layouts
Users took longer to store pages
Created more categories
Were more consistent in their categories
Users were more than 2x faster on retrieval
Since typical use patterns suggest than each page is used about 5 times
Overall performance will be about 2x faster
53. Examples: Cognition Spatial memory
Cognitive chunking
Attention
Curiosity
Time Constants Chunking: conscious perception of subtasks depends on the input device
Consider map manipulation
with mouse it takes 3 or 4 steps to pan and zoom
with 2 handed technique, it is one movement
Chunking: conscious perception of subtasks depends on the input device
Consider map manipulation
with mouse it takes 3 or 4 steps to pan and zoom
with 2 handed technique, it is one movement
54. Examples: Cognition Spatial memory
Cognitive chunking
Attention
Curiosity
Time Constants Motion attracts attention (evolved survival trait)
Prey uses it to spot predator
Predator uses it to spot prey
Implication:
animation can be used to focus attention
will distract if used inappropriately (I.e., spinning or blinking web objects)
Peripheral vision particularly tuned for motion detection
May be one reason that VR appears immersive
We are exploring ways to enhance that experience on the desktop
Focus in context displays
Much of IV work was trying to develop these kind of displays
Want focus seamlessly integrated with context
Avoid shift of attention
EXAMPLEMotion attracts attention (evolved survival trait)
Prey uses it to spot predator
Predator uses it to spot prey
Implication:
animation can be used to focus attention
will distract if used inappropriately (I.e., spinning or blinking web objects)
Peripheral vision particularly tuned for motion detection
May be one reason that VR appears immersive
We are exploring ways to enhance that experience on the desktop
Focus in context displays
Much of IV work was trying to develop these kind of displays
Want focus seamlessly integrated with context
Avoid shift of attention
EXAMPLE
55. Focus in Context Focus in context displays EXAMPLE: Cone Tree
Looking at large hierarchy
2D layout causes you to scroll and lose context
You can fit it all on the screen, but lose details
Wrapped in 3D, you get the details and always see the context
This is an example of focus seamlessly integrated with contextFocus in context displays EXAMPLE: Cone Tree
Looking at large hierarchy
2D layout causes you to scroll and lose context
You can fit it all on the screen, but lose details
Wrapped in 3D, you get the details and always see the context
This is an example of focus seamlessly integrated with context
56. Examples: Cognition Spatial memory
Cognitive chunking
Attention
Curiosity
Time Constants Discoverability is a key problem in current interfaces
Market drives addition of new functionality
Discoverability gets worse
Fear keeps us from natural exploration
Will my action be reversable?
Will I destroy my work?
Universal undo
Suggested by Raj Reddy
Would remove the fear
Could also allow us to remove Save commands
But, it is hard to implementDiscoverability is a key problem in current interfaces
Market drives addition of new functionality
Discoverability gets worse
Fear keeps us from natural exploration
Will my action be reversable?
Will I destroy my work?
Universal undo
Suggested by Raj Reddy
Would remove the fear
Could also allow us to remove Save commands
But, it is hard to implement
57. Examples: Cognition Spatial memory
Cognitive chunking
Attention
Curiosity
Time Constants From Allen Newells levels of cognition
0.1 s -- Perceptual fusion
-> Animations must be 10 fps or faster
1.0 s -- Immediate response
-> Unless a response is planned, cannot respond faster than this
-> Ideal for short animations
slow enough to get some significant animation
fast enough that use doesnt feel like he is waiting
EXAMPLE:
Cone Tree uses 1 second animation to show complex rotationFrom Allen Newells levels of cognition
0.1 s -- Perceptual fusion
-> Animations must be 10 fps or faster
1.0 s -- Immediate response
-> Unless a response is planned, cannot respond faster than this
-> Ideal for short animations
slow enough to get some significant animation
fast enough that use doesnt feel like he is waiting
EXAMPLE:
Cone Tree uses 1 second animation to show complex rotation
58. Summary: Recommendations Broaden scope!
Identify and engage human abilities
Go beyond the perceptive and multimodal
Test for usability! Case for broadening our scope
A lot of work has been done on multimodal input
Some work on perceptive interfaces
A lot of work on multimedia
Only a little on broader multimodal output
And many human abilities are not leveraged at all
Need to pull together all of these to build perceptual interfaces
Fully engaging human abilities will simplify UI and let user focus on the real task
We need to change the way we do our research and report on it.
Testing for usability should be something we routinely do.
Every new technique should be tested in some way before being reported in the literature.
Case for broadening our scope
A lot of work has been done on multimodal input
Some work on perceptive interfaces
A lot of work on multimedia
Only a little on broader multimodal output
And many human abilities are not leveraged at all
Need to pull together all of these to build perceptual interfaces
Fully engaging human abilities will simplify UI and let user focus on the real task
We need to change the way we do our research and report on it.
Testing for usability should be something we routinely do.
Every new technique should be tested in some way before being reported in the literature.