Eye-Based Interaction in Graphical Systems: Theory & Practice

Eye-Based Interaction in Graphical Systems: Theory & Practice Part I Introduction to the Human Visual System

A: Visual Attention “When the things are apprehended by the senses, the number of them that can be attended to at once is small, `Pluribus intentus, minor est ad singula sensus' ” — William James • Latin translation: “Many filtered into few for perception” • Visual scene inspection is performed minutatim (piecemeal), not in toto

A.1: Visual Attention—chronological review • Qualitative historical background: a dichotomous theory of attention—the “what” and “where” of (visual) attention • Von Helmholtz (ca. 1900): mainly concerned with eye movements to spatial locations, the “where”, I.e., attention as overt mechanism (eye movements) • James (ca. 1900): defined attention mainly in terms of the “what”, i.e., attention as a more internally covert mechanism

A.1: Visual Attention—chronological review (cont’d) • Broadbent (ca. 1950): defined attention as “selective filter” from auditory experiments; generally agreeing with Von Helmholtz’s “where” • Deutsch and Deutsch (ca. 1960): rejected “selective filter” in favor of “importance weightings”; generally corresponding to James’ “what” • Treisman (ca. 1960): proposed unified theory of attention—attenuation filter (the “where”) followed by “dictionary units” (the “what”)

A.1: Visual Attention—chronological review (cont’d) • Main debate at this point: is attention parallel (the “where”) or serial (the “what”) in nature? • Gestalt view: recognition is a wholistic process (e.g., Kanizsa figure) • Theories advanced through early recordings of eye movements

A.1: Visual Attention—chronological review (cont’d) • Yarbus (ca. 1967): demonstrated sequential, but variable, viewing patterns over particular image regions (akin to the “what”) • Noton and Stark (ca. 1970): showed that subjects tend to fixate identifiable regions of interest, containing “informative details”; coined term “scanpath” describing eye movement patterns • Scanpaths helped cast doubt on the Gestalt hypothesis

A.1: Visual Attention—chronological review (cont’d) Fig.2: Yarbus’ early scanpath recording: • trace 1: examine at will • trace 2: estimate wealth • trace 3: estimate ages • trace 4: guess previous activity • trace 5: remember clothing • trace 6: remember position • trace 7: time since last visit

A.1: Visual Attention—chronological review (cont’d) • Posner (ca. 1980): proposed attentional “spotlight”, an overt mechanism independent from eye movements (akin to the “where”) • Treisman (ca. 1986): once again unified “what” and “where” dichotomy by proposing the Feature Integration Theory (FIT), describing attention as a “glue” which integrates features at particular locations to allow wholistic perception

A.1: Visual Attention—chronological review (cont’d) • Summary: the “what” and “where” dichotomy provides an intuitive sense of attentional, foveo-peripheral visual mechanism • Caution: the “what/where” account is probably overly simplistic and is but one theory of visual attention

B: Neurological Substrate of the Human Visual System (HVS) • Any theory of visual attention must address the fundamental properties of early visual mechanisms • Examination of the neurological substrate provides evidence of limited information capacity of the visual system—a physiological reason for an attentional mechanism

B.1: The Eye Fig. 3: The eye—“the world’s worst camera” • suffers from numerous optical imperfections... • ...endowed with several compensatory mechanisms

B.1: The Eye (cont’d) Fig. 4: Ocular optics

B.1: The Eye (cont’d) • Imperfections: • spherical abberations • chromatic abberations • curvature of field • Compensations: • iris—acts as a stop • focal lens—sharp focus • curved retina—matches curvature of field

B.2: The Retina • Retinal photoreceptors constitute first stage of visual perception • Photoreceptors  transducers converting light energy to electrical impulses (neural signals) • Photoreceptors are functionally classified into two types: rods and cones

B.2: The Retina—rods and cones • Rods: sensitive to dim and achromatic light (night vision) • Cones: respond to brighter, chromatic light (day vision) • Retinal construction: 120M rods, 7M cones arranged concentrically

B.2: The Retina—cellular makeup • The retina is composed of 3 main layers of different cell types (a 3-layer “sandwich”) • Surprising fact: the retina is “inverted”— photoreceptors are found in the bottom layer (furthest away from incoming light) • Connection bundles between layers are called plexiform or synaptic layers

B.2: The Retina—cellular makeup (cont’d) Fig.5: The retinocellular layers (w.r.t. incoming light): • ganglion layer • inner synaptic plexiform layer • inner nuclear layer • outer synaptic plexiform layer • outer layer

B.2: The Retina—cellular makeup (cont’d) Fig.5 (cont’d): The neuron: • all retinal cells are types of neurons • certain neurons mimic a “digital gate”, firing when activation level exceeds a threshold • rods and cones are specific types of dendrites

B.2: The Retina—retinogeniculate organization (from outside in, w.r.t. cortex) • Outer layer: rods and cones • Inner layer: horizontal cells, laterally connected to photoreceptors • Ganglion layer: ganglion cells, connected (indirectly) to horizontal cells, project via the myelinated pathways, to the Lateral Geniculate Nuclei (LGN) in the cortex

B.2: The Retina—receptive fields • Receptive fields: collections of interconnected cells within the inner and ganglion layers • Field organization determines impulse signature of cells, based on cell types • Cells may depolarize due to light increments (+) or decrements (-)

B.2: The Retina—receptive fields (cont’d) Fig.6: Receptive fields: • signal profile resembles a “Mexican hat” • receptive field sizes vary concentrically • color-opposing fields also exist

B.3: Visual Pathways • Retinal ganglion cells project to the LGN along two major pathways, distinguished by morphological cell types:  and  cells •  cells project to the magnocellular (M-) layers •  cells project to the parvocellular (P-) layers • Ganglion cells are functionally classified by three types: X, Y, and W cells

B.3: Visual Pathways—functional response of ganglion cells • X cells: sustained stimulus, location, and fine detail • nervate along both M- and P- projections • Y cells: transient stimulus, coarse features, and motion • nervate along only the M-projection • W cells: coarse features and motion • project to the Superior Colliculus (SC)

B.3: Visual Pathways (cont’d) Fig.7: Optic tract and radiations (visual pathways): • The LGN is of particular clinical importance • M- and P-cellular projections are clearly visible under microscope • Axons from M- and P-layers of the LGN terminate in area V1

B.3: Visual Pathways (cont’d) Table.1: Functional characteristics of ganglionic projections

B.4: The Occipital Cortex and Beyond Fig.8: The brain and visual pathways: • the cerebral cortex is composed of numerous regions classified by their function

B.4: The Occipital Cortex and Beyond (cont’d) • M- and P- pathways terminate in distinct layers of cortical area V1 • Cortical cells (unlike center-surround ganglion receptive fields) respond to orientation-specific stimulus • Pathways emanating from V1 joining multiple cortical areas involved in vision are called streams

B.4: The Occipital Cortex and Beyond—directional selectivity • Cortical Directional Selectivity (CDS) of cells in V1 contributes to motion perception and control of eye movements • CDS cells establish a motion pathway from V1 projecting to areas V2 and MT (V5) • In contrast, Retinal Directional Selectivity (RDS) may not contribute to motion perception, but is involved in eye movements

B.4: The Occipital Cortex and Beyond—cortical cells • Two consequences of visual system’s motion-sensitive, single-cell organization: • due to motion sensitivity, eye movements are never perfectly still (instead tiny jitter is observed, termed microsaccade)—if eyes were stabilized, image would fade! • due to single-cell organization, representation of natural images is quite abstract: there is no “retinal buffer”

B.4: The Occipital Cortex and Beyond—2 attentional streams • Dorsal stream: • V1, V2, MT (V5), MST, Posterior Parietal Cortex • sensorimotor (motion, location) processing • the attentional “where”? • Ventral (temporal) stream: • V1, V2, V4, Inferotemporal Cortex • cognitive processing • the attentional “what”?

B.4: The Occipital Cortex and Beyond—3 attentional regions • Posterior Parietal Cortex (dorsal stream): • disengages attention • Superior Colliculus (midbrain): • relocates attention • Pulvinar (thalamus; colocated with LGN): • engages, or enhances, attention

C: Visual Perception (with emphasis on foveo-peripheral distinction) • Measurable performance parameters may often (but not always!) fall within ranges predicted by known limitations of the neurological substrate • Example: visual acuity may be estimated by knowledge of density and distribution of the retinal photoreceptors • In general, performance parameters are obtained empirically

C.1: Spatial Vision • Main parameters sought: visual acuity, contrast sensitivity • Dimensions of retinal features are measured in terms of projected scene onto retina in units of degrees visual angle, where S is the object size and D is distance

C.1: Spatial Vision—visual angle Fig.9: Visual angle

C.1: Spatial Vision—common visual angles Table 2: Common visual angles

C.1: Spatial Vision—retinal regions • Visual field: 180° horiz.  130° vert. • Fovea Centralis (foveola): highest acuity • 1.3° visual angle; 25,000 cones • Fovea: high acuity (at 5°, acuity drops to 50%) • 5° visual angle; 100,000 cones • Macula: within “useful” acuity region (to about 30°) • 16.7° visual angle; 650,000 cones • Hardly any rods in the foveal region

C.1: Spatial Vision—visual angle and receptor distribution Fig.10: Retinotopic receptor distribution

C.1: Spatial Vision—visual acuity Fig.11: Visual acuity at eccentricities and light levels: • at photopic (day) light levels, acuity is fairly constant within central 2° • acuity drops of linearly to 5°; drops sharply (exp.) beyond • at scotopic (night) light levels, acuity is poor at all eccentricities

C.1: Spatial Vision—measuring visual acuity • Acuity roughly corresponds to foveal receptor distribution in the fovea, but not necessarily in the periphery • Due to various contributing factors (synaptic organization and later-stage neural elements), effective relative visual acuity is generally measured by psychophysical experimentation

C.2: Temporal Vision • Visual response to motion is characterized by two distinct facts: persistence of vision (POV) and the phi phenomenon • POV: essentially describes human temporal sampling rate • Phi: describes threshold above which humans detect apparent movement • Both facts exploited in media to elicit motion perception

C.2: Temporal Vision—persistence of vision Fig.12: Critical Fusion Frequency: • stimulus flashing at about 50-60Hz appears steady • CFF explains why flicker is not seen when viewing sequence of still images • cinema: 24 fps  3 = 72Hz due to 3-bladed shutter • TV: 60 fields/sec, interlaced

C.2: Temporal Vision—phi phenomenon • Phi phenomenon explains why motion is perceived in cinema, TV, graphics • Besides necessary flicker rate (60Hz), illusion of apparent, or stroboscopic, motion must be maintained • Similar to old-fashioned neon signs with stationary bulbs • Minimum rate: 16 frames per second

C.2: Temporal Vision—peripheral motion perception • Motion perception is not homogeneous across visual field • Sensitivity to target motion decreases with retinal eccentricity for slow motion... • higher rate of target motion (e.g., spinning disk) is needed to match apparent velocity in fovea • …but, motion is more salient in periphery than in fovea (easier to detect moving targets than stationary ones)

C.2: Temporal Vision—peripheral sensitivity to direction of motion Fig.13: Threshold isograms for peripheral rotary movement: • periphery is twice as sensitive to horizontal-axis movement as to vertical-axis movement • (numbers in diagram are rates of pointer movement in rev./min.)

C.3: Color Vision—cone types • foveal color vision is facilitated by three types of cone photorecptors • a good deal is known about foveal color vision, relatively little is known about peripheral color vision • of the 7,000,000 cones, most are packed tightly into the central 30° foveal region Fig.14: Spectral sensitivity curves of cone photoreceptors

C.3: Color Vision—peripheral color perception fields • blue and yellow fields are larger than red and green fields • most sensitive to blue, up to 83°; red up to 76°; green up to 74° • chromatic fields do not have definite borders, sensitivity gradually and irregularly drops off over 15-30° range Fig.15: Visual fields for monocular color vision (right eye)

C.4: Implications for Design of Attentional Displays • Need to consider distinct characteristics of foveal and peripheral vision, in particular: • spatial resolution • temporal resolution • luminance / chrominance • Furthermore, gaze-contingent systems must match dynamics of human eye movement

D: Taxonomy and Models of Eye Movements • Eye movements are mainly used to reposition the fovea • Five main classes of eye movements: • saccadic • smooth pursuit • vergence • vestibular • physiological nystagmus • (fixations) • Other types of movements are non-positional (adaptation, accommodation)

D.1: Extra-Ocular Muscles Fig.16: Extrinsic muscles of the eyes: • in general, eyes move within 6 degrees of freedom (6 muscles)

D.1: Oculomotor Plant Fig.17: Oculomotor system: • eye movement signals emanate from three main distinct regions: • occipital cortex (areas 17, 18, 19, 22) • superior colliculus (SC) • semicircular canals (SCC)

Eye-Based Interaction in Graphical Systems: Theory & Practice