Attention, Selection and Nonconceptual Reference

Attention, Selection and Nonconceptual Reference An empirically-motivated proposal concerning the nonconceptual link between the perceived world and its conceptual representation Zenon Pylyshyn, Rutgers Center for Cognitive Science

Focal attention: What is it for?Perceptual selection and perceptual demonstratives The principal function of focal attention is to select. But why do we need to select? • We must select because our capacity to process information is limited. • We also must select because we need to be able to mark certain tokens in the perceived world and to refer to the marked tokens qua individuals (e.g., as in counting things). • Another way to put this is that we need to select in order to refer to things and we need to refer to things whenever we detect relational properties among them (Collinear, Inside, Part-of, Connected-to, ...) • An important reason for early selection is that it provides a way to group properties appropriately at the earliest (nonconceptual) stages of perception – and thus to help solve the binding problem • That’s what this talk is about: but first some background

Some background …. The early origins and motivation for the view that there is nonconceptual selection … a personal introduction

Why do we need to be able to pick out individuals without concepts? • We need to make nonconceptual contact with the world through perception in order to stop the regress of concepts being defined in terms of other concepts which are defined in terms of still other concepts – sometimes called the symbol grounding problem • Sensory transduction appears to be the universal, though typically tacit, assumption about how grounding occurs, at least in psychology and artificial intelligence. Yet most concepts cannot be reduced to sensory transduction. • My proposal is that nonconceptualselection of individual objects is the primitive basis for all conceptualization and predication • The argument for nonconceptual selection of token objects as the primitive operation is primarily empirical. • I begin with a personal experience in developing a model for reasoning about geometry by drawing a diagram.

Begin by drawing a line….

Now draw a second line….

And draw a third line….

Notice what you have so far….(noticings are local – you encode what you attend to) There is an intersection of two lines… But which of the two lines you drew are they? There is no way to indicate which individual things are seen again unless there is a way to refer to individual things

Look around some more to see what is there …. L3 L6 Here is another intersection of two lines… Is it the same intersection as the one seen earlier? To be able to tell without a reference to individuals you would have to encode unique properties of the individual lines. Which properties should you encode?

Keeping track by encoding unique properties of individual items will not work in general • No description can keep picking out the same individual when it is changing its location or appearance unpredictably • But a perceptual representation is always changing since it is always built up over time as properties are noticed – so you need a way to find the representation of a particular token element when new properties of that particular token element are noticed • Many writers have postulated a “marking” process for computing relational predicates. But where is the “mark” placed? It can’t be placed in the representation, because its purpose is to keep track of which things in the world correspond to which things in the representation (e.g. counting). • People can pick out several individual items even if they are in a field of identical individuals – e.g., pick out a dot in a uniform field of dots so the picking out cannot be done solely by direction of gaze.

Footnote Notice that in the previous example it would not help if you labeled the diagram as you drew it. Why not? • Because to refer to the line with label L1 you would have to be able to think “This is line L1” and you could not think that unless you had a way to think “this” and the label would not help you to do that! • Being able to think “this” is another way to view the very problem I will be concerned with in this talk. You need an independent way to pick out and refer to an individual element – even if it is labeled! (I will also provide evidence that you need to do this for several individuals simultaneously). • This is exactly the point of Kaplan’s and Perry’s claim about the “essential indexical”

The requirements for picking out individual things and keeping track of them reminded me of an early comic book character called “Plastic Man”

Imagine being able to place several of your fingers on things in the world without being able to detect their properties in this way, but being able to refer to those things so you could move your gaze or attention to them. If you could you would possess FINgers of INSTantiation = FINSTs!

Outline of remainder of this talk • Selection: What is selected? • Places vs ‘Objects’ (Posner & analogue attention movement) • Evidence in favor of object-based selection • Selection and demonstrative reference • Multiple selection • FINST Theory and Object Files • Multiple Object Tracking (MOT) and FINST Indexes as direct (non-conceptually-mediated) reference • Selection and the Binding Problem • Implication for philosophical ideas about individuals, tracking and nonconceptual representation

Covert movement of attention * * Example of an experiment using a cue-validity paradigm for showing that the locus of attention moves without eye movements and for estimating its speed.Posner, M. I. (1980). Orienting of Attention. Quarterly Journal of Experimental Psychology, 32, 3-25.

Extension of Posner’s demonstration of attention switch Does the improved detection in intermediate locations entail that the “spotlight of attention” moves continuously through empty space?

But the enhancement of intermediate locations does not require a continuous analogue movement of attention through empty space • When attention is attracted by an onset event, the appearance of analog movement of focal attention can be explained by a punctate (quantal) theory of attention-switchingSperling & Weichselgartner (1995) – an episodic theory of attention shift • This raises the possibility that in shifting between two objects, attention does not actually move through empty space • Maybe attention is allocated to objects rather than locations?

Evidence for Objects as the basis for selection • Single Object Advantage: pairs of judgments are faster when both judgments concern the same perceived object • Entire objects acquire enhanced sensitivity from the allocation of focal attention to partof the object • Single-Object advantage occurs even with generalized “objects” defined in feature space (Blaser & Pylyshyn, 2000) and even when the object is distributed over time-slices (Flombaum & Scholl, 2006) • Clinical (brain damage) syndromes such as Simultanagnosia and Hemispatial Neglect show object-based properties • Attention moves with Moving Objects • Inhibition of Return (IOR) • Object Files • Multiple Object Tracking MOT (and generalization to movement in feature space)

Single-object superiority even when the shapes are controlled There are a large number of published experiments showing that when several perceptual judgments are made they are faster when they pertain to the same object, even when all other factors are controlled

Attention spreads over perceived objects Spreads to B and not C Spreads to C and not B * Spreads to B and not C Spreads to C and not B Using a priming method (Egly, Driver & Rafal, 1994) showed that the effect of a prime spreads to other parts of the same visual object compared to equally distant parts of different objects.

Objecthood endures over space-time Several studies have shown that what counts as the same object endures over time and location; Object-specific priming (Kahneman; Scholl), Inhibition of return (Tipper) Inhibition of return is object-based Certain forms of disappearance-reappearance preserve objecthood Multiple Object Tracking MOT (Scholl, Keane) Apparent motion(Kolers, Yantis) Tunnel Effect(Michotte, 1953; Flombaum & Scholl, 2006) This identity constancy gives “visual objects” a real physical-object character and is one of the reasons why psychologists refer to them as “objects”.

Objects endure despite changes in location; and they carry their history with them! Object File Theory of Kahneman & Treisman Letters are faster to read if they appear in the same box in which they had appeared initially. Priming travels with the object. According to the theory, when an object first appears, a file is created for it and the properties of the object are encoded and subsequently accessed through this object-file.

Inhibition of return appears to be object-based • Inhibition-of-return is thought to help in visual search since it prevents previously visited objects from being revisited • The original study used static objects. Then (Tipper, Driver & Weaver, 1991) showed that IOR moves with the inhibited object.

IOR appears to be object-based (it travels with the object that was attended)

There is also evidence from clinical studies supporting object-based selection • Hemispatial Neglect • Balint and simultanagnosia syndromes

An empirical hypothesis: To select is to refer • When we select an object with focal attention we thereby refer to it. Consequently we can e.g., • Entertain thoughts about it (“this is red”) • Carry out certain actions towards it (e.g., move our gaze to it) • But we can select several (n ≤ 4) objects at once so; • We can have demonstrative thoughts about several objects“this1 is above this2” • Having selected several objects we can evaluate predicates over them or move focal attention to them • We can also subitize them or search through them <experiments> • We can keep track of selected objects if we or they move unpredictably or change their properties <MOT>

Pick out 3 dots I will cue and keep track of them • In a field of identical elements you can select several of them and move your attention among them (e.g., “move one up” or Move 2 right” etc) so long as at no time do you have to hold on to more than 3 or 4 dots

Subset selection for search Burkell, J., & Pylyshyn, Z. W. (1997). Searching through subsets: A test of the visual indexing hypothesis. Spatial Vision, 11(2), 225-258.

Subset search results: • Only properties of the subset matter • If the subset is a single-feature search it is fast and parallel • If the subset is a conjunction search set, finding the target takes longer and is a serial search (RT increases with set size) • The distance between targets does not matter, so observers don’t seem to be scanning the display looking for the target but can switch their attention directly to the subset items. • This finding supports the claim that we have a small number of FINST indexes that can be captured by sudden onsets and can serve to direct focal attention

Individuals and patterns • Vision does not recognize patterns by applying templates but rather by decomposing them into parts Recognition-By-Parts (Biederman, 2000) • A pattern is encoded over time (and often over different views separated by saccades), so the visual system must keep track of the individual parts and merge descriptions of the same part at different times and stages of encoding • In recognizing a pattern, the visual system must pick out individual parts and bind them to the representation being constructed

Are there collinear items (n>3)?

Several objects must be picked out at once in making relational judgments • The same is true for other relational judgments like inside or on-the-same-contour… etc. We must pick out the relevant individual objects first. Respond: Inside-same contour? On-same contour?

When items cannot be individuated, predicates over them cannot be evaluated● Do these figures contain one or two distinct curves?●Individuating these curves requires a “curve tracing” operation, so Number_of_curves (C1, C2, …) takes time proportional to the length of the shortest curve.

The figure on the left is one continuous curve, the one on the right is two distinct curves – as shown in color.

Signature ‘subitizing’ phenomena only appear when objects are automatically individuated and indexed Counting slope subitizing slope Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited capacity preattentive stage in vision. Psychological Review, 101(1), 80-102.

Demonstrations of MOT *These require a Quicktime Viewer • Basic MOT with repulsion: Basic Early MOT with repulsion between items • MOT with no restrictions Basic MOT without repulsion • MOT with occluding surfaces Objects can be tracked even if they briefly disappear • Tracking without keeping track of identities Track these and recall what label they had initially

Explaining Multiple Object Tracking • Do we track by storing and updating objects’ locations? • Not likely: the possibility that locations of targets are encoded and updated through serial visitation by focal attention was excluded in an early study • This supports the idea that the FINST mechanism automatically keeps track of objects as long as there are 4 or fewer of them (in other words indexes are “sticky”).

Other findings using MOT There have been dozens of studies using MOT with many surprising findings. Here are a few: • Tracking performance is not affected if objects continually change their color or shape during a tracking trial (whether the change is synchronous or asynchronous) • If objects do change their color or shape the change is not noticed • Tracking is not disrupted of objects disappear briefly but totally behind opaque strips or if they all disappear together • Targets can be selected automatically (by flashing) and also voluntarily. If selected voluntarily they have to be visited serially (while indexes are “dropped off”)

Review: A FINST is a mechanism that: • Picks out, andkeeps trackof individual distal objects • It does so directly – without the mediation of concepts and without using any encoded property of the indexed objects • In other words, FINSTs pick out and track objects as individuals rather than as bearers of certain properties • Because FINSTs do not pick out and track individuals as members of any category (including the category object), their connection to the world is transparent and nonconceptual. It is not an opaque “selecting as” relation; • Consequently a person may literally not know what he has selected (although indexes do make it possible for properties of the objects to be subsequently encoded into Object Files) • Pace John Campbell(2002, p134)“conscious experience of an object explains how you know the reference of a demonstrative”, we may not know the reference of a (perceptual) demonstrative

More on FINSTs • A FINST is a numerically limited mechanism for selecting individual visual objects currently in view. It works just the way that a pointer in a computer data structure works: It provides epistemic access to a particular item without representing the item’s location or other properties; • Although a FINST does not pick out an object in terms of its represented properties, there are properties that cause an index to be assigned (cf Kripke’s distinction between properties that fix a referent vs properties of the referent). There are also properties (maybe different properties) that allow objects to be tracked; • A FINST is usually captured or grabbed by an object that suddenly appears. But its attachment to particular items can be voluntarily enabled by moving unitary focal attention to the desired objects, thus precipitating the capture of an index

A fundamental problem of perception: Encoding conjunctions of properties • Finally this brings me to an important function that FINST indexes provide – a way to solve the ubiquitous binding problem in perception • Since we can distinguish between one combination of properties and another, early vision (sensation?) cannot simply announce the presence of properties for which there are sensors. They must provide additional information that allows the reconstruction of which properties ‘go with’ which. • The almost universal assumption about how this is done is that in early vision properties are encoded as being at particular locations • Treisman’s Feature Integration Theory • Strawson’s (and Clark’s) use of Feature Placing Theory

The role of location in Treisman’s Feature Integration Theory

But in encoding properties, early vision can’t just bind them together according to their spatial co-occurrence – even their co-occurrencewithin some region.That’s because the relevant region depends on the object. So the selection and binding must beaccording to the objects that have those properties

The problem of binding conjunctions by the location of conjuncts does not work when feature location is not punctate and becomes even more problematic if they are co-located – e.g., if their relation is “inside”

An alternative: In computing conjunctions of properties attention is directed at objects since it is objects that have conjoined properties • Instead of being like a spotlight beam that can be scanned around a scene, and can be zoomed to cover a larger or smaller area, maybe attention can only be directed to occupied places – i.e., to visual objects • A large experimental literature shows thatattention is Object-Based • This suggests an alternative view of how the binding problem is solved in early vision – through the prior selection of perceptual objects • But selection does not have to depend only on unitary focal attention. FINSTs allow multiple objects to be selected.

Object Files and the binding problem • Suppose that only properties of indexed objects are conceptually encoded and that these are stored in object files associated with each object. • Then properties that belong to the same object are stored in the same object file(which may be empty,as they are in MOT). • This automatically solves the binding problem since it connects encoded properties to their visual object • This view comes out of both FINST Theory (Pylyshyn, 1989) and Object File Theory (Kahneman et al., 1992)

FINSTs and Object Files form the link between the world and its conceptualization

Some open questions • We have arrived at the view that only properties of selected (indexed) objects enter into subsequent conceptualization and perception-based thought (i.e., only information in object files is made available to cognition) • So what happens to the rest of the visual information? • Visual information seems rich and fine-grained while this theory only allows for the properties of 4 or 5 objects to be encoded! • The present view leaves no room for nonconceptual representations whose content corresponds to the content of conscious experience • According to the present view, the only content that nonconceptual representations have is the demonstrative content of indexes that refer to perceptual objects • Question: Why do we need any more than that?

An intriguing possibility…. Maybe the theoretically relevant information we take in is less than (or at least different from) what we experience • This possibility has received attention recently with the discovery of various “blindnesses” (e.g., change-blindness, inattentional blindness, blindsight…) as well as the discovery of independent-vision systems (e.g., recognition and motor control) • The qualitative content of conscious experience may not play a role in explanations of cognitive processes • Even if unconceptualized information enters into causal process (e.g., motor control) it may not be represented or made available to the cognitive mind it – not even as a nonconceptualrepresentation • For something to be a representation its content must figure in explanations – it must capture generalizations. It must have truth conditions and therefore allow for misrepresentation. It is an empirical question whether current proposals do (e.g., primal sketch, scenarios). cf Devitt: Pylyshyn’s Razor

Vision science has always been deeply ambivalent about role of conscious experience Isn’t how things appear one of the things that our theories must explain?Answer: There is no a priori ‘must explain’! • The content of subjective experience is a major type of evidence. But it may turn out not to be the most reliable source for inferring the relevant functional states. It competes with other types of evidence. • How things appear cannot be taken at face value: it carries substantive theoretical assumptions. It also draws on many levels of processing. • It was a serious obstacle to early theories of vision (Kepler) • It has been a poor guide in the case of theories of mental imagery (e.g., color mixing, image size, image distances). ‘Reading X off an image’ is an illusion. • It seems likely that vision science will use evidence of conscious experience the way linguistics uses evidence of grammatical intuitions – only as it is filtered through developing theories. • The questions a science is expected to answer cannot be set in advance – they change as the science develops.

Attention, Selection and Nonconceptual Reference