The Illusion of Mental Pictures

The Illusion of Mental Pictures Zenon Pylyshyn Rutgers University, Center for Cognitive Science http:/ruccs.rutgers.edu/faculty/pylyshyn.html

The illusion of mental pictures • There is no question that we (all but about 2% of us) experience mental images and, in some sense, use them to recall, anticipate and enjoy life in the absence of the things and people that we imagine. • Not only are we able to “picture” some object or scene in our “mind’s eye” but it seems that we must do so in order to solve certain kinds of problems. • Books are full of examples of how images helped people to make discoveries in science and create works of art – none of which would have happened without the capacity to use mental imagery. I will not rehearse all the examples, but they include Einstein, Kikule, …

The illusion about the causal role of mental pictures in thought • It is important that as scientists we consider what is assumed when we speak of creating, recalling, examining and transforming mental images. • I have argued that there is a powerful illusion behind not only our folk understanding of mental imagery, but also behind our attempts to build scientific theories of it, and this illusion is not just a way of speaking or a handy metaphor. It is an essential part of our understanding of imagery. In fact the very term “imagery” betrays an assumption about what it is like. • The illusion is that when we engage in what we call imaging or visualizing, there is, somewhere in our head, somethingthat we see more or less the way we see the world, and which resembles a possible or actual visual scene about which we are thinking.

Some common mistakes in thinking about mental imagery • The intentional or phenomenological fallacy: Confusing properties of the imagined world with properties of our images.  Examples : size, distance, and especially temporal duration • Task demands. The rational interpretation of the task of imagining something is to pretend you are perceiving it. • “Imagine X” “Pretend that you are seeing X happening”

Intuitions about which property in the world maps onto the same property in its representation • Temperature, weight…  • Brightness  • 3D depth ? • Shape, color  • Size  • Motion  • Duration  • Metrical properties such as distance ? • Metrical axioms, Euclidean properties (Pythagoras’ theorem)

Examples to probe your intuition and your tacit knowledge Imagine seeing these events unfolding… • You hit a baseball. What shape trajectory does it trace? It is coming towards you: Where would you run to catch it? If you have ever played baseball you would have a great deal of “tacit knowledge” of what to do in such well-studied cases. • You drop a rubber ball on the pavement. Tap a button every time it hits the ground and bounces. Plot bounce heightvstime. • Suppose you get this pattern: • Drop a heavy steel ball at the same time as you drop a light ball from, say, the leaning tower of Pisa. Indicate when they hit the ground. Repeat for different heights and weights. (It turns out that people are Aristotelian rather than Galilean). What is responsible for this pattern in your image? height Time since first drop

What color do you see when two colored light beams overlap? ? Two complementary colored light beams => white Two complementary colored filters or paint => black

Where would the water go if you poured into a full beaker full of sugar? Is there conservation of volume in your image? If not, why not?

What do these image behaviors have in common? • Objects in your image do whatever you believe those objects would have done had you watched them under the same set of circumstances in reality. • Finding that your image mimics nature is not a discovery about images. It is a discovery about your tacit beliefs of what would happen in the world under conditions similar to those of the imagery experiment.

The most interesting questions about mental imagery come together in the problem of representing spatial patterns Representation of Space in Mental Images This is the issue I am most interested in because it bears on some questions about how visual information is encoded as well as the vexing question of the role of conscious experience in cognitive science

Spatial character of mental images • Some of the more impressive experimental results on mental imagery (mental rotation, mental scanning, mental size effects) appear to suggest that images have spatial properties. • It is no accident that we can reason by imagining things laid out in space and then examining the layout pattern to see the solution. Yet there have been few attempts to say exactly what “being laid out in space” means, either formally or physically. • One of the most explicit has been a statement by Steve Kosslyn about what he calls the depictive nature of mental images. Since it shows the intimate connection of images with spatial cognition I begin with this quote.

Images as depictive representations (Kosslyn, 1994, p 5) “A depictive representation is a type of picture, which specifies the locations and values of configurations of points in a space. … In a depictive representation, not only is the shape of the represented parts immediately available to appropriate processes, but so is the shape of the empty space … Moreover, one cannot represent a shape in a depictive representation without also specifying a size and orientation….” • This is the claim that the form of representation of images compels certain properties to be represented. The reason for this assumption goes back to what “image” means to many people – and to the underlying mental picture assumption.

Images as displayed in “functional space” • “The space in which the points appear need not be physical…, but can be like an array in a computer, which specifies spatial relations purely functionally. That is, the physical locations in the computer of each point in an array are not themselves arranged in an array; it is only by virtue of how this information is ‘read’ and processed that it comes to function as if it were arranged into an array (with some points being close, some far, some falling along a diagonal, etc).” (Kosslyn, 1994, p5) • But it is important whythe information is ‘read’ in one way rather than in another since that is what gives the account the appearance of being principled and explanatory. • To understand why the picture theory does not offer an explanation one needs to understand the functional space proposal and it’s assumptions.

Do images have (or just represent) size? • There are many studies showing that when subjects imagine something small it takes them longer to detect small features (e.g., a mouse’s whiskers) than when they imagine them as large. What does this tell us about the representation of size? • There are two possibilities: The “size” is either the size of the image or it is the size of the thing imagined. • The first needs either a physical size or some still-unknown variables that obey the law Time = Distance/Speed. • The second can yield the observed result simply because people know what it would be like to view the object, namely if it is small the details will not be as clear or you will need to ‘zoom’ in on the object, to see the details, (Ask yourself: What if it were faster for the small image? What would you conclude?) • Suppose, instead, the experiment asked you to report details in a large blurred or low-definition image as opposed to a small high definition image? Why do you predict that?

One of the least controversial examples of image transformation: Mental rotation Time to judge whether (a)-(b) or (b)-(c) are the same except for orientation increases linearly with the angle between them (Shepard & Metzler, 1971)

What do you do to judge whether these two figures are the same shape? Is this how the process looked to you? When you make it rotate in your mind, does it seem to retain its rigid 3D shape without re-computing it?

The important distinction between architecture and represented content • It is only obligatory that a certain pattern must occur if the pattern is caused by fixed properties of the architecture as opposed to being due to properties of what is represented (i.e., what the observer tacitly knows about the behavior of that which is represented) • If it is obligatory only because the theorist says it is, then score that as a free empirical parameter (a wild card). • The important consequence is that if we allow one theory to stipulate what is obligatory without there being a principle that mandates it, then any other theory can stipulate the same thing. Such theories are unconstrained and explain nothing. • This failure of image theories is quite general – all picture theories suffer from the same lack of principled constraints.

How are these ‘obligatory’ constraints realized? • Image properties, such as size and rigidity are assumed to be inherent in the architecture (of the ‘display’) • That raises the question of what kind of architecture could possibly enforce rigidity of shape? • Notice that neither a spatial display nor a functional space make it obligatory that shape be rigidly maintained as orientation is changed. Only certain physical properties can explain rigidity. • Such rigidity could not be part of the architecture of an imagery system because we can easily imagine objects for which rigidity does not hold (e.g. imagine a rotating snake!). • There is also evidence that ‘mental rotation’ is incremental, not holistic, and the speed of rotation depends on the conceptual complexity of the shape and the comparison task.

Example 2: Mental Scanning • Hundreds of experiments have now been done demonstrating that it takes longer to scan attention between places that are further apart in the imagined scene. In fact the time-distance relation is linear. • These have been reviewed and described in: • Denis, M., & Kosslyn, S. M. (1999). Scanning visual mental images: A window on the mind. Cahiers de Psychologie Cognitive / Current Psychology of Cognition, 18(4), 409-465. • Rarely cited are experiments by Liam Bannon and me (described in Pylyshyn, 1981) which I will summarize for you. A window on the mind

Studies of mental scanningDoes it show that images have metrical space? (Pylyshyn & Bannon. Described in Pylyshyn, 1981) • Conclusion: The image scanning effect is Cognitively Penetrable • i.e., it depends on Tacit Knowledge.

 The central problem with imagistic explanations… What is assumed in the mental picture explanations of mental scanning? • In actual vision, it takes longer to scan a greater distance because real distance, real motion, and real time is involved, therefore this equation holds due to natural law: Time =distance speed But what ensures that a corresponding relation holds in an image? The obvious answer is: Because the image is laid out in real space! • But what if that option is closed for empirical reasons? Well you might appeal to a “Functional Space” which imagists liken to a matrix data structure in which some pairs of cells are closer and others further away, and to move from one to another it is natural that you pass through intermediate cells • Question: What makes these sorts of properties “natural” in a matrix data structure?

What warrants the ‘obligatory’ constraint? To use Prinz’s term, it is not obligatory that the well-known relation between distance, speed and time hold in functional space or in a matrix. There is no natural law or principle that requires it. You couldimagine an object moving instantly or according to any motion relation you like, and the functional space would then be made to comply with that since it has no constraints of its own. • So why is it natural to imagine a moving object traversing intermediate empty space when getting from A to B? • Because that’s how real objects move through real space!

Why is it ‘natural’ to assume that functional space is like real space? There are at least two possible reasons why a functional space, such as a matrix data structure, appears to have natural spatial properties (e.g., distances, size, empty places): • Because when we think of incarnations of functional space, such as a matrix, we think of how we picture them on paper. • In fact a matrix does not intrinsically have distance, empty places, direction or any other such property, except in the mind of the person who draws it or uses it! • Moving from one cell to another does not require passing through intermediate cells unless we stipulate that it does. A computer is quite happy to go directly from one cell to any other cell. The same goes for the very concept of ‘intermediate cell’.

Why is it ‘natural’ to assume a matrix … • Because when we think of a functional space, such as a matrix, we think of it as being a way of simulating real (cortical) space – making it more convenient to think about the consequences of the cortical space assumption. • This is why we think of some cells as being ‘between’ others, some being farther away, etc. This makes properties like distances seem natural because we interpret the matrix as standing in for real space. • In that case we are not appealing to a functional space in explaining the scanning effect, the size effect, etc. The explanatory force of the explanation comes from the real space that we are assuming. • This is just another way of assuming a real space (in the brain) where representations of objects are located in neural space. • We will see that all the reasons for the failure of the assumption that images are laid out on the surface of visual cortex apply equally to this ‘functional space.’

What next? • We turn now to the only way in which we might be able to explain the experimental imagery results in terms of pictorial properties, as assumed by picture theorists. That’s to locate the picture in the brain – because it is the only place where there is a literal physical space that could underwrite such operations as scanning or rotation or properties such as size or shape in the terms assumed by picture theorists.

The good news for picture theories What are some plausible reasons why we might find a mechanisms of imagery in visual cortex • There is neuroanatomical evidence for a retinotopic layout in the earliest visual area of the brain (V1). • Neural imaging data shows that V1 is more active during mental imagery than during other forms of thought. • Transcranial magnetic stimulation (TMS) of visual areas interferes more with imagery than other forms of thought. • Clinical cases of visual agnosia show that some impairments of vision have associated impairments of imagery (Bisiach, Farah) • Recent psychophysical observations of imagery show parallels with corresponding observations of vision, and these can be related in both cases to certain cells in V1 (e.g., oblique effect)

Neuroscience evidence shows that the retinal pattern of activation is displayed on the surface of the cortex There is a topographical projection of retinal activity on the visual cortex of the cat and monkey. Tootell, R. B., Silverman, M. S., Switkes, E., & de Valois, R. L. (1982). Deoxyglucose analysis of retinotopic organization in primate striate cortex. Science, 218, 902-904.

The bad news for picture theories Drawing conclusions about the form of visual images from neuroscience data faces many hurdles • The capacity for imagery and for vision are independent. All imagery results are observed in the blind as well as in patients with no visual cortex. So there is nothing visual about them. • Cortical topography is 2-D, but mental images are 3-D – all phenomena (e.g. rotation) occur in depth as well as in the plane. • Patterns in the visual cortex are in retinal coordinates whereas images are primarily in world-coordinates • Unless you make a special effort, your image of parts of the room stays fixed in room coordinates when you move your eyes or turn your head or walk around the room.

…Problems with drawing conclusions about mental imagery from neuroscience data • Accessing and manipulating information in an image is very different from accessing it from the perceived world. Order of access from images is highly constrained. • Some have tried to explain this by postulating rapid decay of images, but the times involved in these demonstrations are not consistent with the data (e.g., times for reporting letters are comparable to those involving size or mental scanning). • Conceptual rather than graphical properties are relevant to image complexity (e.g., mental rotation) suggesting that image representations are conceptual. • If images consist in patterns on visual cortex then they behave differently when the same patterns are acquired from vision. For example the important Emmert’s law applies to retinal and cortical images but not to mental images, a fact largely unnoticed.

…Problems with drawing conclusions about mental imagery from neuroscience data • The signature properties of vision (e.g., spontaneous 3D interpretation, automatic reversals, apparent motion, motion aftereffects, etc) are absent in images; • A cortical display account of most imagery findings is incompatible with the cognitive penetrability of mental imagery phenomena, such as scanning and image size effects; • The fact that the Mind’s Eye is so much like a real eye (e.g., oblique effect, resolution fall-off) should serve to warn us that we may be studying what observers know about how the world looks to them, rather than what form their images take (unless the Mind’s eye is exactly the same as the real eye!). • I will consider a possible neural explanation of the oblique effect later.

…Problems with drawing conclusions about mental imagery from neuroscience data • Many clinical cases cited by image theorists can be explained by appeal to tacit knowledge and attention • The ‘tunnel effect’ found in vision and imagery (Farah) is plausibly due to the patient knowing how things looked to her post-surgery (The experiments were done a year after). • Hemispatial neglect seems to be an attention deficit, which explains the neglect in imagery reported by Bisiach. A recent study shows that image neglect does not appear if patients have their eyes closed (Bartolomeo & Chokron, 2002). This fits well with the account I have offered in which the spatial character of mental images derives from concurrently perceived space (I will give examples later).

A more detailed look at two examples where neuroscience evidence is used • Claims that fMRI and PET evidence supports the assumption that larger mental images have correspondingly larger regions of cortical excitation. • Claims that the Oblique Effect in imagery supports the assumption that images are laid out on the visual cortex.

1. Image size and the visual cortex • There is evidence that when imagining “large” objects that overflow one’s phenomenal image, a different pattern of activation in visual cortex occurs than when imagining a small object. • This in itself is not remarkable since all scientists accept that a difference in mental experience must be accompanied by some difference in the neural state – this is called thesupervenience assumption: no mental differences without physical differences. This also follows from materialism.

Image size and neural encoding • In vision:cells in the parafoveal area of the retina project onto the more frontal parts of the visual cortex. Thus when objects are large enough so that they fall onto the parafovea, they will activate frontal parts of the visual cortex. • In imagery: it is claimed that imagining large objects (which fill the visual field) leads to increased activity in the frontal part of the visual cortex. Some have taken this as prima facie evidence that perceived (large) size is neurally encoded the same way as imagined (large) size.

Image size and the visual cortex… But the explanation for why large visual objects activate more frontal parts of the visual cortex depends on the fact that fibers from parafoveal cells connect to these frontal areas. This can’t be the case with mental images unless they are also on the retina! And anyway, how does the fact that large mental images activate frontal parts of the visual cortex explain why small details are easier to detect in large mental images? Or how does it explain why scanning across a large image takes longer just because it happens to lie in the more fontal visual cortex? All picture-theory explanations make essential reference to distances and sizes. Many neuroscience explanations for imagery findings make exactly the same mistake of citing activation patterns that arise from connections to the retina, and which therefore do not work unless mental images are projected onto the retina. I will give just one more example of a such a neural explanation because the error in that case is particularly egregious.

2. The oblique effect and visual cortex • In vision, when a set of lines is to be discriminated (distinguished from a single blur) the discrimination is better when the lines are vertical or horizontal than when they are at a 45° angle. This is called the Oblique Effect. It is a low-level effect that occurs in the early vision module. • Does the Oblique effect occur with mental images?

Do images have low-level visual properties? • Imagine a grating in which the bars are: • Horizontal • Vertical • Oblique (45°) • Imagine the bars getting closer and closer together. In which of these displays do the bars blur together first? • In vision, the oblique bars blur sooner (called oblique effect) • In imagery, a similar result was reported by Kosslyn et al. (3) (2) (1)

Neurological explanations for both cases? • An accepted explanation of the psychophysical case (where lines are seen) is that in primary visual cortex (V1) there are more cells tuned to horizontal and vertical orientations than to oblique orientations, so horizontal and vertical discrimination is more sensitive. Can this fact also explain why imagined bars show the same pattern? Kosslyn et al claim that it does and that this provides further support for the view that images are laid out in visual cortex. • But this argument rests on a misunderstanding of how the orientation-specific cells are tuned to specific orientations: the tuning comes from the way they are connected to photoreceptive cells on the retina. Vertical cells are more often connected to columns of photocells while horizontal cellsare more often connected to rows of photocells (relative to the retina).

Neurological explanations for both cases? • If patterns of bars were activated on the surface of cortex by mental imagery, as assumed by picture-theorists, then no overall bias toward vertical-horizontal bars would occur. Horizontal cells would be no more likely to be activated by horizontal patterns on the surface of the visual cortex than by vertical patterns. The only way that images of horizontal bars would preferentially activate horizontal cells is if the images were on the retina!

What happens when horizontal/vertical cells are activated by means other than retinal patterns? 9 vertical 9 horizontal 5 oblique The proportion of Vertical, Horizontal & Oblique cells remains the same in all cases – they are located at random on the surface of visual cortex!

An overarching consideration: • What if colored three-dimensional images were found in visual cortex? What would that tell you about the role of mental images in reasoning? • Would this require a homunculus?

Should we welcome back the homunculus? • In the limit if the visual cortex mapped the contents of one’s conscious images we would need an interpreter to “see” this display in visual cortex • But we will never have to face this prospect because experiments show that the contents of mental images are already conceptual (or, as Kosslyn puts it, are ‘predigested’) and therefore unlike any picture. • Finally, you can make your image do whatever you want, and to have whatever properties you wish.There are no known constraints on mental images that cannot be attributed to lack of knowledge of the imagined situation (e.g., imagining a 4-dimensional object).

What is the alternative to a picture in V1 • Even accepting the tacit knowledge explanation of the scanning result, there remains an open question: How is the right amount of time computed in the scanning experiment. I don’t claim that observers just stand by idly until the right amount of time has passed and then click the button indicating that the scan has reached its goal (even though psychophysical studies show that they are capable of doing so). • I think there is something to the scanning explanation, except that the space being scanned is not in the head but in the concurrently perceived real space.

Are there any ways of representing spatial layouts that are possible, given these problems? • Maybe we have been looking in the wrong place for things that fall under the formal requirements of being spatial. Maybe they are not in the head after all. • I have sketched a way of looking at this problem that locates the spatial character of thought in the concurrently-perceived world (see Chapter 5 of Things and Places). I will end with just a hint of this approach. It relies on findings from the study of the interaction among perceptual modalities and imagery as well as with motor actions and also neuroscience findings concerned with coordinate transformation mechanisms in the brain.

Another chapter in the imagery debate:The relation of images to vision and motor control • It has always seemed to me that one of the properties of mental images that makes them appear spatial is that they connect in certain ways not only with vision, but also with the motor system: • We can point to things in our image! <example> • We can “project” our images onto perceived space – even space perceived in different modalities. I believe that this observation is the key to the spatial character of images. • This projection does not require a picture to be projected, only the location of a small number of features. Over the past few decades I have been studying a mechanism called a visual index, or a FINST, that is well suited for this task.

Using a concurrently perceived room to anchor FINSTs tagged with map labels

Studies of mental scanningDoes it show that images have metrical space? The image scanning effect was shown to be Cognitively Penetrable.But what allows a smooth scan across the image is the perceptual display. Without the perceived map scanning would not be smooth and continuous and the timing would not be accurate (Pylyshyn & Cohen, 1999).

Where do we stand? • It seems that a literal picture-in-the-brain theory is untenable for many reasons – including the major empirical differences between mental images and cortical images. A serious problem with any format-based explanation of mental imagery is the cognitive penetrability of many of the imagery demonstrations. • The pictorial quality of images may be an illusion that arises from the similarity of the experience of imaging and of seeing • So how do we explain the similarity of the experience of imagining and of seeing – the fact that they both seem to involve a pictorial panoramic display? • It is very likely that neither experience directly reveals the form of the representation.

The Illusion of Mental Pictures