750 likes | 907 Views
Concepts: Conceptualizing the world. Cognition represents things “under a description” – it represents them as something. (we say it “conceptualizes” the world). What do concepts represent? i.e. what determines the content of concepts?
E N D
Concepts: Conceptualizing the world Cognition represents things “under a description” – it represents them as something. (we say it “conceptualizes” the world). What do concepts represent? i.e. what determines the content of concepts? Complex concepts get their content from their constituents (and the way they are put together) Simple (basic) concepts get their content from…..?
Conceptualizing and “parsing”; “picking out” things-in-the-world • The most basic cognitive operation involves the picking out that occurs in the formulation of predicate-argument structures of judgments. This is closely related to figure-ground differentiation. • The predicate-argument structure logically precedes judgments (or thoughts about the perceived world) since judgments presuppose that what judgments are about has been picked out. • A better way to put this is that the arguments of perceptual predicates P(x,y,z,…) must be bound to things in the world in order for the judgment to have perceptual content.
How do we “pick out” things that will serve as the arguments of visual predicates? • Simple empirical examples: Attentional selection; multiple-attentional selection.Picking out is different from discrimination.Pick out the n’th element to the right of a grating or pick out elements forming a line or a square from a random texture pattern or wallpaper tessellation. (Intrilligator examples; Trick example of subitizing). • “Visual routine” examples from Ullman. (Collinearity) • Picking out or individuating-plus-indexing is one of the most basic functions of perception. A central thesis of this course is that “picking out” is preconceptual and not mediated by the prior detection and encoding of any visual properties!
Example of the need for multiple individuation of visual tokens
Subitizing (and when it fails) • Visual Indexing theory provides an account of the relation between subitizing and counting • In subitizing only active indexes need be counted; there is no need to consult the display so patterns and locations should not matter • Effect of cue-validity is strong in counting range, weak in subitizing range subitizing always occurs when (n < 4) range, even with invalid location cues • There is no subitizing when focal attention is required to individuate items (e.g., connected-points, embedded-rectangles, conjunction-feature targets)
Individuation and subitizing The “subitizing” phenomenon only occurs with figures that can be preattentively individuated (as on the right).
Subitizing graphs for concentric and conconcentric squares Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited capacity preattentive stage in vision. Psychological Review, 101(1), 80-102.
Another case in which items that require attention to individuate cannot be subitized Can subitized: Count all black squares. Cannot subitize: count all Squares on the S-grid
Other “visual routines” that require objects to be picked out
Assumptions sketched so far • Primitive individuation is preconceptual • When visual properties are detected they are detected as properties-of-individuated-things. We call these individuals Primitive Visual Objects. The relation between PVOs and bona fide physical objects or “Spelke Objects” will be discussed later. • Only individuated visual objects, either primitive or complex, can be the subject of visual judgments or the subject of motor commands. • The mechanism for individuation and argument-binding is called a Visual Index or FINST.
Another argument for preconceptual individuation and indexing: Incremental construction of perceptual representations • Percepts do not come into existence all at once; they are built up over time (with or without saccadic eye movements). • This is an example of one of several related phenomena that gives rise to a “correspondence problem”: • When P(x) is encoded we need to assign this new property to the appropriate previously-encoded object. • Without a preconceptual reference we would be faced with an intractable matching problem
What is the evidence for (or against) the assumption that individuation does not involve the prior detection of some property? The strongest contender for the role of mediating property in picking out and binding token individuals (PVOs) is location. The pro-location argument Treisman’s Feature Integration Theory and the “binding problem” assumes location across feature maps provides the means for accomplishing property-conjunction binding. Two properties are conjoined if the have the same location. Balint syndrome patients are very poor at spatial tasks and are also very poor a finding property conjunctions. They suffer from conjunction illusions (CI). A number of people, including Mary-Jo Nissen, have explicitly tested the location-mediation assumption.
Evidence for the priority of location in property encoding • Nissen (1985) measured the probability of correctly reporting shape and location, given a color cue, P(S & L | C). If retrieving shape and location given a color cue depends on using the color cue to first find the location and then using the location to find the shape, then we should have: P(S & L | C) = P(L | C) x P(S | L)Nissen estimated P'(S | L) from other data and concluded that the above relation did hold. • Paschler (1998, p 98-99) reviewed the literature on feature encoding in search and concluded that location is special because “… when an observer detects a target defined by other dimensions, this provides information about the location of the stimulus being detected.” Also location is special in that errors in reporting the identity of cued letters tend to consist of the report of nearby letters, and when the task is to report a letter of a particular color or shape together with as many as possible other letters, the latter tend to be nearby letters (Snyder, 1972).
But…. Although there is a great deal of evidence, such as that described earlier, for the priority of location in accessing features, in (almost) every case “location” is confounded with individuality because objects have fixed locations. Being a different individual usually entails being at a different location. • Consequently the mediation might be by individual rather than by location.1 There are two possible ways to control for the possible mediation of location: • use moving objects • use objects that are not defined spatially 1 This overstates the case. There is clearly some location-focusing of attention since we can look at different parts of a display. But this may be a second-order effect.
1. Moving objects • Priming across moving objects (“reviewing object files”) • Inhibition of Return • Multiple Object Tracking (MOT) 2. Spatially coincident objects • tracking in “feature space”
Inhibition-of-return is object-based Klein (2000) showed that when an item is attended and attention shifts away from it, then after about 300 ms – 900 ms, it takes longer to shift attention back to that item than to shift to a new item. This is called inhibition of return (IOR) and is thought to be useful for searching and foraging. Tipper, Driver & Weaver (1991) showed that IOR is largely object-based by showing that if the object that was attended moves, IOR moves with it.
Inhibition of return moves with the object that is inhibited
Object File Theory(Kahneman, Treisman & Gibbs, 1992) • Information is encoded and stored in files that are specific to particular individual objects. • When an object is encountered, an attempt is made to solve the correspondence problem and assign it to an existing object file, based mainly on spatiotemporal properties. • When an assignment to an existing object file succeeds, the information in the existing file is first reviewed and is used as the default properties of that object. Thus there is a processing benefit for recognizing those properties that are listed in the object file for that object.
Multiple-Object Tracking The MOT paradigm was not invented primarily to test the thesis that objects can be selected based on their individuality rather than on their featural (or local) properties. But it has turned out to be particularly useful for that purpose because it appears to provide an illustration of selection and re-selection (tracking or individuality-maintenance) that uses only the continuing history of an enduring object as the basis for its execution. The reason is that the only thing that defines the targets to be tracked is their history as individuals over time. Try to think of what other basis there might be for tracking the targets in MOT.
A possible location-based tracking algorithm • While the targets are visual distinct, scan attention to each target in turn and encode its location on a list. • When targets begin to move, check the n’th position in the list and go to the location encoded there: Loc(n) • Find the closest element to Loc(n). • Update the actual location of the element found in #3 in position n in the list: this becomes the new value of Loc(n). • Move attention to the location encoded in next list position, Loc(n+1). • Repeat from #3 until elements stop. • Report elements whose locations are on the list. Use of the above algorithm assumes (1) focal attention is required to encode locations (i.e., encoding is not parallel), (2) focal attention is unitary and has to be scanned continuously from location to location. It assumes no encoding (or dwell) time at each element.
Predicted performance for the serial tracking algorithm as a function of the speed of movement of attention
A new argument against location being the basis for access to properties • It might be argued that location is being recorded somehow (e.g., in parallel) and that it therefore might be used to track. • But Blaser, Pylyshyn & Holcombe, 2000, showed that one can track dynamic patterns that are superimposed over the same fixed location.
How does one explain the capacity to track in MOT? • Assume that individuating and maintaining individuality is a primitive operation of the encapsulated early vision system. • This is one of the basic ideas behind the FINST Visual Index Theory.
Some Assumptions of the Visual Indexing Theory (FINST Theory) 1) Early vision processes segment the visual field into feature-clusters automatically and in parallel. The ensuing clusters are ones that tend to be reliably associated with distinct token individuals in the distal scene. The distal counterparts of these clusters are referred to as Primitive Visual Objects (or sometimes FINGs, or things indexed by FINSTs), indicating our provisional assumption that these clusters typically correspond to the proximal projections of physical objects in the world. 2) The clusters are activated (also in parallel) to a degree that depends on such properties as their distinctiveness within a local spatiotemporal neighborhood, including distinctiveness sudden onsets. 3) Based on their degree of activation, these clusters compete for a finite pool of internal Indexes (FINSTs). These indexes are assigned in parallel and in a stimulus-driven manner. Since the supply of Indexes is limited (to about 4 or 5), this is a resource-constrained process.
Assumptions of FINST Theory (continued) 4) Although assignment of indexes is primarily stimulus-driven, there are certain restricted ways in which cognition can influence this process. One way is by scanning focal attention until an object with specified properties is located, at which time an index may spontaneously get assigned to it. 5) An index keeps being bound to the same visual object as that object changes its properties and its location on the retina (within certain constraints). In fact this is what makes it the “same” visual object. On the assumption that Primitive Visual Objects are reliably associated with prototypical real distal objects, the indexes can then functionally "point to" objects in a scene without identifying what is being pointed to — serving like the demonstratives "this" or "that".
Assumptions of FINST Theory (continued) 6) It is an empirical question what kinds of patterns can be indexed. It appears that they need not be spatially punctate. Current evidence suggests that the onset of a new visual object is an index-grabbing event. Perhaps the appearance of a new object within focal attention while the latter is being scanned is another such event (thus allowing for some cognitive control of index assignment, as in assumption 4). 7) Only indexed tokens can enter into subsequent cognitive processing: e.g., relational properties like INSIDE(x,y), PART-OF(x,y), ABOVE(x,y), COLLINEAR(x,y,z),... can only be encoded if tokens corresponding to x, y, z,... are bound by indexes. 8) Only indexed tokens can be the object of an action, such as moving focal attention to it – except when the attention movement is guided by strategies such as moving in a certain direction, which do not make reference to an target object.
Tracking without Keeping Track A puzzle about tracking! (or is it?)
More on how tracking is accomplished We have argued that, under certain assumptions about the movement of unitary focal attention, tracking could not be accomplished by encoding and storing target locations, and serially visiting the stored locations. Since each target’s location (and trajectory) is the only unique property that distinguishes a target from a distractor, this raises the question of how a target can be tracked. What target property allows it to be tracked? We have proposed that MOT is accomplished by a primitive mechanism that directly keeps track of an object’s individuality, as such. The mechanism is called a Visual Index (or FINST). What is at issue in the present context is one crucial aspect of the theory. We call this assumption: “The Internal Name Assumption”
A logical requisite for tracking Whatever the exact nature of the mechanism used for tracking in MOT, it must be able to keep track of the individuality or enduring objecthood of particular token items. But the only thing that makes object Xn(t) a target is that it traces its history back to an object that at the outset was visibly a target. In other words: Xnis a target if it satisfies the following recursive definition: (1) Xn(0) is visually identified as a target (2) If Xn(t) is a target, then Xn(t+Δt) is a target But this means that there must be a mechanism for determining whether the second (recursive) step holds – i.e. whether an object Xn(t) is the same-individual-object as Xn(t-Δt). This known as the correspondence problem. Solving this correspondence problem is equivalent to assigning a distinct internal name n to each target object.
Successful tracking of a particular target implies that a unique internal identifier has been (temporarily) assigned to that target If a distinct internal name (say n1, n2, n3, or n4) has been assigned to each target, it would then be possible to assign overt labels to the targets by merely learning a list of pairs: n1-r1, n2-r2, n3-r3, n4-r4, where ri are external labels or overt responses. Thus if a target is initially given a unique label, and if that target is successfully tracked, an observer should be able to report its label – simply as a consequence of having tracked it.