400 likes | 584 Views
Michael Arbib: CS564 - Brain Theory and Artificial Intelligence University of Southern California, Fall 2001. Lecture 17. From the FARS model to the Evolution of Language Reading Assignment:
E N D
Michael Arbib: CS564 - Brain Theory and Artificial IntelligenceUniversity of Southern California, Fall 2001 • Lecture 17. • From the FARS model to the Evolution of Language • Reading Assignment: • Arbib, M.A., 2001, The Mirror System, Imitation, and the Evolution of Language, in Imitation in Animals and Artifacts, (Chrystopher Nehaniv and Kerstin Dautenhahn, Editors), The MIT Press, to appear.
Two stages in the Evolution of Human Language • Biological Evolution:Yielding a language-ready brain: • Language-readiness: the brain capacity needed to acquire and use language. • Cultural Evolution:From hominids with a language-ready brain and rudimentary manual-vocal communication to humans with full language capability • Social organization of the developing brain. • Stressing the rich historical processes whereby groups of language arose and “cross-pollinated”.
5 million years of hominid evolution Adapted from Clive Gamble: Timewalkers Figure 4.6 What were the biological changes supporting language-readiness? What were the cultural changes extending the utility of language as a socially transmitted vehicle for communication and representation? How did biological and cultural change interact “in a spiral” prior to the emergence of Homo sapiens?
Deep Time • The divergence of the Romance languages took about one thousand years. • The divergence of the Indo-European languages with their immense diversity • Hindi, German, Italian, English, ... • took about 6,000 years. • How can we imagine what has changed since the emergence of Homo sapiens some 200,000 years ago? • Or in 5,000,000 years of prior hominid evolution?
Broca’s and Wernicke’s AphasiasWarning: Localization of Aphasias is HIGHLY Variable Wernicke’s original drawing (wrong hemisphere!) “Perception” Wernicke: “Perception” “Production” Broca: “Production” MRI-scansfrom Keith A. Johnson, M.D. and J. Alex Becker The Whole Brain Atlashttp://www.med.harvard.edu./AANLIB/home.html Slice viewed from below: So “right” is left Wernicke’s Area Broca’s Area(Negative Image)
An Observation/Execution Matching System in Humans • Rizzolatti, Fadiga, Matelli, Bettinardi, Perani, and Fazio: • Broca's region is activated by observation of hand gestures: a PET study. • PET study of human brain with 3 experimental conditions: • Object observation (control condition) • Grasping observation • Object prehension. • The most striking result was highly significant activation in the rostral part of Broca's area. • Another PET data, by Petrides et al., showed that during execution of a sequences of self-ordered hand movements there was a highly significant activation of Broca's area. A key language area!!!
Neural Substrate of Vocalization • The neural substrate for primate calls is in a region of cingulate cortex distinct from F5, the monkey homologue of human Broca's area • For most humans language is heavily intertwined with speech • Why is F5, rather than the cingulate area already involved in monkey vocalization, homologous to the Broca's area's substrate for language?
ANew Approach to the Evolution of Human Language • Rizzolatti, G., Fadiga L., Gallese, V., and Fogassi, L., 1996, Premotor cortex and the recognition of motor actions. Cogn Brain Res., 3: 131-141. • Rizzolatti, G, and Arbib, M.A., 1998, Language Within Our Grasp, Trends in Neuroscience, 21(5):188-194: • The Mirror System Hypothesis: Human Broca’s areacontains a mirror system for grasping which is homologous to the F5 mirror system of monkey, and this provides the evolutionary basis for language parity - i.e., an utterance means roughly the same for both speaker and hearer. • This adds a neural “missing link” to the tradition that roots speech in a prior system for communication based on manual gesture. [See most recently: William C. Stokoe (2001) Language in Hand: Why Sign Came Before Speech.]
Monkey and Human My goal: A fully articulatedmodel of the monkey mirror system(grounded in neurophysiology of macaque [and other?] monkeys;a cooperative computation model of interacting brain regions for human neurolinguisticsas well as human mirror systems; and a coherent evolutionary framework which links them, both bysynthetic brain imagingand by brain imaging across monkeys, chimps, and other primates. Not AIP Homologue: Let’s discuss this! F5 Homologue
A Gene for Universal Language? • Lai, C.SL., Fisher, S.E., Hurst, J.A., Vargha-Khadem, F., & Monaco, A.P., 2001, A forkhead-domain gene is mutated in a severe speech and language disorder, Nature 413:519-523 • We have studied a unique three-generation pedigree, KE, in which a severe speech and language disorder is transmitted as an autosomal-dominant monogenic trait. Our previous work mapped the locus responsible, SPCH1, to a 5.6-cM interval of region 7q31 on chromosome 7. We also identified an unrelated individual, CS, in whom speech and language impairment is associated with a chromosomal translocation involving the SPCH1 interval. Here we show that the gene FOXP2, which encodes a putative transcription factor containing a polyglutamine tract and a forkhead DNA-binding domain, is directly disrupted by the translocation breakpoint in CS. We suggest that FOXP2 is involved in the developmental process that culminates in speech and language.
Criteria for Language-ReadinessA hypothesis on which human brain mechanisms underlie language • Properties Supporting Prelanguage Communication: • Symbolization: The ability to associate an arbitrary symbol with a class of episodes, objects or actions. • (At first, these symbols may not have been words in the modern sense. • Nor need they have been vocalized.) • Intentionality. Extension of communication to be intended by the utterer to have a particular effect on the recipient. • Parity: What counts for the speaker must count for the listener (Mirror Property) • More General Properties: • Hierarchical Structuring: Perception and action involving components with sub-parts (Action-oriented perception) • Temporal Ordering: Codinghierarchical structures “of the mind” • Beyond the Here-and-Now: The ability to recall past events or imagine future ones. • Paedomorphy and Sociality: Conditions for complex social learning
Criteria for LanguageWhat cultural evolution and learning add to the brain’s capabilities • On the basis of the Parity, Hierarchical Structuring, and Temporal Ordering of Language-Readiness: • Symbolization: The symbols become words in the modern sense, interchangeable and composable in the expression of meaning. • Recursivity of Syntax and Semantics:The matching of syntactic to semantic structures co-evolves with the fractionation of utterances • Beyond the Here-and-Now: Verb tenses or other circumlocutions express the ability to recall past events or imagine future ones. • Learnability: To qualify as a human language, it must contain a significant subset of symbolic structures learnable by most human children. [It is not true that children master a language by 5 or 7 years of age.]
From Grasp to Language: Seven hypothesized stages of evolution • grasping • a mirror system for grasping (i.e., a system that matches observation and execution) [Shared with common ancestor of human and monkey] • a simple imitation system for grasping [Shared with common ancestor of human and chimpanzee] • Pre-Hominid • Hominid Evolution • a complex imitation system for grasping, • a manual-based communication system, breaking through the fixed repertoire of primate vocalizations to yield an open repertoire • proto-speech resting on the "invasion" of the vocal apparatus by collaterals from the communication system based on F5/Broca's area • Cultural Evolution in Homo Sapiens • language: the change from action-object frames to verb-argument structures to syntax and semantics: Co-evolution of cognitive and linguistic complexity
Stage 3: Simple Imitation • Masako Myowa-Yamakoshi: • the form of “imitation” employed by chimpanzees is a long and laborious process compared to the rapidity with which humans can acquire novel sequences; • the focus is on moving objects to objects rather than on the structure of movements per se. • Monkeys less so and chimpanzees more so (and, presumably, the common ancestor of human and chimpanzees) have • Simple imitation: imitating simple novel behaviors but only through repeated exposure.
Stage 4: Complex Imitation • Humans have complex imitation: they can acquire (longer) novel sequences in a single trial if the sequences are not too long and the components are relatively familiar. • The very structure of these sequences can serve as the basis for immediate imitation or for the immediate construction of an appropriate response, as well as contributing to the longer-term enrichment of experience • Extension of the mirror system from single actions to compound actions adequate to support complex imitation was an evolutionary change of key relevance to language-readiness • Hypothesis: This emerged on the hominid line after the divergence from the common ancestor of humans and chimpanzees.
Two Roles for Imitation in the Evolution of Manual-Based Communication • 1. Extending imitation to pantomime to provide ad hoc gestures that may convey a situation to the observer • 2. And then extending the mirror system from the grasping repertoire to mediate imitation of gestures to support the transition from ad hoc gestures to conventional signs which can reduce ambiguity and extend the semantic range.
“Beyond” the Mirror System • F5 alone is not the “full” mirror system • We want not only the “unit actions” but also sequences and more general patterns • The FARS model sketched how to generate a sequence positing roles for SMA and BG. • Our proposed mirror model must match this with a model of how • the units of a sequence (cf. current MNS model) and • their order/interweaving (extending the MNS model) • can be recognized and imitated. • This new model requires recognition of a complex behavior on multiple occasions with increasing success in recognizing component actions and in linking them together.
The Ancestral Communication System Primate Call System a limited set of species-specific calls Oro-Facial Gesture System a limited set of gestures expressive of emotion and related social indicators • For want of better data, we will assume that our common human-monkey ancestors shared with monkeys the following: Note the linkage between the two systems:communication is inherently multi-modal. Note the role of body posture as well. Combinatorial properties for the openness of communication are virtually absent in basic primate calls and oro-facial communication though individual calls may be graded.
From Praxis to CommunicationStage 5: Gestural Communication Emerges • Our hypothetical sequence for manual gesture: • pragmatic action directed towards a goal object • pantomime in which similar actions are produced away from the goal object • Imitation is the generic attempt to reproduce movements performed by another, whether to master a skill or simply as part of a social interaction. By contrast, pantomime is performed with the intention of getting the observer to think of a specific action or event. It is essentially communicative in its nature. The imitator observes; the panto-mimic intends to be observed • abstract gestures divorced from their pragmatic origins (if such existed) and available as elements for the formation of compounds which can be paired with meanings in more or less arbitrary fashion. • A distinct manuo-brachial communication system evolved to complement the primate calls/oro-facial communication system • On this view, the "speech" area of early hominids • i.e., the area somewhat homologous to monkey F5 and human Broca’s is not yet even a proto-speech area!
Noun/Verb pairs differentiated by movement • A change in the speed of movement will change the meaning of a sign • A change in the extent of movement will change the meaning of a sign Stokoe Language in HandFigure 1 Figure 3 Figure 6 • Here the noun is characterized by short, repeated movements, while the verb is characterized by a single, prolonged movement
Facial Expressions as Components of Sign • Here facial expression changes a statement to a question. Stokoe Language in HandFigure 7
Stage 6: From Manual Gesture to Proto-Speech • The "generativity" which some see as the hallmark of language is present in manual behavior. Combinatorial propertiesareinherent in the manuo-brachial system. This provided the evolutionary opportunity for: • Stage 6. The manual-orofacial symbolic system then “recruited” vocalization. Association of vocalization with manual gestures allowed them to assume a more open referential character. • This explains why F5, rather than the primate call area provide the evolutionary substrate for speech • This yields our explanation for theevolutionary prevalence of the lateral motor system over the medial (emotion-related) primate call systemin becoming the main communication channel in humans.
Gesture Remains • McNeill has used videotape analysis to show the crucial use that people make of gestures synchronized with speech • Even blind people use manual gestures when speaking • Sign languages are full human languages rich in lexicon, syntax, and semantics. • Moreover: not only deaf people use sign language, so do some aboriginal Australian tribes, and some native populations in North America • All this suggests that we locate phonology in a speech-manual-orofacial gesture complex.
Not three separate systems but a single system operating in multiple motor and sensory modalities Primate CallSystem a limited set of species-specific calls Larynx and Vocal Cords Genuine Cooperation Oro-Facial Gesture System a limited set of gestures expressive of emotion and related social indicators Facial Muscles Arm and Hand Manual Gesture System an open set of communicative gestures Speech System an open set of communicative gestures Caution: One system but many brain regions, each with its own evolutionary story.
Linking the “F5-Broca” and Vocalization Systems • Rizzolatti & Arbib (1988) thus showed why speech did not evolve “simply” by extending the classic primate vocalization system. • We now note the co-evolution of the two systems: • Lesions centered in the anterior cingulate cortex and supplementary motor areas of the brain can cause mutism in humans, similar to the effects produced in muting monkey vocalizations • I hypothesize cooperative computation between cingulate cortex and Broca’s area, • with cingulate cortex involved in breath groups and emotional shading (and imprecations!), and • Broca’s area providing the motor control for rapid production and interweaving of elements of an utterance.
Language acquisition • Locating phonology in a speech-manual-orofacial gesture complex • we see that • language acquisition takes various forms: • a hearing person shifts the major information load of language -- but by no means all of it -- into the speech domain, whereas • for a deaf person the major information load is removed from speech and taken over by hand and orofacial gestures • and note that blind children accompany speech with hand movements
From Action-Object Frame to Verb-Argument Structureto Syntax and Semantics
The Action-Object Frame Cognitive Structures (Schema Assemblages) P e r c e p t i o n P r o d u c t i o n Semantic Structures (Hierarchical Constituents expressing objects, actions and relationships) “Phonological” Structures (Ordered Expressive Gestures) • The action-object frame is non-linguistic: the representation of an action involving one or more objects and agents. (Composing them yields “schema assemblages”) • Verb-argument structure is an overt linguistic representation; in modern human languages, generally the action is named by a verb and the objects are named by nouns(or noun phrases). (Composing them yields semantic structures.) • A grammar for a language is then a specific mechanism (whether explicit or implicit) for converting semantic structures into strings of words, and vice versa. Cautionary Note:In the brain there is probably no single grammar, but rather a “direct model/grammar” for production “inverse model/grammar”for perception
TheBiological Basis of Language-Readiness 1 • “Knowing there are things and events”: The ability for perception of Action-Object Frames in which an actor, an action, and related role players can be perceived in relationship – was well established in the primate line • Recognizing action-object frames • Extending the mirror system beyond single actions to a repertoire of action-object frames which is unbounded a priori. • Naming action-object frames (the “names” can be manual/oro-facial) • creation of a “symbol toolkit” of meaningless [less so for sign; very much so {phonemes} for speech; cf./cx. Chinese]elements from which an open ended class of symbols can be generated • abstract symbols are grounded in action-oriented perception • Note that such naming does notimply separate names for the actions and objects or their attributes; i.e., it does not entail that utterances of prelanguage were compounded from words akin to those we see in, e.g., the Indo-European languages.
TheBiological Basis of Language-Readiness 2 • Hypothesis: The ability to communicate a fair number of action-object frames was established prior to Homo sapiens. • The Transition to Language • Fractionation of symbols to yield symbols for actions and objects, yielding verb-argument structures linked to action-object frames • The ability to compound those structures in diverse ways. • Recognition of hierarchical structure rather than mere sequencing could provide the bridge to constituent analysis in language – • Relating particular subactions (themselves further decomposable) to achievement of certain subgoals in a complex manipulation. • Abstraction and compounding of more generic verb-argument structure • Syntax and semantics: compounding utterances, “going recursive”
Claim: Homo sapiens had a language-ready brain but did not have language • Grounding Hypothesis: Many ways of expressing relationships were the discoveryof Homo sapiens: adjectives, conjunctions such as but, and, or or and that, unless, or because, etc., might well have been “post-biological” in their origin. • The one word ripe halves the number of fruit names to be learned • Separating verbs from nouns lets one learn only m+n words to be able to form m*n*m of the most basic utterances. • The result: A spiraling co-evolution of communication and representation, extending the repertoire of achievable, recognizable and describable actions.
The spatial basis for “prepositions” • Consideration of the spatial basis for “prepositions”may help show how visuomotor coordination underlies some aspects of language and makes clear the “naturalness” of sign. Stokoe Language in HandFigure 10 The addition of movement transforms IN to INTO and exemplifies the differences in meaning between the two signs • However, the basic semantic-syntactic correspondences have been overlaid by a multitude of later innovations and borrowings.
The Mirror Neuron System (MNS) Model Object features cIPS Object affordance extraction F5canonical AIP Object affordance 7b: PF/PG Motor -hand state program Cortex Integrate association (Grasp) temporal association Hand Visual shape Action recognition Motor Mirror recognition execution Feedback Motor Hand (Mirror program M1 motion Neurons) Hand-Object (Reach) detection spatial relation F5mirror F4 analysis STS work with Erhan Oztop 7a Object location If the monkey needs so many brain regions for the mirror system for grasping, just think how many more brain regions we will need for an account of language-readiness that goesbeyond the mirror to develop a full neurolinguistic modelthat extends the linkagesfar beyond the F5 Broca’s area homology
"What" versus "How" • DF: Jeannerod et al. Inability to Preshape (except for objects with size “in the semantics” reach programming Parietal Cortex How (dorsal) grasp programming Visual Cortex Inferotemporal What (ventral) Cortex AT: Goodale and Milner Inability to verbalize or pantomime size or orientation
Goodale and Milner • Our evolutionary theory suggests a progression from action to pantomime to (proto)language • object AIP F5canonical: pragmatics • action PF F5mirror: action understanding • scene Wernicke’s Broca’s: utterance • The "zero order” model of AT and DF data is: • Parietal “affordances” preshape • IT “perception of object” pantomime or verbally describe size • Inference: one cannot pantomime or verbalize an affordance; one needs a "unified view of the object" (IT) to express attributes. • The problem with this is that the “language” path as shown in is completely independent of the parietal F5 system, and so the data seem to contradict our view in .
Recall:FARS (Fagg-Arbib-Rizzolatti-Sakata) Model Overview AIP • AIP extracts a set of affordances but • IT and PFC are crucial to F5’s selection of the affordanceto execute Dorsal Stream: Affordances Ways to grab this “thing” Task Constraints (F6) Working Memory (46?) Instruction Stimuli (F2) Ventral Stream: Recognition “It’s a mug” IT PFC
An Early Pass on the AT/DF Challenge F5canonical Choosing an Action AIP PF F5mirror Recognizing an Action Wernicke’s Area Broca’s Area Describing an Object or an Action Visual Input STS Prefrontal IT Recognizing an Object or an Action Memory Do these link the right boxes? What is the relationship? Is PF a homologue of Wernicke’s area? How does the role of PFC in the FARS model relate to its roles in the mirror system of monkey and in language? To be continued ...
Many Challenges Lie Ahead at the Interface between Computer Science and Cognitive Neuroscience • Analyze the brain regions involved in tasks involving varied combinationsof action, vision and language, to probe the overlapping or distinctive roles of specific regions. • Develop a neurally plausible model which explains how, given a video, the viewer's attention may be drawn to a specific object or action, and then expands that attention to determine the minimal subscene containing that focus of attention. • Develop a neurally plausible model for how, given a minimal subscene, the viewer generates sentences to describe it, analyzing the extent to which the initial focus of attention biases the type of sentence structure used for the description. • Develop a neurally plausible model for how a question about a visual scene provides a top-down influence on mechanisms of attention as the viewer examines the scene in preparation to answer the question. • Explore how the expansion of attention in detail, space, time and factuality grounds a cognitive-based functional expansion of syntax and semantics. • Use comparative and historical studies to tease out the universals of language and test them against functional explanations versus Universal Grammar. • Explore the relation between the child’s exploration of its world and the influence of mother and community in the acquisition of language.
An Invitation • All these topics will be explored in CS 664 • taught by Michael Arbib and Laurent Itti • in the Spring of 2002.