1 / 68

Dynamics of Gestures: Temporal Patterning

Work supported by NIH grant DC-03663. Dynamics of Gestures: Temporal Patterning. Elliot Saltzman Boston University & Haskins Laboratories. Colleagues. Dani Byrd University of Southern California, USA Louis Goldstein Yale University & Haskins Laboratories, USA Hosung Nam

francine
Download Presentation

Dynamics of Gestures: Temporal Patterning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Work supported by NIH grant DC-03663 Dynamics of Gestures: Temporal Patterning Elliot Saltzman Boston University & Haskins Laboratories

  2. Colleagues Dani Byrd University of Southern California, USA Louis Goldstein Yale University & Haskins Laboratories, USA Hosung Nam Yale University & Haskins Laboratories, USA

  3. Question: What is being learned when we learn a skilled behavior? • Answer: The dynamical system, or coordinative structure, that shapes functional, coordinated activity defined across animal and environment • But what is a dynamical system? • Roughly, it is a system of interacting variables whose change over time are shaped by laws or rules of motion • what types of variables? • what types of rules of motion?

  4. System states, parameters, and graphs—and their dynamics • Any dynamical system can be completely characterized according to three types of variables—state, parameter, and graph—and their dynamics (Farmer, 1986) • State variables: a system’s active degrees of freedom • defined by the number of autonomous 1st order equations used to describe the system • Ex) position & velocity of the mass in a damped mass-spring system • Ex) activations of nodes in a connectionist network • State dynamics: the “forces” (velocity vector field) defined in the space of state variables (state space) that shapes motion patterns of the state variables

  5. System states, parameters, and graphs—and their dynamics(cont.) • System parameters • Ex) m, b, k, and escapement strength in a limit cycle equation • Ex) target position in a point attractor equation • Ex) pendulum length • Ex) inter-node synaptic connection strength in a connectionist network • Parameter dynamics: the “forces”/processes that shape motion patterns of the system parameters • Ex) intentional changes in oscillation frequency in finger-wiggling experiment • Ex) actor-environment field equation for specifying target position in reaching • Ex) changing system eigenfrequency due to alteration of pendulum lengths in pendulum-swinging experiment • Ex) connectionist learning algorithms for changing system weights to solve a given computational task

  6. System states, parameters, and graphs—and their dynamics(cont.) • System graph:“Architecture” of the system’s equation of motion • the parameterized set of relationships defined among a system’s state variables • Ex) “circuit diagram” (e.g., Simulink) representation of mbk equation of motion • Ex) node/connection diagram in a connectionist network

  7. System states, parameters, and graphs—and their dynamics(cont.) • Graph dynamics: the “forces”/processes that change the system graph • # state variables (i.e., system dimensionality) • Ex) recruitment/selection/assembly of degrees of freedom appropriate for task in a particular actor-environment context • e.g., recruitment of trunk leaning or body twisting for reaching, depending on distance to target • interconnection/linkage structure defined across state variables • Ex) learning/discovering appropriate interlimb oscillator coupling functions to perform bimanual m:n rhythms • Ex) “constructivist” connectionist learning algorithms that add/delete nodes and/or connections to implement “grammar” appropriate for learning given class of functions

  8. Outline of Remaining Presentation • Part 1: Overview and review of task-dynamic model of speech production • Four types of timing phenomena: Intragestural, transgestural, intergestural, and global • Hybrid dynamical model: Task dynamics + recurrent connectionist network • Part 2: Focus on system graphs and intergestural timing/phasing in speech production • Influence of system graph on patterns of relative timing between vowels and consonants in syllables • Competitive, coupled oscillator model of syllable structure • task-dynamic model of intergestural phasing (Saltzman & Byrd, 2000)

  9. Outline of Remaining Presentation (cont.) • Part 3: State and/or parameter dynamics and transgestural timing • Phrasal boundary effects on local speaking rate • Prosodic gestures (p-gestures) induce local slowings of central “clock” • Part 4: Intragestural timing: Gestural anticipation intervals • Self-organization of gestural onsets given required times of target attainment • Constrained temporal elasticity of anticipation intervals

  10. Part 1: Overview and Review • General Theoretical Question • How can we characterize the dynamics that underlie the temporal coordination among the units (gestures) of speech?

  11. Dynamics Defined • Dynamics • Laws or rules that specify the “forces” that change a system’s variables (system state) from one moment to the next

  12. Speech Gestures • Equivalence classes of goal-directed actions by different sets of articulators in the vocal tract • examples: • /p/, /b/, /m/—Upper lip, lower lip, and jaw work together to close the lips. • /a/, /o/—Tongue body and jaw work together to position and shape the tongue dorsum (surface) for the vowel.

  13. Articulatory PhonologyCatherine Browman and Louis Goldstein • Speech can be described with a unitary structure that captures both phonological and physical properties. • Act of speaking can be decomposed into atomic units, or gestures. • Units of information: Linguistic primitives of speech production • Units of action: Dynamically-controlled constriction actions of distinct vocal tract organs (e.g., lips, tongue tip, tongue body, velum, glottis) • Coordinated into larger ‘molecular’ structures

  14. Four Aspects of Speech Timing • Intragestural: variations of temporal patterns of individual gestures • Ex. Temporal asymmetry of velocity profiles • Intergestural: relative phasing among gestures • Sequencing and partial temporal overlap (coproduction) of vowel and consonant gestures in the word (and syllable) /pub/ • Transgestural: modulations of temporal patterns of all active gestures during a relatively localized portion of an utterance • Ex. Temporally localized slowing of all gestures in neighborhood of phrasal boundaries • Global: temporal pattern of entire utterance • Ex. Overall speaking rate or style

  15. Overview: Hybrid Dynamical Model • Modeling dynamics of speech production— a hybrid dynamical model: • 2 components: • Task-dynamic component: shapes articulatory trajectories given gestural timing information as input. Uses tract-variable and model articulatorcoordinates. • Recurrent neural network: provides a dynamics of gestural timing. Uses activation coordinates.

  16. Tract Variable & Model Articulator Coordinates

  17. Gestural Activation • A gesture’s dynamics influence vocal tract activity for a discrete interval of time. • Activations wax and wane gradually at edges. • A gesture’s strength is defined by its activation level (range: 0-1) “bad” time

  18. Gestures as Dynamical Systems • Gestural activations are used to define gesture-specific control dynamics in goal/task space coordinates • point attractor dynamics of damped mass-spring systems in the task-space • constriction space (tract variables): closing the lips, raising the tongue tip, etc. • constriction target is approached regardless of initial conditions or perturbations along the way

  19. Gestural Equation of Motion Total gestural acceleration is the sum of the constriction gesture and neutral gesture acceleration components. Constriction gesture Neutralgesture(governs return to neutral posture)

  20. Hybrid Model: Three Coordinate Systems

  21. input “plan” phonetic and prosodic structure (constant over course of utterance) state “clock” provides temporal context for gestural sequences Gestural Activation Dynamics (Recurrent Network) hidden units mapping from input and clock to appropriate sequencing and timing output activation of basic phonetic and prosodic gestures Interarticulator Coordination Dynamics (Task Dynamics) taskspace dynamics: tract variables point attractors in constriction or acoustic coordinates internal model: "model" articulators plant: "real" articulators reflex and musculoskeletal dynamics Hybrid Dynamical Model: Overall Structure

  22. Part 2: Intergestural Timing, System Graphs, and Syllable Structure • Phenomenon: Vowel and consonant gestures within syllables show characteristic “signatures” of relative timing/phasing • We hypothesized that these different patterns were due to corresponding differences in intergestural coupling graphs • coupling graphs were implemented in simulations • simulations were compared with actual data

  23. C1C2…CnVC1C2…Cm Nucleus (vowel) Coda (final consonant cluster) Onset (initial consonant cluster) Syllable Structure: Some Definitions • The vowel and consonant gestures in a syllable can be partitioned in three components—Onset, Nucleus, & Coda

  24. Relative Timing in Syllables • There is an asymmetry in patterns of relative timing displayed within syllable-initial (onset) and syllable-final (coda) consonant clusters • C-center effect on mean values of intergestural relative phase • c-center pattern occurs syllable-initially in onsets but not syllable- finally in codas • Browman & Goldstein (1988), Byrd (1995) • Stability of relative phasing • Greater stability (lower standard deviation) of relative phasing occurs syllable initially in onsets than syllable-finally in codas • Byrd (1996), Cho (2001) • Both effects are hypothesized to emerge from appropriate dynamic coordination of gestures viewed in a oscillatory framework

  25. Browman & Goldstein (2000)’s Data C s “sayed” V V C C s “spayed” V V p C C s V “splayed” V p l C C-center Effect in Onsets, not Codas Hypothetical Model C-center But C-V phasing is preserved as global c-center-to-V coordination If add an additional coordination (C-C phasing)? C-C phasing separatesCC in timing CV and CC phasings in competition C-V phasing

  26. Why C-center Effect in Onsets and not Codas? • Browman & Goldstein (2000)’s Hypothesis • there are different coupling structures (system graphs) for onsets (C1,oC2,oV) and codas (VC1,cC2,c) • there is C1,o-V coupling in onsets, but there is no V-C2,c coordination (coupling) in codas • as a result, there is competition betweenVC and CC phasings for onsets, but not for codas

  27. Proposed Coupling Graphs: CCV vs. VCC • CCV • # C1C2V • VCC • VC1C2# Competitive coupling structure No V-C2coordination No competition

  28. Stability of Relative Phasing • Browman & Goldstein (2000) additionally hypothesized that: • Competitive coupling structures in syllable initial position may also help explain the greater stability of intergestural phasing in onsets than in codas

  29. Outline of Simulation Experiments • C-center effect in CCV but not VCC? • Greater stability (lower variability) between consonants in CCV than VCC? • Effect of syllable boundary in heterosyllabicCC sequences

  30. What do Oscillators Have to do with Speech? • Oscillatory units have a well defined variable representing time—phase • dynamics of coupled limit cycle oscillators allows their relative timing to emerge in a self-organized manner due to intrinsic oscillator dynamics and the nature of the coupling. • the best developed theories of inter-unit timing come from work in (non-speech) rhythmic movement

  31. What do oscillators have to do with speech? (cont.) • Phase has also been adopted as a measure of intrinsic gestural time in speech gestures (Browman & Goldstein, Kröger, et al.) • although point attractor models have been used to model these gestures, intrinsic gestural phase has been defined relative to an associated abstract, underlying gestural oscillator • Previously, the coordination of gestures in terms of their relative phase has been specified “by hand” in models of word production • we have been pursuing a model of speech timing that allows relative phasing to self-organize as it does in oscillatory systems

  32. Task-dynamics of Intergestural Phasing • We assume that rhythmic and non-rhythmic speech behavior have a common underlying dynamical organization • here, we attempt to reconcile work in coupled oscillator dynamics and intergestural timing in speech. • Saltzman & Byrd (2000) implemented a task-dynamic approach to controlling (generalized) relative phase and (m:n) frequency ratio in a single pair of coupled nonlinear oscillators • For a pair of oscillators in 1:1 frequency locking • the component oscillators must be coupled to one another in a manner specific to the desired relative phasing • We have generalized the Saltzman & Byrd (2000) model to implement intergestural coupling among multiple (>2) gestures (Nam, Saltzman, & Goldstein, 2003)

  33. Control of Relative Phase: General Approach • Intergestural coupling is defined in a pairwise manner among a set of oscillators in three steps: • 1st—define set of task space potential functions, V(y), • state-variable represents relative phase (=øi–øj) • point minimum corresponds to desired relative phase value,y0 • 2nd—define corresponding task-space (relative phase) dynamics • 3rd—transform these dynamics into the required coupling forces between the component oscillators • see Saltzman & Byrd (2000) for details

  34. Simulation Experiment 1: C-center effect in CCV • Resultant rel. phase(Final output) • C1-V = 59.94 • C2-V = 39.96 • C1-C2 = 19.98 Competition C-centers • Target relative phase • C1-V = 50 • C2-V = 50 • C1-C2= 30 C1 C1 V C2 C2 Mean of c-centers C1 C-center effect V C2

  35. Simulation Experiment 1: No C-center effect in VCC • Resultant rel phase(Final output) • V-C1 = 49.96 • V-C2 = 79.90 • C1-C2 = 29.94 No competition C-center • Target relative phase • V-C1= 50 • V-C2= none • C1-C2= 30 C1 C1 V C2 Mean of c-centers C1 No c-center effect V C2

  36. Adding noise:Simulation Experiment 2 • Source of noise: slight differences in frequencies of oscillators (detuning) • Noise modeled by adding a linear function to the potentialenergy function V() = -a cos( - 0) + b( - 0) b represents the amount of inter-oscillator detuning, which perturbs the location of potential minimum b randomly varied across simulations “trials” within conditions defined by a given standard deviation standard deviation of b manipulated across simulation conditions

  37. Results: Simulation Experiment 2 • Interconsonant phasing is more variable in syllable-final position std. of CC phase (radian) 1.0 Onsets Codas std. of detuningb .05 .25 .45 .65 .85 • Browman & Goldstein’s hypothesis proved correct: “Onsets in competition show greater stability”

  38. Across boundaries: Only V-V coupling is added; there is no cross-boundary C-C coupling Simulation Experiment 3: Generalizing the Model to Hetero-Syllabic Consonant Sequences • V # C C V • V CC # V • V C # C V e.g. a scab e.g. mask amp e.g. bag sab

  39. Results: Simulation Experiment 3 • C-to-C phasing is more variable across boundaries std. of CC phase (radian) Onsets 1.0 Codas X-bound std. of detuning b .05 .25 .45 .65 .85 • The result (V#CCV < VCC#V < VC#CV) corresponds to Byrd (1994)’s findings

  40. Conclusion: Importance of System Graph • Dynamic structure (system graph’s coupling structure) generates observed phonetic asymmetries of intergestural phasing (mean patterns and their stability) • C-center effect • mean relative phasing • Greater temporal stability Competitivecoupling structure in onset Consonants notdirectly coupled across boundaries • Effect of boundaries (Greater variability)

  41. Future Directions: Where are the Underlying Oscillators? • Hypothesis: Underlying oscillators “live” at the state-unit level of the hybrid model’s recurrent network as members of an entrained oscillatory ensemble • Question: Is there a 1:1 association between oscillators and gestures? • Question: How are the mappings learned between oscillators and gestural activations?

  42. Part 3: Transgestural Effects of Phrasal Boundaries • It has been shown that prosodic boundaries induce temporally local contextual variation in ongoing articulation • prosodic boundaries are boundaries between words and higher order phrases in speech • Boundary effects on articulation include: • lengthening of gestural durations • decreased overlap (coarticulation) between adjacent gestures • spatially larger gestures in phrase-initial positions • Boundary effects appear to be graded • stronger boundaries induce greater lengthening

  43. Boundary Adjacent Slowing • It has been shown that speech gestural durations lengthen in the region of word and phrase boundaries • It also appears that stronger boundaries induce greater lengthening • Example (Byrd & Saltzman 1998)

  44. Boundary Adjacent Slowing(Byrd & Saltzman 1998)

  45. Boundary Adjacent Slowing(Byrd & Saltzman 1998) Speaker J [m´#mi] none word pre-boundary lip opening duration list vocative post-boundary lip closing duration Boundary Type utterance Speaker K none word list vocative utterance 0 100 200 300 (ms)

  46. Error bars: ± 1 Standard Error 60 Small Phrase Boundary /n/: Tongue tip closure peak /m/: Upper lip vertical minimum Large Phrase Boundary 50 40 mean duration (ms) 30 20 10 0 [m#n] [n#m] Byrd, Kaun, Narayanan, & Saltzman, (2000) Boundary Adjacent Relative Timing • Additionally, evidence exists suggesting that phrase boundaries affects the relative timing (i.e. overlap) between gestures. • Chitoran, Goldstein & Byrd (to appear), Byrd (1996), Hardcastle, (1985), Byrd, Kaun, Narayanan, & Saltzman, (2000), Jun (1993), Keating et al. (in press) Time between displacement extrema in [C#C] . 70

  47. Approach: Prosodic (π)-gestures • Question: How can we account for the variations of gestural timing associated with prosodic context? • π-gestures (prosodic gestures) influence the expression of all constriction gestures which are concurrently active with the p-gestures • Transgestural effect • Effect in proportion to the activation level of the π-gesture. • π-gesture activation determined by boundary strength. Byrd, Kaun, Naryanan, & Saltzman (2000), Byrd (2000), Byrd & Saltzman (subm)

  48. π-gesture domain of effect constriction 1 constriction 2 Two constrictions spanning a phrase boundary

  49. How is this Prosodic Action Effected?—Parameter Dynamics: Stiffness Lowering • Lowering of gestural stiffness values has been hypothesized to underlie gestural lengthening adjacent to phrasal boundaries. • Beckman et al. 1992, Byrd & Saltzman 1997 • Local, transgestural on-line modulation of gestural parameter values. • E.g. Locally lower stiffness local slowing

  50. But... • Changes in both duration and relative timing occur at phrase boundaries. • Stiffness scaling does not account for changes in relative timing. • modulates point-attractor parameter values, but does not specifically influence the domain of gestural activation.

More Related