1 / 23

Jessica S. Horst (jessica-horst@uiowa) Bob McMurray Larissa K. Samuelson Dept. of Psychology

Connectionist Time and Dynamic Systems Time in One Architecture? Modeling Word Learning at Two Timescales. Jessica S. Horst (jessica-horst@uiowa.edu) Bob McMurray Larissa K. Samuelson Dept. of Psychology University of Iowa. Two Time Scales in Neural Networks.

Download Presentation

Jessica S. Horst (jessica-horst@uiowa) Bob McMurray Larissa K. Samuelson Dept. of Psychology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Connectionist Time and Dynamic Systems Time in One Architecture?Modeling Word Learning at Two Timescales Jessica S. Horst (jessica-horst@uiowa.edu) Bob McMurray Larissa K. Samuelson Dept. of Psychology University of Iowa

  2. Two Time Scales in Neural Networks • Connectionist and dynamical systems accounts: • stress change over time • complement each other in timescale • Dynamic Systems: online processes • Connectionist Networks: long-term learningMany domains of development require both timescales: • Example: language development requires • sensitivity to brief and sequential nature of the input • slower developmental processes.

  3. Two Time Scales in Language Acquisition Word learning often attributed to fast mapping - quick link between a novel name and a novel object (e.g., Carey, 1978). But, recent empirical data suggests that fast mapping and word learning may represent two distinct time scales (Horst & Samuelson, April, 2005). - Fast Mapping: quick process emerging in the moment. - Word Learning: gradual process over the course of developmentWe capture both timescales in a recurrent network….

  4. Auditory Inputs The Architecture • Activation feed from input layers to decision layers. • Decision units compete via inhibition. • Activation feeds back to input layers. • Cycle continues until system settles.c Decision Units (Hidden) Layer Visual Inputs Initial State (Before Learning) (McMurray & Spivey, 2000) • Unsupervised Hebbian learning occurs on every cycle.

  5. 1 0.9 0.8 0.7 0.6 Activation 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 14 16 18 Cycles • Online decision dynamics reflect auditory and visual competitors.

  6. Intermediate State During Learning The Model • 15 Auditory & 15 Visual units • 90 Decision units • Names presented singly with a variable number of objects • Name-Decision & Object-Decision associations strengthened via learning • After 4000 training trials network forms localist representations • Learns name-object links and to ignore visual competitors End State Post Learning

  7. Decision Units Decision Units 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 10 10 9 9 8 8 7 7 6 6 Auditory Input Auditory Input 5 5 0.2 4 4 3 3 2 2 0.15 1 1 0.1 0.05 9 16 26 30 32 39 41 49 65 67 Connection Strength

  8. Fast: Moment by Moment Online information integration and constraint satisfaction (e.g., McClelland & Elman, 1986, Dell, 1981) Reaches a pattern of stable activation through input based on auditory and visual inputs and stored knowledge (weights) Model makes correct name-object links based on the latest input Slow: Over the Long-Term Unsupervised Hebbian Learning Associates words with visual targets Learns to ignore visual competitors Two Time Scales

  9. The two time scales are not independent Long-term learning depends critically on the dynamics of the fast time scales Competition between decision units ensures pseudo-localist representations—critical for Hebbian learning (e.g. Rumelhart & Zipser, 1986) Learning occurs on each cycle - Influences processing cycle-by-cycle & trial-by-trial Accumulated learning across trials leads to learning on long-term time scale (i.e., word learning) Dependent Time Scales

  10. Empirical Results

  11. Fast Time Scale Cow (familiar) Block (familiar) Yok (novel) • 24-month-old children • Saw 2 familiar & 1 novel objects • Asked to get familiar and novel objects (e.g., “get the cow!” or “get the yok!”) *** *** • Children were excellent at fast mapping (finding the referent of novel and familiar words in the moment).

  12. Slow Time Scale Fode (named foil) unnamed foil (prev. seen) Yok (target) After a 5-minute delay, children were asked to pick a newly fast-mapped name (e.g., “get the yok!”) *** *** • Children unable to retain mappings after a 5-minute delay

  13. Replication • Initial findings replicated with simpler tasks: • effect of number of names or trials? • Children’s difficulty in retaining newly fast-mapped names is not related to the number of names or trials Replication #1 (N = 12) Replication #2 (N = 12) • 1 Novel Name • 8 Familiar Names • 7 Preference Trials • 1 Novel Name • 2 Familiar Names * Binomial, p < .05, ** Binomial, p < .01

  14. Simulations

  15. 20 networks initialized with random weights 15 word lexicon (names & objects): 5 familiar words 5 novel words 5 held out Trained on 5 familiar items for 5000 epochs Items presented in random order Run in the Fast Mapping Experiment: 10 fast mapping trials (5 familiar, 5 novel) 5 retention trials Learning was not turned off during experiment.

  16. How The Model Behaves • Fast Time Scale: • Model succeeded on both types of fast-mapping trials • Model behavior patterned with empirical results

  17. Slow Time Scale: • The model fails to “retain” the newly learned words after a “delay” Chance

  18. How The Model “Thinks” 0.000005 0.000004 0.000003 Squared Deviations 0.000002 0.000001 0 Familiar Words Novel Words Control Words After Test 1 1 0.8 0.8 0.6 0.6 Activation Activation 0.4 0.4 0.2 0.2 0 0 0 5 10 15 20 0 5 10 15 20 Cycles (familiar words) Cycles (novel words) • Analyses of weight matrices revealed that relatively little learning occurred during the test phase. Change (RMS) in portions of weight matrix 2 1.6 1.2 Squared Deviations 0.8 0.4 0 Familiar Familiar Novel Control Words Words Words Words After After Test End End Learning Temporal dynamics of processing

  19. 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 0.2 0.15 0.1 14 14 12 12 0.05 10 10 8 8 6 6 4 4 2 2 1 4 66 80 86 Prior to Experiment Connection Strength After Experiment

  20. Conclusions • Two time scales captured in a single architecture: • Fast, online:fast mapping • Slow, long-term: word learning • The model replicated the empirical findings: • Excellent word learning and fast mapping • Poor “retention” • Has sufficient knowledge to select the referent at a given moment in time, given auditory and visual input and stored knowledge (weights). • But not enough to subsequently “know” the word.

  21. Conclusions • In-the-moment learning: • Subtly biases behavior • Combined with activation dynamics, yields correct response. • Does not provide robust, context-independent word knowledge (in the short term) • Continued training on fast-mapped words (i.e., 5000 epochs) makes them familiar words. • Accumulation of this learning provides robust context-independent word knowledge over development.

  22. Take-Home Messages • 1) A fast-mapped word is not a known word… • …but a known word is known, because it has been fast-mapped many, many times. • 2) Understanding development requires models that integrate both short-term dynamic processes and long-term learning.

  23. References Carey, S. (1978). The child as word learner. In M. Halle, J. Bresnan & A. Miller (Eds.), Linguistic Theory and Psychological Reality (pp. 264-293). Cambridge, MA: MIT Press. Dell, Gary S. (1986). A spreading-activation theory of retrieval in sentence production. Psychological Review, 93(3) 283-321. Horst, J.S. & Samuelson, L.K. (2005, April). Slow Down: Understanding the Time Course Behind Fast Mapping. Poster session presented at the 2005 Biennial Meeting of the Society for Research in Child Development, Atlanta, GA. McClelland, J. & Elman, J. (1986). The TRACE Model of Speech Perception, Cognitive Psychology, 18(1), 1-86. McMurray, B., & Spivey, M. (2000). The Categorical Perception of Consonants: The Interaction of Learning and Processing, The Proceedings of the Chicago Linguistics Society, 34(2), 205-220. Rumelhart, D. & Zipser, D. (1986). Feature Discovery By Competitive Learning. In Rumelhart, D., & McClelland, J. (Eds) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1, Cambridge, MA: MIT Press. Acknowledgements The authors would like to thank Joseph Toscano for programming assistance and support. This work was supported by NICHD Grant R01-HD045713 to LKS.

More Related