450 likes | 464 Views
This presentation explores the SuperToy interface and its capabilities in mimicking human baby communication and thinking. It examines studies in psychology, implications for AGI development, and different approaches in AGI. The demonstration includes the learning agent interface, Baby LyLy.
E N D
SuperToy : Using Early Childhood Development to Guide AGI Scott Settembre University at Buffalo, SNePS Research Group ss424@cse.buffalo.edu June 24, 2008
Overview • Demonstration of the SuperToy interface • Explore capabilities and progression of human baby communication and thinking • What studies have been done in psychology? • Implications to developing an AGI? • How is AGI different from classical AI? • Different approaches in AGI • My approach and observations • Demonstrate • Baby LyLy : the learning agent interface
SuperToy Interface • Visual Interface • 3D Head, ability to mimic natural movements • Accurate lip-syncing • Ability to display emotional state • Listening Interface • Speech Recognition, allowing for hand-coded grammars or free-speech • Ability to view HMM probabilities (and choose for myself) • Talking Interface • Realistic voice, child-like voice preferred • Ability to change prosody (rhythm, stress and intonation)
Child Psychology Studies • Purpose: to understand how one AGI develops may indicate a progression of learning path that is possible (or perhaps necessary) • Progression in world representation/objects • Progression in language development • Interesting indication of dependency • What does this imply in terms of • Conversation and interaction • Mental representation and manipulation
Newborns • 12 hr old newborns can distinguish between native and foreign languages† • Maybe not so innate, since they probably learn rhythms of language in the womb • 24 hr old – “Lexical Words” experiment† • Can distinguish between content words, like nouns/verbs, and words with no meaning of their own, like prepositions/articles. (Occurs even in different languages) † Studies done by Janet Werker at the University of British Columbia, Canada.
Newborns • 10 days old – “Face Recognition” experiment† • Newborn prefers correct orientation of features, but favors contrast over features (A & C) • 6 weeks old – same experiment • Baby prefers features over contrasts (A & D) C. D. A. B. † Study done by Daphne Maurer, McMaster University, Canada.
Newborns • What is innate? • Crying – an innate signal of distress • Soothing – mother’s voice will soothe baby, but any human voice will work • Attention – visual attention drawn to contrasts within first two weeks • Language development? We will return to this…
Few week old Babies • Develop simple rules for conversation • Gaze of the eyes, directed at speaker • Facial expressions, match emotion in voice • “Eye gaze” experiment † • Baby gets frustrated if mother averts her eyes when talking to the baby. • “Upside-down face” experiment † • An inverted face of mother is not recognized • “Happy-Sad face” experiment † • Emotions of face and in voice should match † Study done by Darwin Muir lab, Queens University, Canada.
Few week old Babies • 8 weeks old • Connections are being made between modalities • Correlations between sights and sounds • Beginning to understand what an object is • 10 mo. - “Object Permanence” experiment † • Moving a doll behind an obstruction, baby expects doll not to move magically – “an object must continue to exist in time and space” † Study done by Andrea Aguiar, University of Waterloo, Canada.
Few month old Babies • Ability to “search” and understand objects increases • 6 mo. – “Hidden Object” experiment † • Toy hidden under a blanket cannot be found, but can at 8 mo. • 9 mo. – “The Search” experiment † • Toy hidden behind wall cannot be found at 8 mo., but can at 9 mo. † Study done by Andrea Aguiar, University of Waterloo, Canada.
Few month old Babies • 18 mo. – “Object Permanence” experiment † • Cannot find a toy twice hidden. Hidden under hand, then hand under blanket. Once hand is removed, baby thinks it is still in hand. • These experiments are significant because although objects are understood at 2.5 months, not all properties & physical laws are. † Study done Andrew Meltzoff, Center for Mind Brain & Learning, U. at Washington.
Object Properties Timeline • 2.5 mo. – understand there are objects1 • 5 mo. – width of objects, no big into small1 • 6 mo. – simple addition, 1 toy and 1 toy = 2 toys2 • 7.5 mo. – height of objects, no long into short1 • 9 mo. – occlusion, can find obscured object1 • 9 mo. – motion, can predict path of ball3 • But cannot do occlusion and motion at same time • 11 mo. – tool use4 • 18+ mo. – twice hidden object can be found5
Object Learning - Summary • Children first learn by observation, then by experimenting • How objects behave • Size and Shape • Under, behind, inside • Falling, inclines, planes, volume, liquids… etc. • They learn properties/categories one at a time • At some stages, they cannot apply two physical laws/properties/categories at the same time
Initial Language Development • 6 mo. old – still able to distinguish all sounds † • 10 mo. old – loses this ability † • Beginning to filter out sounds that are not part of the language they hear all the time. • Implies that babies “start out as universal linguists and then become culture bound specialists” † Study done by Andrea Aguiar, University of Waterloo, Canada.
New Words and Pointing • 13 mo. old – “Joint Visual Attention” ex.† • Uniquely human gesture of pointing • Pointing must coincide with the gaze of the speaker in order to be meaningful • How to do this in a text based interface? • Even Hellen Keller was able to select things to be named (or have them brought to her attention by touch and then named) † Study done by Andrea Aguiar, University of Waterloo, Canada.
Conversation and Turn Taking • 14 mo. old – “The Robot” experiment† • Green fur makes noise after baby makes noise, and they begin taking turns making noise. • 18 mo. old – “Imitation” experiment‡ • Taking turns with a toy may be important to understand taking turns with a conversation. † Study done by Susan Johnson, Stanford University. ‡ Study done Andrew Meltzoff, Center for Mind Brain & Learning, U. at Washington.
New Words and Shape • 17-24mo. olds – “Shape Bias” experiment † • Identifies object by shape, not color or texture • Getting a child to pay attention to shape can increase vocabulary 3x faster • Implication? Perhaps language in humans is intertwined with the learning of objects and properties of the world. • Is “Language Explosion” at two years old due to understanding objects and categories better? † Study done by Susan Jones, Indiana University.
Language Learning Timeline • 24 week old fetus – can hear voices • Melody and rhythms of speech make heart beat faster • Birth – crying, mother’s voice soothes baby • 12 hrs old – distinguish native language1 • 24 hrs old – distinguish between POS1 • Weeks old – needs eye contact and matching voice & face emotions2 • 6 mo old – can distinguish all language sounds1 • 10 mo old – filters out non-native language sounds1 • 13 mo old – uses gestures and pointing to understand3 • 14 mo old – understands give and take of conversation4 • 18 mo old – imitation useful in learning how to converse5 • 18mo-2years – Everything has a name, shape bias6 • “Language explosion” – micro-sentences, mimic actions
Language Development - Summary • Language initially taught by Mother-eseand Father-ese • Sing song quality, pitched higher • Sentences reduced to short phrases • Exaggerated words, stretched vowels • Repeating of the words • Emphasis on most important “meaningful” words • Gesturing and facial expressions • Can convey intentions, feelings, desires • Imitation and repetition • “primary engine for learning” language • Understand more words than can use • At 2 yrs old child uses 300 words, but understands 1000.
Anything Applicable to AGI • An underlying representation of objects helps learn a language • Maybe words are “verbal objects” and are subject to the same rules that objects are to categories? • Maybe grammar is equivalent to physical laws? • Maybe articles/prepositions, non-content words, are properties/relations between objects?
…Applications cont. • Humans have a clear progression in object understanding and language development • Is this sequential progression a limitation of neural architecture? • Is this sequential progression necessary for any kind of language learning and development?
…Applications cont. • Are we having difficulty because we do not process visual data in connection with language development? • Forms of gesturing and pointing are necessary in language development (required) • At early ages (weeks old) eye contact and matching emotions of the voice and face occurs (must be advantageous?) • Shape bias occurs at/around the time of the “language explosion” (coincidence?)
Artificial General Intelligence • “…the construction of a software program that can solve a variety of complex problems in a variety of different domains, and that controls itself autonomously, with its own thoughts, worries, feelings, strengths, weaknesses and predispositions.” † • Whereas normal AI would be “creating programs that demonstrate intelligence in one or another specialized area” • † Section content from “Contemporary Approaches to Artificial General Intelligence” by CassioPennachin and Ben Goertzel
Taxonomy of AGI Approaches • AGI approaches: • symbolic • symbolic and probability- or uncertainty-focused • neural net-based • evolutionary • artificial life • program search based • embedded • integrative
AGI - Symbolic • GPS – General Problem Solver • Uses heuristic search, break goals into subgoals • No learning involved • Doug Lenat’s CYC project • Encodes all common sense knowledge in first-order predicate logic • Uses humans to encode knowledge in CycL • New effort called “CognitiveCyc” to address creating autonomous, creative, interactive intelligence • Alan Newell’s SOAR • No real autonomy or self-understanding • Used as limited-domain, problem solving tool
AGI – Symbolic & Probabilistic • ACT-R framework – similar to SOAR • Modeling of human performance on relatively narrow and simple tasks [modularity of mind mimicked] • Uses probability, similar to some human cognition tasks • Bayesian networks • Embody knowledge about probabilities and dependencies between events in the world • Learning the probabilities (what to learn) is a problem • NARS - uncertainty-based, symbolic AI system • uncertain logic – with fuzzy logic, certainty theory • Failed Japanese 5th Generation Computer System
AGI – Neural net-based • Attempts include using: • Network models carrying out specialized functions modeled on particular brain regions • Cooperative use of a variety of different neural net learning algorithms • Evolving differing net-assemblies and piecing them together • Physical simulation of brain tissue
AGI – Evolutionary • John Holland’s Classifier System • Hybridization of evolutionary algorithms and probabilistic-symbolic AI • Specifically oriented toward integrating memory, perception, and cognition to allow an AI system to act in the world • CAM-Brain machine – Hugo de Garis • Evolving differing net-assemblies and piecing them together
AGI – Artificial Life • Approaches are interesting, but not fruitful above a basic level of cognition • “Creatures” games (social) • “Network Tierra” project (multicellular) • “AlChemy” project (lower biological processes) • No Alifeagent with significant general intelligence
AGI - Program search • General approach • Begins with a formal theory of general intelligence • Defines impractical algorithms that are provably known to achieve general intelligence • Then approximate these algorithms with related, but less comprehensive, algorithms • There is actually a solution to AGI, but the search for the algorithm takes a long time • AIXI (Marcus Hutter) can work, but requires infinite memory and infinitely fast processor!
AGI – Integrative • Taking elements from various approaches and creating a combined, synergistic system • Create a unified knowledge representation and dynamics framework • Manifest the core ideas of the various AI paradigms within the universal framework. • Novamente project – Goertzel/Pennachin • Uses Semantic Networks, Genetic Algorithms, Description Logic, Probability integration, Fuzzy Truth values • Uses psynet model of mind http://www.goertzel.org/books/wild/chapPsynet.html
My Approach • Develop two systems • Top down – hand coded AI and KR&R • Bottom up – learn through interaction • Key idea is to use the same representation scheme for both systems • My intent is to develop an environment for which the needs of each system will constrain or expand the representation for the benefit of the other system
My Approach – cont. • Visual modality as input is out • Too costly in processing and effort • This means there will be: • No eye gaze consideration • No visual emotional content • No processing of gestures/pointing • No shape bias benefits • These are serious drawbacks that will need to be compensated for
My Approach – cont. • Verbal processing partly done by Speech Recognition engine • Benefits include • No need for learning to become a “culture dependent language specialist”, it is already done • Drawbacks include: • No verbal emotional cues (face/voice matching active within first few weeks, so might be important) • No distinguishing different people from voice alone
My Approach – cont. • Speech output done by TSS engine • Benefits include: • Speech production learning is unnecessary • We can still produce emotional output in the speech for the benefit of listener
Baby LyLy • Purposes: • Playmate for Snow • Interact with Snow the way Snow interacts • Mimicry – of words, micro-sentences • Association – one word recognized may induce another word • Imitation – repeat last word(s) of longer sentence • Repetition – repeat one word over and over until repeated • Emotional – frustration, happiness, boredom, clinical/serious • Acquire words and micro-sentence use similar to Snow • Only verbal interface for learning • Restrict teaching and feedback to speech • Discover what is necessary in such an interface
Adult Lyly • Purposes: • Productive interaction with OS/humans • Hand-coded knowledge and conversation protocols • Innate knowledge of environment and use of some web • Integration with various tools (like email and schedule) • Specialized grammar and modules • Kitchen - recipe, calorie, timers • Living room – TV schedules, movie database, lyric lookup • Office – web search, question answering, calculator • Bedroom – read stories, play games • Assist in development of high-level knowledge representation and reasoning framework • Needs dictate KR scheme • Needs dictate specialized AI functions and tools to use
Additional Considerations • Forms of communication from computer to human we can exploit (common references) • Visual • Facial emotions and actions • Diagrams, pictures, timelines, representations • Auditory • Speech – with prosody • Recorded sound effects and speech • Other • Sliders/progress bars to indicate intensity of emotion, how well something is understood, certainty