100 likes | 242 Views
Tangible User Interfaces and Reinforcement Learning (Smart Toys). An honours thesis presentation by… Trent Apted <tapted@it.usyd.edu.au> Supervised by A/Prof Bob Kummerfeld Smart Internet Technology Research Group. Tangible User Interfaces. Not just a mouse
E N D
Tangible User Interfaces and Reinforcement Learning(Smart Toys) An honours thesis presentation by… Trent Apted<tapted@it.usyd.edu.au> Supervised by A/Prof Bob Kummerfeld Smart Internet Technology Research Group
Tangible User Interfaces • Not just a mouse • Although he can advance my slides • Facilitate a more intimate interaction with the user • Mainly targeted towards children • Huggable, cute and cuddly • Develop a relationship with the user • Play games
Toys - Motivation • Plush (soft and furry) toys account for around 25% of toy store sales • Over 17 million Furby toys were sold between October 1998 and December 1999 • They had primitive learning capabilities • Mostly robot-like in appearance • They were also relatively cheap (unlike Sony’s Aibo ~$2,000+)
Toys - Challenges • Want to (cheaply) make a Smart Toy, derived from a plush doll • Don’t want to adversely affect the original function • Namely, being soft, cute and cuddly • Also want to be able to detect the usual ‘plush toy’ interactions • E.g. squeeze, carry, lie down with • I am not an engineer…
Reinforcement Learning • Like training a dog with a ‘clicker’ • Need to associate the reward (click) with behaviour in a nearby temporal window • How to represent the behaviour • How to determine the window • Apply learning that attempts to maximise all future possible rewards • Many techniques • Q-learning, TD(l), Bayesian models, Markov models, neural networks, actor-critic, hierarchical
Reinforcement Learning - Challenges • Not all techniques can be applied to this scenario • Infinite: no end to training examples • Interactive: need to wait for the user to determine the reward • Discrete: few training examples • Future use: a (cheap) toy can not hold a lot of state • Sensors are unsophisticated (Boolean) • Also needs to be fun • Non-determinism • Anticipate possible actions without stimuli • May not also be possible to punish the model
My Contributions –Hardware / Systems • Design and implementation of the circuitry and sensors • Integration into a plush toy • A hardware software interface (via parallel port) and event model • Many lessons learnt • E.g. limitations of high-level hardware (PDA)
My Contributions –Software • Reinforcement learning in the context of a Smart Toy • Flexible learning architecture for further research and exploration (in other contexts) • Evaluation of the reinforcement learning techniques implemented • Implementation of a number of simple games to motivate learning of the toy (fun?)
Some Results and Analysis • Increasing the state space and re-presenting examples does not help interactive learning • ‘Snapshot’ environments perform poorly and do not benefit from increasing the learner complexity • Q-Learning combined with Markov models perform well
Future Work • Improve the abilities of the toy • There’s spare wires - a speaker would be easy to add • Speech recognition would be harder • Wireless • Remove the tether for more natural interaction • Power source and increased expense • Collaboration • ‘talking’ to other Smart Toys, collaborating in games • Collaborative learning • Examine more learning models • Psychological / Sociological aspects