380 likes | 508 Views
Making a Robotic Dog See and Hear. Daniel D. Lee. World of Science 2000. Alternative images. Face recognition. Original image. Terminator. Arnold is looking for you. Robots. Hollywood versus reality. Gort. Data. HAL. Deep Blue. Computer beats world champion Gary Kasparov. Complexity.
E N D
Making a Robotic DogSee and Hear Daniel D. Lee World of Science 2000
Alternative images Face recognition Original image
Terminator • Arnold is looking for you...
Robots • Hollywood versus reality Gort Data HAL
Deep Blue • Computer beats world champion Gary Kasparov
Complexity • Tic Tac Toe easy to program using brute force • Deep Blue evaluated 200 million chess positions per second Number of configurations Tic Tac Toe
Images Pixel vector • Vector representation of pixel values(white=0.0, black=1.0).
Combinatorial explosion • Impossible for a computer to search all possible images images 2 pixels 400 pixels images images Age of universe: seconds 3 pixels
The brain • Vision occupies a large fraction of our brains
Neurons • Approximately 1012 neurons in a human brain
Neuronal properties • Neurons communicate with each other using action potentials
Circuit diagram (Felleman & Van Essen, 1991) • Complex and hierarchical organization.
Artificial neuron • Unit sums inputs x with synaptic weights w • Nonlinear transformation x1 Synaptic weights w1 x2 Input activities + Output x3 Squashing function x4 w5 x5
x1 x2 t1 x3 t2 x4 xN Artificial neural network Weights • Transformation of input into output. • Change synaptic weights to maximize performance. Hidden layer Labelled data: W11 Input Output Output layer Input layer WNM
Learning • How to set the connections between neurons to have the network do the right thing? Weights x1 Hidden layer W11 x2 t1 x3 Output layer t2 x4 Input layer WNM xN
Optimization Gradient ascent Mount Everest • Like climbing a mountain blindfolded. • Small steps until top is reached.
Robotic dog • Doesn’t have a name yet… any suggestions?
Artificial sensorimotor system • Total cost of parts ~ $700 • You too can build your own!
Video processing • Conversion of video images into luminance, color, and motion channels.
Face recognition neural network • Learns to associate saliency with face.
Unsupervised learning • Database containing many different faces.
Parts representation W: 49 hidden units V X = Original: • Computer automatically decomposes the images into their constituent parts.
Eye movements • Fast eye movements to scan visual environment Eye muscles (Yarbus, 1967)
Neural integrator (Pastor, et al., 1994)
Vestibular system • Sense of balance and seasickness
Auditory localization (Konishi, 1990) Barn Owl
Language Text Corpus Doc #1 Doc #2 Text Document lazy Doc #3 brown fox dogs Doc #4 Doc #5 jumped • Model text document as collections of words.
Text and images analogy Word counts: Text Images Documents words pixels document picture Words word frequency grayscale intensity • Represent documents with word frequencies. • Analogy between learning algorithms.
president (148) congress (124) power (120) united (104) constitution (81) amendment (71) government (57) law (49) Learned semantic topics court government council culture supreme constitutional rights justice president served governor secretary senate congress presidential elected flowers leaves plant perennial flower plants growing annual disease behavior glands contact symptoms skin pain infection ´ » Entry on “Constitution of the United States” metal processmethodpaper…glass copper lead steel person exampletimepeople…rules lead leads law • Grolier encyclopedia: 15276 words, 30991 articles. • Semantic features, word sense disambiguation.
Multimodal integration • Vision, hearing and language combined (Knudson, 1997)
Summary • Adaptation and learning in biological systems important for vision, hearing, motor control. • Mimic neural systems in computer algorithms. • Robotic systems can learn from experience. • But still cannot compete with your family dog or cat...