Advanced Topics in Robotics CS493/790 (X)

Advanced Topics in Robotics CS493/790 (X) Lecture 2 Instructor: Monica Nicolescu

Robot Components • Sensors • Effectors and actuators • Used for locomotion and manipulation • Controllers for the above systems • Coordinating information from sensors with commands for the robot’s actuators • Robot = an autonomoussystem which exists in the physical world, can senseits environment and canacton it to achieve some goals CS 493/790(X) - Lecture 2

Sensors • Sensor = physical device that provides information about the world • Process is called sensing or perception • Robot sensing depends on the task • Sensor (perceptual) space • All possible values of sensor readings • One needs to “see” the world through the robot’s “eyes” • Grows quickly as you add more sensors CS 493/790(X) - Lecture 2

State Space • State: A description of the robot (of a system in general) • State space: All possible states a robot could be in • E.g.: light switch has two states, ON, OFF; light switch with dimmer has continuous state (possibly infinitely many states) • Different than the sensor/perceptual space!! • Internal state may be used to store information about the world (maps, location of “food”, etc.) • How intelligent a robot appears is strongly dependent on how much and how fast it can sense its environment and about itself CS 493/790(X) - Lecture 2

Representation • Internal state that stores information about the world is called a representation or internal model • Self: stored proprioception, goals, intentions, plans • Environment: maps • Objects, people, other robots • Task: what needs to be done, when, in what order • Representations and models influence determine the complexity of a robot’s “brain” CS 493/790(X) - Lecture 2

Action • Effectors: devices of the robot that have impact on the environment (legs, wings  robotic legs, propeller) • Actuators: mechanisms that allow the effectors to do their work (muscles  motors) • Robotic actuators are used for • locomotion (moving around, going places) • manipulation (handling objects) CS 493/790(X) - Lecture 2

Autonomy • Autonomy is the ability to make one’s own decisions and act on them. • For robots: take the appropriate action on a given situation • Autonomy can be complete (R2D2) or partial (teleoperated robots) CS 493/790(X) - Lecture 2

Control Architectures • Robot control is the means by which the sensing and action of a robot are coordinated • Controllers enable robots to be autonomous • Play the role of the “brain” and nervous system in animals • Controllers need not (should not) be a single program • Typically more than one controller, each process information from sensors and decide what actions to take • Should control modules be centralized? • Challenge: how do all these controllers coordinate with each other? CS 493/790(X) - Lecture 2

Spectrum of robot control From “Behavior-Based Robotics” by R. Arkin, MIT Press, 1998 CS 493/790(X) - Lecture 2

Robot control approaches • Reactive Control • Don’t think, (re)act. • Deliberative (Planner-based) Control • Think hard, act later. • Hybrid Control • Think and act separately & concurrently. • Behavior-Based Control (BBC) • Think the way you act. CS 493/790(X) - Lecture 2

Thinking vs. Acting • Thinking/Deliberating • slow, speed decreases with complexity • involves planning (looking into the future) to avoid bad solutions • thinking too long may be dangerous • requires (a lot of) accurate information • flexible for increasing complexity • Acting/Reaction • fast, regardless of complexity • innate/built-in or learned (from looking into the past) • limited flexibility for increasing complexity CS 493/790(X) - Lecture 2

Technique for tightly coupling perception and action to provide fast responses to changing, unstructured environments Collection of stimulus-response rules Limitations No/minimal state No memory No internal representations of the world Unable to plan ahead Unable to learn Advantages Very fast and reactive Powerful method: animals are largely reactive Reactive Control:Don’t think, react! CS 493/790(X) - Lecture 2

Deliberative Control: Think hard, then act! • In DC the robot uses all the available sensory information and stored internal knowledge to create a plan of action: sense  plan  act (SPA) paradigm • Limitations • Planning requires search through potentially all possible plans  these take a long time • Requires a world model, which may become outdated • Too slow for real-time response • Advantages • Capable of learning and prediction • Finds strategic solutions CS 493/790(X) - Lecture 2

Hybrid Control: Think and act independently & concurrently! • Combination of reactive and deliberative control • Reactive layer (bottom): deals with immediate reaction • Deliberative layer (top): creates plans • Middle layer: connects the two layers • Usually called “three-layer systems” • Major challenge: design of the middle layer • Reactive and deliberative layers operate on very different time-scales and representations (signals vs. symbols) • These layers must operate concurrently • Currently one of the two dominant control paradigms in robotics CS 493/790(X) - Lecture 2

Behavior-Based Control:Think the way you act! • Behaviors: concurrent processes that take inputs from sensors and other behaviors and send outputs to a robot’s actuators or other behaviors to achieve some goals • An alternative to hybrid control, inspired from biology • Has the same capabilities as hybrid control: • Act reactively and deliberatively • Also built from layers • However, there is no intermediate layer • Components have a uniform representation and time-scale CS 493/790(X) - Lecture 2

Fundamental Differences of Control • Time-scale: How fast do things happen? • how quickly the robot has to respond to the environment, compared to how quickly it can sense and think • Modularity: What are the components of the control system? • Refers to the way the control system is broken up into modules and how they interact with each other • Representation: What does the robot keep in its brain? • The form in which information is stored or encoded in the robot CS 493/790(X) - Lecture 2

Behavior Coordination • Behavior-based systems require consistent coordination between the component behaviors for conflict resolution • Coordination of behaviors can be: • Competitive: one behavior’s output is selected from multiple candidates • Cooperative: blend the output of multiple behaviors • Combination of the above two CS 493/790(X) - Lecture 2

Competitive Coordination • Arbitration: winner-take-all strategy  only one response chosen • Behavioral prioritization • Subsumption Architecture • Action selection/activation spreading (Pattie Maes) • Behaviors actively compete with each other • Each behavior has an activation level driven by the robot’s goals and sensory information • Voting strategies • Behaviors cast votes on potential responses CS 493/790(X) - Lecture 2

Cooperative Coordination • Fusion: concurrently use the output of multiple behaviors • Major difficulty in finding a uniform command representation amenable to fusion • Fuzzy methods • Formal methods • Potential fields • Motor schemas • Dynamical systems CS 493/790(X) - Lecture 2

Example of Behavior Coordination Fusion: flocking (formations) Arbitration:  foraging (search, coverage) CS 493/790(X) - Lecture 2

Learning & Adaptive Behavior • Learning produces changes within an agent that over time enable it to perform more effectively within its environment • Adaptation refers to an agent’s learning by making adjustments in order to be more attuned to its environment • Phenotypic (within an individual agent) or genotypic (evolutionary) • Acclimatization (slow) or homeostasis (rapid) CS 493/790(X) - Lecture 2

Learning Learning can improve performance in additional ways: • Introduce new knowledge (facts, behaviors, rules) • Generalize concepts • Specialize concepts for specific situations • Reorganize information • Create or discover new concepts • Create explanations • Reuse past experiences CS 493/790(X) - Lecture 2

Learning Methods • Reinforcement learning • Neural network (connectionist) learning • Evolutionary learning • Learning from experience • Memory-based • Case-based • Learning from demonstration • Inductive learning • Explanation-based learning • Multistrategy learning CS 493/790(X) - Lecture 2

Reinforcement Learning (RL) • Motivated by psychology (the Law of Effect, Thorndike 1991): Applying a reward immediately after the occurrence of a response increases its probability of reoccurring, while providing punishment after the response will decrease the probability • One of the most widely used methods for adaptation in robotics CS 493/790(X) - Lecture 2

Reinforcement Learning • Goal: learn an optimal policy that chooses the best action for every set of possible inputs • Policy: state/action mapping that determines which actions to take • Desirable outcomes are strengthened and undesirable outcomes are weakened • Critic: evaluates the system’s response and applies reinforcement • external: the user provides the reinforcement • internal: the system itself provides the reinforcement (reward function) CS 493/790(X) - Lecture 2

Learning to Walk • Maes, Brooks (1990) • Genghis: hexapod robot • Learned stable tripod stance and tripod gait • Rule-based subsumption controller • Two sensor modalities for feedback: • Two touch sensors to detect hitting the floor: - feedback • Trailing wheel to measure progress: + feedback CS 493/790(X) - Lecture 2

Learning to Walk • Nate Kohl & Peter Stone (2004) CS 493/790(X) - Lecture 2

Supervised Learning • Supervised learning requires the user to give the exact solution to the robot in the form of the error direction and magnitude • The user must know the exact desired behavior for each situation • Supervised learning involves training, which can be very slow; the user must supervise the system with numerous examples CS 493/790(X) - Lecture 2

Neural Networks • One of the most used supervised learning methods • Used for approximating real-valued and vector-valued target functions • Inspired from biology: learning systems are built from complex networks of interconnecting neurons • The goal is to minimize the error between the network output and the desired output • This is achieved by adjusting the weights on the network connections CS 493/790(X) - Lecture 2

ALVINN • ALVINN (Autonomous Land Vehicle in a Neural Network) • Dean Pomerleau (1991) • Pittsburg to San Diego: 98.2% autonomous CS 493/790(X) - Lecture 2

Learning from Demonstration & RL • S. Schaal (’97) • Pole balancing, pendulum-swing-up CS 493/790(X) - Lecture 2

Learning from Demonstration Inspiration: • Human-like teaching by demonstration Demonstration Robot performance CS 493/790(X) - Lecture 2

Learning from Robot Teachers • Transfer of task knowledge from humans to robots Human demonstration Robot performance CS 493/790(X) - Lecture 2

Classical Conditioning • Pavlov 1927 • Assumes that unconditioned stimuli (e.g. food) automatically generate an unconditioned response (e.g., salivation) • Conditioned stimulus (e.g., ringing a bell) can, over time, become associated with the unconditioned response CS 493/790(X) - Lecture 2

Darvin’s Perceptual Categorization • Two types of stimulus blocks • 6cm metallic cubes • Blobs: low conductivity (“bad taste”) • Stripes: high conductivity (“good taste”) • Instead of hard-wiring stimulus-response rules, develop these associations over time Early training After the 10th stimulus CS 493/790(X) - Lecture 2

Genetic Algorithms • Inspired from evolutionary biology • Individuals in a populations have a particular fitness with respect to a task • Individuals with the highest fitness are kept as survivors • Individuals with poor performance are discarded: the process of natural selection • Evolutionary process: search through the space of solutions to find the one with the highest fitness CS 493/790(X) - Lecture 2

Genetic Operators • Knowledge is encoded as bit strings: chromozome • Each bit represents a “gene” • Biologically inspired operators are applied to yield better generations CS 493/790(X) - Lecture 2

Evolving Structure and Control • Karl Sims 1994 • Evolved morphology and control for virtual creatures performing swimming, walking, jumping, and following • Genotypes encoded as directed graphs are used to produce 3D kinematic structures • Genotype encode points of attachment • Sensors used: contact, joint angle and photosensors CS 493/790(X) - Lecture 2

Evolving Structure and Control • Jordan Pollak • Real structures CS 493/790(X) - Lecture 2

Readings • F. Martin: Sections 1.1, 1.2.3, 5 • M. Matarić: Chapters 1, 3, 10 CS 493/790(X) - Lecture 2

Advanced Topics in Robotics CS493/790 (X)