390 likes | 465 Views
Quantizing Behavioral Heterogeneity. Jon Beckham 11/21/02. Papers to Cover. “Measuring Robot Group Diversity”, Balch “Design & Evaluation of Robust Behavior-Based Controllers”, Goldberg & Mataric
E N D
Quantizing Behavioral Heterogeneity Jon Beckham 11/21/02
Papers to Cover • “Measuring Robot Group Diversity”, Balch • “Design & Evaluation of Robust Behavior-Based Controllers”, Goldberg & Mataric • “Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning”, Zinkevich & Balch
Quantizing • “Measuring Robot Group Diversity”, Tucker Balch
Purpose • To suggest a standard way of quantitatively measuring diversity. • Allows for more accurate, effective analysis. • By establishing a standard metric, we can establish a baseline for comparison.
Sources • Simple Social Entropy • Adapted from Shannon’s Information Entropy • Behavioral Difference • Quantitative measure between different robots. • Hierarchic Social Entropy • Combination of the above.
Diversity • To quote Tucker, who quotes Webster… di verse adj 1: differing from one another: unlike. 2: composed of distinct or unlike elements or qualities.
The Discrete Approach • Assume robots are either alike or different; thus assume subsets of identical robots.
Simple Social Entropy • First, some notation: • R is a society of N agents, thus R = {r1, r2…rN} • C is a classification of R into M subsets • ci is an individual subset of C • Thus C = {c1,c2…cM} • pi is the proportion of agents in the ith subset. • Thus, the sum of all pi is 1.
Social Entropy’s Requirements • Continuous (H must be continuous in pi) • Monotonic (H must be monotonically increasing function of M) • Recursive (H must be weighted sum of H of subsets) • H = 0 when system is homogeneous • H is maximized when all pi are equal for given M • Any change to pi to approach greater equality increases H.
Thus… • H(X) = -K∑Mi=1 pilog2(pi) • REMEMBER THIS! • Also know that it’s the only equation to satisfy the first three properties (as proven by Shannon in his information entropy work).
Limitations of Simple Social Entropy • Loses data by munging pi and M into single value. • Only works for discrete systems.
What About C? • The classification into subsets… • Taxonomy • Clustering
More on Taxonomy • Classification at varying levels through a “dendrogram”.
Which Brings Us To Hierarchic Social Entropy • Simple Social Entropy is only a “snapshot” at a particular level of clustering. • To achieve a continuous metric, we use a plot of entropy at all taxonomic levels. • Good because it gives data at all clustering resolutions, putting to rest the clustering issue.
Another Formula • This time for hierarchic social entropy. • S(R) = ∫0∞ H(R,h)dh
Branching the Taxonomy? • How to get that pretty 2D mapping… • Evaluation Chamber? • In real world, this requires: • Fixed policies • Mechanically Homogeneous • Policy is reflected directly in overt behavior
Placing Numerical Value on Behavioral Differences • More notation • i is a robot’s perceptual state • a is the action (behavioral assemblage) selected by a robot’s control system based on the input i. • πj is rj‘s policy; a = πj(i) • pij is the number of times rj has encountered perceptual state I divided by the total number of times all states have been encountered
Simple Behavioral Difference Metric • Continuous • D’(ra,rb) = 1/n ∫ | πa(i) - πb(i) | di • Discrete • D’(ra,rb) = 1/n Σi | πa(i) - πb(i) | (1/n is normalization factor)
Behavioral Difference • Continuous • D’(ra,rb) = ∫ (pia + pib)/2 | πa(i) - πb(i) | di • Discrete • D’(ra,rb) = Σi (pia + pib)/2 | πa(i) - πb(i) |
Definitions • Absolutely behaviorally equivalent • Iff two robots select the same behavior in every perceptual state. • ε-equivalent if D(ra,rb) < ε. • ≡ε indicates ε-equivalence • A group of robots, R, is ε-homogeneous if for all ra,rb in R, ra≡ε rb.
Experiments (briefly) • Multiforaging • Behaviors • wander • stay_near_home • acquire_red • acquire_blue • deliver_red • deliver_blue • Perceptual Features • red_visible • blue_visible • red_visible_outside_homezone • blue_visible_outside_homezone • red_in_gripper • blue_in_gripper • close_to_homezone • close_to_red_bin • close_to_blue_bin
Methods • Local performance-based reinforcement • Global performance-based reinforcement • Local shaped reinforcement
Summary • Diversity is good in soccer, bad in simple foraging. • Diversity • Globally Rewarded, most diverse • Locally Rewarded • Shaped, least diverse
Conclusions • Diversity as an independent variable • Simple social entropy • Hierarchic social entropy
Problems? • Only deterministic policies • Analysis limited to behavioral diversity
Applying • “Design and Evaluation of Robust Behavior-Based Controllers”, Dani Goldberg and Maja J. Mataric
The Goal • To design multirobot controllers that: • Exhibit group-level robustness to robot failures and noise. • Are easily modified.
Focus • Simple Foraging
Controllers • One Homogeneous • Two Heterogeneous • Pack • Caste
Homogeneous Controller • Act concurrently and independently. • Behaviors • Avoiding • Wandering • Puck Detecting • Puck Grabbing • Homing • Boundary • Buffer • Creeping • Home Detector • Exiting • Reverse Homing • Heading
Heterogeneous Pack Controller • Uses temporal arbitration • SPST → SPDT • Dominance hierarchy based on capabilities or arbitrary assignment • Only one robot can deliver a puck at a time • Same controller as homogeneous, but uses ‘message passing’ to figure out which robot should deliver first. • Uses communication to determine failed or active.
Heterogeneous Caste Controller • Uses spatial arbitration • SPST → DPST • Robots are differentiated into sub-groups or castes • Act concurrently and independently, but in different regions of the task space • May have heterogeneous behavior in addition to spatial heterogeneity • No reliance on communication • (Not implemented, but communication could be use to balance caste ratios in case of failure.)
Interference Graphs • Homogeneous • Heterogeneous Pack • Heterogeneous Caste
Analysis Metrics • Inter-robot collisions • Distance traveled by each robot • Time-to-completion
Statistics… Goldberg & Mataric: “We have performed hypothesis tests using Student’s t, 1-factor analysis of variance (ANOVA), and 2-factor ANOVA, in order to verify that the differences between the results of the implementations were in fact statistically significant.” Tucker:
Conclusions • Attempted to apply Balch’s SSE and HSE, but because of vague definitions no clear conclusion could be reached. • Attempted several calculations, but no conclusive relation to performance. • Partly because no best controller.
Flaws • Use of communication in Pack controller, but nowhere else. • Allowed pack controller to keep track of state of other robots (working or non-working).