330 likes | 457 Views
Introduction to Collectives. Kagan Tumer NASA Ames Research Center kagan@ptolemy.arc.nasa.gov http://ic.arc.nasa.gov/~kagan http://ic.arc.nasa.gov/projects/COIN/index.html (Joint work with David Wolpert). Outline. Introduction to collectives Definition / Motivation
E N D
Introduction to Collectives Kagan Tumer NASA Ames Research Center kagan@ptolemy.arc.nasa.gov http://ic.arc.nasa.gov/~kagan http://ic.arc.nasa.gov/projects/COIN/index.html (Joint work with David Wolpert)
Outline • Introduction to collectives • Definition / Motivation • A naturally occurring example • Illustration of theory of collectives I • Central equation of collectives • Interlude 1: • Autonomous defects problem (Johnson and Challet) • Illustration of theory of collectives II • Aristocrat utility • Wonderful life utility • Interlude 2: • El Farol bar problem: System equilibria and global optima • Collective of rovers: Scientific return maximization • Final thoughts K. Tumer
Motivation • Most complex systems, not only can be, but need to be viewed as collectives. Examples include: • Control of a constellation of communication satellites • Routing data/vehicles over a communication network/highway • Dynamic data migration over large distributed databases • Dynamic job scheduling across a (very) large computer grid • Coordination of rovers/submersibles on Mars/Europa • Control of the elements of an amorphous computer/telescope • Construction of parallel algorithms for optimization problems • Autonomous defects Problem K. Tumer
Collectives • A Collective is • A (perhaps massive) set of agents; • All of which have “personal” utilities they are trying to achieve; • Together with a world utility function measuring the full system’s performance. • Given that the agents are good at optimizing their personal utilities, the crucial problem is an inverse problem: How should one set (and potentially update) the personal utility functions of the agents so that they “cooperate unintentionally” and optimize the world utility? K. Tumer
Natural Example: Human Economy • World utility is GDP • Agents are the individual humans • Agents try to maximize their own “personal” utilities • Design problem is: • How to modify personal utilities of the agents through incentives or regulations (e.g., tax breaks, SEC regulations against insider trading, antitrust laws) to achieve high GDP? • Note: A. Greenspan does not tell each individual what to do. • Economics hamstrung by “pre-set agents” • No such restrictions for an artificial collective K. Tumer
Outline • Introduction to Collectives • Definition / Motivation • A naturally occurring example • Illustration of Theory of Collectives I • Central Equation of Collectives • Interlude 1: • Autonomous defects problem (Johnson and Challet) • Illustration of theory of collectives II • Aristocrat utility • Wonderful life utility • Interlude 2: • El Farol bar problem: System equilibria and global optima • Collective of rovers: Scientific return maximization • Final thoughts K. Tumer
z z z z h4 h1,t0 ^h4,t0 tn Nomenclature h : an agent z : state of all agents across all time z h,t : state of agent h at time t z ^h,t : state of all agents other than h at time t K. Tumer
Key Concepts for Collectives • Factoredness: Degree to which an agent’s personal utility is aligned with the world utility (e.g., quantifies “if you get rich, world benefits” concept). • Intelligence: Percentage of states that would have resulted in agent hhaving a worse utility (e.g., SAT-like percentile concept). • Learnability: Signal-to-noise measure. Quantifies how sensitive an agent’s personal utility function is to a change in its state. K. Tumer
Learnability Explore vs. Exploit Factoredness Operations Research Economics Machine Learning Central Equation of Collectives • Our ability to control system consists of setting some parameters s (e.g, agents' goals): • eG and eg are intelligences for the agents w.r.t the world utility (G) and their personal utilities (g) , respectively K. Tumer
Outline • Introduction to Collectives • Definition / Motivation • A naturally occurring example • Illustration of Theory of Collectives I • Central Equation of Collectives • Interlude 1: • Autonomous defects problem (Johnson and Challet) • Illustration of Theory of Collectives II • Aristocrat utility • Wonderful life utility • Interlude 2: • El Farol bar problem: System equilibria and global optima • Collective of rovers: Scientific return maximization • Final thoughts K. Tumer
Autonomous Defects Problem • Given a collection of faulty devices, how to choose the subset of those devices that, when combined with each other, gives optimal performance (Johnson & Challet). aj: distortion of component j nk: action of agent k (nk = 0 ; 1) • Collective approach: Identify each agent with a component. • Question: what utility should each agent try to maximize? K. Tumer
Autonomous Defects Problem (N=100) K. Tumer
Autonomous Defects Problem (N=1000) K. Tumer
Autonomous Defects Problem: Scaling K. Tumer
Outline • Introduction to Collectives • Definition / Motivation • A naturally occurring example • Illustration of Theory of Collectives I • Central Equation of Collectives • Interlude 1: • Autonomous defects problem (Johnson and Challet) • Illustration of Theory of Collectives II • Aristocrat utility • Wonderful life utility • Interlude 2: • El Farol bar problem: System equilibria and global optima • Collective of rovers: Scientific return maximization • Final thoughts K. Tumer
Personal Utility • Recall central equation: Learnability Factoredness • Solve for personal utility g that maximizes learnability, while constrained to the set of factored utilities K. Tumer
1 pi(zh) = Number of possible actions for h Aristocrat Utility • One can solve for factored U with maximal learnability, i.e., a U with good term 2 and 3 in central equation: • Intuitively, AU reflects the difference between the actual G and the average G (averaged over all actions you could take). • For simplicity, when evaluating AU here, we make the following approximation: K. Tumer
Clamping • Clamping parameter CLhv: replace h’s state (taken to be unary vector) with constant vector v • Clamping creates a new “virtual” worldline • In general v need not be a “legal” state for h • Example: four agents, three actions. Agent h2 clamps to “average action” vector a = (.33 .33 .33): 0 0 0 1 1 1 3 0 9 0 0 0 K. Tumer
Wonderful Life Utility • The Wonderful Life Utility (WLU) for h is given by: • Clamping to “null” action (v = 0) removes player from system (hence the name). • Clamping to “average” action disturbs overall system minimally (can be viewed as approximation to AU). • Theorem: WLU is factored regardless of v • Intuitively, WLU measures the impact of agent h on the world • Difference between world as it is, and world without h • Difference between world as it is, and world where h takes average action • WLU is “virtual” operation. System is not re-evolved. K. Tumer
Outline • Introduction to Collectives • Definition / Motivation • A naturally occurring example • Illustration of Theory of Collectives I • Central Equation of Collectives • Interlude 1: • Autonomous defects problem (Johnson and Challet) • Illustration of Theory of Collectives II • Aristocrat utility • Wonderful life utility • Interlude 2: • El Farol bar problem: System equilibria and global optima • Collective of rovers: Scientific return maximization • Final thoughts K. Tumer
El Farol Bar Problem • Congestion game: A game where agents share the same action space, and world utility is a function purely of how many agents take each action. • Illustrative Example: Arthur’s El Farol bar problem: • At each time step, each agent decides whether to attend a bar: • If agent attends and bar is below capacity, agent gets reward • If agent stays home and bar is above capacity, agent gets reward • Problem is particularly interesting because rational agents cannot all correctly predict attendance: • If most agents predict attendance will be low and therefore attend, attendance will be high • If most agents predict high attendance and therefore do not attend … K. Tumer
Attendance for night k at week t Capacity of bar Reward for night k at week t Rt : Reward for week t Modified El Farol Bar Problem • Each week agents select one of seven nights to attend a bar • Further modifications: • Each week each agent selects two nights to attend bar. • ... • Each week each agent selects six nights to attend bar. K. Tumer
Personal Utility Functions • Two conventional utilities: • Uniform Division (UD): Divide each night’s total reward among all agents that attended that night (the “natural” reward) • Team Game (TG): Total world reward at time t (Rt) • Three collective-based utilities: • WL 0 : WL utility with clamping parameter set to vector of 0s (world utility minus “world utility without me”) • WL 1 : WL utility with clamping parameter set to vector of 1s (world utility minus “world utility where I attend every night”) • WL a : WL utility with clamping parameter set to vector of average action (world utility minus “world utility where I do what is “expected of me”) K. Tumer
Bar Problem: Utility Comparison (Attend one night, 60 agents, c=3) K. Tumer
Typical Daily Bar Attendance (c=6; t=1000 s ; Number of agents = 168) K. Tumer
Scaling Properties (attend one night) c=2,3,4,6,8,10,15, respectively K. Tumer
Performance vs. # of Nights to Attend 60 agents; c= 3,6,8,10,10,12,15 respectively K. Tumer
Collectives of Rovers • Design a collective of autonomous agents to gather scientific information (e.g., rovers on Mars, submersibles under Europa) • Some areas have more valuable information than others • World Utility: Total importance weighted information collected • Both the individual rovers and the collective need to be flexible so they can adapt to new circumstances • Collective-based payoff utilities result in better performance than more “natural” approaches K. Tumer
Token value function: L: Location Matrix for all agents Lh: Location Matrix agent h Lh,ta: Location Matrix of agent h at time t, had it taken action a at t-1 Q: Initial token configuration World Utility • World Utility : • Note: Agents’ payoff utilities reduce to figuring out what “L” to use. K. Tumer
Payoff Utilities • Selfish Utility : • Team Game Utility : • Collectives-Based Utility (theoretical): • Collectives-Based Utility (practical): K. Tumer
Utility Comparison in Rover Domain 100 rovers on a 32x32 grid K. Tumer
Scaling Properties in Rover Domain K. Tumer
Summary • Given a world utility, deploying RL algorithms provides a solution to the distributed design problem. But what utilities does one use? • Theory of collectives shows how to configure and/or update the personal utilities of the agents so that they “unintentionally cooperate” to optimize the world utility • Personal utilities based on collectives successfully applied to many domains (e.g., autonomous rovers, constellations of communication satellites, data routing, autonomous defects) • Performance gains due to using collectives-based utilities increasewith size of problem • A fully fleshed science of collectives would benefit from and have applications to many other sciences K. Tumer