280 likes | 407 Views
Artificial Agents Play the Beer Game Eliminate the Bullwhip Effect and Whip the MBAs. Steven O. Kimbrough D.-J. Wu Fang Zhong. The MIT Beer Game. Players Retailer, Wholesaler, Distributor and Manufacturer. Goal Minimize system-wide (chain) long-run average cost.
E N D
Artificial Agents Play the Beer Game Eliminate the Bullwhip Effect and Whip the MBAs Steven O. Kimbrough D.-J. Wu Fang Zhong
The MIT Beer Game • Players • Retailer, Wholesaler, Distributor and Manufacturer. • Goal • Minimize system-wide (chain) long-run average cost. • Information sharing: Mail. • Demand: Deterministic. • Costs • Holding cost: $1.00/case/week. • Penalty cost: $2.00/case/week. • Leadtime: 2 weeks physical delay
Timing 1. New shipments delivered. 2. Orders arrive. 3. Fill orders plus backlog. 4. Decide how much to order. 5. Calculate inventory costs.
The Bullwhip Effect • Order variability is amplified upstream in the supply chain. • Industry examples (P&G, HP).
Bullwhip Effect Example (P & G) Lee et al., 1997, Sloan Management Review
Analytic Results: Deterministic Demand • Assumptions: • Fixed lead time. • Players work as a team. • Manufacturer has unlimited capacity. • “1-1” policy is optimal -- order whatever amount is ordered from your customer.
Analytic Results: Stochastic Demand (Chen, 1999, Management Science) • Additional assumptions: • Only the Retailer incurs penalty cost. • Demand distribution is common knowledge. • Fixed information lead time. • Decreasing holding costs upstream in the chain. • Order-up-to (base stock installation) policy is the optimal.
Agent-Based Approach • Agents work as a team. • No agent has knowledge on demand distribution. • No information sharing among agents. • Agents learn via genetic algorithms. • Fixed or stochastic leadtime.
Research Questions • Can the agents track the demand? • Can the agents eliminate the Bullwhip effect? • Can the agents discover the optimal policies if they exist? • Can the agents discover reasonably good policies under complex scenarios where analytical solutions are not available?
Agents Coding Strategy • Bit-string representation with fixed length n. • Leftmost bit represents the sign of “+” or “-”. • The rest bits represent how much to order. • Rule “x+1” means “if demand is x then order x+1”. • Rule search space is 2n-1 – 1.
Experiment 1a: First Cup • Environment: • Deterministic demand with fixed leadtime. • Fix the policy of Wholesaler, Distributor and Manufacturer to be “1-1”. • Only the Retailer agent learns. • Result: Retailer Agent finds “1-1”.
Experiment 1b • All four Agents learn under the environment of experiment 1a. • Über rule for the team. • All four agents find “1-1”.
Artificial Agents Whip the MBAs in Playing the MIT Beer Game
Result of Experiment 1b All four agents can find the optimal “1-1” policy
Stability (Experiment 1b) • Fix any three agents to be “1-1”, and allow the fourth agent to learn. • The fourth agent minimizes its own long-run average cost rather than the team cost. • No agent has any incentive to deviate once the others are playing “1-1”. • Therefore “1-1” is apparently Nash.
Experiment 2: Second Cup • Environment: • Demand uniformly distributed between [0,15]. • Fixed the lead time. • All four Agents make their own decisions as in experiment 1b. • Agents eliminate the Bullwhip effect. • Agents find better policies than“1-1”.
Artificial agents discover a better policy than “1-1” when facing stochastic demand with penalty costs for all players.
Experiment 3: Third Cup • Environment: • Lead time uniformly distributed between [0,4]. • The rest as in experiment 2. • Agents find better policies than “1-1”. • No Bullwhip effect. • The polices discovered by agents are Nash.
Artificial agents discover better and stable policies than “1-1” when facing stochastic demand and stochastic lead-time.
Artificial Agents are able to eliminate the Bullwhip effect when facing stochastic demand with stochastic leadtime.
The Columbia Beer Game • Environment: • Information lead time: (2, 2, 2, 0). • Physical lead time: (2, 2, 2, 3). • Initial conditions set as Chen (1999). • Agents find the optimal policy: order whatever is ordered with time shift, I.e., Q1 = D (t-1), Qi = Qi-1 (t – l).
Ongoing Research: More Beer • Value of information sharing. • Coordination and cooperation. • Bargaining and negotiation. • Alternative learning mechanisms: Classifier systems.
Summary • Agents are capable of playing the Beer Game • Track demand. • Eliminate the Bullwhip effect. • Discover the optimal policies if exist. • Discover good policies under complex scenarios where analytical solutions are not available. • Intelligent and agile supply chain. • Multi-agent enterprise modeling.