260 likes | 479 Views
Trading Agent Competition (TAC). Jon Lerner, Silas Xu, Wilfred Yeung CS286r, 3 March 2004. TAC Overview. International Competition Intended to spur research into trading agent design First held in July 2000 TAC Classic and TAC SCM Scenarios. TAC Classic.
E N D
Trading Agent Competition(TAC) Jon Lerner, Silas Xu, Wilfred Yeung CS286r, 3 March 2004
TAC Overview • International Competition • Intended to spur research into trading agent design • First held in July 2000 • TAC Classic and TAC SCM Scenarios
TAC Classic • Each team in charge of virtual travel agent • Agents try to find travel packages for virtual clients • All clients wish to travel over same five day period • Clients not all equal, each has different preferences for certain types of travel packages
Travel Packages • Each contains flight info, hotel type, and entertainment tickets • To gain positive utility from client, agents must construct feasible packages. Feasible means: • Arrival date strictly less than departure date • Same hotel reserved during all intermediate nights • At most one entertainment event per night • At most one of each type of entertainment ticket
Flights • Clients have preferences for ideal arrival/departure dates • Infinite supply of flights sold through continuously clearing auctions • Prices set by a random walk • Prices later set to drift upwards to discourage waiting • No resale or exchange of flights permitted
Hotels • Two hotels – high quality and low quality, 16 rooms per hotel per night • Sold through ascending, multi-unit, sixteenth-price auctions: one auction for all rooms for single hotel on single night • Periodically a random auction closes to encourage agents to bid • Clients have different values for high and low quality hotels
Entertainment • Three types of entertainment available • Clients have value for each type • Each agent has initial endowment of tickets • Buy and sell tickets through continuous double auction
Agent Themes • Agents have to address: • When to Bid • What to Bid On • How Much to Bid • Combinatorial preferences, but not combinatorial auctions
Strategies • What strategies come to mind? • What AI techniques might be useful? • Simple vs. Complicated Strategies • How quickly should you adapt as game progresses? • Use of historical data vs. Focus on current game only • Play the game vs. Play the players
living agents (Living Systems AG)Winner: TAC 2001 • Makes two assumptions • 1. Steadily increasing flight prices favor early decisions for flight tickets. • 2. Especially the good performing teams are following a strategy to maximize their own utility. They are not trying to take the risk to reduce other team’s utility. • Simple strategy • Makes substantial use of historical data. • Barely any monitoring/adapting to changing conditions • Benefits from other agents’ complicated algorithms to control price; Open-loop, Play the Players
living agents: Determining Hotel and Flight Bids • Assume hotel auction will clear at historical levels • Using these as hotel prices, initial flight prices, and client preferences, determine optimal client trips • Immediately place bids based on this optimum • Purchase corresponding flights immediately • Place offers for required hotels at prices high enough to ensure successful acquisition
Entertainment Auction • Immediately makes fixed decision as to which entertainment to attempt to buy/sell assuming the historical clearing price of about $80. • Opportunistically buy and sell around this point • Put in final reservation prices at seven minute mark.
How good is living agents? • Risky • If hotel bids are not high enough, fails to complete trips, resulting in huge loss of points. • If hotel clears at living agents’ bid, potentially pays much more than necessary • After placing initial bid, does not monitor hotel or flight auctions at all • Clearly not all agents could use this strategy (Hotel auctions) • Simple • Buys flights immediately, avoiding cost of waiting • Relies on historical data • Contains information from many games • But how sensitive is evolution of game to changes in client preferences, or changes in opponents’ strategy?
Applicability • Use of historical data for predictive information • Feasibility of simple strategies that ignore feedback • Play against the players (not prices), under the assumption that other agents keep things relatively efficient.
ATTac (AT&T Research)Winner: TAC 2002 • Uses sophisticated machine-learning techniques to predict future hotel prices based on the current situation • Buys flights based on cost-benefit analysis of committing versus waiting • Minute-by-minute reoptimization of bids based on holdings and predictions
The heart of ATTac • Assumption: Because of many unknowns, exactly predicting the price of a hotel room is hopeless. • Instead, regard the closing price as a random variable that needs to be estimated, conditional on our current state of knowledge • Number of minutes remaining in game • Ask price of each hotel • Flight prices • Historical Date • Construct a model of the probability distribution over clearing prices (based on a boosting algorithm), stochastically sample prices, and compute expected profit
The high-level algorithm • Denote the most profitable allocation of goods at any time by G* • When first flight quotes are posted: • Compute G* with current holdings and expected prices • Buy the flights in G* for which the expected cost of postponing commitment exceeds the expected benefit of postponing commitment • Starting 1 minute before each hotel close: • Compute G* with current holdings and expected prices • Buy the flights in G* for which expected cost of postponing commitment exceeds expected benefit of postponing commitment • Bid hotel room expected marginal values given holdings, new flights, and expected hotel purchases • Last minute: Buy remaining flights as needed by G* • In parallel (continuously): Buy/sell entertainment tickets base on their expected values
The boosting algorithm: solving conditional density estimation problems • Start with ordered pairs (x,y), with x being a vector that describes auction-specific features, y being the difference between closing price and current price • Aim of boosting is, given current x, to estimate the conditional distribution of y • Construct conditional distribution function that minimize the sum of negative log likelihood of y given x, for all training samples. • Use this condition distribution function to map x to y
living agents vs. ATTac • Two very different approaches • Statistically insignificant difference in scores in TAC2001
Open and Closed Loop Processes • Closed-loop: system feeds information back into itself. Examines the world in an effort to validate the world model. • appropriate for real-world environments in which feedback is necessary to validate agent actions. • Open-loop: no feedback from the environment to the agent. Output from processes are considered complete upon execution. • appropriate for simulated rather than real environments (tasks not performed perfectly by agent generally.) • generally more efficient for the same reason.
Walverine: (Closed-loop) • Model Based: Flight and Hotel • Predicts hotel prices by Walrasian equilibrium • Derives expected demand from 64 clients’ preferences and initial flight prices, which influence clients’ choice of travel days, and • Construct bids that max expected value of bid • Model Free: Entertainment • Q-Learning from thousands of auction instances (aside on model vs model-free learning) • No empirically tuned parameters
SouthamptonTAC: (Closed-loop) • Adaptive agent, varies strategy to mkt cond. • 3 classifications for environments: • Non-competitive (agent gets hotel at low prices) • Semi-competitive (medium prices) • Competitive (prices of hotels high) • Based on curr game and outcomes of recent games • Non-competitive: • Buys all flights at beginning of game • Never change itinerary of clients
SouthamptonTAC: (Closed-loop) • Competitive: • Rapidly rising prices – buy at beginning • Stagnant prices – buy near the end • Fuzzy reasoning to predict hotel clearing prices • 3 rule bases • Factors inc: price of hotel, counterpart, price change in prev minute, price change in counterpart hotel in prev minute • Continuously assesses game type
ROXY-BOT: (Open-loop) • Two phase bidding policy: • Solve completion problem • Optimization based on a tree structure using beam search that only partially expands the tree. [Greenwald] • Valuate goods in that set • Marginal utility calculator MU(x) = V(N) – V(N|x) • Computing Prices: (historical data) • Point estimates (’00) • Estimated price distributions (’01) • Averaging MU across many samples of estimated price dist • Monte-Carlo simulation to evaluate bidding policy (’02)
Whitebear (Winner in ’02, Open-loop) • Flights: • A: buy everything • B: buy only what is absolutely necessary • Combination: buy everything except dangerous tickets • Hotels: (predictions simply historical averages) • A: bid small increment greater than current prices • B: bid marginal utility • Combination: Use A, unless MU is high, use B • Domain specific, extensive experimentation • No necessarily optimal set of goods, no learning
Summary: Open vs Closed • All else equal open-strategy better: • Simple • Avoids waiting costs (higher prices) • Predictability of price is determining factor • Perfectly predictable – open-loop • Large price variance – closed-loop • Open-loop picks the good at the start and may pay a lot • Small price variance – optimal closed loop • But complexity for potentially small benefit