150 likes | 326 Views
Guiding Combinatorial Optimization with UCT Ashish Sabharwal and Horst Samulowitz IBM Watson Research Center (presented by Raghuram Ramanujan) MCTS Workshop at ICAPS-2011 June 12, 2011. MCTS and Combinatorial Search. Monte Carlo Tree Search (MCTS): widely used in a variety of domains in AI
E N D
Guiding Combinatorial Optimizationwith UCT Ashish Sabharwal and Horst SamulowitzIBM Watson Research Center (presented by Raghuram Ramanujan) MCTS Workshop at ICAPS-2011 June 12, 2011
MCTS and Combinatorial Search • Monte Carlo Tree Search (MCTS): widely used in a variety of domains in AI • Upper Confidence bounds on Trees (UCT): a form of MCTS, especially successful in two-agent game tree search, e.g., Go, Kriegspiel, Mancala, General Game Playing • Based on single-agent tree search: one multi-armed bandit at each node of a tree goal: find the most “rewarding” root-to-leaf path in the tree • Combinatorial Search • A discrete search space, e.g., {0,1}N or {R, G, B}N • A “feasible” subspace of interest: typically defined indirectly by a finite set of constraints • Goal: find a solution – an element of the discrete space that satisfies all constraints • If a utility function / objective function given: find an optimal solution • E.g., Boolean Satisfiability (SAT), Graph Coloring (COL), Constraint Satisfaction Problems (CSPs), Constraint Optimization, Integer Programming (IP) Can MCTS/UCT inspired techniques be used to improve the performance of combinatorial search algorithms? graph coloring
Mixed Integer Programming (MIP) :A Challenging but Promising Opportunity • MIP: linear inequality constraints, continuous & discrete variables • Typically with a linear (or quadratic) objective function • NP-hard; highly useful, with several academic and commercial solvers available • MIP search appears much more suitable than, e.g., SAT for applying UCT! • Opportunity for applying UCT • MIP solvers such as IBM ILOG’s CPLEX, Gurobi, etc.: • maintain a “frontier” of open nodes, exploring them with acombination of best-first search, “diving” to the bottom of the tree, etc. • rely on spending substantial effort per node, e.g., computing LP relaxation to obtain a bound on the objective value in the subtree: an estimate of the true value • In contrast, state-of-the-art SAT solvers not easily adapted to UCT: • are based on enhancements to basic depth-first search traversal • rely on processing nodes extremely fast (~ 2000-5000 per second) Can we improve CPLEX by letting UCT decide search tree exploration order?
Mixed Integer Programming (MIP) :A Challenging but Promising Opportunity • Challenges and Differences from the “usual” setup for UCT • Biggest success of UCT so far: two-agent game tree search, rather than single-agent • Random playouts are costly to implement in MIP search • Unlike game tree search, too costly to create a full UCT tree at each node • Exploitation isn’t very meaningful after true value of a node is revealed:no reason to repeatedly visit that node even if it is optimal • LP relaxation – available for “free”, provides a guaranteed bound on the true value averaging backups may not be the best strategy! • Highly optimized commercial MIP solvers such as CPLEX very hard to improve upon! • Implementation: no easy access to CPLEX’s internal data structures; must maintain our own “shadow tree” for exploring UCT strategies – additional overhead Main Finding: Guidance near the top of the tree can improve performance across a variety of instances!
How does Search in CPLEX (roughly) work? • CPLEX explores the search tree by alternating between two operations: • Node Selection: Select the next open search node to continue search on: CPLEX selects node with the best estimate E • Branching: Select the next variable to branch on (assume binary branching) Root-Node • Node Selection: • Initially only one node that can be selected - Branching: Select variable x • Node Selection: • Select node with estimate Search Tree - Branching: Select variable y • Node Selection: • Select node with estimate - Branching: Select variable z • Node Selection: • Select node with estimate - Branching: Select variable v CPLEX open nodes and corresponding quality estimate of the underlying sub-tree (e.g., LP objective value) CPLEX closed nodes
Guiding Node Selection in CPLEX with UCT • Node Selection with UCT • Idea: expand nodes in the order in which UCT would expand them • Traverse search tree from root to a current leaf node (i.e., “open” node) while at each node selecting the child that has the highest UCT score s. • UCT score s: Combines estimate of the “quality” of a node (the same CPLEX uses) with how often this node has been visited already • Goal: Balance Exploration / Exploitation in CPLEX search • Tree Update Phase • When node selection reaches a leaf node, • compute its quality estimate (e.g., objective value of LP relaxation) and propagate it upwards towards the root • branch on this node using the default variable/value selection of CPLEX • Update rule / backup operator: max of the two children (no averaging!), if maximization problem; min if minimization • Result: estimate at each node N along this leaf-to-root path equals the best value seen in the entire sub-tree under N
Guiding Search in CPLEX with UCT • Node Selection • Node Selection is now guided by UCT scores (as illustrated below) • UCT score is based on estimate E and number of visits to a search nod • In order to employ UCT one needs to maintain a shadow tree of CPLEXs search tree • CPLEX maintains just a frontier of open nodes; the underlying search tree only exists implicitly Root-Node • Node Selection: • Initially only one node that can be selected - Branching: Select variable x • Node Selection: • Select node with highest UCT score based • on and Search Tree - Branching: Select variable y • Node Selection: • Select node with highest UCT score based • on and … CPLEX open nodes and corresponding quality estimate of the underlying sub-tree (e.g., LP objective value) CPLEX closed nodes
Guiding Search in CPLEX with UCT • Tree Update Phase • After selecting a node N and branching on a variable, two child nodes N_left and N_right will be created with their corresponding estimates E_left and E_right • When propagating estimates upwards, we only consider the best estimate (e.g., no averaging) • Update using the “backup operator” Root-Node - Propagate to - Propagate to as long as new estimates improve current best estimate at a node on path to the root. Search Tree E.g., only if then propagate new estimate to node labeled with . However, visit counts are updated for each node on the path to root. CPLEX open nodes and corresponding quality estimate of the underlying sub-tree (e.g., LP objective value) CPLEX closed nodes
UCT Score: “Epsilon Greedy” Variant of UCB1 • UCT Score computation: • N = tree node under consideration P = parent of N = a constant balancing exploration and exploitation (0.7 in experiments) = theoretically a number decreasing inversely proportional to visits(N) ( = a constant set to 0.01 in experiments) • Fast and accurate enough for our purposes, compared to the standard UCB1 formula
Experimental Evaluation • Starting with 1,024 publically available MIP instances we removed: • All instances solved by default CPLEX within 10 seconds (too easy) • All instances not solved by default CPLEX within 900 seconds (too hard) • Experimental Evaluation is based on the 170 remaining instances • Spanning a variety of domains • Experimentation not limited to any particular instance family (e.g., TSP instances, set covering, etc.) • Experiments were conducted on: • Intel Xeon CPU E5410, 2.33GHz with 8 cores, and 32GB of memory • Only a single run per machine since multiple CPLEXs on one machinecan (and often do!) interfere with each other • OS: Ubuntu
Experimental Evaluation: Solvers • Default CPLEX • Uses various strategies, including a combination of best-first node selection and depth-first “diving” to reach a leaf node from each best node • Highly optimized; very challenging to beat by a large margin across a large variety of problem domains • CPLEX with node selection guided by UCT • Best results when guidance limited to the top 5 levels of the tree;then revert to the default node selection of CPLEX • Other standard exploration schemes • Best-first • Breadth-first • Depth-first
Preliminary Experimental Results • [ timeout: 600 sec ] • Promising performance: • UCT guidance results in the fewest instances timing out (8) • Fastest on 39 instances • Lowest average runtime (albeit only by a few seconds)
Preliminary Experimental Results • Pairwise performance measure (timeout: 600 sec) : • how often does the row solver outperform the column solver? • e.g., UCT guidance outperforms default CPLEX on 64 instances;52 times vice versa • Promising performance: • UCT guidance outperforms default CPLEX and other natural alternatives
Conclusion • Explored the use of MCTS/UCT in a combinatorial search setting • Specifically, for mixed integer programming (MIP) search, with CPLEX • Typical “random playouts” very costly but LP relaxation objective value serves as a good estimate – a guaranteed one-sided bound! • Max-style update rule performs better here than the usual averaging backups • Guiding combinatorial search with UCT holds promise! • Improving performance of highly optimized MIP solvers across a variety of problem domains is a huge challenge • UCT-inspired guidance for node selection shows promise • Most benefit when UCT used only near the top of the search tree • Further exploration along these lines appears fruitful, e.g.: • using UCT for variable or value selection (rather than node selection) • building a “full” UCT tree at each search tree node before branching