No-Regret Algorithms for Online Convex Programs

No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007

Outline • Online learning setting • Definition of Regret • Safe Set • Lagrangian Hedging (gradient form) • Lagrangian Hedging (optimization form) • Mention of Theoretical Results • Application: One-Card Poker

Online Learning • Sequence of trials 1, 2, … • At each trial we must pick a hypothesis yi • Correct answer revealed in the form of a convex loss function lt(yt) • Just before seeing t-th example, total loss is given by

Goal of Paper • Introduce Lagrangian Hedging algorithm • Generalization of other algorithms • Hedge (Freund and Schapire) • Weighted Majority (Littlestone and Warmuth) • External-regret Matching (Hart and Mas-Colell) • (CMU Technical Report is much clearer than NIPS paper)

Regret • If we had used a fixed hypothesisy, the loss would have been • The regret is the difference between the total loss of the adaptive and fixed hypotheses: • Positive regret means that we should have preferred the fxed hypothesis

Hypothesis Set • Assume that hypothesis set Y is a convex subset of Rd • For example, the simplex of probability distributions • The corners of Y represent pure actions and the middle region a probability distribution over actions

Loss Function • Minimize a linear loss

Regret Vector • Keep the state of the learning algorithm • Vector that keeps information about actual losses and gradient of loss function • Define regret vector st by the recursion • Arbitrary vector u which satisfiesfor all • Example: if y is a probability, then u can be the vector of all ones.

Use of Regret Vector • Given any hypothesis y, we can use the regret vector to compute its regret:

Safe Set • Region of the regret space in which the regret is guaranteed to be nonpositive for all hypotheses • Goal of the Lagrangian Hedging algorithm is to keep its regret vector « near » the safe set

Safe Set (continued) Hypothesis set Y Safe Set S

Unnormalized Hypotheses • Consider the cone of unnormalized hypotheses: • The safe set is a cone that is polar to this cone of unnormalized hypotheses:

Lagrangian Hedging (Setting) • At each step, the algorithm chooses its play according to the current regret vector and a closed convex potential function F(s) • Define (sub)gradient of F(s) as f(s) • Potential function is what defines the problem to be solved • E.g. Hedge / Weighted Majority:

Lagrangian Hedging (Gradient)

Optimization Form • In practice, may be difficult to define, evaluate and differentiate an appropriate potential function • Optimization form: same pseudo-code as previously, but define F in terms of a simpler hedging functionW • Example corresponding to previous F1

Optimization Form (cont’d) • Then may obtain F as: • And the (sub)gradient as: • Which we may plug into the previous pseudo-code

Theoretical Results(In a nutshell: it all works)

One-Card Poker • Hypothesis space is the set of sequence weight vectors • information about when it is player i’s turn to move and the actions available at that time • Two players: gambler and dealer • Ante = $1 / given 1 card from 13-card deck • Gambler Bets / Dealer Bets / Gambler Bets • A player may fold • If neither folds: player with highest card wins pot

Why is it interesting? • Elements of more complicated games: • Incomplete information • Chance events • Multiple stages • Optimal play requires randomization and bluffing

Results in Self-Play

Results Against Fixed Opponent

No-Regret Algorithms for Online Convex Programs

No-Regret Algorithms for Online Convex Programs

Presentation Transcript

More approximation algorithms for stochastic programming programs

Convex Hull Algorithms for Dynamic Data

Online Algorithms

Online Algorithms

Online Algorithms

No Regret Algorithms in Games

EXPRESSING REGRET

God’s Regret

Convex Functions, Convex Sets and Quadratic Programs

Online Algorithms

Screening Mammography: Regret or no regret?

Graph Programs for Graph Algorithms

Online Algorithms

Designing Algorithms and Programs

online convex optimization (with partial information)

Regret

“No, I regret nothing, all I regret is having been born,

Adaptive Algorithms for Planar Convex Hull Problems

Online Algorithms for Network Design

Online Algorithms for Market Clearing

Online Algorithms

Response Regret