1 / 21

Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments

Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments. Pr(success 1 , …, success n ) = ??. Michael J. Neely University of Southern California http://www-rcf.usc.edu/~mjneely Information Theory and Applications Workshop (ITA), UCSD Feb. 2009.

talor
Download Presentation

Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments Pr(success1, …, successn) = ?? Michael J. Neely University of Southern California http://www-rcf.usc.edu/~mjneely Information Theory and Applications Workshop (ITA), UCSD Feb. 2009 *Sponsored in part by the DARPA IT-MANET Program, NSF OCE-0520324, NSF Career CCF-0747525

  2. 5 6 0 1 2 3 4 • Slotted System, slots t in {0, 1, 2, …} • Network Queues: Q(t) = (Q1(t), …, QL(t)) • 2-Stage Control Decision Every slot t: • 1) Stage 1 Decision: k(t) in {1, 2, …, K}. •  Reveals random vector w(t) (iid given k(t)) • w(t) has unknown distribution Fk(w). • 2) Stage 2 Decision:I(t) in I(a possibly infinite set). •  Affects queue rates: • A(k(t), w(t), I(t)) , m(k(t), w(t),I(t)) •  Incurs a “Penalty Vector” x(t): • x(t) = x(k(t), w(t), I(t))

  3. Stage 1: k(t) in {1, …, K}. Reveals random w(t). Stage 2: I(t) in I. Incurs Penalties x(k(t), w(t), I(t)). Also affects queue dynamics A(k(t), w(t), I(t)) , m(k(t), w(t),I(t)). Goal: Choose stage 1 and stage 2 decisions over time so that the time average penalties x solve: f(x), hn(x) general convex functions of multi-variables

  4. Motivating Example 1: Min Power Scheduling with Channel Measurement Costs S1(t) A1(t) Minimize Avg. Power Subject to Stability S2(t) A2(t) AL(t) SL(t) If channel states are known every slot:  Can Schedule without knowing channel statistics or arrival rates! (EECA --- Neely 2005, 2006) (Georgiadis, Neely, Tassiulas F&T 2006)

  5. Motivating Example 1: Min Power Scheduling with Channel Measurement Costs S1(t) A1(t) Minimize Avg. Power Subject to Stability S2(t) A2(t) AL(t) SL(t) If “cost” to measuring, we make a 2-stage decision: Stage 1: Measure or Not? (reveals channels w(t) ) Stage 2: Transmit over a known channel? a blind channel? -Li and Neely (07) -Gopalan, Caramanis, Shakkottai (07) Existing Solutions require a-priori knowledge of the full joint-channel state distribution! (2L , 1024L ? )

  6. Motivating Example 2: Diversity Backpressure Routing (DIVBAR) 3 2 error 1 [Neely, Urgaonkar 2006, 2008] broadcasting Networking with Lossy channels & Multi-Receiver Diversity: DIVBAR Stage 1: Choose Commodity and Transmit DIVBAR Stage 2: Get Success Feedback, Choose Next hop If there is a single commodity (no stage 1 decision), we do not need success probabilities! If two or more commodities, we need full joint success probability distribution over all neighbors!

  7. Stage 1: k(t) in {1, …, K}. Reveals random w(t). Stage 2: I(t) in I. Incurs Penalties x(k(t), w(t), I(t)). Also affects queue dynamics A(k(t), w(t), I(t)) , m(k(t), w(t),I(t)). Goal: Equivalent to: Where g(t) is an auxiliary vector that is a proxy for x(t).

  8. Stage 1: k(t) in {1, …, K}. Reveals random w(t). Stage 2: I(t) in I. Incurs Penalties x(k(t), w(t), I(t)). Also affects queue dynamics A(k(t), w(t), I(t)) , m(k(t), w(t),I(t)). Equivalent Goal: Technique: Form virtual queues for each constraint. U(t) b h(g(t)) Un(t+1) = max[Un(t) + hn(g(t)) – bn,0] Z(t) x(t) g(t) Zm(t+1) = Zm(t) – gm(t) + xm(t) Possibly negative

  9. Use Stochastic Lyapunov Optimization Technique: [Neely 2003], [Georgiadis, Neely, Tassiulas F&T 2006] Define: Q(t) = All Queues States = [Q(t), Z(t), U(t)] Define: L(Q(t)) = (1/2)[sum of squared queue sizes] Define: D(Q(t)) = E{L(Q(t+1)) – L(Q(t))|Q(t)} Schedule using the modified “Max-Weight” Rule: Every slot t, observe queue states and make a 2-stage decision to minimize the “drift plus penalty”: Minimize:D(Q(t)) + Vf(g(t)) Where V is a constant control parameter that affects Proximity to optimality (and a delay tradeoff).

  10. How to (try to) minimize: Minimize:D(Q(t)) + Vf(g(t)) The proxy variables g(t) appear separably, and their terms can be minimized without knowing system stochastics! Minimize: Subject to: [Zm(t) and Un(t) are known queue backlogs for slot t]

  11. Minimizing the Remaining Terms: Minimize:D(Q(t)) + Vf(g(t))

  12. Solution: Defineg(mw)(t), I(mw)(t) , k(mw)(t) as the ideal max-weight decisions (minimizing the drift expression). Define ek(t): Then: ? k(mw)(t) = argmin{k in {1,.., K}} ek(t) (Stage 1) I(mw)(t) = argmin{I in I} Yk(t)(w(t), I, Q) (Stage 2) g(mw)(t) = solution to the proxy problem

  13. Approximation Theorem: (related to Neely 2003, G-N-T F&T 2006) If actual decisions satisfy: With:  (related to slackness of constraints) Then: -All Constraints Satisfied. [B + C + c0V] min[emax – eQ, s – eZ] -Average Queue Sizes < -Penalty Satisfies: f( x ) < f*optimal + O(max[eQ,eZ]) + (B+C)/V

  14. It all hinges on our approximation of ek(t): Declare a “type k exploration event” independently with probability q>0 (small). We must use k(t) = k here. Approach 1: {w1(k)(t), …, wW(k)(t)} = samples over past W type kexplor. events

  15. It all hinges on our approximation of ek(t): Declare a “type k exploration event” independently with probability q>0 (small). We must use k(t) = k here. Approach 2: {w1(k)(t), …, wW(k)(t)} = samples over past W type kexplor. Events {Q1(k)(t), …, QW(k)(t)} = queue backlogs at these sample times.

  16. Analysis (Approach 2): Subtleties: “Inspection Paradox” issue requires use of samples at exploration events, so {w1(k)(t), …, wW(k)(t)} iid. 2) Even so, {w1(k)(t), …, wW(k)(t)} are correlated with queue backlogs at time t, and so we cannot directly apply the Law of Large Numbers!

  17. Analysis (Approach 2): w1(t) w2(t) w3(t) wW(t) tstart t Use a “Delayed Queue” Analysis: Can Apply LLN constant constant

  18. Max-Weight Learning Algorithm (Approach 2): (No knowledge of probability distributions is required!) -Have Random Exploration Events (prob. q). -Choose Stage-1 decision k(t) = argmin{k in {1,.., K}}[ ek(t) ] -Use I(mw)(t) for Stage-2 decision: I(mw)(t) = argmin{I in I} Yk(t)(w(t), I, Q(t)) -Use g(mw)(t) for proxy variables. -Update the virtual queues and the moving averages.

  19. Theorem (Fixed W, V): With window size W we have: -All Constraints Satisfied. [B + C + c0V] min[emax – eQ, s – eZ] -Average Queue Sizes < -Penalty Satisfies: f( x ) < f*q + O(1/sqrt{W}) + (B+C)/V

  20. Concluding Theorem (Variable W, V): Let 0 < b1 < b2 < 1. Define V(t) = (t + 1) b1 , W(t) = (t+1)b2 Then under the Max-Weight Learning Algorithm: -All Constraints are Satisfied. -All Queues are mean rate stable*: -Average Penalty gets exact optimality (subject to random exploration events): *Mean rate stability does not imply finite average congestion and delay. In fact, Average congestion and delay are necessarily infinite when exact optimality is reached. f( x ) = f*q

More Related