290 likes | 316 Views
Explore the application of reinforcement learning in network routing, covering supervised, unsupervised, and reinforcement learning methods, including key concepts like Markov Decision Processes, dynamic programming, Monte Carlo methods, and Q-routing algorithms.
E N D
Application of Reinforcement Learning in Network Routing By Chaopin Zhu
Machine Learning • Supervised Learning • Unsupervised Learning • Reinforcement Learning
Supervised Learning • Feature: Learning with a teacher • Phases • Training phase • Testing phase • Application • Pattern recognition • Function approximation
Unsupervised Leaning • Feature • Learning without a teacher • Application • Feature extraction • Other preprocessing
Reinforcement Learning • Feature: Learning with a critic • Application • Optimization • Function approximation
Elements ofReinforcement Learning • Agent • Environment • Policy • Reward function • Value function • Model of environment (optional)
Markov Decision Process (MDP) Definition: A reinforcement learning task that satisfies the Markov property Transition probabilities
Markov Decision Process (cont.) • Parameters Value functions
Elementary Methods forReinforcement Learning Problem • Dynamic programming • Monte Carlo Methods • Temporal-Difference Learning
Dynamic Programming Methods • Policy evaluation • Policy improvement
Dynamic Programming (cont.) E ---- policy evaluation I ---- policy improvement • Policy Iteration • Value Iteration
Monte Carlo Methods • Feature • Learning from experience • Do not need complete transition probabilities • Idea • Partition experience into episodes • Average sample return • Update at episode-by-episode base
Temporal-Difference Learning • Features (Combination of Monte Carlo and DP ideas) • Learn from experience (Monte Carlo) • Update estimates based in part on other learned estimates (DP) • TD() algorithm seemlessly integrates TD and Monte Carlo Methods
TD(0) Learning Initialize V(x) arbitrarily • to the policy to be evaluated Repeat (for each episode): Initialize x Repeat (for each step of episode) aaction given by for x Take action a; observe reward r and next state x’ xx’ until x is terminal
Q-Learning Initialize Q(x,a) arbitrarily Repeat (for each episode) Initialize x Repeat (for each step of episode): Choose a from x using policy derived from Q Take action a, observe r, x’ xx’ until x is terminal
Q-Routing Qx(y,d)----estimated time that a packet would take to reach the destination node d from current node x via x’s neighbor node y Ty(d) ------y’s estimate for the time remaining in the trip qy ---------queuing time in node y Txy --------transmission time between x and y
Algorithm of Q-Routing • Set initial Q-values for each node • Get the first packet from the packet queue of node x • Choose the best neighbor node and forward the packet to node by • Get the estimated value from node • Update • Go to 2.
Initialization/ Termination Procedures • Initilization • Initialize and / or register global variable • Initialize routing table • Termination • Destroy routing table • Release memory
Arrival Procedure • Data packet arrival • Update routing table • Route it with control information or destroy the packet if it reaches the destination • Control information packet arrival • Update routing table • Destroy the packet
Departure Procedure • Set all fields of the packet • Get a shortest route • Send the packet according to the route
References [1] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning—An Introduction [2] Chengan Guo, Applications of Reinforcement Learning in Sequence Detection and Network Routing [3] Simon Haykin, Neural Networks– A Comprehensive Foundation