320 likes | 470 Views
Learning Approach to Link Adaptation in WirelessMAN. Avinash Prasad Supervisor: Prof. Saran. Outline of Presentation . Introduction Problem Definition Proposed Solution & Learning Automaton. Requirements About Implementation Results Conclusions References.
E N D
Learning Approach to Link Adaptation in WirelessMAN Avinash Prasad Supervisor: Prof. Saran
Outline of Presentation • Introduction • Problem Definition • Proposed Solution & Learning Automaton. • Requirements • About Implementation • Results • Conclusions • References
Introduction ( Link Adaptation ) Definition Link adaptation refers to a set of techniques where modulation, coding rate and/or other signal transmission parameters are changed on the fly to better adjust to the changing channel conditions.
Introduction ( WirelessMAN ) • WirelessMAN requires high data rates over • channel conditions varying across different links. • channel conditions vary over time • Link Adaptation on per link basis is the most fundamental step that this BWA system uses to respond to these link to link variations, and variations over time. There is an elaborate message passing mechanism for exchanging channel information at the MAC layer.
Problem Definition • Link adaptation requires us to know which channel condition changes summon for a change in transmission parameters. • Most commonly identified problem of link adaptation • How do we calculate the threshold values for various channel estimation parameters, which shall signal a need for change in channel transmission parameters.
Problem Definition ( Current Approaches) • Current methods for threshold estimation • Model Based • Requires analytical modeling. • How reliable is the model ? • Availability of appropriate model for the wireless scenario ? • Statistical methods. • Hard to obtain one channel conditions. • Doesn’t change with time, fixed. • Even a change in season may effect the best appropriate values. • Heuristics based • Scope limited to very few scenarios.
Proposed Solution ( Aim ) • Come up with a machine learning based method such that • Learn the optimal threshold values as we operate over the network. • No analytical modeling needed by the method in its operation. • Should be able to handle noisy feedback from the environment. • Generic enough to be able to learn different parameters without much changes to the core.
Proposed Solution ( Idea ) • Use stochastic learning automaton. • Informally – Essentially simulates animal learning ; Repeatedly make your decisions based on your current knowledge, and then refine your decision as per the response from the environment. • Mathematically – Modifies the probability of selecting any action based on how much reward do we get from the environment.
Proposed Solution ( Exp. Setup ) • Experimental setup used to study the stochastic learning methods. • We learn the optimal SNR threshold values for switching among coding profiles, such that the throughput is maximized. • Threshold Ti decides when to switch from profile ito profile i+1. • The possible values for Ti have been restricted to a limited set of values, to facilitate faster learning, by diminishing the number of options.
Proposed Solution ( Exp. Setup ) ( Mathematical Formulation ) • For N different profiles in use, the need to (N-1) thresholds to be determined/learned. • At any instance these (N-1) thresholds Ti , i€ {1,..,N-1}, form the input to the environment. • In return the environment returns the reward β( SNR estimate,< T1,..,TN>)=(1- SER)* ( K / Kmax) • K represents the information block size fed to the RS encoder in the selected profile. • Kmax is the maximum possible value of K for any profile. This makes the reward value lie in the range [0,1] • Clearly β is a measure of normalized throughput.
Proposed Solution (Learning Automaton) ( Formal Definition ) A learning automaton is completely given by (A, B, LA, Pk) • Action set A (α1, α2,…, αr), we shall always assume this set to be finite in our discussion. • Set of rewards B = [0,1] • The learning algorithm LA • State information Pk = [ p1k , p2k ,.., prk]
Proposed Solution (Learning Automaton) ( Why? ) Advantages: • Complete generality of action set • We can have a entire set of automaton , each working on a different variable of a multivariable problem, and yet they arrive at a Nash Equilibrium, such that the overall function is maximized, much faster than a single automaton. • It can handle noisy reward values from the environment • Perform long time averaging as it learns • But thus needs the environment to be stationary
Proposed Solution (Learning Automaton) ( Solution to Exp. setup ) • Each threshold is learnt by an independent automaton in the group, game , of automaton that solves the problem. • We choose the smallest possible action set depending that covers all possible variations in channel conditions in the setup, for each of the automaton .i.e. decide the possible range of threshold values. • We decide on the learning algorithm to use.
Proposed Solution (Learning Automaton) ( Solution to Exp. setup ) • For k being the instance of playoff. We do the following • Each automaton selects an action (Threshold) based on its state Pk, the probability vector. • Based on these threshold values, we select a profile for channel transmission. • Get feedback from the channel in the form of the value of normalized throughput defined earlier. • Use the learning algorithm to calculate the new state, set of probabilities Pk+1 .
Proposed Solution (Learning Automaton) ( Learning Algorithms ) • We have explored two different algorithms • LRI , Linear reward inaction. • Very much Markovian , just update the Pk+1based on the last action/reward pair • for α(k)=αi pi(k+1)= pi(k)+ ∆*β(k)*(1- pi(k)) otherwise pi(k+1)= pi(k ) - ∆*β(k)* pi(k ) • ∆ is a rate constant. • Pursuit Algorithm • Uses the entire history of selection and reward to calculate the average reward estimates for all actions. • Aggressively tries to move towards the simplex solution, which has probability 1 for action with highest reward estimate, say action αM . • P(k+1)= P(k) + ∆*( eM(k) – P(k))
Proposed Solution (Learning Automaton) ( Learning Algorithms cont.) • Both differ in the speed of convergence to the optimal solution • The amount of storage required for each. • How much decentralized the learning setup, game, can be • The way they approach their convergence point • Being a greedy method pursuit algorithm shows lots of deviation in the evolution phase.
Requirements • 802.16 OFDM Physical layer • Channel model (SUI model used) • Learning Setup
About Implementation ( 802.16 OFDM Physical Layer) • Implements OFDM physical layer from 802.16d. • Coded in MatlabTM • Complies fully to the standard, operations tested with the example for pipeline given in the standard. • No antenna diversity used, and perfect channel impulse response estimation assumed.
About Implementation ( Channel Model ) • We have implemented the complete set of SUI models for omni antenna case. • The complete channel model consists of one of the SUI models plus AWGN model for noise. • Coded in MatlabTM , thus completing the entire channel + coding pipeline. • Results from this data transmission pipeline shall be presented later.
About Implementation ( Learning Setup ) • We implemented both the algorithms for comparison. • Coded in C/C++. • A network model was constructed using the Symbol error rate plots obtained form PHY layer simulations to estimate the reward values.
Results ( PHY layer)( BER plots for different Profiles at SUI2)
Results ( PHY layer)(SER plots for different Profiles at SUI2)
Results ( PHY layer)( Reward Metric for learning automaton )
Results ( Learning )( Convergence curve; LRI (rate=0.0015)
Results ( Learning )(Convergence curve; Pursuit (rate=0.0017)
Results ( Learning )( Convergence curve; LRI (rate=0.0032)
Results( Learning : 4 actions per Thresh )( Convergence curve; LRI (rate=0.0015)
Conclusions • Our plots suggest the following • Learning methods are indeed capable of arriving at the optimal values for parameters in the type of channel conditions faced in WirelessMAN. • The rate of convergence depends on • rate factor(∆) • size of the action set • How much do the actions differ in terms of the reward that they get from the environment. • The learning algorithm • Although we have worked with a relatively simple setup with assumption that SNRestimated is perfect and available complete generality of the action set ensures that we can work with other channel estimation parameters as well.
References • V. Erceg and K. V. Hari, Channel models for fixed wireless applications. IEEE 802.16 broadband wireless access working group.2001 • Daniel S. Baum, Simulating the SUI models. IEEE 802.16 broadband wireless access working group,2000 • M. A. L. Thathachar and P.S. Shastry. Network of learning automata techniques for online stochastic optimization, Kluwer Academic Publication,2003
Thanks Thanks