1 / 15

UAV Route Planning in Delay Tolerant Networks

UAV Route Planning in Delay Tolerant Networks. Daniel Henkel , Timothy X Brown University of Colorado, Boulder Infotech @ Aerospace ‘07 May 8, 2007. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A. Familiar: Dial-A-Ride.

said
Download Presentation

UAV Route Planning in Delay Tolerant Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UAV Route Planning in Delay Tolerant Networks Daniel Henkel, Timothy X Brown University of Colorado, Boulder Infotech @ Aerospace ‘07 May 8, 2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

  2. Familiar: Dial-A-Ride Dial-A-Ride:curb-to-curb, shared ride transportation service • Receive calls • Pick up and drop off passengers • Minimize overall transit time The Bus Optimal route not trivial !

  3. In context: Dial-A-UAV Complication: infinite data at sensors; potentially two-way traffic Delay tolerant traffic! Talk tomorrow – 8am: Sensor Data Collection Sensor-1 Sensor-3 Sensor-5 Monitoring Station Sensor-2 Sensor-6 Sensor-4 • Sparsely distributed sensors, limited radios • TSP solution not optimal • Our approach: Queueing and MDP theory

  4. TSP’s Problem Traveling Salesman Solution • One cycle visits every node • Problem: far-away nodes with little data to send • Visit them less often A B UAV hub pA pB dA dB fA fB B New: cycle defined by visit frequenciespi B

  5. Queueing Approach Goal Minimize average delay Idea: express delay in terms of pi, then minimize over set {pi} • pi as probability distribution • Expected service time of any packet • Inter-service time: exponential distribution with mean Ti/pi • Weighted delay: A B UAV fB fA pA pB dB dA pC C hub pD dC dD D fC fD

  6. Solution and Algorithm Probability of choosing node i for next visit: Implementation: deterministic algorithm 1. Set ci= 0 2. ci = ci + pi while max{ci} < 1 3. k = argmax {ci} 4. Visit node k; ck = ck-1 5. Go to 2. Performance improvement over TSP!

  7. Unknown Environment • What is RL? • Learning what to do without prior training • Given: high-level goal; NOT: how to reach it • Improving actions on the go • Distinguishing Features: • Interaction with environment • Trial & Error Search • Concept of Rewards & Punishments • Example: training dog Learns model of environment.

  8. The Framework Agent • Performs Actions Environment • Gives rise to Rewards • Puts Agent in situations called States

  9. Elements of RL Policy Reward Value Model ofEnvironment • Policy: what to do (depending on state) • Reward: what is good • Value: what is good because it predicts reward • Model: what follows what Source: Sutton, Barto, Reinforcement Learning – An Introduction, MIT Press, 1998

  10. UA Path Planning - Simple Goal Minimize average delay -> Find pA and pB • Service traffic from A and B to hub H • Goal: minimize average packet delay • State: traffic waiting at nodes: (tA, tB) • Actions: fly to A; fly to B • Reward: # packets delivered • Optimal policy: # visits to A and B; depend on flow rates, distances A B UAV hub pA pB dA dB fA fB

  11. MDP • If a reinforcement learning task has the Markov Property, it is basically a Markov Decision Process (MDP). • If state and action sets are finite, it is a finite MDP. • To define a finite MDP, you need to give: • state and action sets • one-step “dynamics” defined by transition probabilities: • reward expectation:

  12. RL approach to solving MDPs • Policy: Mapping from set of States to set of Actions π : S → A • Sum of Rewards (:=return): from this time onwards • Value function (of a state): Expected return when starting with s and following policy π. For an MDP,

  13. Bellman Equation for Policy π • Evaluating E{.}; assuming deterministic policy; π solution: • Action-Value Function: Value of taking action a in state s. For an MDP,

  14. Optimality • V and Q, both have a partial ordering on them since they are real valued. π also ordered: • Concept of V* and Q*: • Concept of π*: The policy π which maximizes Qπ(s,a) for all states s.

  15. Reinforcement Learning - Methods • To find π*, all methods try to evaluate V/Q value functions • Different Approaches: • Dynamic Programming Approach • Policy evaluation, improvement, iteration • Monte-Carlo Methods • Decisions are taken based on averaging sample returns • Temporal Difference Methods (!!)

More Related