90 likes | 172 Views
Section 8: Markov Decision Process. The Clinic’s Problem. You are a doctor and run a clinic which opens 2 hours a day Each hour you will have one patient: 80% of the time (s)he has flu, 20% of the time (s)he has Ebola You can either send them home or to the hospital
E N D
The Clinic’s Problem • You are a doctor and run a clinic which opens 2 hours a day • Each hour you will have one patient: 80% of the time (s)he has flu, 20% of the time (s)he has Ebola • You can either send them home or to the hospital • It’s better when patients survive at home than if they survive at the hospital. But we definitely don’t want them to die! • A patient with the flu has 90% chance to survive at home and will survive at the hospital for sure • A patient with Ebola has 50% chance to survive at home and will survive at the hospital for sure.
Home or Hospital? • For each patient who visits your clinic in a day, should you send him/her home or to the hospital?
Modeling in MDP • What are the states? - S = {s1, s2,…,sn} • What are the actions? - A = {a1, a2, …, am} • What is the transition model? - T: SAS • What is the reward model? - R: SA
Model (1) • States: a four-element tuple [N, C, PA, PS] - N: number of patients visited so far - C: condition of the current patient - PA: the action on the previous patient - PS: whether the previous patient survives • Actions: - send home, send to hospital, null action • Reward Model: - If s(PA)=home and s(PS)=1, R(s, a)=2 - If s(PA)=home and s(PS)=0, R(s, a)=0 - If s(PA)=hospital and s(PS)=1, R(s, a)=1
Model (2) [0, F, n/a, n/a] [0, E, n/a, n/a] home [1, F, home, 1] [1, E, home, 1] … [1, F, home, 0] [1, E, home, 0] … [2, n/a, hospital, 1] [2, n/a, home, 1] [2, n/a, hospital , 0][2, n/a, home, 0] Reward=0! 0.8*0.9=0.72 0.2*0.9=0.18 0.8*0.1=0.08 0.2*0.1=0.02 hospital home Reward=2! Reward=2! 0.9 1 null 0.1 0 Reward=1! Finish
Eat ice-cream or take vitamins??? • States: Healthy (H) or Sick (S) • Actions: Ice-cream (IC) or Vitamins (V) • Transitions: • Ice cream Vitamins • Rewards:
Example (=0.9) k Qk(H,IC) Qk(H,V) Qk(S,IC) Qk(S,V) *k(H) *k(S) Vk(H) Vk(S) 0 0 0 1 10 0 5 0 IC IC 10 5 2 9 9.5 8.55 IC IC 17.65 9.5 17.65 3 15.88 13.55 15.15 IC V 23.68 15.15 23.68