100 likes | 254 Views
Probability. CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax. Hungry Monkey: 2-Ply Game Tree. shake. jump. 1/6. 5/6. 1/3. 2/3. shake. shake. jump. shake. jump. shake. jump. jump. 2/3. 5/6. 1/3. 5/6. 1/6. 1/3. 5/6. 1/6. 1/6. 2/3. 1/3. 2/3. 1/3. 1/3. 2/3. 2/3. 0. 0.
E N D
Probability CSE 473 – Autumn 2003 Henry Kautz
Hungry Monkey: 2-Ply Game Tree shake jump 1/6 5/6 1/3 2/3 shake shake jump shake jump shake jump jump 2/3 5/6 1/3 5/6 1/6 1/3 5/6 1/6 1/6 2/3 1/3 2/3 1/3 1/3 2/3 2/3 0 0 0 0 1 0 0 1 1 1 0 1 2 0 0 1
ExpectiMax 1 – Chance Nodes shake jump 1/6 5/6 1/3 2/3 shake shake jump shake jump shake jump jump 2/3 7/6 1 0 0 0 1/6 1/6 2/3 5/6 1/3 5/6 1/6 1/3 5/6 1/6 1/6 2/3 1/3 2/3 1/3 1/3 2/3 2/3 0 0 0 0 1 0 0 1 1 1 0 1 2 0 0 1
ExpectiMax 2 – Max Nodes shake jump 1/6 5/6 1/3 2/3 1/6 2/3 7/6 1/6 shake shake jump shake jump shake jump jump 2/3 7/6 1 0 0 0 1/6 1/6 2/3 5/6 1/3 5/6 1/6 1/3 5/6 1/6 1/6 2/3 1/3 2/3 1/3 1/3 2/3 2/3 0 0 0 0 1 0 0 1 1 1 0 1 2 0 0 1
ExpectiMax 3 – Chance Nodes shake jump 1/2 1/3 1/6 5/6 1/3 2/3 1/6 2/3 7/6 1/6 shake shake jump shake jump shake jump jump 2/3 7/6 1 0 0 0 1/6 1/6 2/3 5/6 1/3 5/6 1/6 1/3 5/6 1/6 1/6 2/3 1/3 2/3 1/3 1/3 2/3 2/3 0 0 0 0 1 0 0 1 1 1 0 1 2 0 0 1
ExpectiMax 4 – Max Node 1/2 shake jump 1/2 1/3 1/6 5/6 1/3 2/3 1/6 2/3 7/6 1/6 shake shake jump shake jump shake jump jump 2/3 7/6 1 0 0 0 1/6 1/6 2/3 5/6 1/3 5/6 1/6 1/3 5/6 1/6 1/6 2/3 1/3 2/3 1/3 1/3 2/3 2/3 0 0 0 0 1 0 0 1 1 1 0 1 2 0 0 1
Policies • The result of the ExpectiMax analysis is a conditional plan (also called a policy): • Optimal plan for 2 steps: jump; shake • Optimal plan for 3 steps:jump; if (ontable) {shake; shake} else {jump; shake} • Probabilistic planning can be generalized in many ways, including: • Action costs • Hidden state • The general problem is that of solving a Markov Decision Process (MDP)
Backgammon • Branching factor: • Chance node: 21 • Max node: about 20 on average • Size of tree: O(ckmk) • In practice: can search 3 plies • Neurogammon & TD-Gammon (Tesauro 1995) • Learned weights on static evaluation function by playing against itself • Use results of games to optimize weights: • “Punish” features that were on in losing games • “Reward” features that were on in winning games • A kind of reinforcement learning • Became world’s best backgammon player!