190 likes | 439 Views
POMDP. Simple Example. Andrey Andreyevich Markov. POMDP – Instance general description. Problem description: A robot is situated in the following course: S={ S 1 , S 2 , S 3 } However, it cannot tell in which state it starts. Instance parameters – The actions set. The actions set:
E N D
POMDP Simple Example AndreyAndreyevich Markov
POMDP – Instance general description Problem description: A robot is situated in the following course: S={ S1, S2, S3 } However, it cannot tell in which state it starts.
Instance parameters – The actions set The actions set: Move clockwise => am Stay put => as Activate colour sensor => ac A={ am, as, ac }
Instance parameters – The score function The score function: R(s, a) The (immediate) reward for doing aA at state S. R( S2, as )= R( S3, as )=-1 R( S1, as )=+1 R( S1, ac )=R( S2, ac )=R( S3, ac )=-0.1 R( S1, am )=R( S2, am )=R( S3, am )=0 * We will use γ=0.5
Instance parameters – The traverse function The transition function: Tr(s, a, s’) The probability to reach s’ from s using a. Tr(S1, am, S2)=0.9 Tr(S1, am, S1)=0.1 Tr(S1, am, S3)=0 Tr(S2, am, S3)=0.9 Tr(S2, am, S2)=0.1 Tr(S2, am, S1)=0 Tr(S3, am, S1)=0.9 Tr(S3, am, S3)=0.1 Tr(S3, am, S2)=0 Tr(S1, as, S1)=1 Tr(S1, as, S2)=Tr(S1, as, S3)=0 Tr(S2, as, S2)=1 Tr(S2, as, S3)=Tr(S2, as, S1)=0 Tr(S3, as, S3)=1 Tr(S3, as, S1)=Tr(S3, as, S2)=0 Tr(S1, ac, S1)=1 Tr(S1, ac, S2)=Tr(S1, ac, S3)=0 Tr(S2, ac, S2)=1 Tr(S2, ac, S3)=Tr(S2, ac, S1)=0 Tr(S3, ac, S3)=1 Tr(S3, ac, S1)=Tr(S3, ac, S2)=0
Instance parameters – The observation set & function The observations set: Black observation => ob Red observation => or Ω={ ob, or } The observation function: O(s, a, o) The probability to observe oΩ after doing a and reaching s O(S1, am, ob)=O(S1, am, or)=0.5 O(S2, am, ob)=O(S2, am, or)=0.5 O(S3, am, ob)=O(S3, am, or)=0.5 O(S1, as, ob)=O(S1, as, or)=0.5 O(S2, as, ob)=O(S2, as, or)=0.5 O(S3, as, ob)=O(S3, as, or)=0.5 O(S1, ac, or)=0.8 O(S1, ac, ob)=0.2 O(S2, ac, ob)=0.8 O(S2, ac, or)=0.2 O(S3, ac, ob)=0.8 O(S3, ac, or)=0.2
Calculations – notes Notation - the belief states probabilities: We will use the following notations: P(s1)=p P(s2)=q P(s3)=1-p-q b={P(s1)=p, P(s2)=q } (short for (p,q,1-p-q). Notation - the next belief state: We will use the following notation: for the resulting belief state (i.e. the next belief state) after applying action a at the belief state b and observing o. Start state: We start at the belief state: bi={P(s1)=⅓, P(s2)=⅓ }.
1 step policy schema: Concrete policies: The belief state value function: The initial belief state value:
2 steps policy – 1st round schema: Concrete policy (1 out of 27): The belief state value function: The initial belief state value:
Calculations – The new belief state & observations probabilities We will calculate: a) b)
2 steps policy – 1st round schema: Concrete policy (1 out of 27): The belief state value function: The initial belief state value: