270 likes | 288 Views
Understand utility theory, decision-making using Bayesian networks, MEU principle, inference with Bayes' rule and rational preferences. Learn about encoding preferences over lotteries and the underlying principles guiding rational behavior.
E N D
Utilities and Decision Theory Lirong Xia Spring, 2017
Checking conditional independence from BN graph • Given random variables Z1,…Zp, we are asked whether X⊥Y|Z1,…Zp • dependent if there exists a path where all triples are active • independent if for each path, there exists an inactive triple
General method for variable elimination • Compute a marginal probability p(x1,…,xp) in a Bayesian network • Let Y1,…,Yk denote the remaining variables • Step 1: fix an order over the Y’s (wlog Y1>…>Yk) • Step 2: rewrite the summation as Σy1Σy2 …Σyk-1 Σykanything • Step 3: variable elimination from right to left sth only involving X’s sth only involving Y1, Y2,…,Yk-1 and X’s sth only involving Y1 and X’s sth only involving Y1, Y2 and X’s
Today • Utility theory • expected utility: preferences over lotteries • maximum expected utility (MEU) principle
Expectimax Search Trees • Expectimax search • Max nodes (we) as in minimax search • Chance nodes • Need to compute chance node values as expected utilities • Next class we will formalize the underlying problem as a Markov decision Process
ExpectimaxPseudocode • Def value(s): If s is a max node return maxValue(s) If s is a chance node return expValue(s) If s is a terminal node return evaluations(s) • Def maxValue(s): values = [value(s’) for s’ in successors(s)] return max(values) • Def expValue(s): values = [value(s’) for s’ in successors(s)] weights = [probability(s, s’) for s’ in successors(s)] return expectation(values, weights)
Maximum expected utility • Principle of maximum expected utility: • A rational agent should chose the action which maximizes its expected utility, given its (probabilistic) knowledge • Questions: • Where do utilities come from? • How do we know such utilities even exist? • What if our behavior can’t be described by utilities?
Inference with Bayes’ Rule • Example: diagnostic probability from causal probability: • Example: • F is fire, {f, ¬f} • A is the alarm, {a, ¬a} • Note: posterior probability of fire still very small • Note: you should still run when hearing an alarm! Why? p(f)=0.001 p(a)=0.1 p(a|f)=0.9
0.009 stay 0.991 100% run out
Utilities • Utilities are functions from outcomes (states of the world, sample space) to real numbers that represent an agent’s preferences • Where do utilities come from? • In a game, may be simple (+1/-1) • Utilities summarize the agent’s goals -10100 100 -100
Preferences over lotteries • An agent chooses among: • Prizes: A, B, etc. • Lotteries: situations with uncertain prizes • Notation: A p L 1-p B A is strictly preferred to B In difference between A and B A is strictly preferred to or indifferent with B
Utility theory in Economics • State of the world: money you earn • Money does not behave as a utility function, but we can talk about the utility of having money (or being in debt) • Which would you prefer? • A lottery ticket that pays out $10 with probability .5 and $0 otherwise, or • A lottery ticket that pays out $1 with probability 1 • How about: • A lottery ticket that pays out $100,000,000 with probability .5 and $0 otherwise, or • A lottery ticket that pays out $10,000,000 with probability 1 • Usually, people do not simply go by expected value
Which one you would prefer? • Lottery A: $1M@100% • Lottery B: $1M@89% + $5M@10% + 0@1% • How about • Lottery A: $1M@11%+0@89% • Lottery B: $0@90% + $5M@10%
Encoding preferences over lotteries • How many lotteries? • infinite! • Need to find a compact representation • Maximum expected utility (MEU) principle • which type of preferences (rankings over lotteries) can be represented by MEU?
Rational Preferences • We want some constraints on preferences before we call them rational • For example: an agent with intransitive preferences can be induced to give away all of its money • If B≻C, then an agent with C would pay (say) 1 cent to get B • If A≻B, then an agent with B would pay (say) 1 cent to get A • If C≻A, then an agent with A would pay (say) 1 cent to get C
Rational Preferences • Preference of a rational agent must obey constraints • The axioms of rationality: for all lotteries A, B, C Orderability Transitivity Continuity Substitutability Monotonicity • Theorem: rational preferences imply behavior describable as maximization of expected utility
MEU Principle • Theorem: • [Ramsey, 1931; von Neumann & Morgenstern, 1944] • Given any preference satisfying these axioms, there exists a real-value function U such that: • Maximum expected utility (MEU) principle: • Choose the action that maximizes expected utility • Utilities are just a representation! • an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities • Utilities are NOT money
What would you do? -10100 0.009 stay 0.991 100 100% -100 run out
Risk attitudes • An agent is risk-neutral if she only cares about the expected value of the lottery ticket • An agent is risk-averse if she always prefers the expected value of the lottery ticket to the lottery ticket • Most people are like this • An agent is risk-seeking if she always prefers the lottery ticket to the expected value of the lottery ticket
Decreasing marginal utility • Typically, at some point, having an extra dollar does not make people much happier (decreasing marginal utility) utility buy a nicer car (utility = 3) buy a car (utility = 2) buy a bike (utility = 1) money $200 $1500 $5000
Maximizing expected utility utility buy a nicer car (utility = 3) • Lottery 1: get $1500 with probability 1 • gives expected utility 2 • Lottery 2: get $5000 with probability .4, $200 otherwise • gives expected utility .4*3 + .6*1 = 1.8 • (expected amount of money = .4*$5000 + .6*$200 = $2120 > $1500) • So: maximizing expected utility is consistent with risk aversion buy a car (utility = 2) buy a bike (utility = 1) money $200 $1500 $5000
Different possible risk attitudes under expected utility maximization utility • Green has decreasing marginal utility → risk-averse • Blue has constant marginal utility → risk-neutral • Red has increasing marginal utility → risk-seeking • Grey’s marginal utility is sometimes increasing, sometimes decreasing → neither risk-averse (everywhere) nor risk-seeking (everywhere) money
Example: Insurance • Because people ascribe different utilities to different amounts of money, insurance agreements can increase both parties’ expected utility • You own a car. Your lottery: • LY = [0.8, $0; 0.2, -$200] • i.e., 20% chance of crashing • You do not want -$200! • Insurance is $50 • UY(LY) = 0.2*UY(-$200)=-200 • UY(-$50)=-150
Example: Insurance • Because people ascribe different utilities to different amounts of money, insurance agreements can increase both parties’ expected utility • You own a car. Your lottery: • LY = [0.8, $0; 0.2, -$200] • i.e., 20% chance of crashing • You do not want -$200! • UY(LY) = 0.2*UY(-$200)=-200 • UY(-$50)=-150 • Insurance company buys risk: • LI = [0.8, $50; 0.2, -$150] • i.e., $50 revenue + your LY • Insurer is risk-neutral: • U(L) = U(EMV(L)) • UI(LI) = U(0.8*50+0.2*(-150)) • = U($10) >U($0)
Acting optimally over time • Finite number of periods: • Overall utility = sum of rewards in individual periods • Infinite number of periods: • … are we just going to add up the rewards over infinitely many periods? • Always get infinity! • (Limit of) average payoff: limn→∞Σ1≤t≤nr(t)/n • Limit may not exist… • Discounted payoff: Σtϒtr(t) for some ϒ < 1 • Interpretations of discounting: • Interest rate r: ϒ= 1/(1+r) • World ends with some probability 1-ϒ • Discounting is mathematically convenient • We will see more in the next class