280 likes | 458 Views
Utilities and decision theory. Lirong Xia. Friday, Feb 28, 2014. Reminder. Midterm Mar 7 in-class open book and lecture notes simple calculators are allowed cannot use smartphone/laptops/ wifi practice exams and solutions (check piazza) Project 2 deadline Mar 18 midnight.
E N D
Utilities and decision theory Lirong Xia Friday, Feb 28, 2014
Reminder • Midterm Mar 7 • in-class • open book and lecture notes • simple calculators are allowed • cannot use smartphone/laptops/wifi • practice exams and solutions (check piazza) • Project 2 deadline Mar 18 midnight
Checking conditional independence from BN graph • Given random variables Z1,…Zp, we are asked whether X⊥Y|Z1,…Zp • dependent if there exists a path where all triples are active • independent if for each path, there exists an inactive triple
General method for variable elimination • Compute a marginal probability p(x1,…,xp) in a Bayesian network • Let Y1,…,Yk denote the remaining variables • Step 1: fix an order over the Y’s (wlog Y1>…>Yk) • Step 2: rewrite the summation as Σy1Σy2 …Σyk-1 Σykanything • Step 3: variable elimination from right to left sth only involving X’s sth only involving Y1, Y2,…,Yk-1 and X’s sth only involving Y1 and X’s sth only involving Y1, Y2 and X’s
Today • Utility theory • expected utility: preferences over lotteries • maximum expected utility (MEU) principle
Expectimax Search Trees • Expectimax search • Max nodes (we) as in minimaxsearch • Chance nodes • Need to compute chance node values as expected utilities • Next class we will formalize the underlying problem as a Markov decision Process
ExpectimaxPseudocode • Def value(s): If s is a max node return maxValue(s) If s is a chance node return expValue(s) If s is a terminal node return evaluations(s) • Def maxValue(s): values = [value(s’) for s’ in successors(s)] return max(values) • Def expValue(s): values = [value(s’) for s’ in successors(s)] weights = [probability(s, s’) for s’ in successors(s)] return expectation(values, weights)
Maximum expected utility • Why should we average utilities? Why not minimax? • Principle of maximum expected utility: • A rational agent should chose the action which maximizes its expected utility, given its (probabilistic) knowledge • Questions: • Where do utilities come from? • How do we know such utilities even exist? • Why are we taking expectations of utilities (not, e.g. minimax)? • What if our behavior can’t be described by utilities?
Inference with Bayes’ Rule • Example: diagnostic probability from causal probability: • Example: • F is fire, {f, ¬f} • A is the alarm, {a, ¬a} • Note: posterior probability of fire still very small • Note: you should still run when hearing an alarm! Why? p(f)=0.01 p(a)=0.1 p(a|f)=0.9
0.009 stay 0.991 100% run out
Utilities • Utilities are functions from outcomes (states of the world, sample space) to real numbers that represent an agent’s preferences • Where do utilities come from? • In a game, may be simple (+1/-1) • Utilities summarize the agent’s goals -10100 100 -100
Preferences over lotteries • An agent chooses among: • Prizes: A, B, etc. • Lotteries: situations with uncertain prizes • Notation: A p L 1-p B A is strictly preferred to B In difference between A and B A is strictly preferred to or indifferent with B
Encoding preferences over lotteries • How many lotteries? • infinite! • Need to find a compact representation • Maximum expected utility (MEU) principle • which type of preferences (rankings over lotteries) can be represented by MEU?
Rational Preferences • We want some constraints on preferences before we call them rational • For example: an agent with intransitive preferences can be induced to give away all of its money • If B>C, then an agent with C would pay (say) 1 cent to get B • If A>B, then an agent with B would pay (say) 1 cent to get A • If C>A, then an agent with A would pay (say) 1 cent to get C
Rational Preferences • Preference of a rational agent must obey constraints • The axioms of rationality: for all lotteries A, B, C Orderability Transitivity Continuity Substitutability Monotonicity • Theorem: rational preferences imply behavior describable as maximization of expected utility
MEU Principle • Theorem: • [Ramsey, 1931; von Neumann & Morgenstern, 1944] • Given any preference satisfying these axioms, there exists a real-value function U such that: • Maximum expected utility (MEU) principle: • Choose the action that maximizes expected utility • Utilities are just a representation! • an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities • Utilities are NOT money
What would you do? -10100 0.009 stay 0.991 100 100% -100 run out
Utility theory in Economics • State of the world: money you earn • Money does not behave as a utility function, but we can talk about the utility of having money (or being in debt) • Which would you prefer? • A lottery ticket that pays out $10 with probability .5 and $0 otherwise, or • A lottery ticket that pays out $3 with probability 1 • How about: • A lottery ticket that pays out $100,000,000 with probability .5 and $0 otherwise, or • A lottery ticket that pays out $30,000,000 with probability 1 • Usually, people do not simply go by expected value
Risk attitudes • An agent is risk-neutral if she only cares about the expected value of the lottery ticket • An agent is risk-averse if she always prefers the expected value of the lottery ticket to the lottery ticket • Most people are like this • An agent is risk-seeking if she always prefers the lottery ticket to the expected value of the lottery ticket
Decreasing marginal utility • Typically, at some point, having an extra dollar does not make people much happier (decreasing marginal utility) utility buy a nicer car (utility = 3) buy a car (utility = 2) buy a bike (utility = 1) money $200 $1500 $5000
Maximizing expected utility utility buy a nicer car (utility = 3) • Lottery 1: get $1500 with probability 1 • gives expected utility 2 • Lottery 2: get $5000 with probability .4, $200 otherwise • gives expected utility .4*3 + .6*1 = 1.8 • (expected amount of money = .4*$5000 + .6*$200 = $2120 > $1500) • So: maximizing expected utility is consistent with risk aversion buy a car (utility = 2) buy a bike (utility = 1) money $200 $1500 $5000
Different possible risk attitudes under expected utility maximization utility • Green has decreasing marginal utility → risk-averse • Blue has constant marginal utility → risk-neutral • Red has increasing marginal utility → risk-seeking • Grey’s marginal utility is sometimes increasing, sometimes decreasing → neither risk-averse (everywhere) nor risk-seeking (everywhere) money
Example: Insurance • Consider the lottery [0.5, $1000; 0.5, $0] • What is its expected monetary value (EMV)? ($500) • What is its certainty equivalent? • Monetary value acceptable in lieu of lottery • $400 for most people • Difference of $100 is the insurance premium • There is an insurance industry because people will pay to reduce their risk • If everyone were risk-neutral, no insurance needed!
Example: Insurance • Because people ascribe different utilities to different amounts of money, insurance agreements can increase both parties’ expected utility • You own a car. Your lottery: • LY = [0.8, $0; 0.2, -$200] • i.e., 20% chance of crashing • You do not want -$200! • Insurance is $50 • UY(LY) = 0.2*UY(-$200)=-200 • UY(-$50)=-150
Example: Insurance • Because people ascribe different utilities to different amounts of money, insurance agreements can increase both parties’ expected utility • You own a car. Your lottery: • LY = [0.8, $0; 0.2, -$200] • i.e., 20% chance of crashing • You do not want -$200! • UY(LY) = 0.2*UY(-$200)=-200 • UY(-$50)=-150 • Insurance company buys risk: • LI = [0.8, $50; 0.2, -$150] • i.e., $50 revenue + your LY • Insurer is risk-neutral: • U(L) = U(EMV(L)) • UI(LI) = U(0.8*50+0.2*(-150)) • = U($10) >U($0)
Acting optimally over time • Finite number of periods: • Overall utility = sum of rewards in individual periods • Infinite number of periods: • … are we just going to add up the rewards over infinitely many periods? • Always get infinity! • (Limit of) average payoff: limn→∞Σ1≤t≤nr(t)/n • Limit may not exist… • Discounted payoff: Σtϒtr(t) for some ϒ< 1 • Interpretations of discounting: • Interest rate r: ϒ= 1/(1+r) • World ends with some probability 1-ϒ • Discounting is mathematically convenient • We will see more in the next class