Bayesian Decision Theory – Continuous Features

Bayesian Decision Theory– Continuous Features Team teaching

Introduction • The sea bass/salmon example • State of nature, prior • State of nature is a random variable • The catch of salmon and sea bass is equiprobable • P(1) = P(2) (uniform priors) • P(1) + P( 2) = 1 (exclusivity and exhaustivity) Pattern Classification, Chapter 2 (Part 1)

Decision rule with only the prior information • Decide 1 if P(1) > P(2) otherwise decide 2 • Use of the class –conditional information • P(x | 1) and P(x | 2) describe the difference in lightness between populations of sea and salmon Pattern Classification, Chapter 2 (Part 1)

Pattern Classification, Chapter 2 (Part 1)

Posterior, likelihood, evidence • P(j | x) = P(x | j) . P (j) / P(x) • Where in case of two categories • Posterior = (Likelihood. Prior) / Evidence Pattern Classification, Chapter 2 (Part 1)

Pattern Classification, Chapter 2 (Part 1)

Decision given the posterior probabilities X is an observation for which: if P(1 | x) > P(2 | x) True state of nature = 1 if P(1 | x) < P(2 | x) True state of nature = 2 Therefore: whenever we observe a particular x, the probability of error is : P(error | x) = P(1 | x) if we decide 2 P(error | x) = P(2 | x) if we decide 1 Pattern Classification, Chapter 2 (Part 1)

Bayesian Decision Theory – Continuous Features • Generalization of the preceding ideas • Use of more than one feature • Use more than two states of nature • Allowing actions and not only decide on the state of nature • Introduce a loss of function which is more general than the probability of error Pattern Classification, Chapter 2 (Part 1)

Allowing actions other than classification primarily allows the possibility of rejection • Refusing to make a decision in close or bad cases! • The loss function states how costly each action taken is Pattern Classification, Chapter 2 (Part 1)

Let {1, 2,…, c} be the set of c states of nature (or “categories”) Let {1, 2,…, a}be the set of possible actions Let (i | j)be the loss incurred for taking action i when the state of nature is j Pattern Classification, Chapter 2 (Part 1)

Overall risk R = Sum of all R(i | x) for i = 1,…,a Minimizing R Minimizing R(i | x) for i = 1,…, a for i = 1,…,a Conditional risk Pattern Classification, Chapter 2 (Part 1)

Select the action i for which R(i | x) is minimum R is minimum and R in this case is called the Bayes risk = best performance that can be achieved! Pattern Classification, Chapter 2 (Part 1)

Diagram of pattern classification Procedure of pattern recognition and decision making Features x Inner belief w Observables X Action a subjects • X--- all the observables using existing sensors and instruments • x --- is a set of features selected from components of X, or linear/non-linear functions of X. • w --- is our inner belief/perception about the subject class. • --- is the action that we take for x. We denote the three spaces by

Examples Ex 1: Fish classification X=I is the image of fish, x =(brightness, length, fin#, ….) w is our belief what the fish type is Wc={“sea bass”, “salmon”, “trout”, …} a is a decision for the fish type, in this case Wc= Wa Wa ={“sea bass”, “salmon”, “trout”, …} Ex 2: Medical diagnosis X= all the available medical tests, imaging scans that a doctor can order for a patient x =(blood pressure, glucose level, cough, x-ray….) w is an illness type Wc={“Flu”, “cold”, “TB”, “pneumonia”, “lung cancer”…} a is a decision for treatment, Wa ={“Tylenol”, “Hospitalize”, …}

Tasks Features x Inner belief w Observables X Decision a subjects statistical inference control sensors selecting Informative features risk/cost minimization In Bayesian decision theory, we are concerned with the last three steps in the big ellipse assuming that the observables are given and features are selected.

Bayes Decision • It is the decision making when all underlying probability distributions are known. • It is optimal given the distributions are known. • For two classes w1 and w2 , • Prior probabilities for an unknown new observation: • P(w1) : the new observation belongs to class 1 • P(w2) : the new observation belongs to class 2 • P(w1 ) + P(w2 ) = 1 • It reflects our prior knowledge. It is our decision rule when no feature on the new object is available: • Classify as class 1 if P(w1 ) > P(w2 )

Bayesian Decision Theory statistical Inference risk/cost minimization Inner belief p(w|x) Features x Decision a(x) Two probability tables: a). Prior p(w) b). Likelihood p(x|w) A risk/cost function (is a two-way table) l(a | w) The belief on the class w is computed by the Bayes rule The risk is computed by

Bayes Decision We observe features on each object. P(x| w1) & P(x| w2) : class-specific density The Bayes rule:

Decision Rule A decision rule is a mapping function from feature space to the set of actions we will show that randomized decisions won’t be optimal. A decision is made to minimize the average cost / risk, It is minimized when our decision is made to minimize the cost / risk for each instance x.

Bayesian error In a special case, like fish classification, the action is classification, we assume a 0/1 error. The risk for classifying x to class ai is, The optimal decision is to choose the class that has maximum posterior probability The total risk for a decision rule, in this case, is called the Bayesian error

An example of fish classification

3. It is known that 1% of population suffers from a particular disease. A blood test has a 97% chance to identify the disease for a diseased individual, by also has a 6% chance of falsely indicating that a healthy person has a disease. a. What is the probability that a random person has a positive blood test. b. If a blood test is positive, what’s the probability that the person has the disease? c. If a blood test is negative, what’s the probability that the person does not have the disease? Example

S is a boolean RV indicating whether a person has a disease. P(S) = 0.01; P(S’) = 0.99. • T is a boolean RV indicating the test result ( T = true indicates that test is positive.) • P(T|S) = 0.97; P(T’|S) = 0.03; • P(T|S’) = 0.06; P(T’|S’) = 0.94; • (a) P(T) = P(S) P(T|S) + P(S’)P(T|S’) = 0.01*0.97 +0.99 * 0.06 = 0.0691 • (b) P(S|T)=P(T|S)*P(S)/P(T) = 0.97* 0.01/0.0691 = 0.1403 • (c) P(S’|T’) = P(T’|S’)P(S’)/P(T’)= P(T’|S’)P(S’)/(1-P(T))= 0.94*0.99/(1-.0691)=0.9997

A physician can do two possible actions after seeing patient’s test results: • A1 - Decide the patient is sick • A2 - Decide the patient is healthy • The costs of those actions are: • If the patient is healthy, but the doctor decides he/she is sick - $20,000. • If the patient is sick, but the doctor decides he/she is healthy - $100.000 • When the test is positive: • R(A1|T) = R(A1|S)P(S|T) + R(A1|S’) P(S’|T) = R(A1|S’) *P(S’|T) = 20.000* P(S’|T) = 20.000*0.8597 = $17194.00 • R(A2|T) = R(A2|S)P(S|T) + R(A2|S’) P(S’|T) = R(A2|S)P(S|T) = 100000* 0.1403 = $14030.00

A physician can do three possible actions after seeing patient’s test results: • Decide the patient is sick • Decide the patient is healthy • Send the patient for another test • The costs of those actions are: • If the patient is healthy, but the doctor decides he/she is sick - $20,000. • If the patient is sick, but the doctor decides he/she is healthy - $100.000 • Sending the patient for another test costs $15,000

When the test is positive: • R(A1|T) = R(A1|S)P(S|T) + R(A1|S’) P(S’|T) = R(A1|S’) *P(S’|T) = 20.000* P(S’|T) = 20.000*0.8597 = $17194.00 • R(A2|T) = R(A2|S)P(S|T) + R(A2|S’) P(S’|T) = R(A2|S)P(S|T) = 100000* 0.1403 = $14030.00 • R(A3|T) = $15000.00 • When the test is negative: • R(A1|T’) = R(A1|S)P(S|T’) + R(A1|S’) P(S’|T’) = R(A1|S’) P(S’|T’) = 20,000* 0.9997 = $19994.00 • R(A2|T’) = R(A2|S)P(S|T’) + R(A2|S’) P(S’|T’) = R(A1|S) P(S|T’)= 100,000*0.0003 = $30.00 • R(A3|T’) = 15000.00

Excercise • Consider the example of sea bass – salmon classifier, let these two possible actions: A1: Decide the input is sea bass; A2: Decide the input is salmon. Prior for sea bass and salmon are 2/3 and 1/3, respectively. • The cost of classifying a fish as a salmon when it truly is sea bass is 2$, and The cost of classifying a fish as a sea bass when it is truly a salmon is 1$. • Find the decision for input X = 13, whereas the likelihood P(X|ω1) = 0.28, and P(X|ω2) = 0.17

Bayesian Decision Theory – Continuous Features

Bayesian Decision Theory – Continuous Features

Presentation Transcript

Chapter 21 Statistical Decision Theory

Quick Review Probability Theory

Example of a Decision Tree

Game Theory studies situations of strategic interaction in which each decision maker's plan of action depends on the pla

Bayesian Reasoning

Decision Theory

Clinical and pre-clinical applications of Bayesian methods at UCB

CS b553 : A lgorithms for Optimization and Learning

Simple Bayesian Supervised Models

Decision Rules

Chapter 5 Belief Updating in Bayesian Networks

COMPE 467 - Pattern Recognition

Bayesian Reasoning

Best-Response Curves and Continuous Decision Choices

Decision Theory

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Bayesian Networks

~Love and Friendship~ Bayesian Network Theory

DECISION THEORY

Bayesian Networks

580.691 Learning Theory Reza Shadmehr

Pattern Recognition