Basics of Data Mining with Bayesian Networks

Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe Frank

Weather data set

Windy=True Windy=False Play=yes Play=no Basics • Unconditional or Prior Probability • Pr(Play=yes) + Pr(Play=no)=1 • Pr(Play=yes) is sometimes written as Pr(Play) • Table has 9 yes, 5 no • Pr(Play=yes)=9/(9+5)=9/14 • Thus, Pr(Play=no)=5/14 • Joint Probability of Play and Windy: • Pr(Play=x,Windy=y) for all values x and y, should be 1 3/14 6/14 3/14 ?

Probability Basics • Conditional Probability • Pr(A|B) • # (Windy=False)=8 • Within the 8, • #(Play=yes)=6 • Pr(Play=yes | Windy=False) =6/8 • Pr(Windy=False)=8/14 • Pr(Play=Yes)=9/14 • Applying Bayes Rule • Pr(B|A) = Pr(A|B)Pr(B) / Pr(A) • Pr(Windy=False|Play=yes)= 6/8*8/14/(9/14)=6/9

Conditional Independence • “A and P are independent given C” • Pr(A | P,C) = Pr(A | C) C A P Probability F F F 0.534 F F T 0.356 F T F 0.006 F T T 0.004 T F F 0.048 T F T 0.012 T T F 0.032 T T T 0.008 Ache Cavity Probe Catches

C A P Probability F F F 0.534 F F T 0.356 F T F 0.006 F T T 0.004 T F F 0.012 T F T 0.048 T T F 0.008 T T T 0.032 Suppose C=True Pr(A|P,C) = 0.032/(0.032+0.048) = 0.032/0.080 = 0.4 Pr(A|C) = 0.032+0.008/ (0.048+0.012+0.032+0.008) = 0.04 / 0.1 = 0.4 Conditional Independence • “A and P are independent given C” • Pr(A | P,C) = Pr(A | C) and also Pr(P | A,C) = Pr(P | C)

C P(A) T 0.4 F 0.02 P(C) .01 C P(P) T 0.8 F 0.4 Conditional Independence Conditional probability table (CPT) • Can encode joint probability distribution in compact form C A P Probability F F F 0.534 F F T 0.356 F T F 0.006 F T T 0.004 T F F 0.012 T F T 0.048 T T F 0.008 T T T 0.032 Ache Cavity Probe Catches

Creating a Network • 1: Bayes net = representation of a JPD • 2: Bayes net = set of cond. independence statements • If create correct structure that represents causality • Then get a good network • i.e. one that’s small = easy to compute with • One that is easy to fill in numbers

Example • My house alarm system just sounded (A). • Both an earthquake (E) and a burglary (B) could set it off. • John will probably hear the alarm; if so he’ll call (J). • But sometimes John calls even when the alarm is silent • Mary might hear the alarm and call too (M), but not as reliably • We could be assured a complete and consistent model by fully specifying the joint distribution: • Pr(A, E, B, J, M) • Pr(A, E, B, J, ~M) • etc.

Structural Models (HK book 7.4.3) Instead of starting with numbers, we will start with structural relationships among the variables There is a direct causal relationship from Earthquake to Alarm There is a direct causal relationship from Burglar to Alarm There is a direct causal relationship from Alarm to JohnCall Earthquake and Burglar tend to occur independently etc.

Earthquake Burglary Alarm MaryCalls JohnCalls Possible Bayesian Network

P(E) .002 P(B) .001 B T T F F E T F T F P(A) .95 .94 .29 .01 A T F P(J) .90 .05 A T F P(M) .70 .01 Complete Bayesian Network Earthquake Burglary Alarm MaryCalls JohnCalls

Microsoft Bayesian Belief Net • http://research.microsoft.com/adapt/MSBNx/ • Can be used to construct and reason with Bayesian Networks • Consider the example

Learning problem Some methods are proposed Difficult problem Often requires domain expert’s knowledge Once set up, a Bayesian Network can be used to provide probabilistic queries Microsoft Bayesian Network Software Problems: Known structure, fully observable CPTables are to be learned Unknown structure, fully observable Search structures Known Structure, hidden var Parameter learning using hill climbing Unknown (Structure,Var) No good results Mining for Structural Models

Hidden Variable (Han and Kamber’s Data Mining book, pages 301-302) • Assume that the Bayesian Network structure is given • Some variables are hidden • Example: • Our objective: find the CPT for all nodes • Idea: • Use a method of gradient descent • Let S be the set of training examples: {X1, X2, … Xs} • Consider a variable Yi and Parents Ui={Parent1, Parent2, …}. • Question: What is Pr(Yi=yij | Ui=uik)? • Answer: learn this value from the data in iterations

Learn CPT for Hidden Variable • Suppose we are in a Tennis Domain • We wish to introduce a new variable not in our data set, called Field Temp • It represents the temperature of the field • Assume that we don’t have a good way to measure it, but have to include it in our network Windy Outlook Field Temp

Learn the CPT Ui Parent1 Parent2 • Let wijk be the value of Pr(Yi|Ui) • Compute a new wijk based on the old Yi

Example: Learn the CPT Windy Outlook w=Pr(Field Temp=Hot|Windy=True,Outlook=Sunny) • Let the old w be 0.5. Compute a new w Field Temp Normalize and then iterate until stable.

Basics of Data Mining with Bayesian Networks

Basics of Data Mining with Bayesian Networks

Presentation Transcript

S3 SEMINAR ON DATA MINING -BAYESIAN NETWORKS- A. BASICS

S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE

Learning with Bayesian Networks

Learning with Bayesian Networks

Using Bayesian Networks to Analyze Expression Data

Learning Bayesian Networks with microarray data

Learning With Bayesian Networks

Reasoning with Bayesian Networks

Sampling Bayesian Networks

Learning with Bayesian Networks

Bayesian networks

Using Bayesian Networks to Analyze Expression Data

Bayesian Networks

Bayesian Networks

Bayesian Networks

Data Mining with Neural Networks

Data Mining

Learning with Bayesian Networks

Data Mining

Data Mining with Naïve Bayesian Methods

Bayesian Networks for Data Mining

Reasoning with Bayesian Networks