PatReco: Bayesian Networks

PatReco: Bayesian Networks Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2009-2010

Definitions • Bayesian networks consist of nodes and (usually directional) arcs • Nodes or states represent a classification class or in general events and are described with a pdf • Arcs represent relations between arcs, e.g., cause and effect, time sequence • Two nodes that are connected via another node are conditionally independent (given that node)

When to use Bayesian nets • Bayesian networks (or networks of inference) are statistical models that are used for classification (or in general pattern recognition) problems where there are dependencies among classes, e.g., time dependencies, cause and effect dependencies

Conditional Independence • Full independence between A and B P(A|B) = P(A) or P(A,B) = P(A) P(B) • Conditional independence of A, B given C P(A|BC) = P(A|C) or P(A,B|C) = P(A|C)P(B|C)

Three problems • Probability computation (use independence) • Training/Parameter Estimation • Maximum likelihood (ML) if all is observable • Expectation maximization (EM) if missing data • Inference (Testing) • Diagnosis P(cause|effect) bottom-up • Prediction P(effect|cause) top-down

Probability Computation For a Bayesian Network that consists of N nodes: • Compute P(n1, n2 ..nN) using chain rule starting from the “last/bottom” node and working your way up P(n1, n2 ..nN) = P(nN| n1, n2 .. nN-1) P(nN-1 |n1, n2 .. nN-2 ) … P(n2 |n1) P(n1) • Identify conditional independence conditions from Bayesian network topology • Simplify the conditionals probabilities using independence conditions

Probability Computation • There are general algorithms for identifying cliques in the Bayesian net • Cliques are islands of conditional dependence, i.e., terms in the probability computation that cannot be further reduced SC WSR RC

Training/Parameter Estimation • Instead of estimating the joint pdf of the whole network the joint pdf of each of the cliques is estimated • For example if the network joint pdf is P(C,S,R,W) = P(W|S,R) P(S|C) P(R|C) P(C) instead of computing P(C,S,R,W) we compute each of P(W|S,R), P(S|C), P(R|C), P(C) for all possible values of W, S, R, C (much simpler)

Training/Parameter Estimation • For fully observable data and discrete probabilities compute maximum likelihood estimates of parameters, e.g., for discrete probs counts(W=1,S=1,R=0) P(W=1|S=1,R=0)ML = _______________________ counts(W=*,S=1,R=0)

Training/Parameter Estimation • Example: the following observations pairs are given for (W,C,S,R): • (1,0,1,0), (0,0,1,0),(1,1,1,0),(0,1,1,0),(1,0,1,0), (0,1,0,0),(1,0,0,1),(0,1,1,1),(1,1,1,0) • Using Maximum Likelihood Estimation: P(W=1|S=1,R=0)ML = #(1, *, 1, 0)/#(*,*,1,0) = 2/5 = 0.4

Training/Parameter Estimation • When data is non observable or missing the EM algorithm is employed • There are efficient implementations of the EM algorithm for Bayesian nets that operate on the clique network • When the topology of the Bayesian network is not known structural EM can be used

Inference • There are two types of inference (testing) • Diagnosis P(cause|effect) bottom-up • Prediction P(effect|cause) top-down Once • Once the parameters of the network are estimated the joint network pdf can be estimated for ALL possible network values • Inference is simply probability computation using the network pdf

Inference • For example P(W=1|C=1) = P(W=1,C=1) / P(C=1) where P(W=1,C=1) = RSP(W=1,C=1,R=*,S=*) P(C=1) = RWSP(W=*,C=1,R=*,S=*)

Inference • Efficient algorithms exist for performing inference in large networks which operate on the clique network • Inference is often shown as a probability maximization problem, e.g., what is the most probable cause or effect? argmaxW P(W|C=1)

Continuous Case • In our examples the network nodes represented discrete events (states or classes) • Network nodes often hold continuous variables (observations), e.g., length, energy • For the continuous case parametric pdf are introduced and their parameters are estimated using ML (observed) or EM (hidden)

Some Applications • Medical diagnosis • Computer problem diagnosis (MS) • Markov chains • Hidden Markov Models (HMMs)

Conclusions • Bayesian networks are used to represent dependencies between classes • Network topology defines conditional independence conditions that simplify the network pdf modeling and computation • Three problems: probability computation, estimation/training, inference/testing

PatReco: Bayesian Networks

PatReco: Bayesian Networks

Presentation Transcript

Bayesian Statistics and Belief Networks

Lecture 15 Bayesian Networks in Computer Vision

Knowledge Engineering for Bayesian Networks

Bayesian Networks

Bayesian networks

Bayesian Statistics and Belief Networks

Stochastic Markov Processes and Bayesian Networks

Bayesian networks

Bayesian Networks for Student Model Engineering

Bayesian Statistics and Belief Networks

Bayesian Networks for Modeling Gene Expression Data

Bayesian Networks

Bayesian networks

VIBES Variational Inference Engine For Bayesian Networks

Probabilistic Reasoning

Bayesian Classifiers

Mini-course on Artificial Neural Networks and Bayesian Networks

Bayesian Networks

An Introduction to Bayesian Networks for Multi-Agent Systems

Bayesian Networks

Mini-course on Artificial Neural Networks and Bayesian Networks