200 likes | 299 Views
Cognitive Computer Vision. Kingsley Sage khs20@sussex.ac.uk and Hilary Buxton hilaryb@sussex.ac.uk Prepared under ECVision Specific Action 8-3 http://www.ecvision.org. Lecture 13. Learning Bayesian Belief Networks Taxonomy of methods
E N D
Cognitive Computer Vision Kingsley Sage khs20@sussex.ac.uk and Hilary Buxton hilaryb@sussex.ac.uk Prepared under ECVision Specific Action 8-3 http://www.ecvision.org
Lecture 13 • Learning Bayesian Belief Networks • Taxonomy of methods • Learning BBNs for the fully observable data and known structure case
So why are BBNs relevant to Cognitive CV? • Provides a well-founded methodology for reasoning with uncertainty • These methods are the basis for our model of perception guided by expectation • We can develop well-founded methods of learning rather than just being stuck with hand-coded models
B A O C N Reminder: What is a BBN? • Compact representation of the joint probability • Each variable is represented as a node. • Conditional independence assumptions are encoded using a set of arcs • Different types of graph exist. The one shown is a Directed Acyclic Graph (DAG)
Why is learning important in the context of BBNs? • Knowledge acquisition can be an expensive process • Experts may not be readily available (scarce knowledge) or simply not exist • But you might have a lot of data from (say) case studies • Learning allows us to construct BBN models from the data and in the process gain insight into the nature of the problem domain
The process of learning Data (may be full or partial) Learning process Model structure (if known)
A B O What do we mean by “partial” data? • Training data where there are missing values e.g.: Discrete valued BBN with 3 nodes
A O B A O B What do we mean by “known” and “unknown” structure? Known structure Unknown structure
Taxonomy of learning methods • In this lecture we will look at the full observability and known model structure case in detail • In the next lecture we will take an overview of the other three cases Observability
LIKELIHOOD Full observability & known structure Getting the notation right • The model parameters (CPDs) are represented as (example later) • Training data set D • We want to find parameters to maximise P(|D) • Likelihood function L(:D) is P(D| )
Training data Dz A O B Full observability & known structure Getting the notation right
A O B Factorising the likelihood expression
Decomposition in general All the parameters for each node can be estimated separately
A B O L(:D) ExampleEstimating parameter for root node Let’s say our training data D contains these values for A {T,F,T,T,F,T,T,T} We represent our single parameter as the probability that a=T The likelihood for the sequence is:
So what about the prior on ? We have an expression for P(a[1],…,a[M]), all we need to do now is to say something about P() If all values of were equally likely at the outset, then we have a MAXIMUM LIKELIHOOD ESTIMATE (MLE) for P(|a[1],…,a[M]) which for our example is = 0.75 I.e. p(a=T is 0.75)
So what about the prior on ? If P() is not uniform, we need to take that into account when computing our estimate for a model parameter. In that case P(|x[1],…,x[M]) would be a MAXIMUM APOSTERIORI PROBABILITY (MAP) estimate There are many different forms of prior, one of the more common ones in this application is the DIRICHLET prior …
Dirichlet(T,F) p() The Dirichlet prior
Semantic priors • If the training data D is sorted into known classes, the priors can be estimate beforehand. These are called “semantic priors” • This involves an element of hand coding and loses the advantage gaining some insight into the problem domain • Does give the advantage of mapping into expert knowledge of the classes in the problem
Summary • Estimation relies on sufficient statistics • For ML estimate for discrete valued nodes, we use counts #: • For MAP estimate, we have to account for the prior
Next time … • Overview of methods for learning BBNs: • Full data and unknown structure • Partial data and known structure • Partial data and unknown structure • Excellent tutorial at by Koller and Friedman: www.cs.huji.ac.il/~nir/Nips01-Tutorial/ • Some of today’s slides were adapted from that tutorial