210 likes | 233 Views
Understand how statistical models describe data, likelihoods, priors, and probability propagation in this analytical study. Learn about Markov Chain Monte Carlo, hidden variables, Markov Random Fields, and generalizations in the context of machine learning and computer vision.
E N D
Statistical Models • Statistical Models Describe observed ‘DATA’ via an assumed likelihood: • With denoting the ‘parameters’ needed to describe the data. • Likelihoods measure how likely what was observed was. They implicitly assume an error mechanism (in the translation between what was observed and what was ‘supposed’ to be observed). • Parameters may describe model features or even specify different models.
An Example of a Statistical Model • A burgler alarm is affected by both earthquakes and burgleries. It has a mechanism to communicate with the homeowner if activated. It went off at Judah Pearles house one day. Should he: • a) immediately call the police • under suspicion that a burglary took • place, or • b) go home and immediately transfer his • valueables elsewhere?
A Statistical Analysis • Observation: The burgler alarm went off (i.e., a=1); • Parameter 1: The presence or absence of an earthquake (i.e., e=1,0); • Parameter 2: The presence or absence of a burglary at Judah’s house (i.e., b=1,0).
LIKELIHOODS/PRIORS IN THIS CASE • The Likelihood associated with the observation is: • With b,e =0,1 (depending on whether a burglery,earthquake has taken place). • The Priors specify the probabilities of a burglery or earthquake happenning:
Example Probabilities • Here are some probabilities indicating something about the likelihood and prior:
LIKELIHOOD/PRIOR INTERPRETATION • Burglaries are as likely (apriori) as earthquakes. • It is unlikely that the alarm just went off by itself. • The alarm goes off more often when a burglary happens but an earthquakes does not than (the reverse) i.e., when an earthquake happens but a burglary does not. • If both a burglary and an earthquake happens than it is (virtually) twice as likely the alarm will go off.
Probability Propagation Graph B E b e b e A a
PROBABILITY PROPOGATION • There are two kinds of Probability Propogation: (see Frey 1998) a) marginalization i.e., • And b) multiplication i.e., • Marginalization sums over terms leading into the node; • Multiplication multiplies over terms leading into the node.
CAUSAL ANALYSIS • To analyze the causes of the alarm going off, we calculate the probability that it was a burglary (in this case) and compare it with the probability
CAUSAL ANALYSIS II • So, after normalization: • Similarly, • So, if we had to choose between burglary and earthquake as a cause of making the alarm go off, we should choose burglary.
Markov Chain Monte Carlo for the Burglar Problem • For current values of e =e*, calculate • or • Simulate b from this distribution. Call the result b*. Now calculate: • Or
Independent Hidden Variables: A Factorial Model • In statistical modeling it is often advantageous to treat variables which are not observed as ‘hidden’. This means that they themselves have distributions. In our case suppose b and e are independent hidden variables: • Then optimally:
Nonfactorial Hidden Variable Models • Suppose b and e are dependent hidden variables: • Then a similar analysis yields a related result
INFORMATION • The difference in information available from parameters after observing the alarm versus before the alarm was observed is: • This is the Kullback-Leibler ‘distance’ between the prior and posterior distributions. • Parameters are chosen to optimize this distance.
INFORMATION IN THIS EXAMPLE • The information available in this example • Calculated using: is
Markov Random Fields • Markov Random Fields are simply Graphical Models set in a 2 or higher dimensional field. Their fundamental criterion is that the distribution of a point x conditional on all of those that remain (i.e., -x) is identical to its distribution given a neighborhood ‘N’ of it (i.e.,
EXAMPLE OF A RANDOM FIELD • Modeling a video frame is typically done via a random field. Parameters identify our expectations of what the frame looks like. • We can ‘clean up’ video frames or related media using a methodology which distinguishes between what we expect and what was observed.
GENERALIZATION • This is can be generalized to non-discrete likelihoods with non-discrete parameters. • More generally (sans data) assume that a movie (consisting of many frames, each of which consists in grey level pixel values over a lattice) is observed. We would like to ‘detect’ ‘unnatural’ events.
GENERALIZATION II • Assume a model for frame i (given frame i-1) taking the form, • The parameters typically denote invariant features for pictures of cars, houses, etc.. • The presence or absence of unnatural events can be described by hidden variables. • The (frame) likelihood describes the natural evolution of the movie over time.
GENERALIZATION III • Parameters are estimated by optimizing the information they provide. This is accomplished by ‘summing or integrating over’ the hidden variables.