1 / 16

Homework – BN for 3 Coin Problem

Homework – BN for 3 Coin Problem. Toss coin 0. If Head – toss coin 1 four times if Tails – toss coin 2 four times Generating the sequence H HHHT, T HTHT, H HHHT, H HTTH, … produced by Coin 0 , Coin1 and Coin2 Erase the result of flipping coin 0 (the red flips)

rashida
Download Presentation

Homework – BN for 3 Coin Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Homework – BN for 3 Coin Problem Toss coin 0. If Head – toss coin 1 four times if Tails – toss coin 2 four times Generating the sequenceHHHHT, THTHT, HHHHT, HHTTH, … produced by Coin 0 , Coin1 and Coin2 Erase the result of flipping coin 0 (the red flips) Observing the sequence HHHT, HTHT, HHHT, HTTH, … produced by Coin 1 and/or Coin 2 Estimate most likely values for p, q and  There is no known analytical solution to computing parameters that maximize the likelihood of the data CS446-Fall ’06

  2. A B Best Bayesian Network Structure • Three Bernoulli distributions (binomial less “n”) • One parameter each • Parameters: p, q and  • Whether we draw from p or q depends on the (unobserved) outcome  • Best formalized as two random variables (not three, not one, not five…) • A: First coin used(unobserved first flip) • B: Second coin used(next four flips) Need one distribution here one parameter: Pr(A) or  Need two distributions here two parameters:Pr(B|A) or pPr(B|~A) or q CS446-Fall ’06

  3. In general (in classes, in life, …) • Technical mastery is important (in fact, essential) • Easily taught • Easily learned (well…can be tedious) • Learning How / When to apply techniques is harder • Less tedious & more interesting • Coin problem • Given three distributions • NOT (necessarily) given three random variables • Decide how to model the world – what are the random variables • More natural / unnatural than right / wrong • Better model = better fit of world phenomenon to our techniques • ALWAYS look for the deeper / more interesting understanding • Master the techniques. But do not be content with that! CS446-Fall ’06

  4. Markov Networks (Markov Random Fields) • Graphical Models • Graphical structure • Parameters • Graphical semantics dictates/limits evidence propagation • Bayesian Networks • Conditional independence via directed arcs • Parameters: Conditional Probability Tables (CPTs) • Markov Networks • Undirected arcs • Potential functions over maximal cliques • Somewhat simplifies evidence propagation(at least makes it more intuitive) CS446-Fall ’06

  5. X E Y Bayesian Networks Recall evidence propagation by d-separation X and Y are conditionally independent if: given E all paths from X to Y are d-separated CS446-Fall ’06

  6. X E Y J A M E B Bayesian Networks: d-separation B – a burglary is in progress E – an earthquake is in progress A – the alarm is sounding J – John calls M – Mary calls CS446-Fall ’06

  7. E E E E X E E Markov Network: Markov Blanket X is Conditionally Independent of all Other Nodes given its Markov Blanket Also true of BNs but the “Markov Blanket” definition is more complex for BNs CS446-Fall ’06

  8. A B E Markov Network: d-separation • Sets A and B are conditionally independent given evidence E • d-separation is easier / more intuitive • “Remove” evidence nodes • No remaining path meansconditional independence CS446-Fall ’06

  9. A B C D Markov Network Parameters • Maximal cliques • Clique: subset of nodes w/ every pair connected • Maximal clique: no node can be added & remain a clique A B C are maximal cliques D is a non-maximal clique CS446-Fall ’06

  10. Markov Network Parameters • Potential functions • One for each maximal clique kk = 1, #max cliques • Assigns a real number to each configuration or assignment of vars. • Denotes a measure of relative preference over assignments • Not normalized, these are not probabilities • Distribution is factorable into potential functions • Probability distribution can be recovered: • Potential functions • Have the requisite algebraic structure • But need to be normalized: • Inference may not require normalization (think conditioning) CS446-Fall ’06

  11. A B C Markov Network Inference • Given values for some evidence random variables • Compute probabilities for query random variables • In general maximal cliques will overlap • Among all maximal cliques • Balance competing preferencegiven the evidence CS446-Fall ’06

  12. A little Information Theory • Manipulating / Inferring with probabilitiy distributions • If the world / phenomenon is a probability distribution • How accurately should we capture it with our representations? • How can we measure accuracy? • How much information is needed to achieve desired accuracy? • Entropy, Joint Entropy • Mutual Information • KL (Kulback Liebler) divergence (distance.?) CS446-Fall ’06

  13. Entropy The entropy, H(M), of a discrete random variable M is a measure of the amount of uncertainty one has about the value of M Increasing our information about M decreases our uncertainty which is why entropy can be used to guide growing decision trees Joint entropy of two discrete (not necessarily independent) random variables X and Y is just the entropy of their pairing, (X,Y) If variables X and Y are independent, the joint entropy is just the sum of the individual entropies CS446-Fall ’06

  14. Conditional Entropy • Conditional Entropy of X givenY is the average remaining entropy averaged over knowing Y • Different than knowing Y is a particular value – averaged over all Y: • It follows that • This should not surprise us • Conditional probabilities • The log in Entropy CS446-Fall ’06

  15. Mutual Information • Mutual Information measures how much information can be obtained about one random variable by observing another: CS446-Fall ’06

  16. Relative Entropy or Kullback–Leibler Divergence • KL divergence is • P and Q are distributions over the same domain XWe sum over all i X • KL divergence measures the relative inefficiency of modeling a distribution as Q when the true distribution is P • It is not a true distance (the distance there is different than the distance back) • This can be intuitive by remembering it is an inefficiency measure • Consider the same discrepancy d • At i=j where P(j) is small and Q(j) is too large • At i=k where P(k) is large and Q(k) is too small • The penalty contribution is less where P is small even though the discrepancy is the same • Since P is the true distribution we are less likely to exercise the discrepancy • Also the penalty is less since we divide by Q CS446-Fall ’06

More Related