320 likes | 394 Views
Bayesian Nets and Applications. Naïve Bayes. What happens if we have more than one piece of evidence? If we can assume conditional independence Overslept and trafficjam are independent, given late
E N D
Naïve Bayes • What happens if we have more than one piece of evidence? • If we can assume conditional independence • Overslept and trafficjamare independent, given late • A and B are conditionally independent given C just in case B doesn't tell us anything about A if we already know C: • P(late|oversleptΛtrafficjam) =αP(overslept Λtrafficjam)|late)P(late) = αP(overslept)|late)P(trafficjam|late)P(late) • Naïve Bayes where a single cause directly influences a number of effects, all conditionally independent • Independence often assumed even when not so
Bayesian Networks • A directed acyclic graph in which each node is annotated with quantitative probability information • A set of random variables makes up the network nodes • A set of directed links connects pairs of nodes. If there is an arrow from node X to node Y, X is a parent of Y • Each node Xi has a conditional probability distributionP(Xi|Parents(Xi) that quantifies the effect of the parents on the node
Example • Topology of network encodes conditional independence assumptions
Hard working Smart Good test taker Understands material Exam Grade Homework Grade
Hard working Smart Good test taker Understands material Exam Grade Homework Grade
Compactness • A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values • Each row requires one number p for Xi=true (the number for Xi=false is just 1-p) • If each variable has no more than k parents, the complete network requires O(nx2k) numbers • Grows linearly with n vs O(2n) for the full joint distribution • Student net: 1+1+2+2+5+5=11 numbers (vs. 26-1)=31
Global Semantics/Evaluation • Global semantics defines the full joint distribution as the product of the local conditional distributions:P(x1,…,xn)=∏in=1P(xi| Parents(Xi))e.g., • P(EG=AΛGTΛ⌐UMΛSΛHW)
Global Semantics • Global semantics defines the full joint distribution as the product of the local conditional distributions:P(X1,…,Xn)=∏in=1P(Xi|Parents(Xi))e.g., Observations:S, HW, not UM, will I get an A? • P(EG=AΛGTΛ⌐UMΛSΛHW)= P(EG=A|GT Λ⌐UM)*P(GT|S)*P(⌐UM |HW ΛS)*P(S)*P(HW)
Conditional Independence and Network Structure • The graphical structure of a Bayesian network forces certain conditional independences to hold regardless of the CPTs. • This can be determined by the d-separation criteria
a c Converging a b b b Diverging Linear c c a
D-separation (opposite of d-connecting) • A path from q to r is d-connecting with respect to the evidence nodes E if every interior node n in the path has the property that either • It is linear or diverging and is not a member of E • It is converging and either n or one of its decendents is in E • If a path is not d-connecting (is d-separated), the nodes are conditionally independent given E
Hard working Smart Good test taker Understands material Exam Grade Homework Grade
S and EG are not independent given GTT • S and HG are independent given UM
Pathfinder • Domain: hematopathology diagnosis • Microscopic interpretation of lymph-node biopsies • Given: 100s of histologic features appearing in lymph node sections • Goal: identify disease type malignant or benign • Difficult for physicians
Pathfinder System • Bayesian Net implementation • Reasons about 60 malignant and benign diseases of the lymph node • Considers evidence about status of up to 100 morphological features presenting in lymph node tissue • Contains 105,000 subjectively-derived probabilities
Commercialization • Intellipath • Integrates with videodisc libraries of histopathology slides • Pathologists working with the system make significantly more correct diagnoses than those working without • Several hundred commercial systems in place worldwide
Features • Structured into a set of 2-10 mutually exclusive values • Pseudofollicularity • Absent, slight, moderate, prominent • Represent evidence provided by a feature as F1,F2, … Fn
Value of information • User enters findings from microscopic analysis of tissue • Probabilistic reasoner assigns level of belief to different diagnoses • Value of information determines which tests to perform next • Full disease utility model making use of life and death decision making • Cost of tests • Cost of misdiagnoses
Group Discrimination Strategy • Select questions based on their ability to discriminate between disease classes • For given differential diagnosis, select most specific level of hierarchy and selects questions to discriminate among groups • Less efficient • Larger number of questions asked
Other Bayesian Net Applications • Lumiere – Who knows what it is?
Other Bayesian Net Applications • Lumiere • Single most widely distributed application of BN • Microsoft Office Assistant • Infer a user’s goals and needs using evidence about user background, actions and queries • VISTA • Help NASA engineers in round-the-clock monitoring of each of the Space Shuttle’s orbiters subsystem • Time critical, high impact • Interpret telemetry and provide advice about likely failures • Direct engineers to the best information • In use for several years • Microsoft Pregnancy and Child Care • What questions to ask next to diagnose illness of a child
Other Bayesian Net Applications • Speech Recognition • Text Summarization • Language processing tasks in general