110 likes | 130 Views
A Brief Introduction to Bayesian networks. CIS 391 – Introduction to Artificial Intelligence. Bayesian networks. A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: A set of nodes, one per random variable
E N D
A Brief Introduction toBayesian networks CIS 391 – Introduction to Artificial Intelligence
Bayesian networks • A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions • Syntax: • A set of nodes, one per random variable • A set of directed edges (link ≈ "directly influences"), yielding a directed, acyclic graph • A conditional distribution for each node given its parents:P (Xi | Parents (Xi)) • In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values
The Required-By-Law Burglar Example I'm at work, and my neighbor John calls to say my burglar alarm is ringing, but my neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects "causal" knowledge: • A burglar can set the alarm off • An earthquake can set the alarm off • The alarm can cause Mary to call • The alarm can cause John to call
Semantics • Local semantics give rise to global semantics • Local semantics: given its parents, each node is conditionally independent of everything except its descendants • Global semantics (why?):
Belief Network Construction Algorithm • Choose an ordering of variables X1, … , Xn • For i = 1 to n • add Xi to the network • select parents from X1, … , Xi-1 such that
E B A J M Construction Example Suppose we choose the ordering M, J, A, B, E P(J|M)=P(J)? No P(A|J,M)=P(A)? P(A|J,M)=P(A|J)? No P(B|A,J,M)=P(B)? No P(B|A,J,M)=P(B|A)? Yes P(E|B,A,J,M)=P(E|A)? No P(E|B,A,J,M)=P(E|A,B)? Yes
Lessons from the Example • Network less compact: 13 numbers (compared to 10) • Ordering of variables can make a big difference! • Intuitions about causality useful
Compact Conditional Probability Tables • CPT size grows exponentially; continuous variables have infinite CPTs! • Solution: compactly defined canonical distributions • e.g. boolean functions • Gaussian and other standard distributions
For Inference in Bayesian Networks… • CIS 520 Machine Learning • CIS 520 provides a fundamental introduction to the mathematics, algorithms and practice of machine learning. Topics covered include: • Supervised learning: least squares regression, logistic regression, perceptron, generalized linear models, discriminant analysis, naive Bayes, support vector machines. Model and feature selection, ensemble methods, bagging, boosting. Learning theory: Bias/variance tradeoff. Union and Chernoff/Hoeffding bounds. VC dimension. Online learning. • Unsupervised learning: Clustering. K-means. EM. Mixture of Gaussians. Factor analysis. PCA. MDS. pPCA. Independent components analysis (ICA). • Graphical models: HMMs, Bayesian and Markov networks. Inference. Variable elimination. • Reinforcement learning: MDPs. Bellman equations. Value iteration. Policy iteration. Q-learning. Value function approximation. • (But check prerequisites: CIS 320, statistics, linear algebra)