230 likes | 382 Views
Kenneth Baclawski Northeastern University. Uncertainty Development. Outline. Background on Bayesian networks Bayesian network development phases Requirements and Analysis Design and Implementation Testing and Validation Maintenance Open problems in Bayesian network development
E N D
Kenneth Baclawski Northeastern University Uncertainty Development
Outline Background on Bayesian networks Bayesian network development phases Requirements and Analysis Design and Implementation Testing and Validation Maintenance Open problems in Bayesian network development Methodology Evaluation Relationship to logic Conclusion
Probability Theory Probability measures uncertainty by assigning a number between 0 and 1 to events. Pr(A person P is female) = 0.51 Conditional probability is a probability based on an event that has occurred. Pr(P is female | P is a Northeastern University faculty member) = 0.30
Sources of Uncertainty Measurement (sensor) error Nondeterministic processes Unmodeled variables (ontological commitment) Subjective probabilities (judgement, belief, trust, etc.)
Stochastic Models A random variable (RV) is a variable characterized by random behavior in assuming its different possible values. A stochastic model (or theory) is a set of RVs. The model is completely specified by the joint probability distribution (JPD) of the RVs. As the number of RVs increases, the complexity of the JPD increases rapidly.
Bayesian Networks Efficient graphical mechanism for representing stochastic models. A Bayesian Network (BN) is a directed graph in which: A node corresponds to a RV. An edge represents a stochastic dependency. The conditional probability distribution (CPD) at each RV conditioned on all incoming RVs. It is commonly assumed that the RVs are discrete and that the graph is acyclic.
Bayesian Network Specification Flu Perceives Fever Pr(Flu)=0.0001 Temperature Pr(Cold)=0.01 Cold Required CPDs: 1. Perceives Fever given Flu and/or Cold. 2. Temperature given Flu and/or Cold. 3. Probability of Flu (unconditional). 4. Probability of Cold (unconditional). Discrete RV Continuous RV
Bayesian Network Specification Flu Perceives Fever Pr(Flu)=0.0001 Temperature Pr(Cold)=0.01 Cold The joint probability distribution is the product of all the CPDs. The probability distribution of any RV (or set of RVs) is obtained by computing the marginal distribution. Discrete RV Continuous RV
Bayesian Network Inference Flu Evidence (Observed RV) Perceives Fever Query (Inferred RV) Temperature Cold • Inference is performed by observing some RVs (evidence) and computing some others (query). • The evidence can be a value or a probability distribution. • The answer to the query is the marginal distribution of the specified RVs.
Bayesian Network Inference Evidence Mixed Inference Causal Inference Diagnostic Inference Query • Inference in the same direction as the edges is called causal. • Inference against the direction of the edges is called diagnostic. • Inference in both directions is called mixed inference. • The answer to the query is the marginal distribution of the specified RVs.
Types of Bayesian Network BNs can be discrete, continuous or hybrid. Discrete is the most commonly supported. Connectionist (neural) networks are examples of continuous BNs. Hybrid BNs: From discrete to continous: mixed Gaussian From continous to discrete: connectionist classifiers BNs can have cycles, but these are much harder to compute.
BN Inference Techniques Inference is computationally expensive as the size of the BN increases. Exact inference Clique OOBN Approximate Propagation Monte Carlo (e.g., Gibbs sampling)
BN Software Tools Many software tools are available, both commercial and free. Commercial: Netica, Hugin, Analytic Free: Smile, Genie, Java Bayes, MSBN See www.ai.mit.edu/~murphyk/Bayes/bnsoft.html These tools often assume that the RVs are discrete.
Ontologies and BNs An active research area Classes correspond to boolean RV nodes. Relationships between classes (e.g., subclass) correspond to dependency edges. Attributes are modeled using class constructors. The BNs that are constructed this way are very limited.
Object-Oriented BNs Application of OO techniques to BN specification has many advantages: Reuse of specificiations Enormous improvements in performance Modular development and computation There is not yet a formal connection between OO models and OOBNs.
Connection with Logic There have been many attempts to add uncertainty to logic. None of these have been very successful.
BN Development Select the important variables. Specify the dependencies. Specify the CPDs. Evaluate. Iterate over the steps above.
Current BN Development BNs are generally quite small. Large scale BN development is rare. There is a proposed standard for a BN format for interchange (XBN). However, BN reuse is uncommon. Visualization is rudimentary and does not scale well to large BNs. Development methodologies are informal and simplistic.
Information Fusion Combining stochastic models from different sources is called information fusion. The process is well known and standardized, but not systematically applied to BNs.
Dynamic BNs BNs can be dynamic in two ways: Dynamic systems. The structure of the BN does not vary, but nodes represent states at different times. The structure of the BN varies in time. One possible connection between logic and BNs is to use a rule engine to determine the structure of the BN.
Natural Language Processing Science is based on stochastic modeling. The purpose of an experiment is to constructor or to test a model. The scientific research literature is primarily concerned with discussing stochastic models. While BNs are often used for NLP, there have been no significant efforts to extract the BNs in the scientific research literature using NLP techniques.
Open Research Problems Methodologies Evaluation measures and methods Closer connections with logic Integration with dynamic systems Development of standard representations for interoperability Information fusion Natural language extraction
Conclusion There are methodologies and processes for BN development, but they could benefit from software engineering methodologies and processes. There are many open research problems in BNs that can be addressed using software engineering methodologies.