Statistical Methods in AI/ML

Statistical Methods in AI/ML Bucket elimination VibhavGogate

Bucket Elimination: Initialization (A,C) (C,E) A E D F B C A C E (C,D) (E,F) (A,B) B D F (B,D) (D,F) • You put each function in exactly one bucket • How? • Along the order, find the first bucket such that one of the variable’s in the function’s scope is the bucket variable

Bucket elimination: Processing Buckets A E D F B C ψ(B,C) A C • Process in order • Multiply all the functions in the bucket • Sum-out the bucket variable • Put the new function in one of the buckets obeying the initialization constraint E ψ(C,F) (E,F) (A,B) (A,C) (C,E) B D F (D,F) ψ(B,C,F) (C,D) (B,D) ψ2(B,C) ψ(C) Z

Bucket elimination: Why it works? A E D F B C A C E (E,F) (A,B) (A,C) (C,E) B D F (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) ψ2(B,C) ψ(B,C) Z ψ(C)

Bucket elimination: Why it works? A E D F B C (E,F) (A,B) (A,C) (C,E) (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) ψ2(B,C) ψ(B,C) Z ψ(C)

Bucket elimination: Why it works? A E D F B C (E,F) (A,B) (A,C) (C,E) (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) and so on. ψ2(B,C) ψ(B,C) Z ψ(C)

Bucket elimination: Complexity A E D F B C exp(3) exp(3) exp(4) exp(3) exp(2) exp(1) ≈6exp(3) Complexity: O(nexp(w)) w: scope of the largest function generated n:#variables (E,F) (A,B) (A,C) (C,E) (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) ψ2(B,C) ψ(B,C) Z ψ(C)

Bucket elimination: Determining complexity graphically A • Schematic operation on a graph • Process nodes in order • Connect all children of a node to each other E A C E D B D F F B C

Bucket elimination: Complexity A • Complexity of processing a bucket “i” • exp(childreni) • Complexity of bucket elimination • nexp(max(childreni)) E D F B C

Treewidth and Tree Decompositions • Running schematic bucket elimination yields a chordal graph • Each cycle of length > 3 has a chord (an edge connecting two nodes that are not adjacent in the cycle) • Every chordal graph can be represented using a tree decomposition

Tree Decomposition of Chordal graphs A ABC E EFC BC FC D DBCF FBC F FBC BC B BC C C C

Tree Decomposition and Treewidth: Definition • Given a network and its interaction graph • Tree Decomposition is a set of subset of variables connected by a tree such that: • Each variable is present in at least one subset • Each edge is present in at least one subset • The set of subsets containing a variable “X” form a connected sub-tree • Running intersection property • Width of a tree decomposition: Cardinality of the maximum subset minus 1 • Treewidth: minimum width out of all possible tree decompositions

Bucket elimination: Complexity • Best possible complexity: O(nexp(w+1)) where w is the treewidth of the graph • Thus, we have a graph-based algorithm for determining the complexity of bucket elimination. • If w is small, we can solve the problem efficiently!

Generating Tree Decompositions • Computing treewidth is NP-hard • Branch and Bound algorithm (Gogate&Dechter, 2004) • Best-first search algorithm • (Dow and Korf, 2009) • Heuristics in practice • min-fill heuristic • min-degree heuristic

Min-degree and min-fill • min-degree • At each point, select a variable with minimum degree (ties broken arbitrarily) • Connect the children of the variable to each other • min-fill • At each point, select a variable that adds the minimum number of edges to the current graph • Connect the children of the selected variable to each other

Computing all Marginals • Bucket elimination computes • P(e) or Z • P(Xi|e) where “Xi” is the last variable eliminated • To compute all marginals P(Xi|e) for all variables Xi • Run bucket elimination “n” times • Efficient algorithm • Junction tree algorithm or bucket tree propagation • Requires only two passes to compute all marginals

Junction tree algorithm:An exact message passing algorithm • Construct a tree decomposition T • Initialize the tree decomposition as in bucket elimination • Select an arbitrary node of T as root • Pass messages from leaves to root (upward pass) • Pass messages from root to leaves (downward pass)

Message passing Equations • Multiply all received messages except from R • Multiply all functions • Sum-out all variables except the separator S R

Computing all marginals S P(S)

Message passing Equations (A,B) (A,C) ABC • Select “EFC” as root • Pass messages from leaves to root • Pass messages from root to leaves (E,F) (C,E) EFC (C,D) (D,F) FC DBCF (B,D) FBC FBC BC BC C C

Architectures • Shenoy-Shafer architecture • Hugin architecture • Associate one function with each cluster • Requires multiplication • Smaller time complexity • Higher space complexity

Statistical Methods in AI/ML

Statistical Methods in AI/ML

Presentation Transcript

Chapter 6: Birth Control

GENETIC DIAGNOSTIC METHODS

Chapter 7. Cluster Analysis

Chapter 8 Statistical inference: Significance Tests About Hypotheses

Statistical Machine Translation System

Evolution of statistical models of non-conservative particle interactions

Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1

Machine Translation

Naïve Bayes

Statistical Process Control

Clustering Methods

Practical Applications of Statistical Methods in the Clinical Laboratory

Statistical Methods for Mining Big Text Data

Mesh Parameterizations

Chapter 11 Supervised Learning: STATISTICAL METHODS

Introduction to R for Applied Statistical Methods

Statistical Relational Learning

AAEC 4302 STATISTICAL METHODS IN AGRICULTURAL RESEARCH

Statistical Studies: Statistical Investigations

Descriptive Statistics

Systematic Reviews: Methods and Procedures

Statistical Methods (Lectures 2.1, 2.2)