310 likes | 317 Views
This Master's thesis defense provides an overview of using genetic algorithms to learn Bayesian network adjacency matrices from data. Topics covered include Bayesian belief networks, graphical models of probability distributions, genetic algorithms, structure learning background, and the shortcomings of existing algorithms. The presentation explores the emulated natural process of survival of the fittest, the K2 algorithm, and sparse candidate approaches to structure learning in Bayesian networks. A genetic algorithm (SLAM GA) approach is proposed to address limitations in existing methods, with a focus on improving the efficiency and accuracy of network structure learning.
E N D
Ben Perry – M.S. Thesis Defense A Genetic Algorithm for Learning Bayesian Network Adjacency Matrices from Data Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University http://www.kddresearch.org http://www.cis.ksu.edu/~bbp9857
Overview • Bayesian Network • Definitions and examples • Inference and learning • Genetic Algorithms • Structure Learning Background • Problem • K2 algorithm • Sparse Candidate • Improving K2: Permutation Genetic Algorithm (GASLEAK) • Shortcoming: greedy, sensitive to ordering • Permutation GA • Master’s thesis: Adjacency Matrix GA (SLAM GA) • Rationale • Evaluation with Known Bayesian Networks • Summary
Bayesian Belief Networks (BBNS):Definition • Bayesian Network • Directed acyclic graph • Vertices (nodes): denote events, or states of affairs (each a random variable) • Edges (arcs, links): denote conditional dependencies, causalities • Model of conditional dependence assertions (or CI assumptions) • Example (“Ben’s Presentation” BBN) (sprinkler) • General Product (Chain) Rule for BBNs` Appearance:Good, Bad Ben is nervous: Extremely, Yes, No Sleep: Narcoleptic Well Bad All-nighter X2 X1 X4 X5 Ben’s presentation: Good, Not so good, Failed miserably X3 Memory: Elephant, Good, Bad, None P(Well, Good, Good, No, Good) = P(G) · P(G | W) · P(G | W) · P(N | G, G) · P(G | N)
Graphical Modelsof Probability Distributions • Idea • Want: model that can be used to perform inference • Desired properties • Correlations among variables • Ability to represent functional, logical, stochastic relationships • Probability of certain events • Inference: Decision Support Problems • Diagnosis (medical, equipment) • Pattern recognition (image, speech) • Prediction • Want to Learn: Most Likely Model that Generates Observed Data • Under certain assumptions (Causal Markovity), it has been shown that we can do it • Given: data D (tuples or vectors containing observed values of variables) • Return: directed graph (V, E) expressing target CPTs • NEXT: Genetic algorithms
Genetic Algorithms • Idea • Emulate natural process of survival of the fittest (Example: Roaches adapt) • Each generation has many diverse individuals • Each individual competes for the chance to survive • Most common approach: best individuals live to the next generation and mate • Produce children with traits from both parents • If parents are strong, children might be stronger • Major components (operators) • Fitness function • Chromosome manipulation • Cross-over (Not the “John Edward” type!), mutation • From (Educated?) Guess to Gold • Initial population typically random or not much better than random – bad scores • Performs well with a non-deceptive search space and good genetic operators • Ability to escape local optima with mutations. • Not guaranteed to get the best answer, but usually gets close
10 21 22 13 20 19 23 16 15 36 6 5 4 27 11 31 32 34 35 37 17 12 29 28 24 18 25 26 7 8 9 33 14 30 1 2 3 Learning Structure:K2 Algorithm • Algorithm Learn-BBN-Structure-K2 (D, Max-Parents) FOR i 1 to n DO // arbitrary ordering of variables {x1, x2, …, xn} WHILE (Parents[xi].Size < Max-Parents) DO // find best candidate parent Best argmaxj>i (P(D | xjParents[xi]) // max Dirichlet score IF (Parents[xi] + Best).Score> Parents[xi].Score) THEN Parents[xi] += Best RETURN ({Parents[xi] | i {1, 2, …, n}}) • ALogical Alarm Reduction Mechanism [Beinlich et al, 1989] • BBN model for patient monitoring in surgical anesthesia • Vertices (37): findings (e.g., esophageal intubation), intermediates, observables • K2: found BBN different in only 1 edge from gold standard (elicited from expert)
Learning Structure:K2 downfalls • Greedy (may fall into local maxima) • Highly dependent upon node ordering • Optimal node ordering must be given • If optimal order is already known, an expert could probably create the network • Number of orderings consistent with DAGs is exponential (n!)
Learning Structure:Sparse Candidate • General Idea: • Inspect k-best parent candidates at a time. (K2 only inspects one) • k is typically very small ~ 5 ≤ k ≤ 15 • Exponential to the order of k • Algorithm: Loop until no improvements or iteration limit exceeds: For each node, select the top k parent candidates (mutual information or m_disc) [Restrict]Build a network by manipulating parents (add, remove, reverse from candidate set for each node) . Only accept changes that maximizes the network score (Minimum Descriptor Length) [Maximize phase] • Must handle cycles.. expensive. • K2 gives this to us for free • Next: Improving K2
Genetic Algorithm for Structure Learning from Evidence, AIS, and K2 [2] Representation Evaluator for Bayesian Network Structure Learning Problems Dtrain(Structure Learning) D: Training Data Dval(Inference) : Evidence Specification f(α) Ordering Fitness α Candidate Ordering [1] Permutation Genetic Algorithm Optimized Ordering GASLEAK:A Permutation GA for Variable Ordering
Properties of the Genetic Algorithm • Elitist • Chromosome representation • Integer permutation ordering • Sample chromosome in a BBN of 5 nodes might look like: 3 1 2 0 4 • Seeding • Random shuffle • Operators • Order crossover • Swap mutation • Fitness • RMSE • Job farm • Java-based; Utilize many machines regardless of OS
Histogram of estimated fitness for all 8! = 40320 permutations of Asia variables. GASLEAK results • Not encouraging • Bad fitness functionor bad evidence b.v. • Many graph errors
Master’s Thesis: SLAM GA • SLAM GA – Structure Learning Adjacency Matrix Genetic Algorithm • Initial population- tried several approaches: • Completely Random Bayesian Networks (Box-Muller, Max parents) • Many illegal structures; wrote fixCycles algorithm. • Random networks generated from parents pre-selected by the Restrict phase of Sparse Candidate • Performed better than random • Aggregate of k learned networks from K2 given random orderings (cycles eliminated) – Best approach
K2 Random Order Aggregator Instantiater Training Data K2 Manager BBN K2 Random Order D 1 K2 Random Order BBN 2 Aggregator Aggregate BBN . . . . BBN BBN k • For small networks, k=1 is best. For larger networks, k=2 is best.
SLAM GA • Chromosome representation • Edge matrix – n^2 bits • Each bit represents a parent edge to node. • 1 = parent, 0 = not parent • Operators • Crossover: Swap parents, fix cycles.
SLAM GA • Chromosome representation • Edge matrix – n^2 • Each bit represents a parent edge to node. • 1 = parent, 0 = not parent • Operators • Crossover: Swap parents, fix cycles. • Mutation: Reverse, delete, or add a random number of edges. Fix cycles. • Fitness • Total Bayesian Dirichlet equivalencescore for each node
Learned network 1 Graph Error Results - Asia Best of first generation Actual 15 Graph Errors
Learned network 2 Graph Errors Results - Poker Best of first generation Actual 11 Graph Errors
Learned network 4 Graph Errors Results - Golf Best of first generation Actual 11 Graph Errors
Learned network Results – Boerlage92 Initial Actual
K2 vs. SLAM GA • K2: • Very good if ordering is known • Ordering is often not known • Greedy, very dependent on ordering. • SLAM GA • Stochastic; falls out of local optima trap • Can improve on bad structures learned by K2 • Takes much longer than K2
GASLEAK vs. SLAM GA • GASLEAK: • Gold network never recovered • Much more computationally-expensive • K2 is run on each [new] individual each generation • Each chromosome must be scored • Final network has many graph errors • SLAM GA • For small networks, gold standard network often recovered. • Relatively few graph errors for final network. • Less computationally intensive • Initial population most expensive • Each chromosome must be scored
SLAM GA: Ramifications • Effective structure learning algorithm • Ideal for small networks • Improvement over GASLEAK • SLAM GA faster in spite of same GA parameters • SLAM GA more accurate • Improvement over K2 • Aggregate algorithm produces better initial population • Parent-swapping crossover technique effective • Diversifies search space while retaining past information
SLAM GA: Future Work • Parameter tweaking • Better fitness function • Several ‘bad’ structures score better than gold standard • GA works fine • ‘Intelligent’ mutation operator • Add edges from pre-qualified set of candidate parents • New instantiation methods • Use GASLEAK • Other structure-learning algorithms • Scalability • Job farm
Summary • Bayesian Network • Genetic Algorithms • Learning Structure: K2, Sparse Candidate • GASLEAK • SLAM GA