Estimation of Distribution Algorithm based on Probabilistic Grammar with Latent Annotations

Estimation of Distribution Algorithm based on Probabilistic Grammar with Latent Annotations Written by Yoshihiko Hasegawa and Hitoshi Iba Summarized by Minhyeok Kim

Contents • Introduction • Two groups in GP-EDA • PCFG-LA • PCFG and PCFG-LA • probability of the annotated tree • Probability of a observed tree • Log-likelihood and Update formula • Assumptions • Forward-backward probability • P(T;Θ) by forward and backward • Parameter update formula • Initial parameters • PAGE(Programming with Anntated Grammar Estimation) • Experiment • Royal Tree Problem • DMAX Problem • Conclusion

Two groups in GP-EDA • Proto-type tree based method • It translates variable length tree-structures into fixed length structures • PCFG based method • It is considered to be well suited for expressing functions in GP • It’s production rules do not depend on the ascendant nodes or sibling nodes • It can not take into account the interactions among nodes

PCFG-LA(1/10)- PCFG and PCFG-LA • PCFG • 0.7 VP → V NP • 0.3 VP → V NP NP • PCFG-LA • PCFG + Latent annotations

PCFG-LA (2/10)-Probability of the annotated tree • The probability of the annotated tree • T : derivation tree • xi : annotation of ith non-terminal (all the non-terminals are numbered from the root) • X = {x1, x2, ...} • π(S[x]) : probability of S[x] at the root position • β(r) : probability of annotated production rule r • DT[X] : multi-set of used annotated rules in tree T • Θ : set of parameters Θ = {π, β}.

PCFG-LA (3/10)-Probability of a observed tree • The probability of a observed tree • It can be calculated with summing over annotations • The parameters (π and β) have to be estimated by EM algorithm

PCFG-LA (4/10)-Log-likelihood and Update formula • The differenceof log-likelihood between parameters Θ’ and Θ • the update formula can be obtained by optimizing Q(Θ’|Θ)

PCFG-LA (5/10)-Assumptions • Using not CNF but GNF • To reduce the number of parameters, assume that all right-side non-terminal symbols have the same annotation

PCFG-LA (6/10)-Forward-backward probability(1/2) • Backward probability biT(x) • The probability that the tree beneath ith non-terminal S[x] is generated • Forward probability fiT(y) • The probability that the tree above ith non-terminal S[y] is generated

PCFG-LA (7/10)-Forward-backward probability(2/2) • Backward probability • Forward probability • ch(i,T) :function which returns the set of non-terminal children indices of ith non-terminal in T • pa(i, T) : returns a parent index of ith non-terminal in T • giT :terminal symbol in CFG and is connected to ith non-terminal symbol in T

PCFG-LA (8/10)-P(T;Θ) by forward and backward • P(T;Θ) • cover(g,Ti) : function which returns a set of non-terminal indices at which the production rule generating g without annotations is rooted in Ti

PCFG-LA (9/10)-Parameter update formula • Parameter update formula • By using the forward-backward probability and optimizing Q(Θ’|Θ)

PCFG-LA (10/10)-Initial parameters • EM algorithm maximizes the log-likelihood monotonically from the initial parameters • Initial parameters • κ : random value which is uniformly distributed over [−log 3, log 3] • γ(S → g S....S) : probability of observed production rule (without annotations) • β(S[x] → g S[x]...S[x]) = 0.

Initialization of individuals Evaluation of individuals Selection of individuals Estimation of parameters Generation of new individuals PAGE(Programming with Anntated Grammar Estimation) • Flowchart

Experiment-Royal Tree Problem • Royal Tree Problem • Each function has increasing arity • a has 1 arity, b has 2,… • Perfect tree whose level is smaller by 1 level than the level • P-tree of level c is composed of the P-tree of level b • Level d royal tree problem in this experiments • To compare the performance between PAGE and PCFG-GP

Experiment-DMAX Problem • DMAX problem • MAX problem + deceptiveness • {＋m,×m}, {λ,0.95}, λr=1 • Depth 4, m(arity) 5, r(power) 3 • To show the superiority over simple GP

Conclusion • In the royal tree problem, we showed that the number of annotations greatly affects the search performance and larger annotation size offered better performance • The result of DMAX problem showed that PAGE is highly effective for the problem with strong deceptiveness • PAGE uses EM algorithm, so it is more computationally expensive • The performance of PAGE is much more superior than none-annotation algorithm • It is important to optimize these two contradicting factors which will be examined in the future work

Estimation of Distribution Algorithm based on Probabilistic Grammar with Latent Annotations

Estimation of Distribution Algorithm based on Probabilistic Grammar with Latent Annotations

Presentation Transcript

A Probabilistic Representation of Systemic Functional Grammar

Latent Semantic Indexing: A probabilistic Analysis

Selectivity Estimation using Probabilistic Models

Estimation of Distribution Algorithms

Single-Channel Audio Source Separation based on Probabilistic Latent Variable Decomposition

A New Fast Motion Estimation Algorithm Based on H.264

Lecture 5: Probabilistic Latent Semantic Analysis

Loss-based Learning with Latent Variables

A High-Quality Video Denoising Algorithm based on Reliable Motion Estimation

Relevance Feedback based on Parameter Estimation of Target Distribution

Lecture 5: Probabilistic Latent Semantic Analysis

Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammar

Estimation of Distribution Algorithms (EDA)

Probabilistic Context Free Grammar

Probabilistic Context Free Grammar

Estimation Of Distribution Algorithm based on Markov Random Fields

Probabilistic Latent Semantic Analysis

A Dynamic Caching Algorithm Based on Internal Popularity Distribution of Streaming Media

Figure 3. Distribution of E-values for Annotations.

Latent Semantic Indexing: A probabilistic Analysis

Relevance Feedback based on Parameter Estimation of Target Distribution