1 / 35

Bayesian Learning

Bayesian Learning. Bayesian Reasoning. Basic assumption The quantities of interest are governed by probability distribution These probability + observed data ==> reasoning ==> optimal decision 의의 , 중요성 직접적으로 확률을 다루는 알고리듬의 근간 예 ) naïve Bayes classifier 확률을 다루지 않는 알고리듬을 분석하기 위한 틀

shaun
Download Presentation

Bayesian Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Learning

  2. Bayesian Reasoning • Basic assumption • The quantities of interest are governed by probability distribution • These probability + observed data ==> reasoning ==> optimal decision • 의의, 중요성 • 직접적으로 확률을 다루는 알고리듬의 근간 • 예) naïve Bayes classifier • 확률을 다루지 않는 알고리듬을 분석하기 위한 틀 • 예) cross entropy , Inductive bias decision tree, MDL principle

  3. Feature & Limitation • Feature of Bayesian Learning • 관측된 데이터들은 추정된 확률을 점진적으로 증감 • Prior Knowledge : P(h) , P(D|h) • Probabilistic Prediction에 응용 • multiple hypothesis의 결합에 의한 prediction • 문제점 • initial knowledge 요구 • significant computational cost

  4. Bayes Theorem • Terms • P(h) : prior probability of h • P(D) : prior probability that D will be observed • P(D|h) : prior knowledge • P(h|D) : posterior probability of h , given D • Theorem • machine learning : 주어진 데이터 들로부터 the most probable hypothesis를 찾는 과정

  5. Example • Medical diagnosis • P(cancer)=0.008 , P(~cancer)=0.992 • P(+|cancer) = 0.98 , P(-|cancer) = 0.02 • P(+|~cancer) = 0.03 , P(-|~cancer) = 0.97 • P(cancer|+) = P(+|cancer)P(cancer) = 0.0078 • P(~cancer|+) = P(+|~cancer)P(~cancer) = 0.0298 • hMAP = ~cancer

  6. MAP hypothesis MAP(Maximum a posteriori) hypothesis

  7. ML hypothesis • maximum likelihood (ML) hypothesis • basic assumption : equally probable a priori • basic formular • P(a^b) = P(A|B)P(B) = P(B|A)P(A)

  8. Bayes Theorem and Concept Learning • Brute-force MAP learning • for each calculate P(h|D) • find hMAP • consistent assumption • noise free data D • target concept c in hypothesis space H • every hypothesis is equally probable • Result • every consistent hypothesis is MAP hypothesis (if h is consistent with D) P(h|D) = 0 (otherwise)

  9. Consistent learner • 정의 : training example들에 대해 에러가 없는 hypothesis를 출력해 주는 알고리듬 • result : • every consistent hypothesis output == MAP hypothesis • every consistent learner output == MAP hypothesis • if uniform prior probability distribution over H • if deterministic, noise-free training data

  10. ML and LSE hypothesis • Least squared error hypothesis • NN , curve fitting, linear regression • continuous-valued target function • task : find f : di=f(xi)+ei • preliminary : • probability densities, Normal distribution • target value independence • result : • limitation : noise only in the target value

  11. ML hypothesis for predicting Probability • Task : find g : g(x) = P(f(x)=1) • question : what criterion should we optimize in order to find a ML hypothesis for g • result : cross entropy • entropy function :

  12. (BP) Gradient search to ML in NN Let G(h,D) = cross entropy By gradient ascent

  13. MDL principle • 목적 : Bayesian method에 의한 inductive bias 와 MLD principle 해석 • Shannon and weaver’s optimal code length

  14. Bayes optimal classifier • Motivation : 새로운 instance의 classification은 모든 hypothesis에 의한 prediction의 결합으로 인하여 최적화 되어진다. • task : Find the most probable classification of the new instance given the training data • answer :combining the prediction of all hypotheses • Bayes optimal classification • limitation : significant computational cost ==> Gibbs algorithm

  15. Bayes optimal classifier example

  16. Gibbs algorithm • Algorithm • 1. Choose h from H, according to the posterior probability distribution over H • 2. Use h to predict the classification of x • Gibbs algorithm의 유용성 • Haussler , 1994 • Error(Gibbs algorithm)< 2*Error(Bayes optimal classifier)

  17. Naïve Bayes classifier • Naïve Bayes classifier • difference • no explicit search through H • by counting the frequency of existing examples • m-estimate of probability = • m : equivalent sample size , p : prior estimate of probability

  18. example • (outlook=sunny,temperature=cool,humidity=high,wind=strong) • P(wind=strong|playTennis=yes)=3/9=.33 • P(wind=string|PlayTennis=no)=3/5=.60 • P(yes)P(sunny|yes)P(cool|yes)P(high|yes)P(strong|yes)=.0053 • P(no)P(sunny|no)P(cool|no)P(high|no)P(strong|no)=.0206 • vNB = no

  19. Bayes Belief Networks • 정의 • describe the joint probability distribution for a set of variables • 모든 변수들이 conditional independence일것을 요구하지 않음 • 변수들간의 부분적 의존 관계를 확률로 표현 • representation

  20. Bayesian Belief Networks

  21. Inference • Task : infer the probability distribution for the target variables • methods • exact inference : NP hard • approximate inference • theoretically NP hard • practically useful • Monte Carlo methods

  22. Learning • Env • structure known + fully observable data • easy , by naïve Bayes classifier • structure known + partially observable data • gradient ascent procedure ( by Russel , 1995 ) • ML hypothesis 와 유사 P(D|h) • structure unknown

  23. Learning(2) • Structure unknown • Bayesian scoring metric ( cooper, Herskovits, 1992 ) • K2 algorithm • cooper, Herskovits, 1992 • heuristic greedy search • fully observed data • constraint-based approach • Spirtes, 1993 • infer dependency and independency relationship • construct structure using this relationship

  24. EM algorithm • EM : estimation, maximization • env • learning in the presence of unobserved variables • the form of probability distribution is known • application • training Bayesian belief networks • training radial basis function networks • basis for many unsupervised clustering algorithm • basis for Baum-Welch’s forward-backward algorithm

  25. K-means algorithm • Env : k normal distribution들로부터 임의로 data 생성 • task : find mean values of each distribution • instance : < xi,z11,z12> • if z is known : using • else use EM algorithm

  26. K-means algorithm • Initialize • calculate E[z] • calculate a new ML hypothesis ==> converge to a local ML hypothesis

  27. General statement of EM algo • Terms •  : underlying probability distribution • x : observed data from each distribution • z : unobserved data • Y = X union Z • h : current hypothesis of  • h’ : revised hypothesis • task : estimate  from X

  28. guideline • Search h’ • if h =  : calculate function Q

  29. EM algorithm • Estimation step • maximization step • converge to a local maxima

More Related