1 / 23

Comp. Genomics

Comp. Genomics. Recitation 6 14/11/06 ML and EM. Outline. Maximum likelihood estimation HMM Example EM Baum-Welch algorithm. Maximum likelihood. One of the methods for parameter estimation Likelihood: L=P(Data|Parameters) Simple example: Simple coin with P(head)=p 10 coin tosses

ave
Download Presentation

Comp. Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comp. Genomics Recitation 6 14/11/06 ML and EM

  2. Outline • Maximum likelihood estimation • HMM Example • EM • Baum-Welch algorithm

  3. Maximum likelihood • One of the methods for parameter estimation • Likelihood: L=P(Data|Parameters) • Simple example: • Simple coin with P(head)=p • 10 coin tosses • 6 heads, 4 tails • L=P(Data|Params)=(106)p6 (1-p)4

  4. Maximum likelihood • We want to find p that maximizes L=(106)p6 (1-p)4 • Infi 1, Remember? • Log is a monotonically increasing function, we can optimize logL=log[(106)p6 (1-p)4]= log(106)+6logp+4log(1-p)] • Deriving by p we get: 6/p-4/(1-p)=0 • Estimate for p:0.6 (Makes sense?)

  5. ML in Profile HMMs • Emission probabilities • Mi a • Ii a • Transition Probabilities • Mi  Mi+1 • Mi  Di+1 • Mi  Ii • Ii  Mi+1 • Ii  Ii • Di  Di+1 • Di  Mi+1 • Di  Ii http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt

  6. Parameter Estimation for HMMs Input: X1,…,Xn independent training sequences Goal: estimation of  = (A,E) (model parameters) Note: P(X1,…,Xn | ) = i=1…nP(Xi | )(indep.) l(x1,…,xn | )= log P(X1,…,Xn | ) = i=1…nlog P(Xi | ) Case 1 - Estimation When State Sequence is Known: Akl = #(occurred kl transitions) Ek(b) = #(emissions of symbol b that occurred in state k) Max. Likelihood Estimators: • akl = Akl / l’Akl’ • ek(b) = Ek(b)/ b’Ek(b’) small sample, or prior knowledge correction: A’kl = Akl + rkl E’k(b) = Ek(b) + rk(b)

  7. Example • Suppose we are given the aligned sequences **---* AG---C A-AT-C AG-AA- --AAACAG---C • Suppose also that the “match” positions are marked... http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt

  8. **---* AG---C A-AT-C AG-AA- --AAACAG---C Calculating A, E count transitions and emissions: transitions emissions http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt

  9. **---* AG---C A-AT-C AG-AA- --AAACAG---C Calculating A, E count transitions and emissions: transitions emissions http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt

  10. Estimating Maximum Likelihood probabilities using Fractions emissions http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt

  11. Estimating ML probabilities (contd) transitions http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt

  12. EM - Mixture example • Assume we are given heights of 100 individuals (men/women): y1…y100 • We know that: • The men’s heights are normally distributed with (μm,σm) • The women’s heights are normally distributed with (μw,σw) • If we knew the genders – estimation is “easy” (How?) • What we don’t know the genders in our data! • X1…,X100are unknown • P(w),P(m) are unknown

  13. Mixture example • Our goal: estimate the parameters (μm,σm), (μn,σn), p(m) • A classic “estimation with missing data” • (In an HMM: we know the emmissions, but not the states!) • Expectation-Maximization (EM): • Compute the “expected” gender for every sample height • Estimate the parameters using ML • Iterate

  14. EM • Widely used in machine learning • Using ML for parameter estimation at every iteration promises that the likelihood will consistently improve • Eventually we’ll reach a local minima • A good starting point is important

  15. Mixture example • If we have a mixture of M gaussians, each with a probability αi and density θi=(μm,σm) • Likelihood the observations (X): • The “incomplete-data” log-likelihood of the sample x1,…,xN: • Difficult to estimate directly…

  16. Mixture example • Now we introduce y1,…,y100: hidden variables telling us what Gaussian every sample came from • If we knew y, the likelihood would be: • Of course, we do not know the ys… • We’ll do EM, starting from θg=(α1g ,..,αMg, μ1g,..,μMg,σ1g,.., σMg)

  17. Estimation • Given θg, we can estimate the ys! • We want to find: • The expectation is over the states of y • Bayes rule: P(X|Y)=P(Y|X)P(X)/P(Y):

  18. Estimation • We write down the Q: • Daunting?

  19. Estimation • Simplifying: • Now the Q becomes:

  20. Maximization • Now we want to find parameter estimates, such that: • Infi 2, remember? • To impose the constraint Sum{αi}=1, we introduce Lagrange multiplier λ: • After summing both sides over l:

  21. Maximization • Estimating μig+1,σig+1 is more difficult  • Out of scope here • What turns out is actually quite straightforward:

  22. What you need to know about EM: • When: If we want to estimate model parameters, and some of the data is “missing” • Why: Maximizing likelihood directly is very difficult • How: • Initial guess of the parameters • Finding a proper term for Q(θg, θg+1) • Deriving and finding ML estimators

  23. EM estimation in HMMs Input: X1,…,Xn independent training sequences Baum-Welch alg. (1972): • Expectation: • compute expected # of kl state transitions: P(i=k, i+1=l | X, ) = [1/P(x)]·fk(i)·akl·el(xi+1)·bl(i+1) Akl= j[1/P(Xj)] · i fkj(i) · akl ·el(xji+1) · blj(i+1) • compute expected # of symbol b appearances in state k Ek(b) = j[1/P(Xj)] · {i|xji=b} fkj(i) · bkj(i) (ex.) • Maximization: • re-compute new parameters from A, E using max. likelihood. repeat (1)+(2) until improvement  

More Related