230 likes | 414 Views
CISC 841 Bioinformatics (Fall 2008) Hidden Markov Models. Model Structure. 0.70. 0.65. A: .170 C: .368 G: .274 T: .188. A: .372 C: .198 G: .112 T: .338. 0.35. 0.30. +. −. Hidden Markov models (A quick review). A Markov chain of states
E N D
CISC 841 Bioinformatics(Fall 2008)Hidden Markov Models Model Structure CISC841, F08, Liao
0.70 0.65 A: .170 C: .368 G: .274 T: .188 A: .372 C: .198 G: .112 T: .338 0.35 0.30 + − Hidden Markov models (A quick review) • A Markov chain of states • At each state, there are a set of possible observables (symbols), and • The states are not directly observable, namely, they are hidden. • E.g., CpG island • Three major problems • Most probable state path (Vertibi algorithm) • The likelihood (Forward algorithm) • Parameter estimation for HMMs (Vertibi training, Baum-Welch) CISC636, S08, Lec9, Liao
0.70 0.65 A: .170 C: .368 G: .274 T: .188 A: .372 C: .198 G: .112 T: .338 0.35 0.30 + − • The probability that sequence x is emitted by a state path π is: • P(x, π) = ∏i=1 to Leπi (xi) a πi πi+1 • i:123456789 • x:TGCGCGTAC • π :--++++--- • P(x, π) = 0.338 ×0.70 × 0.112 × 0.30 × 0.368 × 0.65 × 0.274 × 0.65 × 0.368 × 0.65 × 0.274 × 0.35 × 0.338 × 0.70 × 0.372 × 0.70 × 0.198. • The probability to observe sequence x in the model is • P(x) = π P(x, π), • which is also called the likelihood of the model. • Decoding: Given an observed sequence x, what is the most probable state path, i.e., • π* = argmax πP(x, π) CISC636, S08, Lec9, Liao
How to construct a model (the structure)? CISC841, F08, Liao
D D D I I I I Start M M M End Hidden Markov Model Observed emission/transition counts node position 0 1 2 3 ------------------ A – 4 0 0 C – 0 0 4 G – 0 3 0 T – 0 0 0 ------------------ A 0 0 6 0 C 0 0 0 0 G 0 0 1 0 T 0 0 0 0 ------------------ MM 4 3 2 4 MD 1 1 0 0 MI 0 0 1 0 IM 0 0 2 0 ID 0 0 1 0 II 0 0 4 0 DM – 0 0 1 DD – 1 0 0 DI – 0 2 0 X X . . . X bat A G – – – C rat A – A G –C cat A G – A A– gnat – – A A AC goat A G – – – C 1 2 3 0 1 2 3 CISC841, F07, Liao
TMMOD: An improved hidden Markov model for predicting transmembrane topology (Bioinformatics 2005) CISC841, F08, Liao
Fitness value: CISC841, F08, Liao
L. Petersen et al, (2003), J. Mol. Biol. 326:1361-1372. CISC841, F08, Liao
BMC Bioinformatics, 2007. CISC841, F08, Liao