Ab Initio Profile HMM Generation

Ab Initio Profile HMM Generation Sam Gross

STOLEN FROM BATZOGLOU LECTURE Dm-1 Dm D1 D2 BEGIN END I0 I1 Im-1 Im M1 M2 Mm Profile HMMs • Each M state has a position-specific pre-computed substitution table • Each I and D state has position-specific gap penalties • Profile is a generative model: • The sequence X that is aligned to H, is thought of as “generated by” H • Therefore, H parametrizes a conditional distribution P(X | H) Protein profile H

Õ P ( x | H ) i x i Ab Initio Profile Generation • Given N related protein sequences x1…xN • Construct a profile HMM H such that is maximized

Easier Said Than Done • Profile HMM length is unknown • Use average sequence length • Alignment is unknown • HMM parameters are unknown

Not A New Problem • Instance of the general problem of HMM parameter estimation using unlabelled outputs • Instance of the even more general problem of MLE with partially missing data • We want • We know q arg max P ( D | ) obs q q P ( D , D | ) obs hid

The Expectation Maximization (EM) Algorithm • Start with initial guess for parameters • Iterate until convergence: • E-step: Calculate expectations for missing data • M-step: Treating expectations as observations, calculate MLE for parameters

Baum-Welsh: EM For HMMs • Start with initial guess of HMM parameters • Iterate until convergence: • Forward-backward algorithm • MLE using forward-backward posterior probabilities

Incorporating Prior Knowledge • We know in advance certain types of residues tend to align together • Use a Dirichlet mixture prior over outputs for match states • Each distribution in the mixture corresponds to a different “alignment environment”

Coin Flips Example • Two trick coins used to generated a sequence of heads and tails • You see only the sequence, and must determine the probability of heads for each coin Coin A Coin B

10,000 Coin Flips • Real coins • PA(heads) = 0.4 • PB(heads) = 0.8 • Initial guess • PA(heads) = 0.51 • PB(heads) = 0.49 • Learned model • PA(heads) = 0.801 • PB(heads) = 0.413

Toy Profile Example • Create a profile for the following sequences: • ADACGIH • ADAGIH • ADACGH • AACQH • ADAYGIH • Use the profile to align the sequences

Results ADACGIH ADA-GIH ADACG-H A-ACQ-H ADAYGIH Match1 A 100% Match2 D 100% Match3 A 100% Match4 C 75%, Y 25% Match5 G 80%, Q 20% Match6 I 62%, H 38% Match7 H 100%

Õ P ( x | F ( x )) i i x i Clustering With A Mixture Of Profiles • Given N protein sequences x1…xN • Construct M profile HMMs H1…HM and a mapping F: xH such that is maximized • F is a natural clustering of the protein sequences into M groups

Ab Initio Profile HMM Generation

Ab Initio Profile HMM Generation

Presentation Transcript

Ab initio Protein Structure Prediction

Generation of Empirical Tight Binding Parameters from ab -initio simulations

SFB-761 “Stahl – ab initio” Sub-project A2 “ Ab initio thermodynamics and kinetics”

Ab-initio protein structure prediction

Ab Initio Molecular Orbital Theory

Ab Initio Profile HMM Generation

Ab Initio online training | Online Ab Initio Training in usa

Mandarin Chinese Ab Initio

Mandarin Chinese Ab Initio

AB INITIO

Ab initio

Ab initio REMPI

Bridging scales: Ab initio atomistic thermodynamics

Introduction to ab initio methods I

Correlated ab Initio Methods

Ab initio REMPI

Ab-initio protein structure prediction

RUSSIAN ab Initio

What Does Ab Initio Mean?