120 likes | 341 Views
MSCS230: Bioinformatics I. 2. Overview. The Dishonest Casino SolutionModel TrainingPairwise AlignmentProfile HMMs for sequence families, MSAGene Prediction. MSCS230: Bioinformatics I. 3. The Dishonest Casino. 1: 1/62: 1/63: 1/64: 1/65: 1/66: 1/6. 1: 1/102: 1/103: 1/104: 1/105: 1/106: 1
E N D
1. Bioinformatic Applications of Hidden Markov Models Craig A. Struble, Ph.D.
Department of Mathematics, Statistics, and Computer Science
Marquette University
2. MSCS230: Bioinformatics I 2 Overview The Dishonest Casino Solution
Model Training
Pairwise Alignment
Profile HMMs for sequence families, MSA
Gene Prediction
3. MSCS230: Bioinformatics I 3 The Dishonest Casino The dynamic programming algorithm is called the Viterbi algorithm.The dynamic programming algorithm is called the Viterbi algorithm.
4. MSCS230: Bioinformatics I 4 Model Training Estimation of model parameters from data
State sequence is known
Count the number of times a transition/emission is taken
Ratio of specific transition/emission vs. total transition/emission
State sequence is unknown
Baum-Welch [1972]
Iterative procedure: initial estimate, consider probable paths, update parameters, repeat
Expectation maximization
5. MSCS230: Bioinformatics I 5 Pairwise Alignment Global alignment
6. MSCS230: Bioinformatics I 6 Pairwise Alignment Local alignment
7. MSCS230: Bioinformatics I 7 Profile HMMs HMM for consensus sequences
8. MSCS230: Bioinformatics I 8 Profile HMMs
9. MSCS230: Bioinformatics I 9 Profile HMM Applications Recognize structural elements
Multiple sequence alignments
Database searching
10. MSCS230: Bioinformatics I 10 Gene Prediction GenScan
C. Burge and S. Carlin, Prediction of complete gene structures in human genomic DNA, J Mol Biol 1997 Apr 25;268(1):78-94
Hidden Markov Model of gene structure
http://genes.mit.edu/GENSCAN.html
11. MSCS230: Bioinformatics I 11 GenScan The 3 to 5 side is the same, but all arrows are inverted.
Legend
F - 5 UTR
T - 3 UTR
N - intergenic region
E - Exon
I - Intron (these are split up based on the codon position of intron termination)
The 3 to 5 side is the same, but all arrows are inverted.
Legend
F - 5 UTR
T - 3 UTR
N - intergenic region
E - Exon
I - Intron (these are split up based on the codon position of intron termination)
12. MSCS230: Bioinformatics I 12 GenScan Training Trained on a set of annotated human genes
Labeled state sequence
What if you wanted use GenScan for mouse? rat? fish? yeast?