130 likes | 145 Views
Multiple Sequence Alignment. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune urmila@bioinfo.ernet.in. Approaches: MSA. Dynamic programming Progressive alignment: ClustalW Genetic algorithms: SAGA. Progressive alignment approach. Align most related sequences
E N D
Multiple Sequence Alignment Urmila Kulkarni-Kale Bioinformatics Centre University of Pune urmila@bioinfo.ernet.in
Approaches: MSA • Dynamic programming • Progressive alignment: ClustalW • Genetic algorithms: SAGA
Progressive alignment approach • Align most related sequences • Add on less related sequences to initial alignment • Perform pairwise alignments of all sequences • Use alignment scores to produce phylogenetic tree • Align sequences sequentially, guided by the tree • Gaps are added to an existing profile in progressive methods
Pairwise alignment: Calculate the distance matrix Unrooted Neighbor-joining tree Rooted NJ tree Sequence weights Progressive alignment using Guide tree Steps in ClustalW Algorithm
ClustalW: weight • groups of related sequences receive lower weight • highly divergent sequences without any close relatives receive high weights
ClustalW: affine Gap penalty • GOP: Gap Opening Penalty • GEP: Gap Extension Penalty Heuristics in calculating gap penalty • Position specific penalty • gap at position? • yes lower GOP and GEP • no, but gap within 8 residues increase GOP • stretch of hydrophilic residues? • yes lower GOP • no use residue-specific gap propensities Once a gap, always a gap
Highest GOP in ‘Gapped regions’ Variation in local GOP Lowest GOP in Hydrophilic regions Initial GOP
MSA: help detect Similarity Hemoglobin: Human, chimpanzee, Goat, pig, horse & mouse
Applications of MSA • Detecting diagnostic patterns • Phylogenetic analysis • Primer design • Prediction of protein secondary structure • Finding novel relationships between genes • Similar genes conserved across organisms • Same or similar function • Simultaneous alignment of similar genes yields: • regions subject to mutation • regions of conservation • mutations or rearrangements causing change in conformation or function
Limitations of Progressive alignment approach • Greedy nature • Any errors in the initial alignment are carried through • More efficient for closely related sequences than for divergent sequences