1 / 15

Bioinformatics

Bioinformatics. Multiple Alignment. Overview. Introduction Multiple Alignments Global multiple alignment Introduction Scoring Algorithms. Algorithms. Multiple Alignment. HMM. Dynamic Programming Heuristic Searches. Pattern recognition. Motif Searches. Database searches. Chapter 2.

rrafferty
Download Presentation

Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Multiple Alignment

  2. Overview • Introduction Multiple Alignments • Global multiple alignment • Introduction • Scoring • Algorithms

  3. Algorithms Multiple Alignment HMM Dynamic Programming Heuristic Searches Pattern recognition Motif Searches Database searches Chapter 2

  4. Introduction • Global multiple alignment (ClustalW) • Proteins, nucleotides • Long stretches of conservation essential • Identification of protein family profiles • Score gaps • Local multiple alignments (Motif Detection, Profile construction) • Proteins, nucleotides • Short stretches of conservation (12 NT, 6 AA) • Identification of regulatory motifs (DNA, protein) • No explicit gap scoring • Explicit use of a profile

  5. Introduction • Evolution • duplication • speciation Primary sequence Homologs in related organisms Families of proteins Multiple sequence alignment Features characteristic for the whole family

  6. Introduction Multiple sequence alignment Features characteristic for the protein family Profile (HMM) Detect remote members of the family Phylogeny Reconstruct phylogenetic relationships

  7. Scoring a multiple alignment Assumption: • Independency between columns • Residues within column independent (I.e. representative members of a sequence family should be chosen, all evolutionary subfamilies should be represented) • Sequence score: score for all the columns and gaps

  8. Scoring S(a,b) from scoring matrix PAM or BLOSUM • Sums of pair score is an approximation • But for tree-way alignment • SP problem: • N sequences with L (score L is 5) • N-1 sequences with L and one with G (score G is -4) instead of a b c RALRTLCALRAG relative difference in score between the correct and the incorrect alignment decreases with the number of sequences in the alignment Counterintuitive !

  9. Algorithm Multidimensional dynamic programming Tedious formalism (optimal alignment) • computation of the whole dynamic programming matrices L1,L2,…LN entries • Maximize over all 2N-1 combinations of gaps in a column • Time complexity (2N LN) Clever algorithm : Carrillo & Lipman (MSA)

  10. Algorithm Pairwise sequence alignments Multiple sequence alignment “once a gap always a gap” Progressive alignment Progressive clustering D C B A Similarity matrix Guide tree

  11. Algorithm

  12. Algorithm Progressive alignment methods • Hierarchical (heuristic): succession of pairwise alignments • Two sequences are aligned by standard pairwise alignment • This alignment is fixed • Align next sequence • Different algorithms • Order of the alignment • Progression: • Alignment of a new sequence to a growing alignment • Subfamilies are built up on a tree structure and alignments are aligned to alignments • Process used to align and score sequences to alignments • Heuristic approach: • Align most similar pairs of sequences first • Most similar is based on a guide tree (quick and dirty and unsuitable for phylogenetic inference)

  13. Algorithm Disadvantage But it is advantageous to use position specific information from an existing alignment e.g. mismatches at highly conserved positions should be penalized more than mismatches at variable positions e.g. gap penalties might increase in regions which do not contain gaps as compared to regions which contain gaps PROFILE ALIGNMENT (hidden Markov, frequency matrices) C T T G T C A T G T C A C T T C A T T G

  14. PROFILE based progressive multiple alignment : CLUSTALW Algorithm • Construct distance matrix by pairwise dynamic programming • Convert similarity scores to evolutionary distances • Construct a guide tree (clustering, neighbour joining clustering) • Progressively align in order of decreasing similarity • Sequence-sequence • Sequence-profile • Profile-profile • Weighting to compensate for defects in SP • Closely related: hard matrices (BLOSUM80), distant related soft matrices (BLOSUM50) • Gap penalties adapted • To hydrophobicity of the residue • Gap-open and gap-extend penalties increased if there are no gaps in a column

  15. Algorithm • Further improvement • Iterative refinement • Problem: progressive alignment: subalignments are frozen • Solution: • Iterative alignment: remove sequence from alignment and realign • Repeat realignment until the alignment score converges

More Related