160 likes | 351 Views
Genome Alignment. Alignment. Take a set of sequences. Find where they match. Arrange sequences in a matrix where columns contain homologous (corresponding?) characters from each sequence. Types of Alignments. Global – include the entire length of all sequences in the alignment
E N D
Alignment • Take a set of sequences. Find where they match. • Arrange sequences in a matrix where columns contain homologous (corresponding?) characters from each sequence
Types of Alignments • Global – include the entire length of all sequences in the alignment • Local – identify and align subsets of longer sequences
Alignment Methods • Needleman-Wunsch (global) and Smith-Waterman (local) use dynamic programming • Guaranteed to find an optimal alignment given a particular scoring function • Too computationally intensive for genome alignment, especially multiple genomes
Dynamic Programming • One possible simple scoring scheme: • Si,j = 1 if the residue at position i of sequence #1 is the same as the residue at position j of sequence #2 (match score); otherwise • Si,j = 0 (mismatch score) • w = 0 (gap penalty)
Dynamic Programming Three steps: 1) Initialize 2) Fill Matrix Mi,j = MAXIMUM[ Mi-1, j-1 + Si,j (match/mismatch in the diagonal), Mi,j-1 + w (gap in sequence #1), Mi-1,j + w (gap in sequence #2)]
Dynamic Programming 3) Traceback G A A T T C A G T T A G G A - T C - G - - A Score = 1+0+1+0+1+1+0+1+0+0+1 = 6
Genome Alignment • Depending on level of similarity, genome alignments may need to contend with rearrangements and large-scale duplications and deletions • Draft or partial genomes can both benefit from and confound alignment • Need to visualize results in summary form
Genome Alignment • Pair-wise • Align two genomes • Example: MUMmer • Multiple or complex samples and a reference genome • All of one genome plus whatever parts match from the other genome(s) • Example: PIPs • Multiple alignment • All of all the genomes • Example: Mauve
MUMmer (Maximal Unique Match) http://mummer.sourceforge.net/ • Fast pair-wise comparison of draft or complete genomes using nucleotide or 6-frame translated sequences • MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer
Suffix Tree Delcher et al. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002 Jun 1;30(11):2478-83.
MUMMER plot Genome 2 Genome 1
5 Campylobacter PROmer analysis • Fouts et al. Major structural differences and novel potential virulence mechanisms from the genomes of multiple campylobacter species. PLoS Biol. 2005 Jan;3(1):e15. • One genome is used as the x-axis for all four pair-wise comparisons • X-shape characteristic of collinearity interrupted by inversions around the origin or terminus of replication • Loss of collinearity in more distant comparisons
Human Gut metagenomePercent Identity Plot (PIP) of random shotgun reads to a complete Bifidobacterium genome and a good quality draft Methanobrevibacter genomeGill et al. Metagenomic analysis of the human distal gut microbiome. Science. 2006 Jun 2; 312(5778): 1355-9.
Mauve Multiple Genome Aligner • Able to identify and align collinear regions of multiple genomes even in the presence of rearrangements • Find and extend seed matches • Group into locally collinear blocks • Align intervening regions • Darling et al. Genome Res. 2004 Jul;14(7):1394-403.
Progressive Mauve alignment of 12 E. coli genome Aaron Darling 2006 Ph.D. thesis,http://gel.ahabs.wisc.edu/~darling/darling_thesis.pdf