1 / 16

Genome Alignment

Genome Alignment. Alignment. Take a set of sequences. Find where they match. Arrange sequences in a matrix where columns contain homologous (corresponding?) characters from each sequence. Types of Alignments. Global – include the entire length of all sequences in the alignment

gino
Download Presentation

Genome Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome Alignment

  2. Alignment • Take a set of sequences. Find where they match. • Arrange sequences in a matrix where columns contain homologous (corresponding?) characters from each sequence

  3. Types of Alignments • Global – include the entire length of all sequences in the alignment • Local – identify and align subsets of longer sequences

  4. Alignment Methods • Needleman-Wunsch (global) and Smith-Waterman (local) use dynamic programming • Guaranteed to find an optimal alignment given a particular scoring function • Too computationally intensive for genome alignment, especially multiple genomes

  5. Dynamic Programming • One possible simple scoring scheme: • Si,j = 1 if the residue at position i of sequence #1 is the same as the residue at position j of sequence #2 (match score); otherwise • Si,j = 0 (mismatch score) • w = 0 (gap penalty)

  6. Dynamic Programming Three steps: 1) Initialize 2) Fill Matrix Mi,j = MAXIMUM[ Mi-1, j-1 + Si,j (match/mismatch in the diagonal), Mi,j-1 + w (gap in sequence #1), Mi-1,j + w (gap in sequence #2)]

  7. Dynamic Programming 3) Traceback G A A T T C A G T T A G G A - T C - G - - A Score = 1+0+1+0+1+1+0+1+0+0+1 = 6

  8. Genome Alignment • Depending on level of similarity, genome alignments may need to contend with rearrangements and large-scale duplications and deletions • Draft or partial genomes can both benefit from and confound alignment • Need to visualize results in summary form

  9. Genome Alignment • Pair-wise • Align two genomes • Example: MUMmer • Multiple or complex samples and a reference genome • All of one genome plus whatever parts match from the other genome(s) • Example: PIPs • Multiple alignment • All of all the genomes • Example: Mauve

  10. MUMmer (Maximal Unique Match) http://mummer.sourceforge.net/ • Fast pair-wise comparison of draft or complete genomes using nucleotide or 6-frame translated sequences • MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer

  11. Suffix Tree Delcher et al. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002 Jun 1;30(11):2478-83.

  12. MUMMER plot Genome 2 Genome 1

  13. 5 Campylobacter PROmer analysis • Fouts et al. Major structural differences and novel potential virulence mechanisms from the genomes of multiple campylobacter species. PLoS Biol. 2005 Jan;3(1):e15. • One genome is used as the x-axis for all four pair-wise comparisons • X-shape characteristic of collinearity interrupted by inversions around the origin or terminus of replication • Loss of collinearity in more distant comparisons

  14. Human Gut metagenomePercent Identity Plot (PIP) of random shotgun reads to a complete Bifidobacterium genome and a good quality draft Methanobrevibacter genomeGill et al. Metagenomic analysis of the human distal gut microbiome. Science. 2006 Jun 2; 312(5778): 1355-9.

  15. Mauve Multiple Genome Aligner • Able to identify and align collinear regions of multiple genomes even in the presence of rearrangements • Find and extend seed matches • Group into locally collinear blocks • Align intervening regions • Darling et al. Genome Res. 2004 Jul;14(7):1394-403.

  16. Progressive Mauve alignment of 12 E. coli genome Aaron Darling 2006 Ph.D. thesis,http://gel.ahabs.wisc.edu/~darling/darling_thesis.pdf

More Related