1 / 62

Genome Rearrangements

Genome Rearrangements. Basic Biology: DNA. Genetic information is stored in deoxyribonucleic acid (DNA) molecules. A single DNA molecule is a sequence of nucleotides adenine ( A ) cytosine ( C ) guanine ( G ) thymine ( T ). phosphate. nitrogenous base. pentose sugar. Nucleotide.

Download Presentation

Genome Rearrangements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome Rearrangements

  2. Basic Biology: DNA • Genetic information is stored in deoxyribonucleic acid (DNA) molecules. • A single DNA molecule is a sequence of nucleotides • adenine (A) • cytosine (C) • guanine (G) • thymine (T) phosphate nitrogenous base pentose sugar Nucleotide DNA molecule

  3. Basic Biology: DNA • Paired DNA strands are in reverse complementary orientation. • One in forward, 5’ to 3’ direction • The other in reverse, 3’ to 5’ direction • Both strands are complementary. • A pairs with a T • G pairs with a C 3’ 5’ 3’ 5’ reverse strand forward strand Image modified with the permission of the National Human Genome Research Institute (NHGRI), artist Darryl Leja.

  4. Basic Biology: Genome • The genome is the entire hereditary information of an organism. • Genomes are partitioned into chromosomes. • A chromosome can be linear (eukaryotes), or circular (prokaryotes). Image modified with the permission of the National Human Genome Research Institute (NHGRI), artist Darryl Leja.

  5. The Human Karyogram Karyotype of a human male. Courtesy: National Human Genome Research Institute

  6. Changes in Genomic Sequences • Genomes of different species (even of closely related individuals) differ from one another. • These differences are caused by • point mutations, in which only one nucleotide is changed, and • genome rearrangements, where multiple nucleotides are modified.

  7. Point Mutations • Insertion …ATGGCG… → …ATGTGCG… • Deletion …ATGTGCG…→ …ATGGCG… • Substitution …ATGTGCG… → …ATGCGCG… …ATG-GCATGTGCGATGTGCG… …ATGTGCATG-GCGATGCGCG… DNA sequence alignment showing matches, mismatches, and insertions/deletions

  8. Genome Rearrangements • Reversal • Translocation • Fission • Fusion 1 2 3 4 5 6 7 8 9 1 2 3 6 5 4 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 13 14 15 10 11 12 13 14 15 10 11 12 5 6 7 8 9 1 2 3 4 1 2 3 4 5 6 7 8 9 5 6 7 8 9 1 2 3 4 1 2 3 4 5 6 7 8 9 5 6 7 8 9

  9. Levenshtein’sEdit Distance • Let A and B be two sequences (genomes). The minimum number of edit operations that transforms A into B defines the edit distance, dedit, between A and B. • Possible edit operations: • point mutations • genome rearrangements

  10. A Word Puzzle • To transform a start word into a target word, change, add, or delete characters until the target is reached. • Example: start “spices” target “lice”: • spices → slices → slice → lice • spices → spice→ slice→ lice • How many steps do you need to transform • a republican into a democrat? • Google into Yahoo?

  11. Edit Distance Using Point Mutations S1=AGCTT, S2=AGCCTG, S3=ACAG AGCTT AGCTG AGCCTG  dedit(S1,S2) = 2 AGCTT AGCTG AGCAG ACAG  dedit(S1,S3) = 2 AGCCTG AGCTG AGCAG ACAG  dedit(S2,S3) = 2 TG insert C TG TA delete G delete C TA delete G

  12. Edit Distance and Evolution • The edit distance is often used to infer evolutionary relationships. • Parsimony assumption:the minimum number of changes reflects the true evolutionary distance Parsimonious phylogeny inferred from edit distances

  13. Levenshtein’sEdit Distance • Let A and B be two sequences (genomes). The minimum number of edit operations that transforms A into B defines the edit distance, dedit, between A and B. • Possible edit operations: • point mutations • genome rearrangements

  14. Rearrangements and Anagrams • An anagram is a rearrangement of a word or phrase into another word or phrase. • eleven plus two → twelve plus one • forty five → over fifty Please visit the Internet Anagram web server at http://wordsmith.org/anagram/.

  15. Rearrangements and Anagrams Dot plot: Mouse genome vs. Human genome Dot plot: “spendit” vs. “stipend”

  16. Genome Comparison: Human - Mouse • Humans and mice have similar genomes, but their genes are in a different order. • How many edits (rearrangements) are needed to transform human into mouse? Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  17. Transforming Mice into Humans a) Mouse and human share a common ancestor b) They share the same genes, but in a different order c) A series of rearrangements transforms one genome into the other

  18. History of Chromosome X Rat Consortium, Nature, 2004

  19. Dobzhansky’s Experiment Giant polytene chromosomesModified from T.S. Painter, J. Hered. 25:465–476, 1934. Harvesting polytene chromosomestaken from BioPix4U Drosophila melanogaster life cycletaken from FlyMove

  20. Dobzhansky’s Experiment Chromosome 3 of Drosophila pseudoobscura Standard and Arrowhead arrangements differ by an inversion from segments 70 to 76 Figures taken from Dobzhansky T, Sturtevant AH. Genetics (1938), 23(1):28-64.

  21. Dobzhansky’s Experiment Configurations observed in various inversion heterozygotes Figures taken from Dobzhansky T, Sturtevant AH. Genetics (1938), 23(1):28-64.

  22. Dobzhansky’s Experiment Phylogeny for 3rd chromosome of D. pseudoobscura Single and Double Inversions Figures taken from Dobzhansky T, Sturtevant AH. Genetics (1938), 23(1):28-64.

  23. 1 2 3 9 10 8 4 7 5 6 Unsigned Reversals 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  24. Unsigned Reversals 1 2 3 9 10 8 4 7 5 6 1, 2, 3, 8, 7, 6, 5, 4, 9, 10 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  25. Unsigned Reversals and Gene Orders p1 = 5 1 4 3 2 6 7 8 9 10 r(1,2) p2 = 1 5 4 3 2 6 7 8 9 10 r(2,5) p3 = 1 2 3 4 5 6 7 8 9 10

  26. Reversal Edit Distance • Goal: Given two permutations, find the shortest series of reversals that transforms one into another • Input: Permutations pand s • Output: A series of reversals r1,…,rttransforming p into s, such that t is minimum • t - reversal distance between p and s • drev(p, s) - smallest possible value of t, given p and s Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  27. Sorting by Reversals Problem • Goal: Given a permutation, find a shortest series of reversals that transforms it into the identity permutation (1 2 … n ) • Input: Permutation π • Output: A series of reversals r1,…, rt transforming π into the identity permutation such that t is minimum • Reversal Distance Problem and Sorting by Reversals Problem are equivalent. Why? Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  28. Algorithm 1: GreedyReversalSort(π) 1 fori1 to n – 1 2 j position of element i in π(i.e. π[j]=i) 3 ifj≠i 4 π  π • r(i, j) 5 outputπ 6 if π is the identity permutation 7 return Taken from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  29. GreedyReversalSort is Not Optimal • For p = 6 1 2 3 4 5 the algorithm needs 5 steps: • Step 0: 6 1 2 3 4 5 • Step 1: 1 6 2 3 4 5 i=1; j=2; r(1,2) • Step 2: 1 2 6 3 4 5 i=2; j=3; r(2,3) • Step 3: 1 2 3 6 4 5 i=3; j=4; r(3,4) • Step 4: 1 2 3 4 6 5 i=4; j=5; r(4,5) • Step 5: 1 2 3 4 5 6 i=5; j=6; r(5,6) • However, two reversals are enough: • Step 0: 6 1 2 3 4 5 • Step 1: 6 5 4 3 2 1 • Step 2: 1 2 3 4 5 6

  30. Adjacencies & Breakpoints • An adjacency is a pair of adjacent elements that are consecutive • A breakpoint is a pair of adjacent elements that are not consecutive • b(p)is the number of breakpoints in p π = 5 6 2 1 3 4 Extend π with π0 = 0 and π7 = 7 adjacencies 0 5 6 2 1 3 4 7 breakpoints, b(p)=4 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  31. Reversal Distance and Breakpoints • One reversal eliminates at most 2 breakpoints. p = 02 3 1 4 6 5 7b(p ) = 5 p1 = 01 3 24 6 5 7b(p1) = 4 p2 = 0 1 2 3 4 6 57b(p2) = 2 p3 = 0 1 2 3 4 5 6 7 b(p3) = 0 • This implies: reversal distance ≥ b(p )/ 2 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  32. Strips • An interval between two consecutive breakpoints in a permutation is called a strip. • Astrip is increasing if its elements increase. • Otherwise, the strip is decreasing. 01 5 6 7 4 3 2 8 9 10 • A single-element strip is considered decreasing with exception of the strips [0] and [n+1]. Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  33. Strips and Breakpoints Observation 1:If a permutation contains a decreasing strip, then there exists a reversal that will decrease the number of breakpoints. 01 5 6 7 4 3 28 9 100 1 2 3 4 7 6 5 8 9 10 Observation 2: Otherwise,create a decreasing strip by reversing an increasing strip. The number of breakpoints can be reduced in the next step. 0 1 5 6 7 2 3 4 8 9 1001 5 6 7 4 3 2 8 9 10 r(3,8) r(6,8) Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  34. Algorithm2: BreakpointReversalSort(π) 1 whileb(π) > 0 2 if πhas a decreasing strip Choose reversal r that minimizes b(π • r) 4 else 5 Choose a reversal r that flips an increasing strip in π 6 π π • r 7 output π 8 return Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  35. Performance Guarantee • BreakpointReversalSort (BRS) is an approximation algorithm that will not use more than four times the minimum number of reversals. • BRS eliminates at least one breakpoint every two steps: dBRS ≤ 2b(p) steps • An optimal algorithm eliminates at most two breakpoints every step: dOPTb(p) / 2 steps • Performance guarantee: dBRS / dOPT [ 2b(p) / (b(p)/2) ] = 4 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  36. Gene Orientation & Genome Representation modified from http://acim.uqam.ca/~anne/INF4500/Rearrangements.ppt

  37. Genome Rearrangements

  38. Signed Reversals 5’ ATGCCTGTACTA 3’ 3’ TACGGACATGAT 5’ Break and Invert 5’ ATGTACAGGCTA 3’ 3’ TACATGTCCGAT 5’ Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  39. 1 2 3 9 10 8 4 7 5 6 Signed Reversals 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  40. Signed Reversals 1 2 3 9 10 8 4 7 5 6 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  41. Signed Reversals and Breakpoints 1 2 3 9 10 8 4 7 5 6 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 The reversal introduced two breakpoints Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

  42. Summary: Complexity Results • Sorting by unsignedreversals: • NP-hard • can be approximated within a constant factor • Sorting by signedreversals: • can be solved in polynomial time

  43. Web Tools • GRIMM Web Server • computes signed and unsigned reversal distances between permutations. http://www-cse.ucsd.edu/groups/bioinformatics/GRIMM • Cinteny • a web server for synteny identification and the analysis of genome rearrangement http://cinteny.cchmc.org/

  44. DCJ Genome Rearrangements • The DCJ model uses Double-Cut-and-Join genome rearrangement operations. • DCJ operations break and rejoin one or two intergenic regions (possibly on different chromosomes).

  45. Genome Representation Example. linear c1=(o 1 -2 3 4 o) circular c2=(5 6 7) • In the DCJ model, a genome is grouped into chromosomes (linear/circular). • A gene g on the forward strand is represented by [-g,+g] • A gene g on the reverse strand is represented by [+g,-g] • Telomeres are represented by the special symbol ‘o’. • An adjacency (intergenic region) is encoded by the unordered pair of neighboring gene/telomere ends.

  46. DCJ Operations • The double-cut-and-join operation “breaks” two adjacencies and rejoins the fragments: {a, b} {c, d} → {a,d} {c,b}, or {a,c} {b,d}. • a, b, c, and d represent different (signed) gene ends or telomeres (with ‘+o’ = ‘-o’). • A special case occurs for c=d=o: {a,b} {o,o} ↔ {a,o} {b,o}.

  47. Signed reversal of genes 2 and 3

  48. Chromosome Linearization

  49. Weird genme transformation

  50. Using Graphs to Sort Genomes Example: • Adjacency graph AG(A,B)=(V,E) is a bipartite graph. • V contains one vertex for each adjacency of genome A and B. • Each gene, g, defines two edges: • e1 connecting the adjacencies with +g of A and B • e2 connecting the adjacencies with –g. genome A: (o 1 -2 3 4 o) (5 6 7) genome B: (o 1 2 3 4 o) (o 5 6 7 o)

More Related