370 likes | 495 Views
Of Mice and Men Learning from genome reversal findings. Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming Men into Mice: the Nadeau-Taylor Chromosomal Breakage Model Revisited both papers written by Pavel Pevzner and Glenn Tesler.
E N D
Of Mice and MenLearning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming Men into Mice: the Nadeau-Taylor Chromosomal Breakage Model Revisited both papers written by Pavel Pevzner and Glenn Tesler
Reversal Distance – The minimum number of reversals to translate from one genome to another Syntney Block – region in which the same gene order is observed between species Ortholog – corresponding gene in two different species Basic, Basic Terms
Theory of reversal distance calculation A new model for presenting reversal information (primary topic of Genome Rearrangements in Mammalian Evolution) Evidence of “fragile” genome regions (primary topic of (Transforming Men into Mice) Overview
Find db(a), the reversal distance, from permutation a to permutation b where ais L 4 5 2 1 3 6 R bis L 1 2 3 4 5 6 R What are we solving?
A reversal operation, r, is defined as follows: r = [ i , j ] ar(k) = {a ( i + j - k ) if i < k < j, a(k) otherwise} Reversals
L 1 3 2 4 5 6 R A breakpoint of a with respect to b is a pair x, y of elements of Lº such that xy appears in the extended version of a, but neither xy nor the reverse pair yx appear in the extended b. Breakpoints
d(a) > b(a) / 2 Reversal Distance: Guess #1 ...we can do better than that!
Extended a Reality and Desire Construction Terminals Reality Edges Reality and Desire Edges
Reality and Desire Diagram - RD(a) Reality and Desire Diagram c(a) = # of Cycles
d(a) > n + 1 - c(a) Reversal Distance: Guess #2 Try taking a closer look...
Component – set of interleaving cycles (cycles which cross in a reality and desire diagram) This reality and desire diagram has six components. Components
Converging and Diverging • Edges A, C, and E, converge • Edges D and F diverge • Edges B and D diverge • Edges F and B converge
Letr = [ e , f ] and act on RD(a)... If edges e and f belong to different cycles, then c(ar) = c(a) – 1 If edges e and f belong to the same cycle and converge,then c(ar) = c(a) If edges e and f belong to the same cycle and diverge, then c(ar) = c(a) + 1 Converge? Diverge? So what?
Good Components contain at least one Good Cycle. Bad Components contain only Bad Cycles. The Good and the Bad Good Cycles contain at least one pair of diverging edges. Bad Cycles contain only converging edges.
This reality and desire diagram has five badcomponents and only one goodcomponent (bottom). The good component has one goodcycle and one badcycle. Some “Bad” Examples
Hurdle – a bad component that does not separate any other two bad components Hurdles Nonhurdle – a bad component that does separate at least two bad components
In this example... A, F, C, and D are hurdles. E and B are nonhurdles. h(a) = 4 Example Hurdles
In this example... Hurdle F protects nonhurdle E F is a super hurdle A, C, and D are simple hurdles h(a) = 4 Super/Simple Hurdles
Fortress – A permutation whose reality and desire diagram contains an odd number of hurdles and all of them are super hurdles. The Fortress f(a) = 1{a is a fortress}
Smallest Possible fortress: Example Fortress
d(a) = n + 1 – c(a) + h(a) + f(a) Reversal Distance: Guess #3 Finally!!!
The preceding material was taken from Introduction to Computational Molecular Biology by Setubal and Meidanis, based on the following papers: References • V. Bafna and P. A. Pevzner – Genome rearrangements and sorting by reversals. • S. Hannenhalli and P. A. Pevzner – Transforming cabbage into turnip (polynomial algorithm for sorting signed permutations by reversals) (this paper referenced in text of Transforming Men into Mice for definitions of hurdles and fortresses) • J. D. Kececioglu and D. Sankoff – Exact and approximate algorithms for sorting by reversals with application to genome rearrangement
Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes Pavel Pevzner and Glenn Tesler First Paper
This paper presents a new kind of graph which achieves the usefulness of reality and desire diagrams on simple genome comparison graphs. First Paper Overview
The Process GRIMM-Synteny Algorithm Useful features: • Same cycle count as reality desire diagram! • Cycles of more than for edges indicate reused breakpoints!
WholeGenome Results Synteny Blocks: 281 Reversal Distance: 245
Transforming Men into Mice: the Nadeau-Taylor Chromosomal Breakage Model Revisited Pavel Pevzner and Glenn Tesler Second Paper
Are breakpoints random or are some sections of the genome more “fragile” than others? Second Paper Overview
“Since the [random breakage] model was first introduced in [paper cited]..., it has been analyzed by Nadeau and others [more papers cited]... and has become widely accepted” To test, simply plot the lengths of known conserved segments and compare to an exponential distribution... Conventional Wisdom
Too many short segments! Do we have a match?
There is evidence of at least 3,170 micro-rearrangements (reversals) within the synteny blocks (though many may be artifacts of incorrect assemblies) 41 out of 281 synteny blocks do not show any evidence of micro-rearrangements, while 10 synteny blocks are extremely rearranged (40 or more rearrangements within a block) Micro-rearrangement Evidence
Theorem 1: “If all reversals are delimited by pairs of breakpoints, the number of breakpoint re-uses in any parsimonious reversal scenario is 2d - br. This is the lower bound for non-optimal reversal scenarios.” 2 x 245 (Distance) – 300 Breakpoints = = 190 breakpoint reuses 281 Synteny Blocks – 23 Chromosomes + 190 Breakpoint Reuses = 448 Breakages Calculating Breakpoint Reuse
Expected number of “clumps” (pairs of points within a space w, which is a fraction of genome length) is (n – 1)(1 – (1 – w)n), where n is the number of breakages. For w = 0.668Mb/2,983Mb, the number of expected “clumps” is about 43, far less than the 190 number of reused breakpoints! Statistical Evidence