440 likes | 454 Views
Explore the genomics of cancer, including rearrangements and fusion genes, through paired-end sequencing and comparative genomic hybridization.
E N D
CSCI2950-CLecture 9Cancer Genomics October 16, 2008 http://cs.brown.edu/courses/csci2950-c/
Outline • Cancer Genomes • Paired-end Sequencing • Rearrangements • Comparative Genomic Hybridization
Single nucleotide change Cell Division and Mutation Copy number Structural
Rearrangements in Cancer 1) Change gene structure, create novel fusion genes Gleevec targets ABL-BCR fusion 2) Alter gene regulation Burkitt’s lymphoma IMAGE CREDIT: Gregory Schuler, NCBI, NIH, Bethesda, MD
Cancer Genomes Fusion gene in >50% prostate cancer patients (Tomlins et al. Science 2005)
Shotgun Sequencing genomic segment cut many times at random (shotgun) Get one or two reads from each segment ~500 bp ~500 bp
Sequencing of Cancer Genomes What to sequence from each tumor? • Whole genome: all alterations • Specific genes: point mutations • Hybrid approach: structural rearrangements etc.
End Sequence Profiling (ESP)C. Collins and S. Volik (2003) • Pieces of cancer genome: clones (100-250kb). Cancer DNA Sequence ends of clones (500bp). Map end sequences to human genome. x y Human DNA Each clone corresponds to pair of end sequences (ES pair)(x,y). Retain clones that correspond to a unique ES pair.
ValidES pairs • Lmin ≤ y – x ≤ Lmax, min (max) size of clone. • Convergent orientation. End Sequence Profiling (ESP)C. Collins and S. Volik (2003) • Pieces of cancer genome: clones (100-250kb). Cancer DNA Sequence ends of clones (500bp). L Map end sequences to human genome. x y Human DNA
End Sequence Profiling (ESP)C. Collins and S. Volik (2003) • Pieces of cancer genome: clones (100-250kb). Cancer DNA Sequence ends of clones (500bp). L Map end sequences to human genome. x y a b Human DNA • InvalidES pairs • Putative rearrangement in cancer • ES directions toward breakpoints (a,b): • Lmin ≤ |x-a| + |y-b| ≤ Lmax x y a b
ESP of Normal Cell All ES pairs valid. x y Human DNA • Lmin ≤ y – x ≤ Lmax • 2D Representation • Each point (x,y) is ES pair. Genome Coordinate Genome Coordinate
ESP of Tumor Cell • Valid ES pairs • satisfy length/direction • constraints • Lmin ≤ y – x ≤ Lmax • Invalid ES pairs • indicate rearrangements • experimental errors
y x Clusters and Coverage Cancer DNA • Pieces of tumor genome: clones (100-250kb). Rearrangement Chimeric clone Sequence ends of clones (500bp). Cluster invalid pairs Isolated invalid pair Map end sequences to human genome. Human DNA
x1 x2 a y2 y1 b Clusters Clone size: (a – x1) + (b – y1) Lmin Lmax Genome coordinate Lmax Lmin (a,b) (a,b) (x1,y1) (x2,y2) Genome coordinate
Fusion Genes Gene 1 Gene 2 Human x y a b Tumor
Gene1 Gene2 Fusion Genes Gene 1 Gene 2 x y a b Lmax Lmin (a,b) (x1,y1) (x2,y2) Intersection → probability of fusion gene Respect direction of transcription Bashir, et al. (2008) PLOS Comp Biol.
Results: Fusion Gene in Breast CancerBCAS3-BCAS4 Probability of Fusion = 1 Note: More precise sizing information available for some clones Bashir, et al. (2008) In Press.
ESP Data Coverage of human genome: ≈ 0.34 for MCF7, BT474 Breast Cancer Cell Lines Tumors Raphael, et al. (2008)
3 9 97kb Sequenced Clone PTPRG ASTN2 Candidate Fusion Genes Gene 1 Gene 2 x y a b Confirmed by clone sequencing
Breakpoint Detection Detect a rearrangement breakpoint when clone includes breakpoint. Cancer Genome breakpoint ζ Normal Genome xC yC
Lander-Waterman Statistics Given: N clones of length L from a genome of size G P(ζ covered by clone) = 1 – (1 – L/G)N ≈1 – e-c, where c = N L / G is coverage P(breakpoint ζ detected) ≈1 – e-c
Cancer Genome Organization • What are detailed organization of cancer genomes? • What sequence of rearrangements produce these architectures?
x1 x2 x3 x4 y1 y2 x5 y5 y4 y3 ESP Genome Reconstruction Problem Human genome (known) A C E B D Unknown sequence of rearrangements Tumor genome (unknown) Map ES pairs to human genome. Reconstruct tumor genome Location of ES pairs in human genome. (known)
A -C E -D B x1 x2 x3 x4 y1 y2 x5 y5 y4 y3 ESP Genome Reconstruction Problem Human genome (known) A C E B D Unknown sequence of rearrangements Tumor genome (unknown) Map ES pairs to human genome. Reconstruct tumor genome Location of ES pairs in human genome. (known)
ESP Plot E (x3,y3) (x4,y4) D (x2,y2) • 2D Representation of ESP Data • Each point is ES pair. • Can we reconstruct the tumor genome from the positions of the ES pairs? Human (x1,y1) C B A A B C D E Human
A -C E -D B ESP Plot → Tumor Genome E E D -D Human C -C B B A A A B C D E Human Reconstructed Tumor Genome
Real data noisy and incomplete! • Valid ES pairs • satisfy length/direction • constraints • Lmin ≤ y – x ≤ Lmax • Invalid ES pairs • indicate rearrangements • experimental errors
Human Tumor inversion A B C A -B C t s t s translocation A B C D -C -B D A t s t s Computational Approach • Use known genome rearrangement mechanisms • Find simplest explanation for ESP data, given these mechanisms. • Motivation: Genome rearrangements studies in evolution/phylogeny.
ESP Sorting Problem • G = [0,M], unichromosomal genome. • Inversion (Reversal) s,t s,t(x) = G C A B x1 y1 x2 y2 x, if x < s or x > t, t – (x – s), otherwise. t s C A -B G’ =G x1 y1 x2 y2 t s • Given: ES pairs (x1, y1), …, (xn, yn) • Find: • Minimum number of reversals s1,t1, …, sn, tn such that if = s1,t1… sn, tn, • then (x1, y1), …, (xn, yn) are valid ES pairs.
tumor human x1 x2 y1 y2 x3 y3 x1 x2 x3 y2 y1 y3 Sparse Data Assumptions • Each cluster results from single inversion or translocation. 2. Each clone contains at most one breakpoint. tumor
ESP Genome Reconstruction: Discrete Approximation Human • Remove isolated invalid pairs (x,y) Human
ESP Genome Reconstruction: Discrete Approximation Human • Remove isolated invalid pairs (x,y) • Define segments from clusters Human
ESP Genome Reconstruction: Discrete Approximation Human • Remove isolated invalid pairs (x,y) • Define segments from clusters • ES Orientations define links between segment ends Human
ESP Genome Reconstruction: Discrete Approximation (x2, y2) (x3, y3) t (x1, y1) s Human • Remove isolated invalid pairs (x,y) • Define segments from clusters • ES Orientations define links between segment ends Human
5 5 4 4 3 3 Human Genome (1 2 3 4 5) Tumor Genome (1 -3 -4 2 5 ) 2 2 Minimal sequence* of translocations and inversions 1 1 1 2 3 4 5 ESP Graph • Edges: • Human genome • segments • ES pairs Paths in graph are tumor genome architectures. *Hannenhalli-Pevzner theory
Sorting Permutations by Reversals = 12…n signed permutation (Sankoff et al.1990) Reversal (i,j) [inversion] 1…i-1 -j... -ij+1…n Problem: Given , find a sequence of reversals 1, …, t with such that: . 1 . 2 … t = (1, 2, …, n) andt is minimal. Solution: Analysis of breakpoint graph ← ESP graph • Polynomial time algorithms • O(n4) : Hannenhalli and Pevzner, 1995. O(n2) : Kaplan, Shamir, Tarjan, 1997. • O(n) [distance t] : Bader, Moret, and Yan, 2001.O(n3) : Bergeron, 2001.
1 -3 -2 4 5 Sorting Permutations 1 -3 -4 2 5 1 2 3 4 5
Breakpoint Graph Black edges: adjacent elements of 1 -3 -4 2 5 end start Gray edges: adjacent elements of i = 1 2 3 4 5 1 2 3 4 5 start end Key parameter: Black-gray cycles
1 -3 -2 4 5 end start Breakpoint Graph Black edges: adjacent elements of 1 -3 -4 2 5 end start Gray edges: adjacent elements of i = 1 2 3 4 5 1 2 3 4 5 start end Key parameter: Black-gray cycles ESP Graph → Tumor Permutation and Breakpoint Graph Theorem: Minimum number of reversals to transform to identity permutation i is: d() ≥ n+1 - c() where c() = number of gray-black cycles.
1 -3 -2 4 5 end start Breakpoint Graph Black edges: adjacent elements of 1 -3 -4 2 5 end start Gray edges: adjacent elements of i = 1 2 3 4 5 1 2 3 4 5 start end ESP Graph → Tumor Permutation and Breakpoint Graph Theorem: Minimum number of reversals to transform to identity permutation i is: d() = n+1 - c() + h() + f() where c() = number of gray-black cycles.
-B1 -A2 Multichromosomal Sorting • Concatenate chromosomes • Translocations modeled by reversals in concatenate • Minimal sequence in polynomial time (Hannenhalli & Pevzner 1996, Tesler 2003, Ozery-Flato and Shamir, 2003.) A1 A2 A1 B2 translocation B1 B2 B1 A2 concatenation concatenation reversal A1 A2 -B2 -B1 A1 B2
MCF7 Breast Cancer Cell Line Sequence Human chromosomes MCF7 chromosomes 5 inversions 15 translocations Raphael, et al. (2003) Bioinformatics
What about duplications? • 11240 ES pairs • 10453 valid (black) • 737 invalid • 489 isolated (red) • 248 form 70 clusters (blue) 33/70 clusters Total length: 31Mb