1 / 50

A strategy for DNA sequence analysis of genome rearrangements

Piloting an approach to cloning and sequencing genome rearrangements using follicular lymphoma as a case study. A strategy for DNA sequence analysis of genome rearrangements. Background.

sereno
Download Presentation

A strategy for DNA sequence analysis of genome rearrangements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Piloting an approach to cloning and sequencing genome rearrangements using follicular lymphoma as a case study. A strategy for DNA sequence analysis of genome rearrangements

  2. Background “Low resolution, genome-wide studies are beginning to catalog additional changes, such as small deletions or amplifications, and directed studies of individual genes implicated in tumor biology are steadily increasing our awareness of the variety of mutations underlying cancer. These data make it evident that only a modest fraction of the molecular targets involved in tumorigenesis has been identified and that cancer is a very heterogeneous disease due to many different mutations, environmental factors, and the interaction between the two. To develop targeted interventions, it will be important to identify all or most of these events.” “These technological improvements make obtaining sequence information increasingly rapid and relatively inexpensive. It is now possible to contemplate something that was previously incomprehensible: obtaining comprehensive sequence information from multiple tumor types, at different stages—in a process that would be unbiased by our currently selective knowledge of the biology of cancer—to catalog all the genomic changes associated with cancer, and to render them accessible to study and intervention.” Exploring Cancer through Genomic Sequence Comparisons. A NCI – NHGRI Workshop, April 14-15, 2004.

  3. Charge to the workshop • Which sequencing technologies would be the most appropriate for this application? • Which tumor(s) should receive initial focus? • What other data should be collected on selected tumor types? • How could such an effort be piloted? Sequencing genome rearrangements could provide a perspective on what could be learned from whole genome sequencing of cancers. How can these be found (and sequenced)? What cancers? What stages of cancer?

  4. RFI: Large-scale identification of somatic mutations in cancer

  5. Genomic Instability and Disease • genomic rearrangements play a major role in pathogenesis of human genetic disease • architectural features of the genome are associated with susceptibility to rearrangements • segmental duplications are implicated in facilitating recombination events that lead to rearrangements • rearrangements are not strictly random events but reflect higher order genome architectural features • alteration of gene dosage or creation of novel fusion genes as a result of recombination • red-green colour blindness is caused by frequent deletions/duplications between highly similar red and green opsin loci (Xq28) Stankiewicz, P. and Lupski, J.R. 2002. Molecular-evolutionary mechanisms for genomic disorders. Curr Opin Genet Dev 12: 312-319.

  6. Lymphoma and Leukemia-associated Rearrangements Janz, S. et al. 2003. Genes Chromosomes Cancer 36: 211-223.

  7. Follicular lymphoma as a test case • The incidence of Follicular Lymphoma in Canada is significant and increasing. New approaches for disease control are required, and these will most likely derive from enhanced understanding of disease biology. • There is a strong lymphoma research group at BCCA and UBC. • FL samples of > 90% “purity” can be obtained in quantities sufficient for BAC library construction. • Published evidence supports the correlation of genome rearrangements to transformation from indolent widespread disease to more aggressive DLBCL.

  8. Rearrangement Detection Methods

  9. Bacterial artificial chromosome (BAC) array CGH: 2400 elements Albertson and Pinkel, HMG 2003 12 #2 145 – 152.

  10. Bacterial artificial chromosome (BAC) array CGH: 32,000 elements De Leeuw et al., HMG 2004 13 (17); 1827 - 37

  11. Affymetrix 100 k SNP arrays: Genotyping and Copy number analysis. Affymetrix 100 K Users Manual

  12. Affymetrix 100 k SNP arrays: Genotyping and Copy number analysis. Chr 9: 92,630,957 - 93,121,772 (490 kb) 12 SNPs with average Copy Number (CN)=4.5 Min CN=4 Max CN=5

  13. Oligonucleotide approaches to detection of genome copy number imbalances (Affymetrix, ROMA, Nimblegen)

  14. Array methods can detect genome amp-lifications and deletions. Rearrangements that do not produce a “genome imbalance” are cryptic. • Array methods do not yield reagents (clones) that can be sequenced. • >>Experiment with clone based approaches • End Sequence Profiling (ESP) • Fingerprint Profiling (FPP)

  15. Clone fingerprinting applications • Construction of genome maps • Assessment of clone sequence assemblies • Redeye, with G. Rubin, R. Hoskins and S. Celniker • Selection and validation of tiling sets • BAC array CGH (Human, Mouse….) • Cloning (for sequencing) genome rearrangements

  16. Overview of Genome Mapping Total Maps: 27 Total Organisms: 20 Total Fingerprints:1.8 M Total Bases Mapped: 21 GB Capacity: 40 human genome equivalents pa.

  17. Clone Fingerprinting marker restriction enzyme partial digest or shearing isolate DNA from appropriate gel size fraction size separation on agarose gels genomic DNA (chromosomes) DNA fragments of various sizes ligase fingerprinting • overnight culture • purify BAC DNA • restriction enzyme digest • agarose gel electrophoresis • restriction fragment identification BAC vector map construction transform clones into E. coli • compare BAC fingerprint patterns of each clone • identify clones containing highly related DNA and reconstruct contiguous regions of the genome (FPC) array in 384-well plates (library)

  18. Fingerprint Data Generation and Restriction Fragment Identification Fingerprinting Gel Automated analysis of gel images is performed by BandLeader*, customized Matlab software for accurate identification and sizing of restriction fragments M 29,950 • Bandleader performance was assessed on fingerprints of 322 fully sequenced BAC clones (human and mouse) • 96% of real fragments were detected (sensitivity) • 96% of fragments called were real (specificity) • 96% accuracy in detection of co-migrating fragments, including a cluster of 11 fragments 540 121 lanes (96 samples + 25 marker lanes); HindIII digest; 1.2% agarose; run for 7 hours at 3.5 volts/cm in 1xTAE; gels stained post electrophoresis with SYBR Green I; images collected on MD Fluorimager 595 *D. Fuhrmann et al. - Genome Research, 2003

  19. Redeye identifies BACs with fingerprints inconsistent with their in-silico restriction maps BAC A fingerprints consistent confirm BAC sequence localized inconsistency BAC B sequence restriction fragments are coloured by the relative size distance to the nearest matching fingerprint fragment

  20. Experimental plan

  21. Lymphoma genomic DNA • Choice informed by: • arrays • genomics • literature • cytogenetics • relevance to progression • plan incorporates multiple levels of discovery, validation and comparison compare

  22. Array methods and ESP / fingerprinting can identify rearrangement-bearing BACs for sequencing

  23. TUMOR End Sequence Profiling Volik, S. et al. 2003. End-sequence profiling: sequence-based analysis of aberrant genomes. Proc Natl Acad Sci 100: 7696-7701.

  24. Alignment of Fingerprints to “Electronic Digests” of the reference Human Genome Sequence CATGCACATTCCTGCTGTCATCCCAATCATGGTGGCCAACTTTGGAGTCT CCTTGGAGAGCCTGTGACGGCCTCAGTGGCTGTGCACCAGGCCCACCGAT GCTCAGGGGTGTAGGCTGCTTCCGGGTCTCATCTCAGATCCCCGCCAGTT CTGGCTGGCGCTGTGTCACCTCTTCTCTGTGTCAGGATCATTTTTATCCC TCTCTGTCTGTCTTTCTGTCTTTCCCTGTGCCCTCCTTTCTTCCCCGGAC CAGCTATTTCAGATTCCATTCAACTCTGTTCAGTGATGCTGCCGCTCTCA ATGCGGTTAGAGCGCAAGATGTGAGAACGTCTGTGCTGAGTGGCCTAAAC ACTGAAGGCTGCGGGTCTTTCTAATTTCAGCATTGAGACTTTACAAGTCC ACATTCTTGGCATTGCCAACCAGTTAGAATAGAACAATAAATCCCAGTTT TTGTCATGGGCGTCTGTAATTAAAATGGCAACTGGAACAAGGCAGTCACT genome sequence assembly TUMOR NORMAL TUMOR align fingerprint Needleman-Wunsch global alignment experimental fingerprint electronic fingerprint

  25. Potentially confounding issues • Repeated sequences • Sensitivity and specificity of mapping BAC fingerprints to genome • Multiple enzyme digests optimal • Clone artifacts • Redundant representation of regions in independently derived clones

  26. Repeats, Sensitivity and Specificity Approximately 47% of genome sequence is composed of repetitive DNA sequences. • 3% of repeat content is in blocks > 7.4 kb (2X the average size of a fingerprint fragment) • 27% of repeat content is in blocks >1,800 bp (2X the average size of an end sequence read) • ESP is substantially affected by repeats: • 27% of end reads will have ambiguous alignments to the genome sequence • 47% of BACs will have one or neither of their end sequences unambiguously aligned, and thus will not be useful. Is FPP affected by repeat content? • The performance of FPP alignments was assessed using a set of 43,000 simulated 130 kb BAC clone fingerprints derived from the sequence assembly • FPP alignment sensitivity • 99.8% of all BACs had associated alignments in the correct location • FPP alignment specificity • 78% of all alignments do not extend past the actual edges of the BAC • 87% of all alignments do not extend by more than 1kb (e.g. 500bp on both ends) • 99% of all alignments do not extend by more than 10kb (e.g. 5kb on both ends) FPP: less sensitive to repeats (reduced attrition due to repeats) and samples more of clone insert

  27. Double digest fingerprints yield specific patterns • small regions of the genome produce unique patterns • larger domains yield more specific patterns EcoRI/NcoI Number of genome fragments matching fragment F

  28. 5-fold fingerprinting provides coverage by at least 2 BACs for 93% of the genome simulated 50X MboI library 500 iterations simulation shows 96% coverage at 2X+ • Clone redundancy captures rearrangements in more than one BAC • internal validation

  29. Redundancy and sensitivity 100 kb 100 kb pos B pos A • At ~ 5X, 94% of all breakpoints >20 kb from edge of nearest BAC 50 kb 150 kb pos B pos A 25 kb 175 kb pos B pos A what is the smallest mappable fragment? 10 kb 190 kb pos B pos A

  30. FPP Alignment Method (v 0.1)

  31. MCF7 Proof of concept • 607 clones from MCF7 breast cancer cell line fingerprinted with 5 enzymes • C. Collins and S. Volik (UCSF) • all clones subjected to ESP analysis • HindIII, EcoRI, BglII, NcoI, PvuII • enzymes selected to optimize sampling resolution, coverage by sizeable fragments, band spacing, robustness in laboratory • fingerprints were compared to the reference human genome sequence to map the clones onto the genome and identify BACs containing putative genome rearrangements. • 206/607 BACs were identified as containing candidate rearrangements • 245/607 BACs identified by ESP • 148/206 were also identified by ESP as containing rearrangements. • complex rearrangements not detected by ESP were found

  32. 20 53200000 T |200000....|205896....|215551....|223167....|233993....|240978....|244492....|250169....|259400....|262360....|268239....|278 20 53200000 n ...,,,,,,,,,,.....,,,,,,,,xxxxsS,,,,,,,,,xxxxxxXXxxXxxxxsxxXXXXXXXXXXXXXXXX...XXXXXXXXXXXXXSssxxxxxxxxxxxxxxxxXXXxxxxxxXXXXXX 20 53200000 e SSXXXXXXXXXX........,,,,,,,......sss....ssxxxxxxxxxxxxxxxxxxxxxXXXXxxxxs.........................XXXXXXxxxxxxXXxxxxXXXxxxXXXX 20 53200000 p xxxxsxxxxxxxxx,,S,,,,..sxxxxX.....sxxxxxxxxxxsxxxxXXXXxxxxxxXXXXxsxxxxxxxxXXXXXSSssxxxxXxxxxxxxxxxsSXXxxxXXS,,,,,,..XXXXXXXXX 20 53200000 h S,,,,,,,SX.....,,,,,,,...............XXXXXX.....XXXXXxxxxxXXXXXXXXXXXxxxxXXXSXXXXXXXSsxxxxXXXxxxXXXXXxxxxxxxxxxxXXXXXxxxxxxXX 20 53200000 b ......sxxxx,,,,,,,,xxxxxx,,,,,,,,,,,,,SXXXXXxxxxxxxxxxxXXXXXXS,,,,,,xxXXXXXXXXXXSSXXXXXXXxxxxxxXXXXXXXXXxxXXXXXXXXXXXXXXSSssx 20 53200000 P ........................................****************************************.******************************************** ..|278694....|285532....|290615....|296348....|305699....|309888....|315411....|327575....|333647....|336548....|346115. XXXXXXXXXXxxxXXXXXXXXXXxxxxXXXXXXXXXXXXxxxxxxsss...sSXXXXXxxxxXXXXXXXXX....,,,,,,,.......XXXXXXXXXXXXX.................. xxXXXXxxxxxxxxxxXXXxxxxxxxxxsxxxxxxXXXxxxxxxxxxxx,,,,,,,,xxsxxx,,,,,,,,,,,..,,SsS,,,,,,,...ss....,,,,,,,,,,........sS,,S XXXXXXXSXXXXXXSssxxxXXxxx,xxxxxxXXxxxxxxxxxxXXxxxx,,,,SSXXXXXXXXSSSXXxxxx,,,,......sSS,,,,.............XX...s.,..,....,, xxxxXXXXXXXSXXXXXXXXXXXXxxxxxxxxxXXXSXXXxxXXXXXxxxxxxxxxxxxxxXXXXSXXxxxx,,,,,,,,,,,,,.........,,,,Ss......,,,,,,,,,,,,,, XSSssxxxxXXXXXXXXXxxxXXXXXXXXXSXXXXXXXXXXxxXXXXXXXXXXXXxxxxxxxxxxxxxxx,,,,,,,,,,,,,,,,,.....,,,S,,,,,...,,,,,,,,........ ***********************************************************************................................................. Complex (potential) rearrangements M0012O05 localized to chrs 1p13.3 17q23.2 20q13.2 • alignments using BglII, EcoRI, HindIII, NcoI, PvuII 1 107.21 T |210000....|219146....|225959....|237503....|246728....|252248....|260278....|270933....|281502....|288004.... 1 107.21 n ............,,,,,,,,,,,,S,,,,,,,,,,xxxxxxxXXXXXXXxxxxxXXxxXXXXXxxxXXXXXxxxxxxxxxxxxxxXxxxxxXX...,,,,,,,,,,,,,, 1 107.21 e ......,,,,,.........,,,,,,,.....,,,,,,,xxxxxxxxxxxxxxxxxxxxxXxxxxxxXXXXXSXXXSXXXXXXXxxxxx,,,,,,,,,,,,,,,,,,,,, 1 107.21 p .,,...........s...,,,,,..............XSSXXXXXXxxxxxxxssxxXXxxxxxXxxxxXXXXXXXXXXXXXXXXXXXXXXX..,,,,.........,,, 1 107.21 h ....sSS,,,,,,...,,,,,SXXXX..,S,,,,xxxxxxxxxXXXXX..sxxxxxxxxxxxxxxxxxxxxxxxx,,,xxXxxxxxxx,,.......ssSsS,,,,SS,, 1 107.21 b ..,,,,,,S,.......,,............,,...XXXXXxxxXSSXXXXXxxxxxxxxxxXXXXXXxxXXXXxxxxx,,,xXXXX........,,,,,,,,Ss...., 1 107.21 P ..................................********************************************************.................... 17 59.28 T |280000....|292135....|303038....|318237....|328080....|335700....|348112 17 59.28 n .......,,,,,,,,,,,xxXXX....XXXXXxxxxXXXxx,,,,,...ss.......,,,,S,,,,,, 17 59.28 e ..s.sS,,,,....,.,,,,,,,,,,xxxxxxxxxxxxxxxxxxxxx,,,,,,...,,,......,... 17 59.28 p .........,,.............XXXXXXXxxsxxxX......,,,,,,.....,,............ 17 59.28 h .............,,,,xxxxXXXXXXXxxXXXXXxxxxxxx,,,...,,,,........,,,,,,,,S 17 59.28 b .,,,,,,,....,,,,,,,xxxXXXxxxxXXXXXXXXXXX...,,,,,,,,,,,.......,,,...,, 17 59.28 P ..................*************************..........................

  33. Visual comparison of results obtained from fingerprint and ESP methods. Both methods are capable of detecting rearrangements not found by the other method. Not all MCF-7 clones harboured rearrangements. Clones in pilot project were enriched for those with specific rearrangements on chrs 1, 3, 17 and 20. Rearrangement profile of MCF-7 genome is known to be extremely complex. Davidson, J.M. et al. 2000. Molecular cytogenetic analysis of breast cancer cell lines. Br J Cancer 83: 1309-1317.

  34. Progress: FPP analysis of FL • Library #1:patient with cytogenetic profile showing only t(14;18); est average insert size 135 kb • Library #2: patient showing complex cytogenetic profile in addition to t(14;18); est average insert size 130 kb • 90% of clones are in the range 75-225 kb • Empty wells < 1% • Failures:3.6 % vs. 7% (All)

  35. FPP alignments currently cover 77% of the genome average

  36. Redundant coverage Coverage bp >=1 2,216,990,483 >=2 1,344,663,365 >=3 654,423,007 >=4 264,307,507 >=5 90,817,363 >=6 27,173,488

  37. candidate FPP rearrangements intra-chromosomal n=189 inter-chromosomal n=967

  38. Clone size distribution: AEX HT0001

  39. Clones associated with candidate rearrangements tend to be long

  40. Proximity of BAC alignments to segmental duplications

  41. Shorter clones are more likely to be associated with a segmenatl duplication

  42. Redundancy of sampling resolves inconsistent alignments due to chimeric clones • T0049G09 aligns to chrs 5 and 14 • alignment on chr14 is embedded within a contig formed by clones with single alignments • breakpoint suggested by T0049G09 is inconsistent with neighbouring alignments

  43. BCL-2 Translocation seen in FL t(14;18)(q32.33;q21.33) IGH

  44. FPP alignments of clone T0099C19 chr 18 cloneregion T0099C19 18 58898198-58941792 size 43595 score 113.6433 18 58880000 T |880000....|906215....|921342....|932071....|951463. 18 58880000 e X,..,S,XXXxXXXXxxXXXxx..xxSsSxxSxXXXsX,,,,.,,.....,. EcoRI 18 58880000 n ...,,,,,XxxxXxXXxxS,,..xxXXXXXxxxxSxxxxSx...,,.xS,,, NcoI 18 58880000 F 1000000147************8****************9*61000010000 18 58880000 F 1000038***************8************85410100000010000 18 58880000 P .......**********************************........... chr 14 cloneregion T0099C19 14 105464595-105493586 size 28992 score 96.0589 14 105454680-105464594 size 9913 14 105440000 T |440000....|451148....|467323....|476363....|491061 14 105440000 e ....,,,,,,,,,,,,,,,SxxxxxxSxxXXsSxxXXsSxxxxxx..,,. EcoRI 14 105440000 n .sS,,.s.,.,.xSxXx.xxxSsX,XXXxxXXXXxxXXXXxSsSxx..,, NcoI 14 105440000 F 00000000000010134013679***********************8400 14 105440000 F 00000000000497**97***********************876410000 14 105440000 P ............**********************************....

  45. chr 18 BES @ 58,892,771 5.4 kb away 3’ BCL-2

  46. chr 14 BES @ 105,494,703 1.1 kb away IgH

  47. Double digests: >0.43 kb >1.00 kb sensitivity: 91.1% / 98.1% (c.f. 96.5% single) specificity: 88.2% / 93.8% (c.f. 96.2% single) 12 kb 0.43 kb EcoRI+NcoI double digest, 0.9% Trevi gel, 5.3 V/cm, 4 hour

  48. Conclusions • A BAC fingerprinting – based approach can identify rearranged clones (MCF7 and lymphoma). • BAC libraries can be made from primary tumor material (2 lymphoma libraries). • Such libraries can be subjected to high throughput BAC fingerprinting. • The fingerprints can be scanned for BACs bearing candidate genome rearrangements. • FISH has confirmed ~50% of candidate rearrangements. • Redundancy is an asset!

  49. Acknowledgments BC Cancer Agency • Joseph Connors • Randy Gascoyne • Doug Horsman BCCA Genome Sciences Centre Genome Mapping Group • Martin Krzywinski • Jacquie Schein DNA Sequencing Group • Rob Holt • George Yang UCSF Comprehensive Cancer Centre • Colin Collins • Stas Volik • Joe Gray BC Cancer Foundation Michael Smith Foundation for Health Research National Human Genome Research Institute (USA) Genome Canada / Genome BC

More Related