1 / 53

Genomic Duplications, Structural Variation and Disease

Genomic Duplications, Structural Variation and Disease. Evan Eichler Howard Hughes Medical Institute University of Washington. April 3 rd ,2006, Frontiers in Genomics. Genomic Variation. Mutational mechanisms underlying genetic variation?. Sequence.

bryant
Download Presentation

Genomic Duplications, Structural Variation and Disease

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomic Duplications, Structural Variation and Disease Evan Eichler Howard Hughes Medical Institute University of Washington April 3rd,2006, Frontiers in Genomics

  2. Genomic Variation Mutational mechanisms underlying genetic variation? Sequence • Single base-pair changes – point mutations • Small insertions/deletions– frameshift, microsatellite, minisatellite • Mobile elements—retroelement insertions (300bp -10 kb in size) • Large-scale genomic variation (>10 kb) • Large-scale Deletions • Segmental Duplications • Chromosomal variation—translocations, inversions, fusions. Cytogenetics

  3. Global Analysis of Segmental Duplications Intrachromosomal Interchromosomal Question:What is the organization, mechanism and impact of recent human segmental duplications? >90% and > 1kb in length Segmental Duplications Approaches: • Computational a) Whole genome assembly comparison b) Whole genome shotgun sequence detection strategies • Experimental Comparative sequence analysis, array comparative genomic hybridization, comparative FISH

  4. 1 2 3 4 5 6 7 8 200 Mb 250 Mb 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X 10 Mb Y 50 Mb 150 Mb 100 Mb 2p22 2p11 (700 kb) 10q26 11p15 11q14 11q14 7q36 22q12 21q21 4p16.1 4p16.1 12q24 Xq28 4p16.3 12p11 4q24 7q36 Alpha Satellite Recent Duplication Architecture of the Human Genome • Total: 5.26% (150.8 Mb) • Inter: 2.36% (67.6 Mb) • Intra: 3.87% (111.1 Mb) • Non-random distribution • 5.3 fold bias to pericentromere • 389 regions > 100 kb nexi “Heterochromatic” regions Duplications (build34, >90%, >1kb)

  5. Human Genome Segmental Duplication Pattern chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 • ~4% duplication • >20 kb, >95% • ~4 average # duplicates • 59.5% pairwise (> 1 Mb) chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX She, X et al., (2004), Nature chrY http://humanparalogy.gs.washington.edu

  6. Mouse Segmental Duplication Pattern • 1-2% duplication • >20 kb, >95% • 2-3 average # duplicates • July 2004, mmu5 She, X in press

  7. Percent Similarity of Human Segmental Duplications 25My 12 My 5 My 49 Mb 90 91 92 93 94 95 96 97 98 99 100 90.5 91.5 92.5 93.5 94.5 95.5 96.5 97.5 99.5 98.5 12000 10000 8000 6000 4000 Sum of Aligned Bases (kb) 2000 0 20000 15000 10000 5000 Interchromosomal 0 Intrachromosomal Whole-Genome Analysis (2,865 Mb) Build 34, July 2003, 25.8 K alignments Percent Identity (%)

  8. Summary: Segmental Duplication Asymmetry Polymorphism 15-20% 21.7 Mb+ new 7.2 Mb+ shared 24.8 Mb+ new 6.6 Mb+ shared Human 16.0 Mb+ shared Chimp hyperexpansion Chimpanzee • 76.3 Mb of Differentially Duplicated Euchromatic Material

  9. Hyperexpansion of a Chimpanzee Segmental Duplication. 4>>>>>400 copies Cheng, Z et al., (2005), Nature

  10. Human Segmental Duplications Properties • Large (>10 kb) • Recent (>95% identity) • Interspersed (60% are separated by more than 1 Mb) • Modular (duplicon architecture) ~389 acceptor regions • 2.7% Genetic Difference, human vs. chimpanzee What impact in terms of human variation?

  11. Models of Disease • Rare Duplication-mediated Structural Variation • Rare Duplication-Mediated Structural Variation • Common Fine-Scale Structural Variation

  12. A B C A B C TEL TEL Genomic Disorders A B C TEL A B C TEL Aberrant Recombination GAMETES Human Disease Triplosensitive, Haploinsufficient and Imprinted Genes • Hypothesis: Mechanism underlying Uncharacterized Mental Retardation?

  13. Duplication-Mediated Disease

  14. Duplication Map of Human Genome • 130 candidate regions (298 Mb) • 23 associated with genetic disease • Target patients array CGH Bailey et al. (2002), Science:293:1003-1007

  15. Normal Human DNA Sample Cy3 Channel Array of Human BAC Clones Hybridization Cy5 Channel Disease individual DNA Sample Merge Array Comparative Genomic Hybridization 12 mm • High-throughput detection of large-scale variation (>50 kb), • LCV or CNP= Deletions and Duplications (Iafrate et al., 2004; • Sebat et al., 2004).

  16. Duplication Microarrary: Experimental Design BACs TEL dist: >50 kb<5 Mb prop: 95% identity, 10 kb • 130 regions of the human genome • 2178 BACs or on average ~10-12 BACs per region • Perform ArrayCGH—reciprocal dye swap experiments • Strategy: Identify normal variation and then search for variation only observed in disease patients

  17. Hybridization 2 R921 1.5 1 0.5 0 -0.5 -1 -1.5 D3767 1.5 1-3 5 10 15 20 4-5 1 6 0.5 Log2 Hybridization Relative Intensity 7-14 0 15 -0.5 16-20 -1 -1.5 1.5 0 5 10 15 20 R1080 1 0.5 0 -0.5 -1 -1.5 -2 0 5 10 15 20 BAC Probes

  18. Study Populations • Normal unaffected (diversity panel and HapMap Samples). Target= 800 samples, Completed: 75 + 269 samples=344 total—Identified additional 257 CNPs. • Idiopathic Mental Retardation: • Target =900 samples; (400 samples Flint, 500 CWRU samples); 291 complete

  19. Normal Large-Scale Genomic Structural Variation • Based on our analysis of ~568 chromosomes (~40/130 hotspots show no variation)—NAHR resistant or selection?

  20. Validation using Nimblegen Arrays Deletion Duplication Locke et al., unpublished

  21. Deletion Variants Appear Less Common

  22. Study Populations • Normal unaffected (diversity panel and HapMap Samples). Target= 800 samples, Completed: 75 + 269 samples=344 total—Identified additional 257 CNPs. • Idiopathic Mental Retardation: • Target =900 samples; (400 samples Flint, 500 CWRU samples); 291 complete

  23. VCF Deletion detected in IMR26 ~3.0 Mb deletion observed in IMR26 (=common VCF 22q11 deletion)

  24. Novel LCV/CNP Detected in IMR43 Novel LCV/CNP Detected in IMR43 CNP detected by Seg Dup array and Iafrate et al. Novel ~2.5Mb deletion only observed in IMR CNPs detected by Seg Dup array in HapMap samples Sharp et al., unpublished

  25. Novel 2.5Mb Chr1 deletion in IMR43

  26. Variation in IMR • 291 IMR samples (Oxford Cohort) screened to date • 23 (n=31 patients) novel sites of variation defined by >2 BACs • 5 are seen in more • than one unrelated patient • 7/9 events are de novo • New Genomic Disorder Candidates

  27. Problems: • Array CGH has a lower limit to detect deletions (~30 kb) • Oligo-based approaches effectively sample a small • fraction of the genome and extrapolate size indirectly • Precise location of the rearrangement is unknown. 2. Neither can identify subtle (5-30 kb) variation 3. Neither approach can detect inversions. 4. Location and structure of the change unknown

  28. Models of Disease • Rare Duplication-mediated Structural Variation • Common Fine-Scale Structural Variation

  29. Intermediate-Size Structural Variation (ISV) and Inversions Gene Type Freq. Locus Size Phenotype Dup GSTT1 Deletion 20% -/- 22q11.2 54.3 kb halothane/epoxide sensitivity 17kb/94% DEF3A-OR Inversion 26% -/+ 8p23 5 Mb heart defect susceptibility 400kb/98.9% EMD/FLN Inversion 33% -/+ Xq28 219 kb none 48kb/99% IGVH26 Deletion/Dup 4-15% +/- 14q32.3 Variable immune response 91-97% toxin resistance, cancer susceptibility GSTM1 Deletion 50% -/- 1p13.3 18 kb 24kb/95.6% CYP2D6 1-29% +++ Duplication 22q13.1 5 kb antidepressant resistance 5.4kb/91-97% CYP21A2 Duplication 1.6% +/- 6p21.3 35 kb Congenital drenal hyperplasia 0 CYP2A6 Duplication 1.3% +/- 19q13.2 7 kb nicotine metabolism 24kb/96.2% SMN2 Duplication 50% +++/- 5q13 >100 kb SMA susceptibility 88.7/99.8% Adapted from Buckland, Ann Med

  30. Inversions Deletion Insertion Concordant Fosmid > < > < > < < < Build35 Comparing Human Genomes by Paired-End Sequence • ~1.1 million fosmid paired-ends were sequenced by MIT to • facilitate gap closure during final phases of HGP • Derived from a single female donor PDR cell line • Fosmid insert size tightly distributed around mean (40 +/- 2.6 kb), • low copy=stability; capillary sequencing=low mispairing rate • Approach: optimal placement of fosmid ends against human • genome could theoretically detect rearrangements: Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8 X genome coverage)

  31. Genome-wide Detection of Structural Variation (>8kb) b) a) Insertion >48 kb Putative Deletion < 32 kb Putative Insertion Deletion Inversion c) discordant by orientation (yellow/gold) discordant size (red) duplication track Structural polymorphisms?

  32. Validated Structural Polymorphisms • GSTM1 ~ 20 kb deletion • minspread 28 kb (9 fosmids) • 50% of Caucasians/Saudis are -/- for 18 kb gene (predisposition to cancer) • +++ ultrarapid GSTM1 activity GSTM1 • CYP2D6 ~ 5-10 kb insertion • Minspread 17 kb (7 fosmids) • Alternate haplotype support • 1-29% Caucasians/Japanese have • multiple copies (entire gene ~5 kb) • Associated with resistance to • antipsychotic tricyclic antidepressants CYP2D6

  33. Summary: 6/16 of common polymorphisms detected Tuzun et al. (2005) Nat. Genet

  34. ……Sequence the Structural Variation

  35. Putative Insertion (8,384 bp) build34 fosmid

  36. Putative Deletion (14,055 bp) build34 fosmid

  37. a) b) SIGLEC5A MEGF11 b35 b35 fosmid fosmid KCNJ16 c) d) LSP1 TNNT3 KCNJ2 b35 b35 fosmid fosmid e) f) b35 b35 fosmid GSST2 DDT GSST2 fosmid Sequencing Genic Structural Variation

  38. Gene Families and Structural Variants Drug detoxification: glutathione-S-transferase, cytochromeP450, carboxylesterases Immune response and inflammation: leukocyte immunoglobulin-like receptor, defensin, phorbolin Surface integrity genes: mucin, enamelin, late epidermal cornified envelope genes, galectin Surface antigens: melanoma antigen gene family, rhesus antigen Environmental Interaction Genes.

  39. Fine-Scale Structural Variation Map: (build35 vs. Fosmids) • 1.3% Discordant Fosmids • Identify 295 clusters (2 or more) • 246 supported by second haplotype • 147 inserts, 93 deletions, 57 inverts • 18 putative L1 events—10 deletions • and 8 insertions (6 kb insertion) • 89 locate within gene regions. • 138 unique regions of the genome • 159 duplicated regions of the genome Insertion(Fosmid) Deletion Inversions “Heterochromatic” regions “Duplicated” regions

  40. PCR Breakpoint Genotyping Assays for Structural Variation • Tested 11 structural variants (5 insertions, 4 deletions, and 2 inversions) • 7 successful assays (6 >20% minor allele frequency)

  41. Illumina Golden-Gate Genotyping Assays for Structural Variation

  42. Human Genome Structural Variation Project • 2 scientific meetings (2005) • 2 working groups (AHG, MSWG (12/05) • Coordinating Committee (1/06) • NIH Council (2/06) • Press Release (3/15/06) • Goal: Complete Characterization • of Structural Variation in • 48 HapMap Samples Japanese and Chinese Yoruba CEPH

  43. Detected Variants from Two Individuals.

  44. Complementary Approaches • 1503 variants, 115 Mb, 800 genes structurally variant Eichler (2006) Nat. Genet

  45. Summary • Humans relatively unique in size, proportion and • architecture of interspersed segmental duplications • Large-Scale Variation • Normals: Identified 257 CNPs using a targeted • microarray to duplicated regions • IMR: Identified 23 sites (>2 BACs) unique to patients • (n=291 probands) (5 are recurrent and 7 are confirmed de novo) • Novel Genomic Disorders • Fine-Scale Variation: Developed an approach to map and • sequence common fine-scale variation within the human • Population, estimate ~200-300 differences > 8 kb between 2 • individuals.

  46. Models of Human “Genetic” Disease 1) Simple Mendelian --one gene-one disease, familial, highly penetrant, small fraction of pop. Eg. cystic fibrosis 2) Chromosome Disease –large chromosomal regions, non-familial, sporadic, relatively high frequency Eg. Turner Syndrome 3) Genomic Disease –familial and/or recurrent, deletion or duplication of large # of genes, dosage effects. Eg. Prader-Willi Syndrome. 4)Complex Traits--multiple genes plus environment, familial, variably penetrant, large fraction of population, susceptibility genes eg. hypertension.

  47. Acknowledgements Eichler Lab Eray Tuzun Andy Sharp Devin Locke Matthew Johnson Zhaoshi Jiang Jon Bleyhl Sean McGrath Tera Newman Jeff Bailey Anne Morrison Lisa Pertz Ze Cheng Xinwei She James Sprague UWGSC Maynard Olson Rajinder Kaul Hillary Hayden Eric Haugen UCSF Dan Pinkel Donna Albertson CWRU/UChicago Stuart Schwartz Laurie Christ Agencourt Doug Smith Oxford Jonathan Flint Samantha Knight NHGRI Jim Mullikin UW Debbie Nickerson Mark Rieder Chris Carlson Josh Smith

  48. ……Finding Novel Human Sequence

  49. Sequence of Traversing Fosmid Fills Gaps Kaul et al, unpublished

More Related