560 likes | 816 Views
Genomic Duplications, Structural Variation and Disease. Evan Eichler Howard Hughes Medical Institute University of Washington. April 3 rd ,2006, Frontiers in Genomics. Genomic Variation. Mutational mechanisms underlying genetic variation?. Sequence.
E N D
Genomic Duplications, Structural Variation and Disease Evan Eichler Howard Hughes Medical Institute University of Washington April 3rd,2006, Frontiers in Genomics
Genomic Variation Mutational mechanisms underlying genetic variation? Sequence • Single base-pair changes – point mutations • Small insertions/deletions– frameshift, microsatellite, minisatellite • Mobile elements—retroelement insertions (300bp -10 kb in size) • Large-scale genomic variation (>10 kb) • Large-scale Deletions • Segmental Duplications • Chromosomal variation—translocations, inversions, fusions. Cytogenetics
Global Analysis of Segmental Duplications Intrachromosomal Interchromosomal Question:What is the organization, mechanism and impact of recent human segmental duplications? >90% and > 1kb in length Segmental Duplications Approaches: • Computational a) Whole genome assembly comparison b) Whole genome shotgun sequence detection strategies • Experimental Comparative sequence analysis, array comparative genomic hybridization, comparative FISH
1 2 3 4 5 6 7 8 200 Mb 250 Mb 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X 10 Mb Y 50 Mb 150 Mb 100 Mb 2p22 2p11 (700 kb) 10q26 11p15 11q14 11q14 7q36 22q12 21q21 4p16.1 4p16.1 12q24 Xq28 4p16.3 12p11 4q24 7q36 Alpha Satellite Recent Duplication Architecture of the Human Genome • Total: 5.26% (150.8 Mb) • Inter: 2.36% (67.6 Mb) • Intra: 3.87% (111.1 Mb) • Non-random distribution • 5.3 fold bias to pericentromere • 389 regions > 100 kb nexi “Heterochromatic” regions Duplications (build34, >90%, >1kb)
Human Genome Segmental Duplication Pattern chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 • ~4% duplication • >20 kb, >95% • ~4 average # duplicates • 59.5% pairwise (> 1 Mb) chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX She, X et al., (2004), Nature chrY http://humanparalogy.gs.washington.edu
Mouse Segmental Duplication Pattern • 1-2% duplication • >20 kb, >95% • 2-3 average # duplicates • July 2004, mmu5 She, X in press
Percent Similarity of Human Segmental Duplications 25My 12 My 5 My 49 Mb 90 91 92 93 94 95 96 97 98 99 100 90.5 91.5 92.5 93.5 94.5 95.5 96.5 97.5 99.5 98.5 12000 10000 8000 6000 4000 Sum of Aligned Bases (kb) 2000 0 20000 15000 10000 5000 Interchromosomal 0 Intrachromosomal Whole-Genome Analysis (2,865 Mb) Build 34, July 2003, 25.8 K alignments Percent Identity (%)
Summary: Segmental Duplication Asymmetry Polymorphism 15-20% 21.7 Mb+ new 7.2 Mb+ shared 24.8 Mb+ new 6.6 Mb+ shared Human 16.0 Mb+ shared Chimp hyperexpansion Chimpanzee • 76.3 Mb of Differentially Duplicated Euchromatic Material
Hyperexpansion of a Chimpanzee Segmental Duplication. 4>>>>>400 copies Cheng, Z et al., (2005), Nature
Human Segmental Duplications Properties • Large (>10 kb) • Recent (>95% identity) • Interspersed (60% are separated by more than 1 Mb) • Modular (duplicon architecture) ~389 acceptor regions • 2.7% Genetic Difference, human vs. chimpanzee What impact in terms of human variation?
Models of Disease • Rare Duplication-mediated Structural Variation • Rare Duplication-Mediated Structural Variation • Common Fine-Scale Structural Variation
A B C A B C TEL TEL Genomic Disorders A B C TEL A B C TEL Aberrant Recombination GAMETES Human Disease Triplosensitive, Haploinsufficient and Imprinted Genes • Hypothesis: Mechanism underlying Uncharacterized Mental Retardation?
Duplication Map of Human Genome • 130 candidate regions (298 Mb) • 23 associated with genetic disease • Target patients array CGH Bailey et al. (2002), Science:293:1003-1007
Normal Human DNA Sample Cy3 Channel Array of Human BAC Clones Hybridization Cy5 Channel Disease individual DNA Sample Merge Array Comparative Genomic Hybridization 12 mm • High-throughput detection of large-scale variation (>50 kb), • LCV or CNP= Deletions and Duplications (Iafrate et al., 2004; • Sebat et al., 2004).
Duplication Microarrary: Experimental Design BACs TEL dist: >50 kb<5 Mb prop: 95% identity, 10 kb • 130 regions of the human genome • 2178 BACs or on average ~10-12 BACs per region • Perform ArrayCGH—reciprocal dye swap experiments • Strategy: Identify normal variation and then search for variation only observed in disease patients
Hybridization 2 R921 1.5 1 0.5 0 -0.5 -1 -1.5 D3767 1.5 1-3 5 10 15 20 4-5 1 6 0.5 Log2 Hybridization Relative Intensity 7-14 0 15 -0.5 16-20 -1 -1.5 1.5 0 5 10 15 20 R1080 1 0.5 0 -0.5 -1 -1.5 -2 0 5 10 15 20 BAC Probes
Study Populations • Normal unaffected (diversity panel and HapMap Samples). Target= 800 samples, Completed: 75 + 269 samples=344 total—Identified additional 257 CNPs. • Idiopathic Mental Retardation: • Target =900 samples; (400 samples Flint, 500 CWRU samples); 291 complete
Normal Large-Scale Genomic Structural Variation • Based on our analysis of ~568 chromosomes (~40/130 hotspots show no variation)—NAHR resistant or selection?
Validation using Nimblegen Arrays Deletion Duplication Locke et al., unpublished
Study Populations • Normal unaffected (diversity panel and HapMap Samples). Target= 800 samples, Completed: 75 + 269 samples=344 total—Identified additional 257 CNPs. • Idiopathic Mental Retardation: • Target =900 samples; (400 samples Flint, 500 CWRU samples); 291 complete
VCF Deletion detected in IMR26 ~3.0 Mb deletion observed in IMR26 (=common VCF 22q11 deletion)
Novel LCV/CNP Detected in IMR43 Novel LCV/CNP Detected in IMR43 CNP detected by Seg Dup array and Iafrate et al. Novel ~2.5Mb deletion only observed in IMR CNPs detected by Seg Dup array in HapMap samples Sharp et al., unpublished
Variation in IMR • 291 IMR samples (Oxford Cohort) screened to date • 23 (n=31 patients) novel sites of variation defined by >2 BACs • 5 are seen in more • than one unrelated patient • 7/9 events are de novo • New Genomic Disorder Candidates
Problems: • Array CGH has a lower limit to detect deletions (~30 kb) • Oligo-based approaches effectively sample a small • fraction of the genome and extrapolate size indirectly • Precise location of the rearrangement is unknown. 2. Neither can identify subtle (5-30 kb) variation 3. Neither approach can detect inversions. 4. Location and structure of the change unknown
Models of Disease • Rare Duplication-mediated Structural Variation • Common Fine-Scale Structural Variation
Intermediate-Size Structural Variation (ISV) and Inversions Gene Type Freq. Locus Size Phenotype Dup GSTT1 Deletion 20% -/- 22q11.2 54.3 kb halothane/epoxide sensitivity 17kb/94% DEF3A-OR Inversion 26% -/+ 8p23 5 Mb heart defect susceptibility 400kb/98.9% EMD/FLN Inversion 33% -/+ Xq28 219 kb none 48kb/99% IGVH26 Deletion/Dup 4-15% +/- 14q32.3 Variable immune response 91-97% toxin resistance, cancer susceptibility GSTM1 Deletion 50% -/- 1p13.3 18 kb 24kb/95.6% CYP2D6 1-29% +++ Duplication 22q13.1 5 kb antidepressant resistance 5.4kb/91-97% CYP21A2 Duplication 1.6% +/- 6p21.3 35 kb Congenital drenal hyperplasia 0 CYP2A6 Duplication 1.3% +/- 19q13.2 7 kb nicotine metabolism 24kb/96.2% SMN2 Duplication 50% +++/- 5q13 >100 kb SMA susceptibility 88.7/99.8% Adapted from Buckland, Ann Med
Inversions Deletion Insertion Concordant Fosmid > < > < > < < < Build35 Comparing Human Genomes by Paired-End Sequence • ~1.1 million fosmid paired-ends were sequenced by MIT to • facilitate gap closure during final phases of HGP • Derived from a single female donor PDR cell line • Fosmid insert size tightly distributed around mean (40 +/- 2.6 kb), • low copy=stability; capillary sequencing=low mispairing rate • Approach: optimal placement of fosmid ends against human • genome could theoretically detect rearrangements: Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8 X genome coverage)
Genome-wide Detection of Structural Variation (>8kb) b) a) Insertion >48 kb Putative Deletion < 32 kb Putative Insertion Deletion Inversion c) discordant by orientation (yellow/gold) discordant size (red) duplication track Structural polymorphisms?
Validated Structural Polymorphisms • GSTM1 ~ 20 kb deletion • minspread 28 kb (9 fosmids) • 50% of Caucasians/Saudis are -/- for 18 kb gene (predisposition to cancer) • +++ ultrarapid GSTM1 activity GSTM1 • CYP2D6 ~ 5-10 kb insertion • Minspread 17 kb (7 fosmids) • Alternate haplotype support • 1-29% Caucasians/Japanese have • multiple copies (entire gene ~5 kb) • Associated with resistance to • antipsychotic tricyclic antidepressants CYP2D6
Summary: 6/16 of common polymorphisms detected Tuzun et al. (2005) Nat. Genet
Putative Insertion (8,384 bp) build34 fosmid
Putative Deletion (14,055 bp) build34 fosmid
a) b) SIGLEC5A MEGF11 b35 b35 fosmid fosmid KCNJ16 c) d) LSP1 TNNT3 KCNJ2 b35 b35 fosmid fosmid e) f) b35 b35 fosmid GSST2 DDT GSST2 fosmid Sequencing Genic Structural Variation
Gene Families and Structural Variants Drug detoxification: glutathione-S-transferase, cytochromeP450, carboxylesterases Immune response and inflammation: leukocyte immunoglobulin-like receptor, defensin, phorbolin Surface integrity genes: mucin, enamelin, late epidermal cornified envelope genes, galectin Surface antigens: melanoma antigen gene family, rhesus antigen Environmental Interaction Genes.
Fine-Scale Structural Variation Map: (build35 vs. Fosmids) • 1.3% Discordant Fosmids • Identify 295 clusters (2 or more) • 246 supported by second haplotype • 147 inserts, 93 deletions, 57 inverts • 18 putative L1 events—10 deletions • and 8 insertions (6 kb insertion) • 89 locate within gene regions. • 138 unique regions of the genome • 159 duplicated regions of the genome Insertion(Fosmid) Deletion Inversions “Heterochromatic” regions “Duplicated” regions
PCR Breakpoint Genotyping Assays for Structural Variation • Tested 11 structural variants (5 insertions, 4 deletions, and 2 inversions) • 7 successful assays (6 >20% minor allele frequency)
Illumina Golden-Gate Genotyping Assays for Structural Variation
Human Genome Structural Variation Project • 2 scientific meetings (2005) • 2 working groups (AHG, MSWG (12/05) • Coordinating Committee (1/06) • NIH Council (2/06) • Press Release (3/15/06) • Goal: Complete Characterization • of Structural Variation in • 48 HapMap Samples Japanese and Chinese Yoruba CEPH
Complementary Approaches • 1503 variants, 115 Mb, 800 genes structurally variant Eichler (2006) Nat. Genet
Summary • Humans relatively unique in size, proportion and • architecture of interspersed segmental duplications • Large-Scale Variation • Normals: Identified 257 CNPs using a targeted • microarray to duplicated regions • IMR: Identified 23 sites (>2 BACs) unique to patients • (n=291 probands) (5 are recurrent and 7 are confirmed de novo) • Novel Genomic Disorders • Fine-Scale Variation: Developed an approach to map and • sequence common fine-scale variation within the human • Population, estimate ~200-300 differences > 8 kb between 2 • individuals.
Models of Human “Genetic” Disease 1) Simple Mendelian --one gene-one disease, familial, highly penetrant, small fraction of pop. Eg. cystic fibrosis 2) Chromosome Disease –large chromosomal regions, non-familial, sporadic, relatively high frequency Eg. Turner Syndrome 3) Genomic Disease –familial and/or recurrent, deletion or duplication of large # of genes, dosage effects. Eg. Prader-Willi Syndrome. 4)Complex Traits--multiple genes plus environment, familial, variably penetrant, large fraction of population, susceptibility genes eg. hypertension.
Acknowledgements Eichler Lab Eray Tuzun Andy Sharp Devin Locke Matthew Johnson Zhaoshi Jiang Jon Bleyhl Sean McGrath Tera Newman Jeff Bailey Anne Morrison Lisa Pertz Ze Cheng Xinwei She James Sprague UWGSC Maynard Olson Rajinder Kaul Hillary Hayden Eric Haugen UCSF Dan Pinkel Donna Albertson CWRU/UChicago Stuart Schwartz Laurie Christ Agencourt Doug Smith Oxford Jonathan Flint Samantha Knight NHGRI Jim Mullikin UW Debbie Nickerson Mark Rieder Chris Carlson Josh Smith
Sequence of Traversing Fosmid Fills Gaps Kaul et al, unpublished