490 likes | 1.03k Views
CZ5225: Modeling and Simulation in Biology Lecture 10: Copy Number Variations Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg http://bidd.nus.edu.sg Room 08-14, level 8, S16, NUS. Copy number variation (CNV) What is it?.
E N D
CZ5225: Modeling and Simulation in BiologyLecture 10: Copy Number VariationsProf. Chen Yu ZongTel: 6516-6877Email: phacyz@nus.edu.sghttp://bidd.nus.edu.sgRoom 08-14, level 8, S16, NUS
Copy number variation (CNV)What is it? • A form of human genetic variation: instead of 2 copies of each region of each chromosome (diploid), some people have amplifications or losses (> 1kb) in different regions • this doesn’t include translocations or inversions • We all have such regions • the publicly available genome NA15510 has between 5 & 240 by various estimates • they are only rarely harmful (but rare things do happen)
* * * * * * * * * CN=1 CN=2 CN=3 PM = c PM = 2c PM = 3c Copy-number probes are used to quantify the amount of DNA at known loci CN locus:...CGTAGCCATCGGTAAGTACTCAATGATAG... PM: ATCGGTAGCCATTCATGAGTTACTA
Copy number variationPopulation genomics The genomes of two humans differ more in a structural sense than at the nucleotide level; a recent paper estimates that on average two of us differ by ~ 4 - 24 Mb of genetic due to Copy Number Variation ~ 2.5 Mb due to Single Nucleotide Polymorphisms
Abundance of CNVs in the human population ? Still an open question but probably thousands, at low allelic frequency (<20%)
Abundance of deletion CNVs in the human population Comparison of overlapping CNVs identified by Conrad et al. (2006) and McCarroll et al. (2006). Freeman et al. Genome Res 2006
Non-allelic homologous recombination events between low-copy repeats (LCR-NAHR) Lupski & Inoue, TIG 2002
Duplications and Deletions of LCRs mediated by NAHR LCRs in direct orientation LCRs in inverted orientation Inversions
Intrachromatid recombination between LCRs LCRs in direct orientation LCRs in inverted orientation Inversion Deletion
Copy number variationRelations to human disease Responsible for a number of rare genetic conditions. For example, Down syndrome ( trisomy 21),Cri du chatsyndrome (a partial deletion of 5p). Implicated in complex diseases. For example: CCL3L1 CN HIV/AIDS susceptibility; also, some sporadic (non-inherited) CN variants are strongly associated with autism, while Tumors typically have a lot of chromosomal abnormalities, including recurrent CN changes.
Evolutionary and medical implications of CNVs: CCL3L1 as an example Gonzales et al., Science, 2005 When CCL3L1 occupies the CCR5 receptor on CD4 cells, it blocks HIV's entry.
Copy-number variation of CCL3L1 within and among human and chimp populations Gonzales et al., Science, 2005
CCL3L1 and HIV Infection Individuals with a high CCL3L1 gene copy number relative to their population average are more resistant to HIV infection than those with a low copy number, presumably because there is more ligand to compete with HIV during binding to CCR5. Gonzales et al., Science, 2005
A cytogeneticist’s story “The story is about diagnosis of a 3 month old baby with macrocephaly and some heart problems. The doctors questioned a couple of syndromes which we tested for and found negative. Rather than continue this ‘shot in the dark’ approach, we put the case on an array and found a 2Mb deletion which notably deletes the gene NSD1 on chr 5, mutations in which are known to be cause Sotos syndrome. This is an overgrowth syndrome and fits with the macrocephaly. The bottom line is that we are able to diagnose quicker by this approach and delineate exactly the underlying genetic change.”
A cytogeneticist’s story Chromosome 5 2Mb deletion
Many tumors have gross CN changes A lung cancer cell line vs matched normal lymphoblast, from Nannya et alCancer Res 2005;65:6071-6079
Research into gonad dysfunction: Human sex reversal • 20% of 46,XY females have mutations in SRY • 80% of 46,XY females unexplained! • 90% of 46,XX males due to translocation SRY • 10% of 46,XX males unexplained! Suggests loss of function and gain of function mutations in other genes may cause sex reversal. We’re looking at shared deletions.
SNP A TAGCCATCGGTAGTACTCAATGAT G Affymetrix SNP chip terminology Genomic DNA Perfect Match probe for Allele A ATCGGTAGCCATTCATGAGTTACTA Perfect Match probe for Allele B ATCGGTAGCCATCCATGAGTTACTA Genotyping: answering the question about the two copies of the chromosome on which the SNP is located: Is a sample AA(AA),AB(AG)orBB(GG)at thisSNP?
* * * * * 5 µ 5 µ > 1 million identical 25 bp probes/feature 1.28cm 1.28cm 6.4 million features/chip Affymetrix GeneChip *
Xba Xba Xba PCR: One Primer Amplification Complexity Reduction AA BB AB GeneChip Mapping Assay Overview 250 ng Genomic DNA RE Digestion Adaptor Ligation Fragmentation and Labeling Hyb & Wash
Principal low-level analysis steps • Background adjustment and normalization at probe level These steps are to remove lab/operator/reagent effects • Combining probe level summaries to probe set level summary: best done robustly, on many chips at once This is to remove probe affinity effects and discordant observations (gross errors/non-responding probes, etc) • Possibly further rounds of normalization (probe set level) as lab/cohort/batch/other effects are frequently still visible • Derive the relevant copy-number quantities Finally, quality assessment is an important low-level task.
TT AT AA Preprocessing for total CN using SNP probe pairs (250K chip) Modification by H Bengtsson of a method due to A Wirapati developed some years ago for microsatellite genotyping; similar to the approach used by Illumina.
Background adjustment and normalization Outcome similar to that achieved by quantile normalization
Low-level analysis problems remain unsolved; why? • The feature size keeps and so the # features/chip keeps; • Fewer and fewer features are used for a given measurement, allowing more measurements to be made using a single chip These considerations all place more and more demands on the low-level analysis: to maintain the quality of existing measurements, and to obtain good new ones.
* * * * * * * * * * * * * * * * * * * * * * * * AA AB AAB PM = PMA+PMB = 2c PM = PMA + PMB = 2c PM = PMA+PMB = 3c BB PM = PMA + PMB = 2c SNP probes can be used toestimate total copy numbers *
CATGAGTTACTA ATCGGTAGCCATT 0 Allele PM A ATCGGTAGCCAT A CATGAGTTACTA MM 0 Allele A ATCGGTAGCCAT C CATGAGTTACTA 0 Allele PM B CATGAGTTACTA ATCGGTAGCCAT G MM 0 Allele B SNP probe tiling strategy SNP 0 position A / G GTACTCAATGAT* TAGCCATCGGTAN Central probe quartet
GTAGCCATT CAT GAGTTACTAGTCG +4 Allele PM A GTAGCCAT T CAT CAGTTACTAGTCG MM +4 Allele A GTAGCCATC CAT GAGTTACTAGTCG +4 Allele PM B GTAGCCATC CAT CAGTTACTAGTCG MM +4 Allele B SNP probe tiling strategy SNP A / G +4Position GTACTCAATGATCAGCT* TAGCCATCGGTAN +4 offset probe quartet
SNP for Identifying Copy Number Variations • Using SNP chips to identify change in total copy number (i.e. CN ≠ 2) • Outline a new method (CRMA) • Evaluate and compare it with other methods • Make some closing remarks on further issues
Copy-number estimation using Robust Multichip Analysis (CRMA) A few details are passed over. Ask me later if you care about them.
* * * * * * * * * * * * * * * * * * * AA BB AB PMA >> PMB PMA << PMB PMA ≈PMB Crosstalk between alleles - adds significant artifacts to signals Cross-hybridization: Allele A: TCGGTAAGTACTC Allele B: TCGGTATGTACTC
There are six possible allele pairs • Nucleotides: {A, C, G, T} • Ordered pairs: • (A,C), (A,G), (A,T), (C,G), (C,T), (G,C) • Because of different nucleotides bind differently, the crosstalk from A to C might be very different from A to T.
BB AB PMB AA + PMA offset Crosstalk between alleles is easy to spot Example: Data from one array Probe pairs (PMA, PMB) for nucleotide pair (A,T)
PMB + PMA no offset Crosstalk between alleles can be estimated and corrected for What is done: Offset is removed from SNPs and CN units. Crosstalk is removed from SNPs. BB AB AA
Copy-number estimation using Robust Multichip Analysis (CRMA) Already briefly described.
Copy-number estimation using Robust Multichip Analysis (CRMA) That’s it!
Copy-number estimation using Robust Multichip Analysis (CRMA) log2(PMijk) = log2ij + log2jk + ijk Fit using rlm
Copy-number estimation using Robust Multichip Analysis (CRMA) Longer fragments get less well amplified by PCR and so give weaker SNP signals 100K
Copy-number estimation using Robust Multichip Analysis (CRMA) Longer fragments get less well amplified by PCR and so give weaker SNP signals 500K
Copy-number estimation using Robust Multichip Analysis (CRMA) Longer fragments get less well amplified by PCR and so give weaker SNP signals 500K
Copy-number estimation using Robust Multichip Analysis (CRMA) Care required with the number and nature of Reference samples used
Further bioinformatic issues • Estimating copy number: needs calibration data • Segmentation (of chromosomes into constant copy number regions): an HMM-like algorithm • Analyzing family CN data: a different HMM • Incorporating non-polymorphic probes: independent HMM observations to be weighted and combined • Dealing with mixed normal-abnormal samples • Utilizing poor quality DNA samples • Estimating allele-specific copy number