1 / 39

Genetic analysis of human disorders

Genetic analysis of human disorders. Tom Scerri Quantitative association analysis and genotyping technologies. Exercise 1b from Thursday: NPL linkage analysis. Exercise 2c: Results. Opened result files in Excel and sorted by p-value Only possible because we used a relatively small data set

weston
Download Presentation

Genetic analysis of human disorders

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genetic analysis ofhuman disorders Tom Scerri Quantitative association analysis and genotyping technologies

  2. Exercise 1b from Thursday:NPL linkage analysis

  3. Exercise 2c: Results • Opened result files in Excel and sorted by p-value • Only possible because we used a relatively small data set • Identified the top SNPs and checked them in the UCSC Genome Browser • The top two SNPs for the analyses were: • Case-control (using --model): • rs8083963 (p-value = 0.001353 from the genotype test) • location = within an intron of the gene NEDD4L • rs79252026 (p-value = 0.003456 from the allelic test) • location = within an exon of the gene RNMT • TDT: • rs79252026 (p-value = 6.02 × 10-5) • location = within an exon of the gene RNMT • rs7507114 (p-value = 0.001745) • location = within an intron of the gene C18orf1 same SNP

  4. Exercise 2c: Results • Looking at rs79252026 in more detail with the UCSC Genome Browser (www.genome.ucsc.edu)

  5. Exercise 3a: i) Hardy-Weinberg Equilibrium • Results in the folder “answers” • Exercise 3a: • i. Checking the chr18 markers for Hardy-Weinberg equilibrium: • Example command lines: • plink --ped Lecture2/chr18_CC.ped --map Lecture2/chr18_CC.map --hardy --out chr18_CC • plink --ped Lecture2/chr18_TDT.ped --map Lecture2/chr18_TDT.map --hardy --out chr18_TDT • Using rs79252026 as an example from the chr18 data…

  6. Exercise 3a: i) Hardy-Weinberg Equilibrium • Case-control sample: • plink --ped Lecture2/chr18_CC.ped --map Lecture2/chr18_CC.map --hardy • Trios sample: • plink --ped Lecture2/chr18_TDT.ped --map Lecture2/chr18_TDT.map --hardy • plink --ped Lecture2/chr18_TDT.ped --map Lecture2/chr18_TDT.map --hardy --nonfounders Why is there “NA”? Because by default, PLINK will use only founders to calculate HWE as they are indepdendent. Here, the affecteds are “nonfounders”. There are no unaffecteds in the .ped file. This is no longer a reliable estimate of HWE because it now includes the affecteds. It appears that rs7925206 is in HWE, therefore we can assume that the genotyping is good and that we may have found association to the disease, but it still needs replicating in independent samples.

  7. Exercise 3a: ii) Linkage disequilibrium • Exercise 3a: • ii. Estimating LD with PLINK from data presented in the lecture: • Example commands to use: • plink --ped LD.ped --map LD.map --ld Marker1 Marker2 --out LD_marker1_marker2 • plink --ped LD.ped --map LD.map --ld Marker1 Marker3 --out LD_marker1_marker3 • plink --ped LD.ped --map LD.map --ld Marker1 Marker4 --out LD_marker1_marker4

  8. Exercise 3a: ii) Linkage disequilibrium Calculated during lecture yesterday: ||C|A||C|||||||G|| ||C|A||T|||||||A|| ||C|A||C|||||||A|| ||C|A||C|||||||G|| ||C|A||C|||||||G|| ||C|A||C|||||||A|| ||C|A||C|||||||G|| ||C|A||C|||||||G|| ||C|A||C|||||||G|| ||C|A||T|||||||G|| ||C|A||C|||||||G|| ||G|T||C|||||||G|| ||C|A||C|||||||G|| ||C|A||C|||||||A|| ||G|T||C|||||||G|| ||C|A||C|||||||G|| ||C|A||C|||||||G|| ||G|T||T|||||||A|| ||C|A||C|||||||A|| ||C|A||C|||||||G|| ||C|A||T|||||||G|| ||C|A||C|||||||A|| * * * SNP1 SNP2 SNP3 SNP4

  9. Exercise 3a: ii) Linkage disequilibrium Calculated during lecture yesterday: Calculated with PLINK yesterday: same different Why are the results different? different

  10. Exercise 3a: ii) Linkage disequilibrium Calculated during lecture yesterday: Calculated with PLINK yesterday: same different different Why are the results different?

  11. Exercise 3a: ii) Linkage disequilibrium Answer: phase ||C|A||C|||||||G|| ||C|A||T|||||||A|| ||C|A||C|||||||A|| ||C|A||C|||||||G|| ||C|A||C|||||||G|| ||C|A||C|||||||A|| ||C|A||C|||||||G|| ||C|A||C|||||||G|| ||C|A||C|||||||G|| ||C|A||T|||||||G|| ||C|A||C|||||||G|| ||G|T||C|||||||G|| ||C|A||C|||||||G|| ||C|A||C|||||||A|| ||G|T||C|||||||G|| ||C|A||C|||||||G|| ||C|A||C|||||||G|| ||G|T||T|||||||A|| ||C|A||C|||||||A|| ||C|A||C|||||||G|| ||C|A||T|||||||G|| ||C|A||C|||||||A|| * * * SNP1 SNP2 SNP3 SNP4

  12. ||C|A||C|||||||A|| ||C|A||T|||||||G|| ||C|A||C|||||||G|| ||C|A||C|||||||A|| ||G|T||T|||||||A|| ||C|A||C|||||||G|| ||C|A||C|||||||G|| ||G|T||C|||||||G|| ||C|A||C|||||||A|| ||C|A||C|||||||G|| ||G|T||C|||||||G|| ||C|A||C|||||||G|| ||C|A||T|||||||G|| ||C|A||C|||||||G|| * ||C|A||C|||||||G|| ||C|A||C|||||||G|| ||C|A||C|||||||A|| ||C|A||C|||||||G|| * ||C|A||C|||||||G|| ||C|A||C|||||||A|| * ||C|A||T|||||||A|| ||C|A||C|||||||G|| SNP4 SNP3 SNP1 SNP2 Exercise 3a: ii) Linkage disequilibrium SNP4 SNP3 SNP1 SNP2

  13. Exercise 3a: ii) Linkage disequilibrium • Exercise 3a: • ii. Estimating LD with PLINK from data presented in the lecture: • Example commands to use: • plink --ped LD_with_parents.ped --map LD.map --ld Marker1 Marker2 --out LD_marker1_marker2 • plink --ped LD_with_parents.ped --map LD.map --ld Marker1 Marker3 --out LD_marker1_marker3 • plink --ped LD_with_parents.ped --map LD.map --ld Marker1 Marker4 --out LD_marker1_marker4 Now with parental data added to the .ped file

  14. Exercise 3a: ii) Linkage disequilibrium Calculated during lecture yesterday: Calculated with PLINK yesterday: same same same

  15. Exercise 3b: Using HapMap Place to download genotypes SNP allele frequencies in different populations The gene RNMT

  16. Exercise 3b: Using Haploview -LD Plot Block of high LD Different colours & numbers correspond to different levels of pair-wise LD (currently displaying D’)

  17. Exercise 3b: Using Haploview - Haplotypes Haplotypes and their frequencies within a given block of high LD Correlation between haplotypes of different blocks

  18. Exercise 3b: Using Haploview -Check Markers Minor allele frequencies SNP231 = rs948331 Hardy-Weinberg p-values

  19. Exercise 3b: Using Haploview -SNP rs948331 rs948331 Block 27, 41 kb Perfect LD (r2 =1) with four other SNPs Grey-scale & numbers now corresponding to r2

  20. Exercise 3b: Using Haploview -Haplotype-tagging SNPs Haplotype tagging SNPs Switch on tags

  21. Selecting SNPs for genotyping • HapMap data: • establish correlation between SNPs • efficiently select SNPs for genotyping. • Possible methods: • Select haplotype-tagging SNPs • block specific • SNPs selected to tag specific haplotypes • can be used in combination • problem arises in spaces between blocks • Could fill these blacks using “Tagger” (below). • Tagger • generally disregards blocks and haplotypes • selects tagging SNPs based purely on correlations

  22. Selecting SNPs for genotyping: Tagger Tagger Choose between pairwise tagging or 2- and/or 3-marker tagging Go! Select r2 threshold

  23. # SNPs # samples Modern Genotyping Technologies • SNPs and samples, trade-offs due to: • Strategy • discovery, replication, epidemiological, case-control, families • Cost • Chemistry • some SNP combinations are impossible to genotype • Physical constraints *don’t quote me on this (it is only a rough estimation at today’s prices)

  24. SNP ----ATGCCATAAATC--- ----TACGGT* ----ATGCCGTAAATC--- ----TACGGCAT* Direction of extension Sequenom: iPLEX assay • Genotype ~30 SNPs simultaneously • Samples are processed in batches of 384 2. Allele specific primer extension and/or termination 3. Time-of-flight mass spectrometry 1. Multiplex PCR G A Add extension primer ----TACGG + special mix of single nucleotides A, C, G and T* Complex mix of ~60 primers to amplify the regions containing your SNPs. This nucleotide will terminate the extension process Back ground noise  genotype errors

  25. ~1.5 cm ~2.5 cm Sequenom: iPLEX assay Don’t forget to show them the chip Question: How many spots can you see on the chip?

  26. Illumina GoldenGate Assay • 1536 different SNPs simultaneously genotyped per sample • customisable • Samples processed in batches of 96 • 3 primers required per SNP: • 2× allele specific (P1 and P2) • 1× locus specific (P3) • PCR amplification with fluorescently labelled universal primers • Products annealed to microarray and read with a scanner. Adapted from Illumina.com

  27. Illumina Infinium Assay • >1,000,000 of SNPs genotyped simultaneously • Samples processed in batches of 2 • PCR-free whole-genome amplification • Unique 50mer oligonuclotide per SNP • Single base extension with labelled nucleotide. Adapted from Illumina.com

  28. Illumina GoldenGate or Infinium Output SNP 1 SNP 2 Intensity Homo AA Hetero AB Homo BB SNP 3 SNP 4

  29. Maternal grand-parents Mother 2 unknowns Location of duplicate regions? + + + + + + ? Not dyslexic Paternal grand-parents Father Children: Not dyslexic Dyslexic Dyslexic

  30. Detailed view of chromosomal re-arrangement Chromosome 18 18q11.2 18p11.22 fosmids SNPs* genes • Region spans four largely uncharacterised genes: • FHOD3 • C18orf10 • KIAA1328 • Brunol4 *SNPs colour code: Green = normal Red = extra copy

  31. Fish Experiments on Duplicated Region within Family Interphase and fibre FISH performed by the Molecular Cytogenetics and Microscopy Group

  32. Maternal grand-parents Mother + + Paternal grand-parents Father + + Children: ? + + + + Not dyslexic Dyslexic Not dyslexic Dyslexic

  33. Association study designs • Qualitative: • Case-control • population based • Transmission-Disequilibrium Test (TDT) • family based • Quantitative: • Individuals (or singletons) • “population” based • QTDT • family based

  34. Quantitative association analysis: singletons • Collect 100’s or 1000’s of unrelated samples: • Measure these for a quantitative trait. • Genotype samples. • Plot results: y = a + bx b = effect size Marker 1 Marker 2

  35. n-2 Quantitative association analysis: singletons • How to calculate the association? • Consider Marker 1: r = -1.56 / (6.9 × 0.424)0.5tn-2 = -0.91205 × ((10 - 2) / (1 - 0.912052))0.5b = -1.56 / 6.9 = -0.91205 = -6.29066 = -0.2261 p = 0.000235

  36. Quantitative association analysis: singletons • How to calculate the association? • Consider Marker 2: n-2 r = tn-2 = b = = = = p =

  37. Exercise 4a: Using PLINK to do quantitative population-based association analysis • Data and 4th lecture available here: • www.well.ox.ac.uk/~clicker/Bologna/Lecture4/ • PLINK website available here: • http://pngu.mgh.harvard.edu/~purcell/plink/index.shtml • Or simply Google “PLINK association” • Scroll down (or search for) and then click on “Association” on the left menu bar. • Scroll down to “Quantitative trait association”. • Conduct the association analysis using the “--assoc” option with these files: • CC_quant.ped • CC_quant.map • Compare the results to what we calculated in the lecture. • Use also the “--qt-means” option. Does the output make sense?

  38. Exercise 4b: Using HaploView to select tagging SNPs • Using HaploView, open the dataset you should have downloaded yesterday: • RNMT.hmp • Use Tagger to select SNPs for genotyping with these criteria: • from base-pair position 13,500,000 to 13,980,000 • HWE p-value > 0.002 • minimum genotype > 80% • minimum minor allele frequency > 0.05 • r2 threshold > 0.9 • Pairwise tagging • Save your output: “File”  “Export current tab to text”. • How many SNPs need you genotype to cover the region? • How many SNPs will they capture in total? • Compare your results with your neighbour. • Email your results to the clicker genotyping facility: clicker@well.ox.ac.uk • Wait for a confirmation reply before going home. The two “dump” buttons don’t appear to be working properly, so you this method instead.

  39. Exercise 4c: Using in-silico PCR • Using the UCSC Genome Browser to perform in-silico PCR • www.genome.ucsc.edu • Select “PCR” from the top menu. • Search for these primer pairs: • First: • ATAATTAAAAGGCTAATCAAGTGTGCAT • TTGCCATAGGTCTCATAATAGCCTAAC • Second: • TGCCCGGCTACTCATTTTTTAAAATGTG • GTAATACCTTTAAAACATTTTTGCATTTTTT • Third: • GGTTGGTCTTTCAAAATGATCAGTAGA • ATTATAAAGAATTATAAATGAATTATTAAA • It will not work first time… you need to “check” one of the boxes. Which one? • Which SNPs would be amplified using these primers? • Are they good primers for PCR?

More Related