360 likes | 1.32k Views
GWAS vs NGS. James McKay Genetic Susceptibility Group. Genetics. Individual. Predicted Phenotype. Non-heritable. Heritable. Environment. What we expect in terms of effects of genetic variants in cancer susceptibility. Population frequency seems to impact on disease
E N D
GWAS vs NGS James McKay Genetic Susceptibility Group
Genetics Individual Predicted Phenotype Non-heritable Heritable Environment
What we expect in terms of effects of genetic variants in cancer susceptibility Population frequency seems to impact on disease Severity of the consequence on the genes function
Genome wide association studiesGWAS Cases Controls • Agnostic approach -- no knowledge about the gene is needed • Test all common genetic variation across the genome • 770,000 variants for common variants, each tested for • differences between cases and controls
Assays to measure all common genetic variation in human genome
Genome wide association studiesAssociation in case-control groups Controls Cases • Test each one of the variants, • tested for differences between cases and controls
Cancer types with successful GWAS Prostate cancer Breast cancer Colorectal cancer Lung cancer Esophageal cancer Ovarian cancer Head and Neck Testicular cancer Bladder cancer Thyroid cancer Pancreatic cancer Melanoma Basal cell carcinoma Glioma Neuroblastoma Kidney Chronic lymphocytic leukemia Acute lymphoblastic leukemia Follicular lymphoma Myeloproliferative disorders Hodgkin’s Lymphoma Blue = carried out at IARC
GWAS Results Classical HL – 4 european studies 1200 ca 6713 generic control 6p21 MHC Region 5q31 IL13/IL4 -Log10 (p-value) Chromosome K Urayama
MHC Region Associations Extended Class I Class I Class III Class II HLA-DRA: rs6903608 HLA-DRA: rs6903608 All classical HL EBV-positive HL EBV-negative HL HLA-DRA: rs2395185 HLA-A: rs2734986 MICB: rs2248462 -Log10 (P value) HLA-A: rs6904029 Position in MHC Region (MB)
Results for IL13 P=1.8x10-9 P=1.1x10-8 OR
Next step in GWAS Very large sample sizes meta-analysis lung cancer 14K ca 18K co Are all SNPs equal? Bayesian approach, weight SNPs based on different approaches – eQTL, medical literature Many cancer loci are relevant to more than one cancer subtype – start with known loci decrease multiple testing burden
Limitations of GWAS Small RR and many variants tested Sample sizes in thousand samples needed 2nd cancers in Hodgkin’s Best et al Nat Med 2011 Only considers common genetic variants (and only ~ 80% of them) Rare variants not assessed
Next generation sequencing Massive parallel sequencing Now able to assay the entire sequence of an Individual The seq first genome – $3 billion, 14+ labs A single machine, $3000 Many applications other than DNA reseq Review issue Exomes Genome Biology 2011
GWAS assays focus on common genetic variants, NGS gives Individual seq hence common information on rare variants
An example of a NGS workflow Families, trios, case control, tumour vs normal, Pooled/individual Whole genome, target capture (exome, spec regions? Illumina SOLiD, PGM, 454….. Seq ACGTACGTACGAGCT……ACGTACGTACGTACGT 75 – 150 – 250 bp Mapping Variant calling Variant consequence Sboner et al Genome Biology 2011
NGS data, many many short sequence reads Variant calling, heterozygote calls, 50% of reads should be wild type allele, C (ie in the reference) 50% of read should be variant ie T 30 reads / base seems to be solution in terms of accuracy/cost effectiveness
Variant filtering ~3 million SNPs Target exomes 15 – 20,000 Coding SNPs Silent, Synonymous 5,000 – 7,000 Coding SNPs Previously identifed 200 -500 Nonsynon + trun SNPs Functional – truncating In silico predictions 50 – 100 Functional SNPs
An example of a NGS workflow Families, trios, case control, tumour vs normal, Pooled/individual Whole genome, target capture (exome, spec regions? Illumina SOLiD, PGM, 454….. Seq ACGTACGTACGAGCT……ACGTACGTACGTACGT 75 – 150 – 250 bp Mapping Variant calling Variant consequence Ahhh, yes, tricky, we might have to form a working group and get back to you on that one Sboner et al Genome Biology 2011
After Qc filtering 50-100 variants per individual that are in Genes and appear functional How do we differentiate true from false? Bin variants across genes? Test for association? (need @ least 3K ca 3kco)
NPC pedigree Sarawak Malaysia 11 cases for which we have genomic DNA Exome sequencing underway Triage variants in pedigree, interesting variant should segregating in cases Validation in remaining individuals + additional pedigrees, (Allan Hildesheim US NCI)
Genes following two hit models (Knudson’s hypothesis) NGS quite successful in recessive diseases (two mutations, a rare event) Many inherited tumours have no normal alleles, one inherited, the second (wildtype) then deleted somatically, RB, TP53, VHL, BRCA1/2, APC, PTEN BRCA1 BRCA1 chrA chrB
Catalog mutation events in consistutional DNA And somatic events Exomesseq Seq Genomic DNA Somatic Tissue events Exome seq CNV Identify genes for which there is Co-occurence of events, consistent with two hit hypothes chrA (inherited events) 50 by chance? chrB (somatic events) 500 by chance? 1.3 times per genome
IARC bioreposapprox contains lung cancers blood and frozen tumour IARC biorep has close to 500 lung cancer cases with a blood sample and snap frozen tumour 30 LC have a first degree relatives with lung cancer Two stage design Exome sequencing Normal/tumour 30 fh+ 470 for replication
Next generation sequencing Massive parallel sequencing Assay single cell and single position Say chr 3 1 - 50 (from a single cell) Diploid: chr1 ACGTACGTACGAGACGTACGTACGTACGT chr2 ACGTACGTACGAAACGTACGTACGTACGT Not a single cell (although its being worked on), but sample a individuals In parallel, massive billions of reads,