690 likes | 831 Views
Evolution as a Confounding Factor in Genetic Association Studies. 14 December 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern Medical Center. Current projects. Outline.
E N D
Evolution as a Confounding Factor in Genetic Association Studies 14 December 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern Medical Center
Outline • Hypothesis that evolution should be considered a confounding factor in genetic association studies • HLA-mediated autoimmune disease predisposition analysis using SFVT • Identifying genetic determinants of influenza species jump events based on convergent evolution • Novel general strategies for formally controlling for evolution as a confounding factor in genetic association studies
evolution AS a confounding factor in genetic association studies
Population-based genetic association • Many diseases exhibit evidence of genetic predispositions • Genotype-phenotype association studies • Diagnostic biomarker • Molecular underpinnings of disease pathology • GWAS and linkage disequilibrium • Co-inheritance of “linked” genetic markers • Advantage of using SNPs to detect causal variants • NGS could obviate the need for using linked SNPs
Statistical assumptions • Independence (confounding) • Random sampling (bias) • Population has reached equilibrium • Test sample represents a random sampling of the equilibrium population
HLA-mediated autoimmune disease predisposition analysis using SFVT
HLA and autoimmune disease Robbins Pathologic Basis of Disease 6th Edition (1999)
HLA and infectious disease • Correlation between HLA genotype and HIV viral burden and progression to AIDS • M Dean, M Carrington and SJ O'Brien Annual Review of Genomics and Human Genetics Vol. 3: 263-292 (2002)
HLA and drug sensitivity HLA allele drug sensitivity association prevalence B*1502 carbamazepine (epilepsy) p = 3 x 10-27 high Chinese absent Caucasians B*5701 abacavir (HIV) p = 5 x 10-20 high Caucasians absent in Africans, Hispanics B*5801 allopurinol (gout) p = 5 x 10-24 high Chinese P. Parham
Number of HLA Alleles HLA-A HLA-B HLA-C 697 (24) 1109 (49) 381 (9) HLA-DRB HLA-DQA1 HLA-DQB1 HLA-DPA1 HLA-DPB1 690 (20) 34 95 (7) 27 131 MICA MICBTAP 65 30 11 Figures in parenthesis indicate the number of serologically defined antigens at each locus. 500 new submission each year. IMGT HLA - October 2008
Locus Asterisk Allele family (serological where possible) Amino acid difference Non-coding (silent) polymorphism Intron, 3’ or 5’ polymorphism N = null L = low S = Sec. A = Abr. Q = Quest. HLA Allele Nomenclature HLA - A * 24 02 01 01 HLA - A * 24 02 01 02 L
DRB1 phylogeny DRB1*15 DRB1*16 DRB1*04 DRB1*10 DRB1*09 DRB1*07
DRB1 phylogeny DRB1*13 DRB1*13 DRB1*13 DRB1*13 DRB1*13 DRB1*13
DRB1 phylogeny DRB1*15 DRB1*16 DRB1*04 DRB1*10 DRB1*09 DRB1*07
DRB1 alignment 07/15 07/09 09/15
Limitations with traditional HLA allele-based association studies • Treats entire allele as a single unit and therefore includes both causative and passenger variations • Doesn’t take into account structural relationships between alleles • Syntax of the HLA nomenclature was designed to capture some of the structural relationships between alleles, but there are several exceptions
HLA–mediated disease predisposition • Hypothesis: • While the allelic/haplotypic structures reflect evolutionary history of the locus, it is the focused regions in the HLA genes/proteins that affect gene expression, protein structure and/or protein function that are responsible for enhanced disease risk
An alternative approach • DAIT-Data Interoperability Steering Committee/HLA Working Group members • HLA Nomenclature : WHO/ IMGT – HLA/ Anthony Nolan Research Institute • NCBI - dbMHC • Biomedical ontology people
Summary of SFVT approach • Define individual sequence features (SF) in HLA proteins (genes) • Determine the extent of polymorphism for each sequence feature by defining the observed variant types (VT) • Re-annotate HLA typing information with complete list of VT for each SF • Examine the association between every sequence feature variant type and disease or other phenotype
TCR Binding CD8 Binding A*0201 - ‘CD8 binding’ &‘TCR binding’ SF
Summary of SFs defined 1775 total
Variant Types for Hsa_HLA-DRB1_beta-strand 2_peptide antigen binding
HLA SFVT Association with Systemic Sclerosis • Summary of data set • Systemic sclerosis (SSc, scleroderma) is a chronic condition characterized by altered immune reactivity, thickened skin, endothelial dysfunction, interstitial fibrosis, gangrene, pulmonary hypertension, gastrointestinal tract dysmotility, and renal arteriolar dysfunction. • A large cohort of ~1300 SSc patients and ~1000 healthy controls has been assembled by Drs. Frank C. Arnett, John Reveille and colleagues at the University of Texas Health Science Center at Houston. • Information on autoantibody reactivity for over 15 nuclear antigens is available. • 4-digit typing has been done for DRB1, DQA1, and DQB1 in all individuals. • Initial re-annotation of 4 digit DRB1 typing data • DRB1*1104 => SF1_VT43; SF2_VT4; SF3_VT12 ……… • Statistical analysis • Split data set into two - pseudo-replicates • 2 xn contingency table for every SF (286), where n = number of VT • Chi-squared or Fisher’s Exact Test analysis • Select SF with adjusted p-value <0.01 (83/286) • 2 x 2 contingency table (type vs non-type) for every VT (418 total) • Merge results of pseudo-replicates
DRB1*0101 Visualization protective risk 67F 67I 70D 70D 71R 71R 28D 28E 26F 26F 30Y 30L 37Y 37F 86V 86G
Table of subject vs. HLA 4-digit typing data Table of subject vs. SFVT feature vector Table of p-values, adj. p-values, odds ratio, confidence intervals TCR Binding CD8 Binding ImmPort HLA SFVT Workflow
Summary • SFVT Approach • Proposed a novel approach for HLA disease associations based on sequence feature variant type analysis (SFVT) • Defined structural and functional protein sequence features (SF) for all classical human MHC class I and II proteins • Determined variant types (VT) for all SF in known alleles • Available in ImmPort www.immport.org, IMGT-HLA and dbMHC • Systemic Sclerosis Analysis • Based on the SFVT approach, identified a region of the HLA-DRB1 protein centered around peptide-binding pocket 7 that appears to be associated with disease risk • Sequences found in HLA-DRB1*1104 at positions 28, 30, 37, 67 and 86, especially with aromatic amino acids, were associated with increase disease risk • Sequences found in this region of HLA-DRB1*0302 appear to be protective • Different alleles are associated with altered risk in different racial/ethnic populations, but they share common SFVTs • SFVTs associated with risk of developing SSc are different in patients with anti-topo versus anti-cent antibodies, supporting the idea that these are distinct disease • However, the risk-associated SFVTs are from the same SFs suggesting a common mechanism of disease pathogenesis
IRD Overview www.fludb.org
Influenza A Sequence Features as of 18JUL2011 4128 SFs total
genetic determinants of influenza species jump events based on convergent evolution
Flu pandemics of the 20th and 21st centuries initiated by species jump events • 1918 flu pandemic (Spanish flu) • subtype H1N1 (avian origin) • estimated to have claimed between 2.5% to 5.0% of the world’s population (20 > 100 million deaths) • Asian flu (1957 – 1958) • subtype H2N2 (avian origin) • 1 - 1.5 million deaths • Hong Kong flu (1968 – 1969) • subtype H3N2 (avian origin) • between 750,000 and 1 million deaths • 2009 H1N1 • subtype H1N1 (swine origin) • ~ 16,000 deaths as of March 2010
Pandemic stages Adaptive drivers
Basic reproductive number (R0) • Total number of secondary cases per case • Reasonable surrogate of fitness • Characteristics of pandemic viruses: • R0H >1, and • In genetic neighborhood of viruses with R0R>1 and R0H<1 • Adaptive drivers A1 A2 • Reservoir virus • (R0R>1 and R0H<<1) • Stuttering viruses • (R0R>1 and R0H<1) • Pandemic Viruses • (R0H >1)
Adaptive drivers Pepin KM et al. (2010) “Identifying genetics markers of adaptation for surveillance of viral host jump” Nature Reviews Microbiology 8: 802-814.
Stuttering transmission and adaptive drivers • Stuttering transmission can reveal adaptive drivers by evidence of convergent evolution • Odds of finding the same neutral mutation by chance in multiple species jumps is low • Therefore, finding same mutation in multiple independent species jump events is strong evidence for adaptive driver
Genetic convergence during species jump • Virus isolate groups from IRD • Avian H5N1 (PB2) from Southeast Asia* up to 2003 (260 records) – reservoirs of source viruses • Human H5N1 (PB2) from Southeast Asia 2003-present (165 records) – many examples of independent species jumps • Align amino acid sequence and calculate conservation score • Identify highly conserved positions in avian records (≤1/260 variants) (557positions/759) – functionally restricted in reservoir • Select subset in which two or more human isolates contained the same sequence variant – either due to human-human transmission or convergent evolution *China, Hong Kong, Indonesia, Thailand, Viet Nam