Weikuan Gu, PhD Professor Director of Translational Research

Genome-Wide Association Studies Weikuan Gu, PhD Professor Director of Translational Research Department of Orthopaedic Surgery & Biomedical Engineering Director, Gene Discovery Core Center of Genomic and Bioinformatics & Center of Disease of Connective Tissues Special Assistance to the Dean for International Collaboration College of MedicineEmail: wgu@uthsc.edu Webpage: http://www.genediscovery.org/

The DNA Discovery Core • Dr. Weikuan Gu, Director (http://www.genediscovery.org) • (Dept Orthopaedic Surgery and Biomedical Engineering, Pathology) • This Core has three elements: • Service: The core will provide microarray analysis, genome screening and polymorphic detection for research and education in UT and other institutions in Tennessee. Meanwhile, the core will continue develop and modify protocols for faster processes and lower costs during services. • Research: The core serves as a resource for the development of research projects that involves in gene profiling, genetic mapping, fine mapping, genome screening, DNA sequencing, and positional cloning. • Education: The core serves as an education and training base for genetic and genomic analyses. Technique training currently includes genomic analysis and genome comparison, genome screening, simple sequence repeat length polymorphism (SSLP) analysis, Single nucleotide polymorphism (SNP) detection, and DNA sequencing.

Published GWAS Reports, 2005 – 6/2011 951 Total Number of Publications Calendar Quarter Through 9/30/10 postings

What is GWAS: Genome Wide Association Study, Also as whole genome association study (WGA study, or WGAS), is an examination of large number of common genetic variants in different individuals to see if any variant is associated with a trait. GWAS typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major diseases. Genetic variants (Common) Single-nucleotide polymorphisms Traits Associations Large number common variants Different individuals

1. Genetic variants: 1, A single-nucleotide polymorphism (SNP), 2, A mutation, in case it is a rare genetic variant (Single or multiple nucleotide mutation, deletion, insertion, inversion etc), 3, A copy-number variation Common Genetic variants used in GWAS: Single nucleotide polymorphism 2. Single-nucleotide polymorphism: DNA sequence variation occurring when a single nucleotide — A, T, C or G — in the genome (or other shared sequence) differs between members of a biological species or paired chromosomes in an individual 3. Traits: A trait is a distinct variant of a phenotypic character of an organism that may be inherited, be environmentally determined or be a combination of the two. There are phenotypic and genetic traits. A phenotypic trait is an obvious and observable trait; it is the expression of genes in an observable way 4. Associations: the co-occurrence or linkage of two or more traits ,of which at least one trait is known to be genetic. The co-occurrence or linkage of such can not be explained by chance. Association does not explain the mechanism underlies causal-result relationship.

Wisdom of History-All start with the gene Positional cloning from mouse disease model: Giant axonopathy (gaxp) Single gene mutation-single locus -discrete phenotype

The trouble caused by multiple genes

Single gene mutation: changes in a genome are restricted to a single gene, a single- gene defect ensues. Usually this involves a point mutation and leads to an altered amino acid sequence in the proteins that are coded in this section of the DNA. -Autosomal dominant inheritance -Autosomal recessive inheritance -X chromosomal inheritance Complex traits are those that are influenced by more than one factor. The factors can be genetic or environmental. This is in contrast to simple genetic traits, whose variations are controlled by variations in single genes -Each factor contribute to a small portion of a trait, -Genetic factors are influenced by environment, -Multiple genes, each contribute small portions of a trait, -Therefore the causal genes in general are usually not defect but somehow tiny changes occur in either expression or function. QTL mapping has been used to locate genetic factors for the complex traits or quantitative traits.

Strategy for identification of genetic factors for complex traits: Quantitative trait loci (QTLs) mapping to determine the chromosome regions of genetic factors of quantitative traits. Mapping regions of the genome that contain genes involved in specifying a quantitative trait is done using molecular tags such as AFLP or, microsatellite markers, more commonly SNPs. Candidate gene evaluation: A candidate gene is a gene, located in a chromosome region suspected of being involved in the expression of a trait such as a disease, whose protein product suggests that it could be the gene in question. Usually an integrative approach using resources of animal models, genomics, high throughput technologies, and bioinformatics tools has been used. Gwas: An approach aimed at detecting variants at genomic loci that are associated with complex traits in the population and, in particular, at detecting associations between common single-nucleotide polymorphisms (SNPs) and common diseases. What is the difference between QTL and GWAS approaches? (homework)

Sample collection-where, what, possibilities, protocols DNA Extraction: (RNA) quality and quantity Sample characterization: Age, gender, history, …… Whole genome SNP: Microarray chips platforms Phenotypic data: Case, control, detailed clinic feature Association analysis: Software Candidate verification: large or other population, function test, pathways

Example calculation illustrating the methodology of a case-control GWA study. The allele counts of each measured SNPs is evaluated, in this case with a chi-squared test, in order to identify variants associated with the trait in question. The numbers in this example are taken from a 2007 study of coronary artery disease (CAD) which showed that the individuals with the G-allele of SNP1 (rs1333049) were overrepresented amongst CAD-patients.[(Wellcome Trust Case Control Consortium (June 2007).

Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott J, Barnstable C, Hoh J (April 2005). "Complement Factor H Polymorphism in Age-Related Macular Degeneration". Science308 (5720): Age-related macular degeneration (AMD) is a major cause of blindness in the elderly. We report a genome-wide screen of 96 cases and 50 controls for polymorphisms associated with AMD. Among 116,204 single-nucleotide polymorphisms genotyped, an intronic and common variant in the complement factor H gene (CFH) is strongly associated with AMD (nominal P value <10−7). In individuals homozygous for the risk allele, the likelihood of AMD is increased by a factor of 7.4 (95% confidence interval 2.9 to 19). Resequencing revealed a polymorphism in linkage disequilibrium with the risk allele representing a tyrosine-histidine change at amino acid 402. This polymorphism is in a region of CFH that binds heparin and C-reactive protein. The CFH gene is located on chromosome 1 in a region repeatedly linked to AMD in family-based studies.

P values of genome-wide association scan for genes that affect the risk of developing AMD. –log10(p) is plotted for each SNP in chromosomal order. The spacing between SNPs on the plot is uniform and does not reflect distances between SNPs on the chromosomes. The dotted horizontal line shows the cutoff for P = 0.05 after Bon-ferroni correction. The vertical dotted lines show chromosomal boundaries. The arrow indicates the peak for SNP rs380390, the most significant association, which was studied further. (B) Variation in genotype frequencies between cases and controls.

Table 1 Odds ratios and population attributable risks (PARs) for AMD. The dominant odds ratio and PAR compare the likelihood of AMD in individuals with at least one copy of the risk allele versus individuals with no copy of the risk allele. The recessive odds ratio and PAR compare the likelihood of AMD in individuals with two copies of the risk allele versus individuals with no more than one copy of the risk allele. The population frequencies for the risk genotypes are taken from the CEU HapMap population (CEPH collection of Utah residents of northern and western European ancestry).

(A) Linkage disequilibrium across the CFH region, plotted as pairwise D′values. The red/orange box in the center of the plot is the region in strong linkage disequilibrium with the two associated SNPs in our data. (B) Schematic of the region in strong linkage disequilibrium with the two associated SNPs in our data. The vertical bars represent the approximate location of the SNPs available in our data set. The shaded region is the haplotype block found in the Hap-Map data. (C) Haplotype blocks in the HapMap CEU data cross the region. Darker shades of red indicate higher values of D′. Light blue indicates high D′with a low logarithm of the odds ratio for linkage (lod score). The dark lines show the boundaries of haplotype blocks. (D) Maximum-parsimony cladogram derived from haplotypes across the 6-SNP region. The number near each line indicates which of the six SNPs changes along that branch. The two red numbers are the two SNPs initially identified as being associated with AMD. SNP 4 is rs380390 and SNP 6 is rs1329428.

Immunofluorescence localization of CFH protein in human retina. Neighboring human retina sections are stained with (A) antibody to CFH or (B) antibody to CFH preabsorbed with CFH as negative control. (C) High-magnification view of the boxed area in (A). For (A), (B), and (C), left panels are the fluorescence images, with CFH labeling in green and DAPI (4′,6′-diamidino-2-phenylindole)–stained nuclei in blue; right panels are differential interference contrast (DIC) images showing the tissue morphology. In (C), the CFH signal is superimposed onto the DIC image. Labeling of CFH is intense in choroid, including blood vessels and areas bordering RPE [(A) and (C)]; this CFH signal is competed away by purified CFH protein (B), which demonstrates the labeling specificity. The fluorescence signal from RPE arises from lipofuscin autofluorescence, which cannot be competed away with CFH protein [(A) and (B)]. The black spots in DIC images correspond to melanin granules in RPE and choroids. The cell layers are indicated: GC, ganglion cells; INL, inner nuclear layer; ONL, outer nuclear layer; RPE, retinal pigment epithelium. Scale bars: 40 μm in (A) and (B), 20 μm in (C).

Nat Genet. 2011 Dec 4;44(1):73-7. doi: 10.1038/ng.1005. A genome-wide association study in Han Chinese identifies new susceptibility loci for ankylosing spondylitis. Lin Z, Bei JX, Shen M, Li Q, Liao Z, Zhang Y, Lv Q, Wei Q, Low HQ, Guo YM, Cao S, Yang M, Hu Z, Xu M, Wang X, Wei Y, Li L, Li C, Li T, Huang J, Pan Y, Jin O, Wu Y, Wu J, Guo Z, He P, Hu S, Wu H, Song H, Zhan F, Liu S, Gao G, Liu Z, Li Y, Xiao C, Li J, Ye Z, He W, Liu D, Shen L, Huang A, Wu H, Tao Y, Pan X, Yu B, Tai ES, Zeng YX, Ren EC, Shen Y, Liu J, Gu J. To identify susceptibility loci for ankylosing spondylitis, we performed a two-stage genome-wide association study in Han Chinese. In the discovery stage, we analyzed 1,356,350 autosomal SNPs in 1,837 individuals with ankylosing spondylitis and 4,231 controls; in the validation stage, we analyzed 30 suggestive SNPs in an additional 2,100 affected individuals and 3,496 controls. We identified two new susceptibility loci between EDIL3 and HAPLN1 at 5q14.3 (rs4552569; P = 8.77 × 10(-10)) and within ANO6 at 12q12 (rs17095830; P = 1.63 × 10(-8)). We also confirmed previously reported associations in Europeans within the major histocompatibility complex (MHC) region (top SNP, rs13202464; P < 5 × 10(-324)) and at 2p15 (rs10865331; P = 1.98 × 10(-8)). We show that rs13202464 within the MHC region mainly represents the risk effect of HLA-B*27 variants (including HLA-B*2704, HLA-B*2705 and HLA-B*2715) in Chinese. The two newly discovered loci implicate genes related to bone formation and cartilage development, suggesting their potential involvement in the etiology of ankylosing spondylitis.

Broad Institute of MIT and Harvard Wellcome Trust Case-Control Consortium

Ten Basic Questions to Ask About a Genome-wide Association Study Reporta 1. Are the cases defined clearly and reliably so that they can be compared with patients typically seen in clinical practice? 2. Are case and control participants demonstrated to be comparable to each other on important characteristics that might also be related to genetic variation and to the disease? 3. Was the study of sufficient size to detect modest odds ratios or relative risks (1.3-1.5)? 4. Was the genotyping platform of sufficient density to capture a large proportion of the variation in the population studied? 5. Were appropriate quality control measures applied to genotyping assays, including visual inspection of cluster plots and replication on an independent genotyping platform? 6. Did the study reliably detect associations with previously reported and replicated variants (known positives)? 7. Were stringent corrections applied for the many thousands of statistical tests performed in defining the P value for significant associations? 8. Were the results replicated in independent population samples? 9. Were the replication samples comparable in geographic origin and phenotype definition, and if not, did the differences extend the applicability of the findings? 10. Was evidence provided for a functional role for the gene polymorphism identified? JAMA. 2008;299(11):1335-1344.

Five years of GWAS discovery Visscher PM, Brown MA, McCarthy MI, Yang J. Am J Hum Genet. 2012 Jan 13;90(1):7-24. The past five years have seen many scientific and biological discoveries made through the experimental design of genome-wide association studies (GWASs). These studies were aimed at detecting variants at genomic loci that are associated with complex traits in the population and, in particular, at detecting associations between common single-nucleotide polymorphisms (SNPs) and common diseases such as heart disease, diabetes, auto-immune diseases, and psychiatric disorders. We start by giving a number of quotes from scientists and journalists about perceived problems with GWASs. We will then briefly give the history of GWASs and focus on the discoveries made through this experimental design, what those discoveries tell us and do not tell us about the genetics and biology of complex traits, and what immediate utility has come out of these studies. Rather than giving an exhaustive review of all reported findings for all diseases and other complex traits, we focus on the results for auto-immune diseases and metabolic diseases. We return to the perceived failure or disappointment about GWASs in the concluding section.

GWAS Discoveries over TimeData obtained from the Published GWAS Catalog (see Web Resources). Only the top SNPs representing loci with association p values < 5 × 10−8 are included, and so that multiple counting is avoided, SNPs identified for the same traits with LD r2 > 0.8 estimated from the entire HapMap samples are excluded.

Increase in Number of Loci Identified as a Function of Experimental Sample Size(A) Selected quantitative traits.(B) Selected diseases. The coordinates are on the log scale. The complex traits were selected with the criteria that there were at least three GWAS papers published on each in journals with a 2010–2011 journal impact factor >9 (e.g., Nature, Nature Genetics, the American Journal of Human Genetics, and PLoS Genetics) and that at least one paper contained more than ten genome-wide significant loci. These traits are a representative selection among all complex traits that fulfilled these criteria.

Is the GWAS approach founded on a flawed assumption that genetics plays an important role in the risk for common diseases? Have GWASs been disappointing in not explaining more genetic variation in the population? Have GWASs delivered meaningful biologically relevant knowledge or results of clinical or any other utility? Are GWAS results spurious? If we assume that the GWAS results from Figure 1 represent a total of 500,000 SNP chips and that on average a chip costs $500, then this is a total investment of $250 million. If there are a total of ∼2,000 loci detected across all traits, then this implies an investment of $125,000 per discovered locus. Is that a good investment? We think so: The total amount of money spent on candidate-gene studies and linkage analyses in the 1990s and 2000s probably exceeds $250M, and they in total have had little to show for it. Also, it is worthwhile to put these amounts in context. $250M is of the order of the cost of a one-two stealth fighter jets and much less than the cost of a single navy submarine. It is a fraction of the ∼$9 billion cost of the Large Hadron Collider. It would also pay for about 100 R01 grants. Would those 100 non-funded R01 grants have made breakthrough discoveries in biology and medicine? We simply can't answer this question, but we can conclude that a tremendous number of genuinely new discoveries have been made in a period of only five years.

GWAS SNP Large number Require statistician or software Multiple loci Narrow down to several SNP or haplotype QTL Microsatellite markers/SNP Relative small number Easy being conduct Multiple loci Large genomic region Genome-wide association studies (GWAS) are becoming increasingly popular in genetic research, and they are an excellent complement to QTL mapping. Whereas QTL contain many linked genes, which are then challenging to separate, GWAS produce many unlinked individual genes or even nucleotides, but these studies are riddled with large expected numbers of false positives.

Home/class work: • Please list at least two items of similarity and two items of difference between QTL studies and GWAS. • 2. Please identify the candidate genes for a genomic region flanked by two rs2019727 (located on human chromosome 1, at 48163350 bp) and rs187391907 (at Chromosome 1, 50019040 bp), which are the significant SNPs for a GWAS study on blood-brain barrier. • 3. Please conduct a simulation study using GSAA program with the data provided to get data on: 1) GSAA, 2) GSAA-SNP, and 3) GSEA. • Please email be if you have problem to convert the excel file into gmt file.

Acknowledgements UT collaborators: Robert Williams Dan Goldowitz Mark Ledoux John Stuart Arnold Postlethwaite Karen Hasty Lawrence Pfeffer Christopher Nosrat Kristin Hamre UM collabortors Eugene C. Eckstein Waldemar G. de Rijk Charles D Blaha Guy Mittleman Support: NIH, UTHSC • Yan Jiao • Griffin Gibson • Yue Huang • XiaoYun Liu • YanHong Cao • Hank Chiu • Qing Xiong • Feng Jiao • Stephanie Troemel • HongBin Yang • Peilan Zhang • Dejeaune Smith • Feng Lan • Jian Yan • Zhiping Jia • Caili Han • Chi Zhang • Shan Sun • Non UT Collaborators: • Bruce Roe, Univ. Oklahoma • Xinmin Li, UCLA • Harry Jerett, UT San Antonia • David Mount, Harvard Med. Sch. • Beth Bennett, U Colo • Wesley G. Beamer • Leah Rae Donahue • Cliff Rosen • The Jackson Laboratory • Yun Jiao, St Jude • YungJun Wang, Tiantan Hospital, BeiJing, PRC

Weikuan Gu, PhD Professor Director of Translational Research

Weikuan Gu, PhD Professor Director of Translational Research

Presentation Transcript

Research Week 2019 Saundra L. Regan, PhD Assistant Professor Associate Director Research Division

Annamore M. MATAMBANADZO, PhD Research Assistant Professor

Translational Research Program

Rob Tannen , PhD, Director of Research Mathieu Turpault , Director of Design www.bresslergroup.com

Carlos F. Cáceres, MD, PhD Professor and Director

Robert Meyer, Research Professor and Director

S. M. Lutful Kabir, PhD Professor and Director

Arelis moore , PhD; research assistant professor

Sophia Chan, PhD, MPH, MEd, RN, FFPH, FAAN Professor of Nursing and Director of Research

Translational Research

Anthony C. James, PhD, CRadP USTUR Director, Research Professor College of Pharmacy

Professor Mark Llewellyn, Director of Research

Bill Matthew Director, Office of Translational Research; NINDS bill.matthew@nih

Sara Kim , PhD, Director, Associate Professor

Translational Research

Amila Orucevic, MD, PhD Associate Professor, Director of Research Department of Pathology

Marcin Luckner , PhD Director of Applied Research Center

Mohsen Bazargan, PhD. Professor President of Academic Senate Director of Research