MESA SHARe/Genetics Update

MESA SHARe/Genetics Update September 14, 2012 Jerome I. Rotter

Genetics in MESA MESA Candidate Genes/MESA Family MESA CARe MESA SHARe MESA CNV MESA HeartGO/ESP MESA Epigenetics MESA Exome Chip MESA Metabochip ?MESA Microbiome?

MESA SHARe Agenda September 12, 2012, 5 - 8:30pm Double Tree by Hilton/Silver Spring Hotel 8727 Colesville Road Silver Spring, MD 20910 Maryland Ballroom

MESA SNP Health Association Resource (SHARe) Progress Report Study Timeline • Generation 3 imputation uses IMPUTE v2.2.2 and the latest 1000G reference panel. • dbGaP Update Schedule: • Period of exclusivity may be reduced to 6 months in 2013.

MESA SHARe Phenotype Working Groups • 21 MESA SHARe Phenotype Working Groups actively meet. • Since April 2010, 120 publication proposals, and 36pen drafts were submitted from 19 different Phenotype Working Groups and SHARe Committees • 26 proposals use standard analysis plan developed by MESA SHARe Analysis Committee, 68 follow analysis outlined by consortia, and 21 use non-standard analysis as defined by the Working Group • 18 published papers • Includes collaborations with CHARGE, ICBP, MACAD, PRIMA, GENEVA, CARe, NOMAS, Type 2 DM Consortium, CARDIA, SPIROMETA, HealthABC, SUNLIGHT, CKDGen, FIND, WHI, Family Heart Study, Genestar, Diabetic Heart Study, Framingham, CHS, ARIC, AGES, Rotterdam, Jackson Heart, Family Heart Study

MESA SHARe Phenotype Working Groups

MESA Genetics P&P Committee Members Wendy Post, MD, MS (chair)Xiuqing Guo, PhDSpencer Huang, PhDYongmei Liu, MD, PhDJim Pankow, PhD, MPHKen Rice, PhDSteve Rich, PhDNancy Jenny, PhDChristina Wassel, PhD

MESA Genetics Publications

MESA SHARe Published Papers

P&P Approved MESA SHARe Pen Drafts

THE NHBLI EXOME SEQUENCING PROJECT:A View from MESAStephen S. Rich, PhDPresented by Jerome I Rotter, MD September 14, 2012

Exome Sequencing in Mendelian Disorders from Ng SB et al., Hum Mol Genet 2010;19(R2):R119-124

Mendelian is “Easy”But What of Complex Phenotypes? • Many genes, rather than one, each effect likely much less than in Mendelian • Process for identifying candidate gene(s) more involved than simple exclusion (variant not in dbSNP) • Phenotypes of interest are not only “disease” (MI, T2D, stroke) but “risk factors” or “biomarkers” (LDL, BP) • Exome Sequencing Project Design Questions • How large a sample • What analysis methods for rare/infrequent variants • Heterogeneity of samples; cohorts, laboratories, calling • How to replicate results of infrequent/rare variants • Move from gene/variants to biology

NHLBI Exome Sequencing Project (ESP) • Three cohort-based groups • Heart disease (HeartGO, S Rich) • Lung disease (LungGO, M Bamshad) • Women’s Health Initiative (WHISP, R Jackson) • Two sequencing centers • Broad Institute (BroadGO, S Gabriel/D Altshuler) • University of Washington (SeattleGO, D Nickerson) • Two additional GO components • CHARGE-S (E Boerwinkle, targeted & exome sequencing) • WashUGO (T Graubert, cancer focus, whole genome seq)

HeartGO Coordination University of Virginia(S Rich, PI) CHS B Psaty R Tracy ARIC E Boerwinkle A Morrison CARDIA M Gross A Reiner FHS L Atwood* C O’Donnell JHS H Taylor J Wilson MESA J Rotter W Post Requirements: IRB APPROVAL, 5ug DNA, GWAS guidelines HeartGO supports Labs, Coordinating Centers, and Genetic Counseling Data Glue/Guru – Leslie Lange, UNC-CH

NHLBI HeartGO • ~55,000 phenotype and exposure variables in 6 cohorts • “Harmonized” phenotype and exposure data set of ~140 variables (e.g., BMI_baseline, current_smoker_baseline, former_smoker_baseline) coordinated across cohort Coordinating Centers (some had been studied in CARe) • All DNA samples sent to HeartGO Central DNA Lab (Russ Tracy, UVM) for testing prior to shipment to Sequencing Centers

MESA Contribution to ESP (Sample & Data) 395 Samples Received 100% passed Sequencing Center Q/C 383 (97%) passed sequencing

Exome Sequence Project Design Questions Low extreme High extreme • “Mendel-ize” Traits • Compare extremes of distribution • Rare, higher penetrant variants • Enhance for Large Effect Size • Early-onset disease • High risk, no disease • Extremes of quantitative traits How large a sample, what design to use, for what phenotypes

Exome Sequence Project Design Questions Low extreme High extreme • EOMI (N = 1400) • CF/Pseudomonas, COPD, PAH (N=300) • BMI/T2D (N=500) • BP (N=900) • LDL (N = 400) • Ischemic Stroke (N=600) • DPR (N=1000) • Family Studies (N=100) • TOTAL OF 6800 Exomes How large a sample, what design to use, for what phenotypes

Exome Sequence Project Design Questions • What analysis methods for rare/infrequent variants - Does one method work on all phenotypes? - Does one method work on one phenotype? - What of different classes of variants - MAF - type of variant - Do variants have different effect sizes - Do variants have uni- or bi-directional effects • Do rare variants in genes affect multiple phenotypes and, if so, the same variants in the same direction?

The NHLBI ESxP LDL-C Example • LDL work part of the larger Lipids Writing Group (led by Cristen Willer, Leslie Lange; Russ Tracy and Stephen Rich) • Total of ~2,000 samples • 412 participants from HeartGO (ARIC, CHS, FHS, JHS) with 50:50 European:African ancestry • 412 were selected from the 2nd and 98th percentiles from ~30,000 participants with LDL-C • 1,593 additional samples ascertained for other phenotypes (including deeply phenotyped random sample) with LDL-C • Exome sequencing at Broad (1,234) and U.W. (771) • Single variant and burden tests performed

Lessons Learned from LDL-C • Difficult to achieve significance under any test in the extreme sample (n=412) • Identify LDL-associated genes by burden testing of low frequency/rare variants (n=2000) • Known genes (LDLR, PCSK9, APOB, ABCG5, NPC1L1) with novel and recognized variants • Novel genes with suggestive evidence • No one test identified all genes • Genetic architecture underlying association at each gene differed • Variant frequency and putative function likely important • Increased sample sizes are needed • For LDL-C, exonic variation in GWAS regions did not identify likely candidate genes

How to Replicate Results How many genes to replicate? Based upon what criteria? Sources (ESP, other cohorts/collections)

The Exome Chip Rationale Exome sequencing studies are likely well-powered to discovershared (MAF > 0.1%) variants that contribute to diseases of interest Exome sequencing studies of (non-Mendelian) traits may be underpowered to demonstrate association to those variants Array-based genotyping is less expensive per sample than exome sequencing

Samples Contributed to Exome Chip Total: 12,028 samples

Exome Chip Design Team Goncalo Abecasis Karen Mohlke David Altshuler Benjamin Neale Mike Boehnke Debbie Nickerson Candia Brown Shaun Purcell Peter Chines Steve Rich Mark Daly Manny Rivas Kyle Gaulton Carlo Sidore Goo Jun Jen Stone Hyun Min Kang Joshua Smith Mark McCarthy Benjamin Voight Sean McGee

CHARGE Exome Chip Genotyping Project MESA Genetics Meeting Megan L. Grove September 12, 2012

IlluminaExome Chip Design SNP Selection: Focused on coverage in exonic regions

Cohorts • 10 population-based, 1 case-control* • Age, Gene/Environment, Susceptibility--Reykjavik (AGES) • Atherosclerosis Risk in Communities Study (ARIC) • Cardiac Arrest Blood Study (CABS)* • Cardiovascular Health Study (CHS) • Coronary Artery Risk Development in Young Adults (CARDIA) • Erasmus Rotterdam Gezondheid Onderzoek (ERGO): the Rotterdam Study • Family Heart Study (FamHS) • Framingham Heart Study (FHS) • Health, Aging, and Body Composition Study (HABC) • Jackson Heart Study (JHS) • Multi-Ethnic Study of Atherosclerosis (MESA)

Genotyping • 7 Genotyping Centers • Broad Institute (JHS) • Cedars-Sinai (CHS, FamHS, MESA) • Erasmus Medical Center (ERGO) • Illumina Fast Track Services (FHS) • University of Texas HSC at Houston (AGES, ARIC, CARDIA) • University of Washington (CABS) • Wake Forest (HABC) • 96 HapMap Controls • 48 Europeans (24 M, 24 F) • 48 Yorubans (24 M, 24 F) • Trios optional

Joint Calling in Houston • Power in sample numbers • Exome chip content has low MAF, the Illumina Genome Studio clustering algorithm has limited ability to accurately detect and assign genotype calls • Increase number of samples, and cluster using project data to automatically call rare variants N < 10,000 N > 55,000

Exome Chip Cohorts

Best Practices (V1.0 Chips) • CHARGE Wiki (http://www.chargeconsortium.com/main/Consortium-Documents/) • List of HapMap controls (CHARGEExomeChip_QC_Controls.xlsx) • CHARGE Exome Chip Best Practices Calling Protocol(CHARGE_ExomeChip_Best_Practices_Calling_Protocol.pdf) • Cluster file (.egt) – all races combined (N/A) (Do not use for V1.1) • Annotation files – (N/A) • Illumina manifest • dbNSFP (Liu X, et al. 2011) • Updated rs numbers (if available) • Data returned to respective cohorts only via SFTP (PLINK files)

Timeline • Phase1b - released • AGES • ARIC • CABS • CHS* • CARDIA • FHS • HABC • Phase2 – end Sept • AGES • ARIC • CABS • CARDIA • ERGO • FamHS • FHS • HABC • JHS • MESA

CHARGE Analysis Workshop • Boston, September 27 and 28, 2012 • Hands-on workshop for biostatisticians, bioinformaticians and data managers • Data cleaning (quality control and quality assurance) • Genome annotation • Focus on rare variant analyses • Meta-analysis of data derived from whole exome sequencing and exome chip genotyping • Register soon • Contact Ann Walsh (email: walshac@nhlbi.nih.gov) • Seating is limited

MAF Distribution Phase1b (sample size ~44,000)

CHARGE Investigators Exome Chip Genotyping Committee • Josh Bis • Eric Boerwinkle (Camille Breaux) • Barbara Cochran • Ingrid Borecki • Angela Cook • Adrianne Cupples • Myriam Fornage • Megan Grove • Vilmundur Gudnason • Mayetri Gupta • Talin Haritunians • Tamara Harris • Sekar Kathiresan • Robert Kraaij • Yongmei Liu • Dan Levy • Chris O’Donnell (co-chair) (Marcia Lobos) • Ken Rice • Steve Rich • Fernando Rivadeneira • Jerry Rotter (chair) (Melissa Juico) • Bruce Psaty • Albert Smith • Nona Sotoodehnia • Kent Taylor • Andre Uitterlinden • Cornelia van Duijn • Ann Walsh • Jim Wilson • E.M. Zervos

Exome Chip Data - Plans and Availability Data will be returned to Cedars in three weeks where it will undergo internal clustering and cleaning, by Cedars, UVA and CC. CC will distribute to MESA investigators with approved MESA Genetics P&P Proposal. Like SHARe, Exome analyses will be led by working groups. Data will be posted to dbGaP by end of 2012 to link with GWAS data. CHARGE Rare Variants Analysis Workshop Sept 27-28 in Boston

17 MESA SHARe Approved Analytic Sites • Cedars-Sinai Medical Center (PI: Jerome Rotter, MD) • University of Virginia (PI: Stephen Rich, PhD) • University of Washington (PI: Richard Kronmal, PhD) • Wake Forest University (PI: Gregory Burke, MD, MSc) • Johns Hopkins University (PI: Wendy Post, MD, MS) • University of California, San Diego (PI: Christina Wassel, PhD) • Emory University (PI: Yan Sun, PhD, MS) • University of Texas (PI: Jennifer Nettleton, PhD) • Loyola University (PI: Holly Kramer, MD, MPH) • University of North Carolina (PI: Leslie Lange) • University of Vermont (PI: Russell Tracy, PhD) • University of Pennsylvania (PI: Steven Kawut, MD) • Northwestern University (PI: Laura J. Rasmussen-Torvik, PhD, MD) • University of Michigan at Ann Arbor (PI: Sharon Kardia, PhD) • University of Alabama (PI: Donna Arnett, PhD) • University of Minnesota (PI: Mike Tsai, MD, PhD) • University of California San Francisco (Gregory Marcus, MD) 97 additional groups outside of MESA have applied for and gained access to MESA SHARe data.

MESA SHARe Published Papers

MESA SHARe Pen Drafts

Genome-Wide Association Study for Circulating Levels of Plasminogen Activator Inhibitor-1 (PAI-1) Provides Novel Insights into the Regulation of PAI-1 Jie Huang*; Maria Sabater-Lleal*; …Josyf C. Mychaleckyj, …Bruce M. Psaty, …Kent Taylor, …Mary Cushman, …Aaron R. Folsom, …Russell P. Tracy, …Yongmei Liu, …Christopher J. O’Donnell*, Anders Hamsten* Blood (2012) in press 19,599 subjects, followed by replication analysis of genome-wide significant (P<5x10-8) single nucleotide polymorphisms (SNPs) in 10,796 independent samples. (two 7q22.1 loci – MUC3 may be a candidate for the second)

MESA SHARe/Genetics Update

MESA SHARe/Genetics Update

Presentation Transcript