GAW – what is it ? GAW14 data GAW14 analysis groups Our analyses for GAW14 GAW14 discussions

GAW – what is it ? • GAW14 data • GAW14 analysis groups • Our analyses for GAW14 • GAW14 discussions

WHAT IS IT ? • GAW is organised every 2 years • For each GAW participant a real and a simulated data set is available for the analysis • Data set is supposed to be relevant to current analytical problems in genetic epidemiology • During GAW meetings results of analyses are discussed and compared • http://www.gaworkshop.org/welcome.html

GAW14 DATA (simulated) • DATA • multiple replicates available • “simulation answers” available upon request • PHENOTYPE • behavioural disorder: categorical and quantitative traits • genetic background: gene interaction, genetic heterogeneity • RELATIONSHIP STRUCTURE • nuclear families • extended pedigrees • MARKERS • genome scan with MSs every 7 cM • genome scan with SNPs every 3 cM • additional SNPs available for “purchase”

GAW14 DATA (real) • DATA • Collaborative Study on the Genetics of Alcoholism • 1614 individuals • missing genotype and phenotype data • PHENOTYPE • binary, categorical and quantitative traits related to alcoholism e.g. maxdrinks, ALDX (1=pure unaffected … 5=affected), “gave up activities to drink” (0=yes, 1=no) • covariates e.g. number of cigarette packs, ethnicity (1=American Indian … 7=White Hispanic), smoking status (0=yes, 1=no) • RELATIONSHIP STRUCTURE • pedigrees (3 generations) • MARKERS • genome scan with MSs • genome scan with SNPs: 11 560 (Affymetrix), 4 600 (Illumina)

GAW14 ANALYSIS GROUPS • SNPs vs. microsatellites: simulated and real data (25) • Integrating snps and microsatellites (9) • Linkage mapping methods: simulated and real data (20) • Quantitative trait mapping (12) • Fine mapping (11) • Haplotypes and tag SNPs (11) • Detection and implications of linkage disequilibrium between markers (7) • Association mapping (13) • Case-control analyses (15) • Multivariate analyses (8) • Analyses of alcoholism, smoking and related traits (12) • Data mining (12) • Heterogeneity (7) • Gene-gene interaction (7) • Genotyping errors, pedigree errors and missing data (7) • Parent of origin, imprinting, mitochrondrial and X-linked effects (7)

OUR ANALYSIS Joanna Szyda, Przemysław Biecek, Florian Frommlet, Jayanta K. Ghosh, Małgorzata Bogdan Analysis of the genetic background of quantitative traits related to alcoholism by mixed inheritance and oligogenic models • PHENOTYPE CHOICE • GENETIC VARIABILITY • GENOME SCAN WITH SNPs • regression • “nonparametric” test • TESTING SNP EFFECTS

OUR ANALYSIS (phenotype choice) • several phenotypic measurements available • maximal number of drinks per day (MAXDRINKS) – direct measure of alcoholism • subjective – patients own estimate • not normally distributed • available for most of patients • 12 different EEG measurements • objective – “closer to genes” • normally distributed • available for approx. 60 % of patients • choose the EEG phenotype with the highest genetic correlation to ln(MAXDRINKS)

MODEL: OUR ANALYSIS (phenotype choice) • p phenotypes • fixed covariates • random polygenic effects • residual effects • incidence matrices

VARIANCE STRUCTURE: • Gi relationship matrix - represents genome average similarity among individuals OUR ANALYSIS (phenotype choice) • highest genetic correlations to ln(MAXDRINKS)  ttth3

MODEL: VARIANCE STRUCTURE: HERITABILITY: OUR ANALYSIS (genetic variability) • h2(lnMAXDRINKS) = 0.148 • h2(ttth3) = 0.356

MODEL: VARIANCE STRUCTURE: TEST: OUR ANALYSIS (genetic variability) • P(lnMAXDRINKS) = 2*10-5 • P(ttth3) = 7*10-12

MODEL: OUR ANALYSIS (genome scan  regression) MODEL SELECTION: • n number of individuals • RSS residual sum of squares • p number of additive/dominance effects • q number of interaction effects • L penalty for an additive/dominance effect • U penalty for an interaction

TEST: OUR ANALYSIS (genome scan  „nonparametric”) • n number of individuals • xi observed number of allele „1” • E(xi) expected number of „1” based on parents • yi trait value • E(yi) trait value predicted by the mixed model

OUR ANALYSIS (genome scan  results) • no good correspondence between mBIC and PDT • no good correspondence between ln(MAXDRINKS) and ttth3 • P values quite high (4600 comparisons 1614 individuals) • PDTl • ln(MAXDRINKS) chr.3 0.0033 0.086 • ttth3 chr.6 0.0001 0.00004 • polygenic component dominates the genetic background of alcoholism

OTHER ANALYSES OF QUANTITATIVE TRAITS • PHENOTYPE: • constructing „cumulated” phenotype based on several measurements available using PCA • recombination rate as a new phenotype • usefulness of trait transformation (normalisation) • incorporation of other alcoholism phenotypes as covariates • REGRESSION: • markers on individual phenotype • IBD proportion on sib-pair phenotypic difference (Haseman-Elston) • offspring phenotype on mid-parent phenotype

OTHER ANALYSES OF QUANTITATIVE TRAITS • OTHER: • using weights in regression reflecting marker informativeness • examines genetic heterogeneity among ethnic groups • accounting for age of alcoholism onset • empirical estimation of type I error • genotype*smoking interaction

GAW14 DISCUSSIONS • PHENOTYPE: • transformation to normality • SNP GENOME SCANS: • will microsatellite markers still be needed? • SNP generation is easier for automatisation thus: • cheaper • more efficient • more accurate • multiple testing problem • METHODS: • account for linkage disequilibrium – stronger for SNPs • account for haplotype estimate uncertainty

GAW – what is it ? GAW14 data GAW14 analysis groups Our analyses for GAW14 GAW14 discussions

GAW – what is it ? GAW14 data GAW14 analysis groups Our analyses for GAW14 GAW14 discussions

Presentation Transcript

FMRI Data Analysis: I. Basic Analyses and the General Linear Model

Overview

CHAPTER 9 Financial statement analysis I

Quantitative Data Analysis

Econometric Analysis of Panel Data

Data-Flow Analysis (Chapter 8)

An Introduction to Functional Data Analysis

Introduction to Medical Decision Making and Decision Analysis

SECONDARY ANALYSES IN CLINICAL TRIALS

Geo406 Data Analysis in Geology

Discrete Multivariate Analysis

Data Collection and Analysis Tools

Time Series Analysis in AFNI

Pasgear 2

Microarray Data Analysis Using BASE

What is the problem? Broad Data and Infrastructure Analysis

Age-Period-Cohort Analysis: New Models, Methods, and Empirical Analyses

NUMERICAL ANALYSIS OF BIOLOGICAL AND ENVIRONMENTAL DATA

Methods for Summarizing the Evidence: Meta-Analyses and Pooled Analyses