210 likes | 352 Views
Associating Genomic V ariations with Phenotypes. M odel comparison , rare variants , and analysis pipeline. Qunyuan Zhang Division of Statistical Genomics & Genome Institute Washington University School of Medicine. Data & Question. Genotypes: SNP Insertion Deletion Duplication
E N D
Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics & Genome Institute Washington University School of Medicine
Data & Question Genotypes: SNP Insertion Deletion Duplication Inversion Translocation … Relationship between X and Y ? Phenotypes (quantitative, categorical)
Linkage & Association Genotypes Association: (Y,X) Linkage: (Y,Q) Q is unobservable Phenotype • r1Q r2 • Putative QTL
A Fixed-effect Mixture Model For Linkage • P1 X P2 • F1 Commonly used in plant genetics • SNP A SNP B • F2 • r1Q r2
A Variance-component Model For Linkage • SNP A SNP B Commonly used in human genetics QTL IBD matrix Background IBD matrix Diagonal unit matrix • r1Q r2
Variance-component Model = Random-effect Linear Model Random effects
From Linkage to Association QTL effect(s) Linkage model Family-based association model marker effect(s) fixed effect(s)
Covariate(s): Adjusting For Confounder(s) Observed confounders: age, sex etc. Hidden confounders: population structure Population structure can be estimated by: • -PCA -Clustering -Admixture/ancestry
Modeling Hidden Genetic CorrelationBetween Subjects marker fixed effect(s) covariate fixed effect(s) Genetic background random effects Family data, pedigree => IBD matrix Population data, hidden, marker data => IBS matrix
Modeling Rare Variants Common variants, tested individually, H0: β1=0. One p-value per variant Rare variants, tested as an entire group (burden test), usually by gene H0: β1= β2=…=βk=0 . One p-value per group of variants • Incorporated with variable selection, with loose criteria • β can be treated as random effects, variance components test, can be weighted by prior information
Collapsing Model Collapsing multiple variables into one
Weighted Sum Model Weighted sum score
Weighting Variants • Base on allele frequency, continuous or binary(0,1) weight, variable threshold; • Based on function annotation/prediction; • Based on sequencing quality (coverage, mapping quality, genotyping quality, validated or not etc.); • Data-driven, using both genotype and phenotype data, learning weights (including effect directions) from data, requiring permutation test; • Any combination … Grouping Variants • By gene By transcript By exon • By gene set / pathway By protein domain • ……
Modeling More Data TypesGeneralized Linear (Mixed) Model Link function For binary Y, logistic model
Longitudinal Data (quantitative) Time • Fixed effect, time as covariate • Repeated measures, random effect, correlation within subjects
Longitudinal Data (binary) Time • Linear model, time as covariate • Survival analysis, CoxPH model etc.
Tools • SAS Procedures • REG, LOGISTIC, GENMOD, MIXED, HPMIXED, GLIMMIX, PHREG/LIFETEST • R Functions/Packages • lm (), glm() • gee, nlme, kinship2/coxme, lme4, survival • Other Programs • SOLAR, MMAP, EMMA, EMMAX, SKAT
Pipeline Input (data + options) Job generating/submitting module Job number controlling module job2 job1 ….. Job N LSF bsub Options.jobi=> self-programmed modules (SAS, R,…) Options.jobi=> external program modules (MMAP, SKAT,..) ….. Result 1 Result 2 Result N Job status monitoring module (all done ?) Yes no Result summarizing module Wait …
gwas.sh options.gwa [DATA] database=SAS genotype_dir=/dsg1/gwas/fhsgeno genotype_file= phenotype_file=fhs100 markerinfo_file=mapall marker_selection=MAF>0.01 pedigree_file=pediall subjectID=subject pedgreeID=famid markername=snp … [ANALYSIS] phenolist_file= pheno_list=bmi/qt covariates= program=SASGLM analysis=mixed [OUTPUT] output_dir=/dsguser/qunyuan/fhs/bmi output_file= output_replace=no [RUN] clusterjobname=bmimixed memsize=1000M maxjobn=300 … #!/bin/sh OPFILE=$1 ... … Pheno type covar program analysis run Bmi qt age,sex SASGLM mixed YES Obesql NA SASGLM gee YES HD ql age SASGLM gee NO Age … Sex … … Program language location Maintainer SASGLM SAS /dsg1/code/sas/glm.sas Q.Zhang GSTAT R /dsg1/code/R/gstat.RQ.Zhang MMAPC /dsg1/code/sas/mmap.sh J. Czajkowski …