200 likes | 297 Views
Disease Models and Association Statistics. Nicolas Widman CS 224- Computational Genetics. Introduction. Certain SNPs within genes may be associated with a disease phenotype Statistical model used in class only considers inheritance of a single copy of an SNP location: Single Chromosome Model
E N D
Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics
Introduction • Certain SNPs within genes may be associated with a disease phenotype • Statistical model used in class only considers inheritance of a single copy of an SNP location: Single Chromosome Model • Expand the statistic to a diploid model and take into account different expression patterns of a SNP
Basic Statistic- Haploid Model • : Relative Risk • pA: Probability of disease-associated allele • F: Disease prevalence • For this project, F is assumed to be very small • +/-: Disease State Derivation of case (p+) and control (p-) frequencies: P(A)=pA p+A=P(A|+) p-A=P(A|-) F=P(+) P(A|+)=P(+|A)P(A)/P(+) P(+|A)= P(+|¬A)
Derivation- Continued P(+)=F=pAP(+|A)+(1-pA)P(+|¬A) P(+)=F= pAP(+|A)+(1-pA)P(+|A)/ P(+)=F=P(+|A)(pA+(1-pA)/)=P(+|A)(pA(-1)+1)/ P(+|A)= F/(pA(-1)+1) P(A|+)=P(+|A)P(A)/P(+)=P(+|A)pA/F=pA/(pA(-1)+1) P(-|A)=1-P(+|A)=1- F/(pA(-1)+1) P(A|-)=P(-|A)P(A)/P(-) If F is small, then 1-F ≈ 1 and P(-|A) ≈ 1 then, P(A|-) ≈ P(A) = pA
Haploid Model • The relative risk formula: • Association Power:
Assumptions • Low disease prevalence • F ≈ 0: Allows p-A ≈pA • Uses Hardy-Weinberg Principle • A-Major Allele a-Minor Allele P(AA)=P(A)^2 P(Aa)=2*P(A)*(1-P(A)) P(aa)=(1-P(A))^2 • Uses a balanced case-control study
Diploid Disease Models • When inheriting two copies of a SNP site, there are three common relationships between major and minor SNPs • Dominant • Particular phenotype requires one major allele • Recessive • Particular phenotype requires both minor alleles • Additive • Particular phenotype varies based whether there are one or two major alleles
Diploid Disease Models • AA- Homozygous major • Aa, aA- Heterozygous • aa- Homozygous minor
Modifying the Calculation for Relative Risk • Previous relative risk formula only considered the haploid case of having a SNP or not having a SNP. • Approach: Create a virtual SNP which replaces pA in the formula.
Virtual SNPs • Use Hardy-Weinberg Principle to calculate a new pA - the virtual SNP using the characteristics of diploid disease models. • Recessive pA=pd*pd • Dominant pA=pd*pd+2*pd*(1-pd) • Additive pA=pd*pd+c*pd*(1-pd) • Pd: Probability of disease-associated allele. In the calculations used to determine the association power, c was set to sqrt(2).
Results • Achieving significant association power with low relative risk SNPs (=1.5) • Minimum of 200 cases and 200 controls required to reach 80% power within strongest pd intervals for each type of SNP • At a sample size of 1000 cases and 1000 controls, dominant and additive SNPs show very significant power for almost all SNP probabilities below 50% • Difficult to obtain significant association for low probability recessive SNPs regardless of sample size
Results • SNP probability ranges for greatest association power • Dominant: .10 - .30 • Recessive: .45 - .70 • Additive: .15 - .40 • Higher relative risk SNPs require fewer cases and controls to achieve the same power. • As approaches 1, the association power to detect a recessive allele with probability p is the same as the power to detect dominant allele with probability 1-p.
Results • Diseases with higher relative risk have their range of highest association power skewed toward lower probability SNPs. • Challenges in obtaining high association power: • Low probability recessive SNPs • Low relative risk diseases, especially with small sample sizes • High probability dominant SNPs, however these are unlikely due natural selection and that the majority of the population would be affected by such diseases.