310 likes | 495 Views
Hugh Morgan. Statistical analysis methods. Introduction. Role of statistics Current Methods EuroPhenome Numerical Parameters Categorical Parameters MGP Problems with these methods and alternatives Worked Example. Tasks. Role of statistics.
E N D
Hugh Morgan Statistical analysis methods
Introduction • Role of statistics • Current Methods • EuroPhenome • Numerical Parameters • Categorical Parameters • MGP • Problems with these methods and alternatives • Worked Example. • Tasks.
Role of statistics • To determine the effect of the genomic alteration on the phenotype of the animal • Distinguish effect from substantial multi-factorial noise • Provide an estimate of the confidence in the veracity of the effect
Current Methods • EuroPhenome • Numerical Parameters - Wilcoxon rank-sum test • Categorical Parameters – Fishers Exact or Chi-Squared • p-value threashold: 0.0001 (equivalent to 4% change of a false positive in 400 measured parameters) • Sanger Mouse Portal / MGP • Numerical Parameters – Reference Range • Categorical Parameters – Fishers Exact with absolute change threshold
Do them yourself • All commands are at: • http://mrcmousenetwork.har.mrc.ac.uk/r-commands-mrc-mouse-network-training • Get data: • Akt2, Fat mass, View Data, Get as CSV, Save Page • Install R (if required, google R) • akt2Fat=read.csv("akt2Fat.csv") • summary(akt2Fat) • Wilcoxon rank-sum test • wilcox.test(Value~Genotype, data = akt2Fat) • W = 1, p-value = 6.252e-06 • T Test • t.test(Value~Genotype, data = akt2Fat) • t = -9.5627, df = 23.909, p-value = 1.212e-09
Do them yourself • All commands are at: • http://mrcmousenetwork.har.mrc.ac.uk/r-commands-mrc-mouse-network-training • Get data: • Akt2, Fat mass, View Data, Get as CSV, Save Page • Install R (if required, google R) • akt2Fat=read.csv("akt2Fat.csv") • summary(akt2Fat) • Wilcoxon rank-sum test • wilcox.test(Value~Genotype, data = akt2Fat) • W = 1, p-value = 6.252e-06 • T Test • t.test(Value~Genotype, data = akt2Fat) • t = -9.5627, df = 23.909, p-value = 1.212e-09
Do them yourself • Get data: • Abcd4, Touch escape • R • abcd4Touch=matrix(c(122,9,2,8),2) • Fishers Exact Test • fisher.test(abc4Touch)
Do them yourself • Get data: • Abcd4, Touch escape • R • abcd4Touch=matrix(c(122,9,2,8),2) • Fishers Exact Test • fisher.test(abcd4Touch) Fisher's Exact Test for Count Data data: abcd4Touch p-value = 3.052e-07 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 8.491575 550.552750 sample estimates: odds ratio 50.40908
Sanger Mouse Portal / MGP • Numerical Parameters – Reference Range • Calculate the range of values that encompases 95% of the baseline dataset • Call a line phenodeviant in a parameter if 60% or more of the animals fall outside of that range • Categorical Parameters – Fishers Exact with absolute change threshold • Fishers Exact test gives p-value < 5% AND • Absolute change of proportion > 60%
Sanger Mouse Portal / MGP • Numerical Parameters – Reference Range • Calculate the range of values that encompases 95% of the baseline dataset • Call a line phenodeviant in a parameter if 60% or more of the animals fall outside of that range
Sanger Mouse Portal / MGP • Numerical Parameters – Reference Range • Calculate the range of values that encompases 95% of the baseline dataset • Call a line phenodeviant in a parameter if 60% or more of the animals fall outside of that range • Categorical Parameters – Fishers Exact with absolute change threshold • Fishers Exact test gives p-value < 5% AND • Absolute change of proportion > 60%
Problems with these methods and alternatives • Local structure / Lack of independence • Numerical Parameters - Wilcoxon rank-sum test • Categorical Parameters – Fishers Exact or Chi-Squared • MGP • Numerical Parameters – Reference Range • Categorical Parameters – Fishers Exact with absolute change threshold
Problems with these methods and alternatives • Local structure / Lack of independence • Inter day variance greater than intra day variance • 2 measurements on the same day are likely to be more similar than 2 measurements on different days • Cause • ? • Solution • Model the structure • Linear Mixed Model
Mixed Model • Model data as sum of 2 normal distributions, plus a number of fixed effects • Normally distributed • Inter animal difference • Inter day difference • Fixed • Gender • Other parameters such as Weight • Genomic alteration (Genotype) • Gender / Genotype effect • Calculate p value given that Genotype effect is zero
Do them yourself • Get data: • Ptk7, Grip-Strength, Forelimb grip strength measurement mean, View Data, Get as CSV, Save File • R • ptk7GS=read.csv("ptk7GS.csv") • summary(ptk7GS) Centre Strain Genotype Zygosity Gender Parameter WTSI:29 129/SvEv:29 Akt2 :14 :15 Male:29 Fat mass:29 baseline:15 Hom:14
Do them yourself • Linear Model (no batch effect modeled) • ptk7GSLM=lm(Value~Genotype + Gender + Genotype*Gender, ptk7GS, na.action="na.omit") • summary(ptk7GSLM) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 68.777 2.475 27.794 < 2e-16 *** GenotypePtk7 -14.134 5.891 -2.399 0.01777 * GenderMale 11.454 4.011 2.855 0.00497 ** GenotypePtk7:GenderMale 1.987 8.966 0.222 0.82496 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 20.7 on 136 degrees of freedom Multiple R-squared: 0.1222, Adjusted R-squared: 0.1028 F-statistic: 6.311 on 3 and 136 DF, p-value: 0.0004862
Do them yourself • Look at Fit • ptk7GSLMRes<-residuals(ptk7GSLM) • qqnorm(scale(ptk7GSLMRes))
Do them yourself • Mixed Model • Excel • load ptk7GS.csv • =LEFT(H2,(SEARCH("_",H2)-2)) • Save ptk7GSLitter.csv • R • ptk7GSLitter=read.csv("ptk7GSLitter.csv") • ptk7GSMM=lme(Value~Genotype + Gender + Genotype*Gender,random=~1|Litter, ptk7GSLitter, na.action="na.omit“) • summary(ptk7GSMM)
Do them yourself • Mixed Model • R • ptk7GSLitter=read.csv("ptk7GSLitter.csv") • ptk7GSMM=lme(Value~Genotype + Gender + Genotype*Gender, random=~1|Litter, ptk7GSLitter, na.action="na.omit“) • summary(ptk7GSMM) Linear mixed-effects model fit by REML Fixed effects: Value ~ Genotype + Gender + Genotype * Gender Value Std.Error DF t-value p-value (Intercept) 67.02067 3.377184 85 19.845137 0.0000 GenotypePtk7 -12.05973 7.461470 85 -1.616267 0.1097 GenderMale 12.59607 4.403984 85 2.860154 0.0053 GenotypePtk7:GenderMale 1.42342 8.819061 85 0.161403 0.8722
Do them yourself • Mixed Model • ptk7GSMMRes<-residuals(ptk7GSMM) • qqnorm(scale(ptk7GSLMRes))