Laboratory & Professional skills for Bioscientists Term 2: Data Analysis in R

Laboratory & Professional skills for BioscientistsTerm 2: Data Analysis in R Two sample tests: two-sample t-test, two-sample Wilcoxon (also one-sample Wilcoxon) Practical: one and two sample tests examples, different data formats ggplot.

Overview of topics Also in previous lecture Foundation

Summary • Tests for one-, two-, and paired-samples (t-tests and non-parametric equivalents) • Today • two-sample t-test for independent samples • Wilcoxon for one- and two-samples when t-test assumptions are not met • Choosing appropriate tests: type of question, type of data • Second of two lectures; single workshop

Learning objectives Also in previous lecture By actively following the lecture and practical and carrying out the independent study the successful student will be able to: • Explain dependent and independent samples (MLO 2) • Select, appropriately, t-tests and their non-parametric equivalents (MLO 2) • Apply, interpret and evaluate the legitimacy of the tests in R (MLO 3 and 4) • Summarise and illustrate with appropriate R figures test results scientifically (MLO 3 and 4)

Types of t-test Also in previous lecture • One-sample • Compares the mean of sample to a particular value (compares the response to a reference) • Includes paired-sample test for dependent samples (i.e., two linked measures) • Two-sample • Compares two (independent) means to each other

Two-sample t-tests NEW! • Is there a difference between two independent means • Independent – values in one group not related to values in the other group • Example: is there a significant difference between the masses of male and female chaffinches? NOT LINKED Fringilla coelebs

t-tests Also in previous lecture • Standard formula for all t-tests To aid understanding, not for remembering • d.f.= n1 + n2 - 2 NEW!

t-tests in generalAssumptions Also in previous lecture • All t-tests assume the ‘data’ are normally distributed • 0ne-sample – the data themselves • Paired sample – the differences between the pairs • Two-sample – each variable should be n.d • Additionally the two-sample t-test assumes equal variances

t-tests in general: assumptionsChecking normality Also in previous lecture • Common sense • Data should be continuous • There should not be lots of values the same • figure • Difficult to tell for a small sample • Using a test in R • shapiro.test()

Two-sample t-tests: assumptionsChecking equality of variance NEW! • Common sense • Compare the two values • figures • Difficult to tell for small samples • Using a test in R • var.test()

t-tests in general: assumptionsWhen data are not normally distributed • Transform (normality) • E.g. Log to remove skew, arcsinsquareroot on proportions • Welch’s t if variances non-equal • Use a non-parametric test • Fewer assumptions • Generally less powerful Extension of previous lect

t-testsTwo-sample t-test example NEW! • Example: is there a significant difference between the masses of male and female chaffinches? Fringilla coelebs

t-testsTwo-sample t-test example NEW! Note: these data are ‘tidy’ All the responses in one column with other variables indicating the group chaff <- read.table("../data/chaff.txt", header = T) Organise your data this way

Tidy data Extension of previous lect • Each variable you measure should be in one column. • Each different observation of that variable should be in a different row. • There should be one table for each "kind" of variable. • If you have multiple tables, they should include a column in the table that allows them to be linked. Independent study: Wickham, H (2013). Tidy Data. Journal of Statistical Software. https://www.jstatsoft.org/article/view/v059i10

t-testsTwo-sample t-test example Extension of previous lect • Checking the normality assumption - histograms tapply(chaff$mass, chaff$sex, hist)

t-testsTwo-sample t-test example Extension of previous lect • Checking the normality assumption - testing • tapply(chaff$mass, chaff$sex, shapiro.test) • $females • Shapiro-Wilk normality test • data: X[[i]] • W = 0.96783, p-value = 0.7085 • $males • Shapiro-Wilk normality test • data: X[[i]] • W = 0.96261, p-value = 0.5973

t-testsTwo-sample t-test example NEW! • Checking the equal variance assumption - testing • var.test(chaff$mass ~ chaff$sex) • F test to compare two variances • data: chaff$mass by chaff$sex • F = 0.98788, numdf = 19, denomdf = 19, p-value = 0.9791 • alternative hypothesis: true ratio of variances is not equal to 1 • 95 percent confidence interval: • 0.3910141 2.4958251 • sample estimates: • ratio of variances • 0.9878779

t-testsTwo-sample t-test example Extension of previous lect Run the t-test Manual page: t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95, ...) • t.test(chaff$mass ~ chaff$sex, paired = FALSE, var.equal = TRUE) • Two Sample t-test • data: chaff$mass by chaff$sex • t = -2.6471, df = 38, p-value = 0.01175 • alternative hypothesis: true difference in means is not equal to 0 • 95 percent confidence interval: • -3.167734 -0.422266 • sample estimates: • mean in group females mean in group males • 20.480 22.275

t-testsTwo-sample t-test example Extension of previous lect Reporting the result: “significance of effect, direction of effect, magnitude of effect” Male chaffinches () are significantly heavier than females () (t = 2.65; d.f. = 38; p = 0.012). See figure 1.

t-testsTwo-sample t-test: figure NEW! Parametric tests: use mean and S.E (C.I. or s.d.)

t-testsTwo-sample t-test: figure NEW! What is in the figure? You do not say Graph to show..

t-testsTwo-sample t-test: figure NEW! Error bars of some kind Say what kind!

t-testsTwo-sample t-test: figure NEW! Label with units Always refer to figure in the text

NEW!

ggplot • ‘grammar of graphics’ makes it easy to do many routine tasks (like drawing legends, adding errorbars, setting axis limits etc). • On your own pc you need to install ggplot2 using: install.packages("ggplot2") • You only need to install a package once but you need to load it in to your library each R session. This is done with: library(ggplot2)

ggplot ggplot(data = dataframe, aes(x = yourxvar, y = youryvar) ) + geom_point() • ggplot(data = chaff, aes(x = sex, y = mass)) + • geom_point() • chaffsummary <- summarySE(data = chaff, measurevar = "mass", groupvars = "sex") • ggplot(data = chaffsummary, aes(x = sex, y = mass)) + • geom_bar(stat = "identity")

NEW! When the t-test assumptions are not met: non- parametric tests • Non-parametric tests make fewer assumptions • Based on the ranks rather than the actual data • Null hypotheses are about the mean rank (not the mean)

NEW! Non-parametric testst-test equivalents i,.e., the type of question is the same but the response variable is not normally distributed • one – sample t-test and paired-sample t-test: the one-sample Wilcoxon • Two-sample t-test (next lecture): two-sample Wilcoxon aka Mann-Whitney

NEW! Non-parametric testsone/paired-sample Wilcoxon Let’s suppose that we decided to do a non-parametric test on the marks data > wilcox.test(data = marks, mark ~ subject, paired = TRUE) Wilcoxon signed rank test with continuity correction data: mark by subject V = 48.5, p-value = 0.03641 alternative hypothesis: true location shift is not equal to 0 Warning message: In wilcox.test.default(x = c(97L, 58L, 65L, 65L, 80L, 48L, 85L, : cannot compute exact p-value with ties No need to worry!

Non-parametric testsone/paired-sample Wilcoxon Extension of previous lect Individual students score significantly higher in maths than in statistics (Wilcoxon: V = 48.5; n = 10; p = 0.036). Reporting the result: “significance of effect, direction of effect, magnitude of effect”

NEW! Non-parametric teststwo-sample Wilcoxon (Mann-Whitney) Example: comparing the number of leaves on 8 mutant and 7 wild type plants (small samples, counts)

NEW! Non-parametric teststwo-sample Wilcoxon (M-W): example • Carrying out the test two-sample Wilcoxon • wilcox.test(data = plants, leaves ~ type, paired = FALSE) • Wilcoxon rank sum test with continuity correction • data: leaves by type • W = 5, p-value = 0.008664 • alternative hypothesis: true location shift is not equal to 0 • Warning message: • In wilcox.test.default(x = c(3, 5, 6, 7, 3, 4, 5, 8), y = c(8, 9, : • cannot compute exact p-value with ties No need to worry!

Non-parametric teststwo-sample Wilcoxon (M-W): example There are significantly more leaves on wild-type (median = 8) than mutant (median = 5) plants (Mann-Whitney: W=5, n1=7, n2=8, p = 0.009) Reporting the result: “significance of effect, direction of effect, magnitude of effect” 34

Non-parametric teststwo-sample Wilcoxon (M-W): example Label with units Non-parametric tests: use median+IQR Measure of dispersion - IQR What is the figure? Always refer to figure in the text

Learning objectives for the week By attending the lectures and practical the successful student will be able to • Explain dependent and independent samples (MLO 2) • Select, appropriately, t-tests and their non-parametric equivalents (MLO 2) • Apply, interpret and evaluate the legitimacy of the tests in R (MLO 3 and 4) • Summarise and illustrate with appropriate R figures test results scientifically (MLO 3 and 4)

Laboratory & Professional skills for Bioscientists Term 2: Data Analysis in R