R for psychologists

R for psychologists Professor Thom Baguley, Psychology, Nottingham Trent University Thomas.Baguley@ntu.ac.uk

0. Overview 1. What is R? 2. Why use R? 3. R basics 4. R graphics 5. Linear models in R 6. ANOVA and ANCOVA 6. Writing your own R functions 8. Simulation and bootstrapping

1. What is R? R is a software environment for statistical analysis “the lingua franca of computational statistics” De Leeuw & Mair (2007)

2. Why use R? It is: • free • open source • cross-platform (Mac, Linux, PC) It has: • excellent basic statistics functions • powerful and versatile graphics • hundreds of user-contributed packages • a large community of users

3. R basics Some common R objects: • characters (text strings e.g., 'a' or ’1’) • numbers (numbers like 2 or 1e+3) • vectors (1D set of numbers or other elements) • data frames (like vectors organized in columns) • matrices (r by c array of numbers) • lists (objects that contain other objects)

Assignment Input: > vector1 <- 6 > vector1 Output: [1] 6

Arithmetic > 6 + 6 * 2 [1] 18 > vector1^2 [1] 36

Calling functions > c(1, 9, 25) > log(vector1) [1] 1 9 25 [1] 1.791759 > rnorm(1, 100, 15) ?rnorm [1] 123.5336 help(rnorm)

Loading data > dat1 <- read.csv('pride.csv') > library(foreign) > h05 <- read.spss('Hayden_2005.sav', to.data.frame = TRUE)

Addressing the contents of an object > vector1[1] [1] 6 > days <- h05$days > days[1:9] [1] 2864 1460 2921 2921 2921 1460 2921 1460 31

Addressing the contents of an object > vector1[1] [1] 6 > days <- h05$days > days[1:9] [1] 2864 1460 2921 2921 2921 1460 2921 1460 31 Or combine the steps withh05$days[1:9]

Data frames > is.data.frame(h05) [1] TRUE > dim(h05) > names(h05) [1] 43 2 [1] "names" "days” • this data frame has 43 rows and 2 named columns • it can be helpful to think of a data frame as a set of named variables (vectors) bound into columns

Matrix objects (matrices) > cells <- c(3677,56,3924,31) > cat.names <- list(c("Before", "After"), c("Alive", "Dead")) > checklist <- matrix(cells, 2, 2, byrow=TRUE, dimnames= cat.names) > checklist Alive Dead Before 3677 56 After 3924 31

The power of objects > chisq.test(checklist) Pearson's Chi-squared test with Yates' continuity correction data: checklist X-squared = 8.1786, df = 1, p-value = 0.004239 > chisq.test(c(2, 18)) Chi-squared test for given probabilities data: c(2, 18) X-squared = 12.8, df = 1, p-value = 0.0003466

Calling functions: defaults and named arguments > chisq.test(checklist, correct=FALSE) Pearson's Chi-squared test data: checklist X-squared = 8.8072, df = 1, p-value = 0.003000 • the default argument (for matrix input is)correct=TRUE • setting correct=FALSE over-rides the default • because unnamed arguments to functions are matched in order, this argument must be named (otherwise naming the arguments is optional)

Exercise 1 (R Basics) • entering data • working with objects • simple statistical functions

4. R graphics > boxplot(h05$days) n = 43

> hist(h05$days) > plot(density(h05$days))

Distribution functions e.g., Normal distribution dnorm(x, mean = 0, sd = 1) pnorm(q, mean = 0, sd = 1, lower.tail = TRUE) qnorm(p, mean = 0, sd = 1, lower.tail = TRUE) rnorm(n, mean = 0, sd = 1)> rdat <- rnorm(30, 100, 15)

> curve(dnorm(x), xlim=c(-4,4), col='purple') > curve(dt(x, 1), col='red', add=TRUE, lty=3)

Exercise 2 (R Graphics) • exploratory plots • plot parameters • plotting distribution functions • plotting a serial position curve (optional)

5. Linear models in R > expenses <- read.csv('expenses.csv') > t.test(majority ~ problem, data = expenses) Welch Two Sample t-test data: majority by problem t = -3.7477, df = 505.923, p-value = 0.000199 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2044.457 -638.160 sample estimates: mean in group 0 mean in group 1 7092.712 8434.021 Data courtesy of Mark Thompson http://markreckons.blogspot.com/

> expenses <- read.csv('expenses.csv') > lm(majority ~ problem, data = expenses) Call: lm(formula = majority ~ problem, data = expenses) Coefficients: (Intercept) problem 7093 1341

> lm.out <- lm(majority ~ problem, data = expenses) > max(rstandard(lm.out)) [1] 2.629733 > AIC(lm.out) [1] 12674.85 > lm.out$coefficients (Intercept) problem 7092.712 1341.308> ?lm

> summary(lm.out) Call: lm(formula = majority ~ problem, data = expenses) Residuals: Min 1Q Median 3Q Max -8397.0 -3403.5 -260.9 3118.8 11543.3 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7092.7 218.9 32.397 < 2e-16 *** problem 1341.3 357.0 3.758 0.000187 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4395 on 644 degrees of freedom Multiple R-squared: 0.02145, Adjusted R-squared: 0.01993 F-statistic: 14.12 on 1 and 644 DF, p-value: 0.0001871

> glm(problem~I(majority/10000), family='binomial', data = expenses) Call: glm(formula = problem ~ I(majority/10000), family = "binomial", data = expenses) Coefficients: (Intercept) I(majority/10000) -1.0394 0.6878 Degrees of Freedom: 645 Total (i.e. Null); 644 Residual Null Deviance: 855.5 Residual Deviance: 841.6 AIC: 845.6

> lrm.out <- glm(problem ~ I(majority/10000), family='binomial', data = expenses) > plot(election$majority, lrm.out$fitted, col='dark green')

Exercise 3 (Linear models) - using a formula in a call to a model function • linear models • plotting a regression line • generalized linear models (optional) • plotting confidence bands (optional)

6. ANOVA > factor1 <- gl(10,4)> factor 1[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4Levels: 1 2 3 4> library(foreign)> diag.data <- read.spss("diagram.sav", to.data.frame = T)> diag.fit <- lm(descript ~ group, data=diag.data)

Regression model (dummy coding) > diag.fit Call: lm(formula = descript ~ group, data = diag.data) Coefficients: (Intercept) groupPicture groupFull Diagram groupSegmented 17.8 0.5 4.6 9.5

aov() > aov(diag.fit) Call: aov(formula = diag.fit) Terms: group Residuals Sum of Squares 583.7 2440.2 Deg. of Freedom 3 36

anova() > summary(aov(diag.fit)) > anova(diag.fit) Analysis of Variance Table Response: descript Df Sum Sq Mean Sq F value Pr(>F) group 3 583.7 194.567 2.8704 0.04977 * Residuals 36 2440.2 67.783 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Factorial ANOVA and ANCOVA > lm(DV ~ factor1 + factor2 + factor1:factor2, data=data2) > lm(DV ~ factor1 * factor2, data=data2) > lm(descript ~ group + time, diag.data) > lm(descript ~ 0 + group + scale(time, scale=F), diag.data) > diag.ancov <- lm(descript ~ group + scale(time, scale=F), diag.data)

Type I and Type II Sums of squares > drop1(diag.ancov, test = 'F') Single term deletions Model: descript ~ group + scale(time, scale = F) Df Sum of Sq RSS AIC F value Pr(F) <none> 2154.4 169.46 group 3 821.45 2975.9 176.38 4.4484 0.009478 ** scale(time, scale = F) 1 285.78 2440.2 172.44 4.6427 0.038149 * - sequential sums of squares (Type I) tests are given by default • hierarchical sums of squares (Type II) tests are given by the drop1() function [Software such as SAS and SPSS defaults to unique (Type III) SS]

Repeated measures (within-subjects) factors • a weakness in the base R installation, but easily done using packages such as ez, nlme or lme4 (the latter two able to fit a wider range of repeated measures models) > pride.long <- read.csv("pride_long.csv") > pride.mixed <- aov(accuracy ~ emotion*condition + Error(factor(participant)/emotion), pride.long) > summary(pride.mixed) [Software such as SAS and SPSS defaults to unique (Type III) SS]

Multiple comparisons and contrasts • you can run contrasts by changing default contrasts in R • Fisher LSD, Tukey’s HSD and p adjustment (e.g., Hochberg, Holm, Bonferroni, FDR) in R base installation • powerful modified Bonferroni (e.g., Shaffer, Westfall) and general linear contrasts available in multcomp package • flexible contrast and estimable functions in gmodels package

Exercise 4 (ANOVA) - factors • aov() and anova() • ANCOVA • repeated measures (within-subjects) • contrasts and multiple comparisons (optional)

7. Writing your own functions sd.pool <- function(sd.1, n.1, sd.2, n.2 = n.1){ num <- (n.1-1)*sd.1^2+(n.2-1)*sd.2^2 denom <- n.1+n.2-2 output <- sqrt(num/denom) output } > sd.pool(6.1, 20, 5.3, 25) [1] 5.66743

Exercise 5 (Write a function) • pick a simple statistical procedure • write a function to automate it e.g., pooled standard deviation

8. Simulation and bootstrapping • distribution functions > rnorm(100, 10, 1) • R boot package • sample() and replicate()

Bootstrap CIs for a median or timmed mean

The bootstrap (and other Monte Carlo methods) Bootstrapping involves repeatedly resampling with replacement from a data set e.g., Data set = {0,1} 7 simulated samples: {1,0}, {0,1}, {0,0}, {0,0}, {0,1}, {0,0}, {1,1} Bootstrapping effectively treats the sample distribution as the population distribution > replicate(7, sample(x,2,replace=TRUE))

Bootstrapping using R > library(boot) > x1.boot <- boot(samp,function(x,i) median(x[i]),R=10000) > boot.ci(x1.boot) BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 10000 bootstrap replicates CALL : boot.ci(boot.out = x1.boot) Intervals : Level Normal Basic 95% (-0.1747, 0.6948 ) (-0.1648, 0.6614 ) Level Percentile BCa 95% (-0.1538, 0.6724 ) (-0.2464, 0.6620 ) Calculations and Intervals on Original Scale Normal 95% CI for mean (using t) [-2.00, 1.33]

Exercise 6 (Bootstrapping) • resample data • construct a simple percentile bootstrap • run a BCa bootstrap using the bootpackage (optional)

9. Advanced statistical modelling in R Multilevel modeling nlme package lme4 package MCMCglmm package Meta-analysis meta package metafor package … and many, many more other packages

metaforpackage > m.e <- c(10.7, 16.24, 10.03, 3.65, 5.73) > n.e <- c(31, 57, 9, 17, 10) > m.c <- c(2.87, 6.83, 7.18, 4.65, 7.47) > n.c <- c(14, 52, 9, 18, 10) > sd.pooled <- c(7.88, 9.84, 8.67, 6.34, 5.74) > diff <- m.c - m.e > se.diffs <- sd.pooled * sqrt(1/n.e + 1/n.c)

> install.packages('metafor') > library(metafor) > meta.out <- rma(yi=diff, sei=se.diffs, method = 'FE') > meta.out Fixed-Effects Model (k = 5) Test for Heterogeneity: Q(df = 4) = 21.0062, p-val = 0.0003 Model Results: estimate se zval pval ci.lb ci.ub -4.1003 1.0750 -3.8141 0.0001 -6.2073 -1.9933 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> funnel(trimfill(meta.out))

R for psychologists

R for psychologists

Presentation Transcript

Great Psychologists

School Law for School Psychologists

School Law for School Psychologists

Women Psychologists

Psychologists Pathway for Development

Other Psychologists

Influential Psychologists

Psychologists

Behavioural Psychologists

Humanist Psychologists

Clinical Psychologists

Counselling Psychologists

Parenting Coordination: Practice Opportunities for Psychologists

Ethics and Law for school Psychologists

Psychologists Review

SCHOOL PSYCHOLOGISTS

Famous Psychologists

SCHOOL PSYCHOLOGISTS

Famous Psychologists

Supervision for Counsellors and Psychologists

Telehealth Psychologists

Website for Psychologists | LifeHetu