1 / 25

Lecture 1 Introduction

Advanced Research Skills. Lecture 1 Introduction. Olivier MISSA, om502@york.ac.uk. Aims. Introduce the use of R for advanced statistical analyses beyond " Statistics for Ecologists" . Demonstrate these analyses on a broad range of questions and situations.

Download Presentation

Lecture 1 Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Research Skills Lecture 1 Introduction Olivier MISSA, om502@york.ac.uk

  2. Aims • Introduce the use of R for advanced statistical analysesbeyond "Statistics for Ecologists". • Demonstrate these analyses on a broad range of questions and situations. • Develop your understanding of statistical programming. • Empower you to tackle future analytical challenges on your own.

  3. Aims • Other skills will be developed too. • Produce posters using CorelDraw (graphics package). • Learn how to write a grant proposal.

  4. Learning Outcomes • At the end of the module, you should be able to : • Determine which test to use for significance testing. • Explore the inherent structure of your data through a wide range of multivariate techniques. • Work out which model "best explains" the variable you are interested in. • Produce high quality graphs (ready for publication) using fully R graphical capabilities.

  5. Organisation • Staff • Olivier Missa (OM), module organiser, R sessions om502@york.ac.uk • Emma Rand (ER), R sessionser13@york.ac.uk • Phil Roberts (PTR), CorelDraw session • ptr2@york.ac.uk • Peter Mayhew (PJM), Grant writing session • pjm19@york.ac.uk

  6. Organisation • Structure • 9 theoretical lectures (OM) on advanced stats. • 9 practical sessions (OM & ER) on using R. • 1 practical session (PTR) on CorelDraw. • 1 tutorial session (PJM) on Grant writing.

  7. Organisation • Content • L1 Introduction • L2 – L4Linear Models • L5 – L6 GLMs & Mixed-effects models • L7 Non-Linear Models • L8 – L9 Multivariate Analyses • Each lecture is accompanied by a practical session

  8. Organisation • Assessment • Open Data Analysis exercise, • Written reportwith Introduction, • Material & Methods, • Results, • Discussion. • particular emphasis on justifying the analysesand interpreting the results properly.

  9. What is R ? • "R is a language and environment for statistical computing and graphics" R website • A programming language, actually a dialect of S, which wasdeveloped in the 80s by John Chambers at the Bell Labs. • The Bell Labs then sold S to MathSoft (now Insightful Co.), which developed it further into S-Plus, a commercial Statistical package. • In the 90s, S was rewritten from scratch by two statisticians, Ross Ihaka & Rob Gentleman, from New Zealand. • Since then R has continued to grow in scale and scope and is currently maintained by about 20 people across the globe.

  10. Why use R ? • The Key Benefits : • it'sFreeIt won't cost you a penny ever • OpenHow things are calculated is not hidden • Fully customisableThe user is in full control • Cutting EdgeStats Pros use it to create new techniques • Very Widespread (increasingly so) Thousands of contributors (packages), millions of users • Supported by an international user communityhappy to provide help and assistance

  11. Why use R ? • The Drawback : • Steep Learning Curve • You need to learn the language • You need to know what you are doing (stats)

  12. What is R Good for ? • Absolutely everything (to do with data) • Statistics • Modelling • Programming / Simulations • Graphics(from very simple to complex, 2D, 3D, ...) • Database(simple relational functions) • Bioinformatics (Bioconductor project) • Platform interacting with other Softwares (e.g. Ggobi, WinBUGS, MySQL, GRASS GIS)

  13. Example of a session • > data(volcano) • > dim(volcano) • [1] 87 61 • > volcano • [,1] [,2] [,3] [,4] [,5] [,6] [,7] . . . [,61] • [1,] 100 100 101 101 101 101 101 . . . 103 • [2,] 101 101 102 102 102 102 102 . . . 104 • . . . . . . . . . . . . . . . . . . . . . . . . . . • [87,] 97 97 97 98 98 99 99 . . . 94 • > volcano[1:3,1:3] • [,1] [,2] [,3] • [1,] 100 100 101 • [2,] 101 101 102 • [3,] 102 102 103

  14. > range(volcano) • [1] 94 195 • > mean(volcano) • [1] 130.1879 • > sd(volcano) • [1] 6.902227 7.565538 8.203669 8.735686 . . . • [8] 11.165554 11.735217 12.733854 13.668694 . . . • . . . • > ?sd## help('sd') doesthe same • > sd • function (x, na.rm = FALSE) • { if (is.matrix(x)) • apply(x, 2, sd, na.rm = na.rm) • else if (is.vector(x)) • sqrt(var(x, na.rm = na.rm)) • else if (is.data.frame(x)) • sapply(x, sd, na.rm = na.rm) • else sqrt(var(as.vector(x), na.rm = na.rm)) • } . . .

  15. > sd(as.vector(volcano)) • [1] 25.83233 • > summary(as.vector(volcano)) • Min. 1st Qu. Median Mean 3rd Qu. Max. • 94.0 108.0 124.0 130.2 150.0 195.0 • > volcano.v <- as.vector(volcano) • > dim(volcano.v) • NULL • > length(volcano.v) • [1] 5307 • > 61*87 • [1] 5307 • > volcano.v[1:87] == volcano[,1] • [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE . . . • . . . . . . . . . . . . . . . . . . . . . . • [87] TRUE • > volcano.v[1:61] == volcano[1,] • . . . only three values (out of 61) show "TRUE"

  16. > plot(volcano) not useful, only show that elevation in columns 1 and 2 tend to be correlated

  17. E W • > plot(volcano) • > plot(volcano.v, pch=20) • > hist(volcano, prob=TRUE, • + xlab="volcano elevation (m)") • > x <- seq(90,200,1) • > curve(dnorm(x, mean=mean(volcano.v), • + sd=sd(volcano.v)), add=TRUE) • > shapiro.test(volcano.v) • Error in shapiro.test(volcano.v) : • sample size must be between 3 and 5000 • > smpl <- sample(volcano.v, 5000) • > shapiro.test(smpl) • Shapiro-Wilk normality test • data: smpl • W = 0.9358, p-value < 2.2e-16

  18. > library(nortest)##Package of Normality tests • > ad.test(volcano)## Anderson-Darling • Anderson-Darling normality test • data: volcano • A = 106.2715, p-value < 2.2e-16 • > cvm.test(volcano) ## Cramer-von Mises • > lillie.test(volcano) ## Lilliefors • > pearson.test(volcano) ## Pearson (Chi2) • > sf.test(smpl) ## Shapiro-Francia • > qqnorm(volcano.v) • > qqline(volcano.v, col="red")

  19. > x <- 10*(1:nrow(volcano)) ## 10, 20, ..., 610 • > y <- 10*(1:ncol(volcano)) ## 10, 20, ..., 870 • > image(x, y, volcano)

  20. > x <- 10*(1:nrow(volcano)) • > y <- 10*(1:ncol(volcano)) • > image(x, y, volcano) • > image(x, y, volcano, asp=1)

  21. > x <- 10*(1:nrow(volcano)) • > y <- 10*(1:ncol(volcano)) • > image(x, y, volcano) • > image(x, y, volcano, asp=1) • > image(x, y, volcano, asp=1, • + col = terrain.colors(100), • + axes = FALSE, asp=1)

  22. > x <- 10*(1:nrow(volcano)) • > y <- 10*(1:ncol(volcano)) • > image(x, y, volcano) • > image(x, y, volcano, asp=1) • > image(x, y, volcano, asp=1, • + col = terrain.colors(100), • + axes = FALSE, asp=1) • > contour(x, y, volcano, • + levels = seq(90, 200, by=5), • + add = TRUE, col = "peru")

  23. > x <- 10*(1:nrow(volcano)) • > y <- 10*(1:ncol(volcano)) • > image(x, y, volcano) • > image(x, y, volcano, asp=1) • > image(x, y, volcano, asp=1, • + col = terrain.colors(100), • + axes = FALSE) • > contour(x, y, volcano, • + levels = seq(90, 200, by=5), • + add = TRUE, col = "peru") • > image(x, y, volcano, asp=1, • + col = terrain.colors(100), • + axes = FALSE) • > contour(x, y, volcano, • + levels = seq(90, 200, by=10), • + add = TRUE, col = "peru")

  24. image + contour persp with shading persp • Gallery of other Volcano Graphs surface3d

  25. More Classical Graphs Histogram + Theoretical curve Boxplot Stripchart Pie chart Barplot 3D models

More Related