220 likes | 367 Views
Introduction to R. Piotr Wolski. Topics. What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to manipulate them Exercises. What Is R?. a programming “environment” – in fact a programming language
E N D
Introduction to R Piotr Wolski
Topics • What is R? • Sample session • How to install R? • Minimum you have to know to work in R • Data objects in R and how to manipulate them • Exercises
What Is R? • a programming “environment” – in fact a programming language • Operated through command line, no point and click • Rather relaxed approach to term GUI – R GUI is in fact an interface to the command line • object-oriented • Freeware • Cross-platform (windows, linux, mac) • Scriptable - thus good to analyse large datasets, • Good with matrices and multidimensional arrays • excellent graphics capabilities • supported by a large user network (you can always ask for help online, or search through mailing list archives) • Contributed packages provide multitude of procedures
Where does R come from? • R started in the early 1990’s as a project by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, intended to provide a statistical environment in their teaching lab. The lab had Macintosh computers, for which no suitable commercial environment was available. • It is based on an earlier statistical programming language called S
Installing R • download from CRAN (Comprehensive R Archive Network) http://cran.r-project.org/ • …follow instructions • This will load R engine, GUI and base packages • Extra packages/libraries can be downloaded and installed from within R (easy), or from CRAN website (not so easy)
R GUI and environment • R GUI offers some administrative options, but all analyses done through command line or scripts • Working directory is where data are stored • Working directory depends on where you invoke R from, but can be changed during session • R session - when you actually start R • Data generated during the session are held in a workspace which can be saved into a file • only one workspace per session • You can import data from other workspace files into current workspace • You cannot see data (objects) unless you command to see them
R command line • You have to type • Basic syntax: • >command [enter] • Two “types” of commands: • >function()[enter] • Runs a function • >object[enter] • Returns the object (prints object contents to the screen) • Since a function in R is also an object: • >function[enter] • will display the function, but won’t execute it! • Up and down arrows will invoke previous/next command • There is also history - list of all issued commands, accessed from menu
Creation of objects • By assignment • “<-” used to indicate assignment > x<-c(1,2,3,4,5,6,7) > x<-c(1:7) > x<-b > x<- “b” > x<- -2 > x<-read.table(“data.txt”) Special case: empty vector: > x<-c()
Naming Convention • Names of objects must start with a letter (A-Z or a-z) • can contain letters, digits (0-9) and periods “.” • case-sensitive mydata different from MyData • Names of objects do not have parentheses > “myData” is a one element vector, and that element is a string > myData is an object, and it can be a vector, array etc. • Balance between length and meaning: X or tmin B or tmin.clim Climatological.mean.of.monthly.minimum.temperature or tmin.clim
Managing workspace • during an R session, all objects are stored in a temporary working memory, or workspace • list objects in workspace > ls() • remove objects from workspace > rm(object) > rm(list=c(“object1”,”object2”)) > rm(list=ls()) • objects that you want to access later must be saved in a workspace file • from the menu bar • from the command line: > save(x,file=“MyData.Rdata”) • To save all the objects: > save.image(“myData.Rdata”) • Previously saved workspace can be loaded with: > load(“MyData.Rdata”)
Managing working directory • All interaction with the permanent data storage – reading files and workspace from, saving to – takes place within working directory • Unless you specify the path explicitly > load(“/data/projects/MyData.Rdata”) > load(“c:\data\projects\MyData.Rdata”) • Working directory can be checked with: > getwd() • Can be changed with: > setwd(“/new/working/directory”) > setwd(“c:\new\working\directory”)
How to get help? • Within R > help.start() Will start manual/help/tutorial in a web browser To display help on given function use: > help(function) or > ?function e.g. help on function mean(): > help(mean) or > ?mean to search help database for a string and return all functions that contain it: > ??string
Other sources: • CRAN website (http://cran.r-project.org/) • Manuals • FAQ • Contributed documents – a mine! • Rseek it: http://rseek.org/
R object types • Vector • Array (with special case: matrix) • Data frame • List • Factor • Function
Vector • A sequence of values (one dimensional) • only one mode (numeric, character, complex, or logical) allowed • can be created using function concatenate: c() > x<-c(1,2,5,2,1) > y<-c(“may”,”june”,”july”,”august”,“september”) • Vector has length: > length(x) • Logical vectors: > b<- c(TRUE,TRUE,FALSE, FALSE, TRUE) > b<- x>4
Working with vectors • select only one element > x[2] • select range of elements > x[1:3] • select all but one element > x[-3] • slicing: including only part of the object using index vector > x[c(1,2,5)] • select elements based on logical operator > x[x>3] > x[y==“july”] • Inverting a vector > x[10:1] > x[length(x):1]
Working with vectors • Create sequence of numbers > seq(10) > seq(1, 10) > seq(1, 100,5) • Repeating elements of a vector > rep(seq(3), 10) • Repeating elements of a vector in a different way > rep(seq(3), each=10)
R magic - vector arithmetic • Arithmetic operations are performed on EACH value of vector
R magic - vector arithmetic • Vector operations are performed element by element
R magic - vector arithmetic • R recycles vector elements
Vector functions • Basic statistics
Exercise 1: • Create a sequence from 1 to 100. • Create the following sequence: 99, 96, 93, …0 • Create a sequence of values 1,2,…12, repeated 10 times • Create a vector of 30 number (just use a “random number generator” in your head;-) • Calculate its mean, standard deviation, variance, minimum, maximum and sum of all values • Calculate median and 5th percentile • Calculate minimum value of the first half of the vector (i.e. of first 15 values), and of the second half (i.e. of the last 15 values) • Select every second value from that vector, and calculate their mean value