170 likes | 285 Views
Introduction to Exploratory Descriptive Data Analysis in S-Plus. Jagdish S. Gangolly State University of New York at Albany. S-Plus in Unix & MS-Windows. To start S-Plus in Solaris/CDE: Create a directory, say, s. mkdir s Go to that directory cd s Initialise it as a new S-Plus chapter
E N D
Introduction to Exploratory Descriptive Data Analysis in S-Plus Jagdish S. Gangolly State University of New York at Albany
S-Plus in Unix & MS-Windows • To start S-Plus in Solaris/CDE: • Create a directory, say, s. • mkdir s • Go to that directory • cd s • Initialise it as a new S-Plus chapter • splus CHAPTER • Start splus • splus
S-Plus in Unix & MS-Windows • To invoke a graphics window: • Motif() • To invoke the help system (Java based): • Help.start() • To quit S-Plus shell: • Q() or Ctrl-D The S-Plus prompt is >
Simple Structures I: Arithmetic Operators • Arithmetic Operators • *, /, +, and -. • Avoid amguity by using parantheses, eg., (7+2)*3, since 7+2*3=13 and not 27. • Multiplication and division are evaluated before addition & subtraction. Raising to a power (^ or **) takes precedence over everything else.
Simple Structures II: Assignments • Assignments: X <- 3 or 3 -> x or x_3 or x=3 Not a good idea to use underscore for assignment or the equals sign. • To see the value of a variable x: X or print(x) • To remove a variable x: Rm(x)
Simple Structures III: Concatenation • Concatenation: • Used to create vectors of any length > X <- c(1.5, 2, 2.5) > X 1.5 2.0 2.5 > X^2 2.25 4.00 6.25 .c can be used with any type of data
Simple Structures IV: Sequence • Sequence command • Seq(lower, upper, increment) Some examples: seq(1,35,5):1 6 11 16 21 26 31 seq(5,15,1.5): 5 6.5 8.0 9.5 11 12.5 14.0 seq(50,25,-5): 50 45 40 35 30 25
Simple Structures V: Replicate • Replicate command: to generate data that follow a regular pattern: Some examples: rep(8,5): 8 8 8 8 8 rep(“8”, 5): “8” “8” “8” “8” “8” rep(c(0,”ab”),2):“0” “ab” “0” “ab” rep(1:4, 1:4): 1 2 2 3 3 3 4 4 4 4 Rep(1:3, rep(2,3)): 1 1 2 2 3 3 Rep(c(1,8,7),length=5)):1 8 7 1 8
Simple Structures VI: Expressions > X <- seq(2,10,2) > Y <- 1:5 > Z <- ((3*x^2+2*y)/((x+y)*(x-y)))^(0.5) > X 2 4 6 8 10 > Y 1 2 3 4 5 > Z 2.160247 2.081666 2.054805 2.041241 2.033060
Simple Structures VI: Logical Operators • < Less Than • > Greater than • <= Less than or equal to • >= Greater than or equal to • == Equal to • != Not equal to
Simple Structures VII Index Brackets: Square brackets are used to index vectors and matrices. > x <- seq(0,20,10) > x[2] 10 > x[5] NA > X[c(1,3)] 0 20 > X[-1] 10 20
Data Manipulation I: Frames & matrices I • Matrices: two-dimensional vectors (have row and column indices • Arrays: General data structure in S-Plus • Zero-dimensional: scalar • One-dimensional: vector • Two-dimensional: matrix • Three to eight-dimensional: arrays • The data in a matrix must all be of the same datatype (usually numeric datatypes)
Data Manipulation I: Frames & matrices II • The columns in dataframes can be of different datatypes • Lists: The most general datatype in S-Plus
Data Manipulation I: Matrices I • Reading data • S-Plus is very finicky about format of input data • To read a table: • Read.table(“filename”) • The first column must be rownames • The first row must be column names • The top left cell must be empty • Space/tab the default column delimiters • See the example in /db4/teach/acc522/fasb103.txt and play around with it.
Data Manipulation I: matrices II • Read.table and as.matrix(): x <- Read.table(“filename”) as.matrix(x) • Enter data directly: Matrix(data, nrow, ncol, byrow=F) Example: x <- Matrix(1:6, nrow=2, byrow=T) • dim(x): (2 X 3) • Dimnames(x): (NULL)
Data Manipulation I: matrices III • Elements of matrices are accessed by specifying the row and column indices. Example: data <- c(227,8,1.3,1534,58,1.2,2365,82,1.8) dountries <- c(“austria”, “france”, “germany”) variables <- c(“gdp”, “pop”, “inflation”) country.data <- matrix(data,nrow=3,byrow=T) dimnames(country.data)<- list(countries,variables) Country.data[1:2,2:3]:pop and inflation of austria & france
S-Plus Graphics I • To open a graphics window: motif() • You can adjust the color scheme and print options through the drop-down menu on the motif window. • To plot two variables x and y, plot(x,y) Example: (sine curve) plot(1:100, sin(1:100/10))