410 likes | 1.02k Views
R for Data Analysis and Data Mining. Jianping Liu Mar 19, 2014. Outline. R and RStudio installation Basics of R : data types and operators R for Statistical Analysis and Data mining. What is R?.
E N D
R for Data Analysis and Data Mining Jianping Liu Mar 19, 2014
Outline • R and RStudioinstallation • Basics of R : data types and operators • R for Statistical Analysis and Data mining
What is R? • “a language and environment for statistical computing and graphics”; a combination of statistical packages ( interactive statistical analysis) and a programming language • a dialect of the S language that was developed at AT&T Bell Laboratories by Rick Becker, John Chambers and Allan Wilks in 90’s • Run on multiple platforms and various devices: MacOS, Windows, Linux, PC, iPhone … • Frequent releases and bugfix; active development • Free
Installation of R and Resources online • http://www.r-project.org/ # R download & installation • http://cran.r-project.org/doc/manuals/R-intro.html • http://www.rstudio.com/ # RStudio installation # web-based R search • http://www.rseek.org/ # Stat analysis examples • http://www.ats.ucla.edu/stat/r/ # data mining examples • http://www.rdatamining.com/ • http://www.coursera.org # R Programming start 4/7/2014
The uses of R • R may be used as a calculator • R provide numerical or graphical summaries of data • R has extensive graphical abilities • R will handle a variety of specific analyses • R is an interactive programming language • Software for Data Analysis: Programming with R (Statistics and Computing) by John M. Chambers (Springer) • S Programming (Statistics and Computing) Brian D. Ripley and William N. Venables (Springer)
Packages • Install.packages(“name of the package”) • library(pkg) • detach(“package:pkg”) • update.packages(“”) Example: install.packages(“sos”) library(sos) Alert: R is case sensitive
Getting help and info • help(package=“sos”) #documentation on topic • ?'&&' • ??audit • help.search("time series") • library(sos) • findFn("time series") • example(data.frame) • demo(lm.glm, package=“stats”, ask=T)
Data Types and Basic Operations • R has five “atomic” classes of Objects: • Character • Numeric (real numbers) • Integer • Complex • Logical(True/False) • The most basic object is a vector • A vector contain objects of the same class : c() • A list can contain objects of various classes: list()
Data Types and Basic Operations • Matrices are vectors with a dimension attribute. • The dimension attribute is itself an integer vector of length 2 (nrow, ncol) • Matrices are constructed column-wise, or specify row-wise • Factors are used to represent categorical data. • Factors can be unordered or ordered. • One can think of a factor as an integer vector where each integer has a label.
Data Types and Basic Operations Data frames are used to store tabular data • They are fundamental to the use of the R modelling and graphics functions • They are represented as a special type of list where every element of the list has to have the same length • Unlike matrices, data frames can store different classes of objects in each column (just like lists); matrices must have every element be the same class • Data frames are usually created by calling read.table() or read.csv() • Can be converted to a matrix by calling data.matrix()
R for Regression Analysis • Regression analysis is the analysis of the relationship between a response or outcome variable and another set of variables • The relationship is expressed through a statistical model equation that predicts a response variable (also called a dependent variable or criterion) from a function of explanatoryvariables (also called independent variables, predictors, factors, or carriers) and parameters http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf
R for Time series Analysis • Introductory Time Series with R • Time Series Analysis and Its Applications: With R Examples (3rd ed) by R.H. Shumway and D.S. Stoffer. Springer Texts in Statistics, 2011(package: astsa) http://elena.aut.ac.nz/~pcowpert/ts/#RScripts http://www.stat.pitt.edu/stoffer/tsa3/
Data Mining with Rattle # to install package rattle and load the GUI install.packages("rattle", dependencies = c("Depends", "Suggests")) library(rattle) rattle() • Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R!) • by Graham Williams • http://www.r-project.org/doc/bib/R-books.html
Drawbacks of R • Little support on dynamic or interactive graphics • Objects must generally be stored in physical memory • Functionality is based on consumer demand and user distribution • Not ideal for all situations