1 / 17

R for Data Analysis and Data Mining

R for Data Analysis and Data Mining. Jianping Liu Mar 19, 2014. Outline. R and RStudio installation Basics of R : data types and operators R for Statistical Analysis and Data mining. What is R?.

creda
Download Presentation

R for Data Analysis and Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R for Data Analysis and Data Mining Jianping Liu Mar 19, 2014

  2. Outline • R and RStudioinstallation • Basics of R : data types and operators • R for Statistical Analysis and Data mining

  3. What is R? • “a language and environment for statistical computing and graphics”; a combination of statistical packages ( interactive statistical analysis) and a programming language • a dialect of the S language that was developed at AT&T Bell Laboratories by Rick Becker, John Chambers and Allan Wilks in 90’s • Run on multiple platforms and various devices: MacOS, Windows, Linux, PC, iPhone … • Frequent releases and bugfix; active development • Free

  4. Installation of R and Resources online • http://www.r-project.org/ # R download & installation • http://cran.r-project.org/doc/manuals/R-intro.html • http://www.rstudio.com/ # RStudio installation # web-based R search • http://www.rseek.org/ # Stat analysis examples • http://www.ats.ucla.edu/stat/r/ # data mining examples • http://www.rdatamining.com/ • http://www.coursera.org # R Programming start 4/7/2014

  5. RStudio : anintegrated development environment for R

  6. The uses of R • R may be used as a calculator • R provide numerical or graphical summaries of data • R has extensive graphical abilities • R will handle a variety of specific analyses • R is an interactive programming language • Software for Data Analysis: Programming with R (Statistics and Computing) by John M. Chambers (Springer) • S Programming (Statistics and Computing) Brian D. Ripley and William N. Venables (Springer)

  7. Packages • Install.packages(“name of the package”) • library(pkg) • detach(“package:pkg”) • update.packages(“”) Example: install.packages(“sos”) library(sos) Alert: R is case sensitive

  8. Getting help and info • help(package=“sos”) #documentation on topic • ?'&&' • ??audit • help.search("time series") • library(sos) • findFn("time series") • example(data.frame) • demo(lm.glm, package=“stats”, ask=T)

  9. Data Types and Basic Operations • R has five “atomic” classes of Objects: • Character • Numeric (real numbers) • Integer • Complex • Logical(True/False) • The most basic object is a vector • A vector contain objects of the same class : c() • A list can contain objects of various classes: list()

  10. Data Types and Basic Operations • Matrices are vectors with a dimension attribute. • The dimension attribute is itself an integer vector of length 2 (nrow, ncol) • Matrices are constructed column-wise, or specify row-wise • Factors are used to represent categorical data. • Factors can be unordered or ordered. • One can think of a factor as an integer vector where each integer has a label.

  11. Data Types and Basic Operations Data frames are used to store tabular data • They are fundamental to the use of the R modelling and graphics functions • They are represented as a special type of list where every element of the list has to have the same length • Unlike matrices, data frames can store different classes of objects in each column (just like lists); matrices must have every element be the same class • Data frames are usually created by calling read.table() or read.csv() • Can be converted to a matrix by calling data.matrix()

  12. R for Regression Analysis • Regression analysis is the analysis of the relationship between a response or outcome variable and another set of variables • The relationship is expressed through a statistical model equation that predicts a response variable (also called a dependent variable or criterion) from a function of explanatoryvariables (also called independent variables, predictors, factors, or carriers) and parameters http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf

  13. R for Time series Analysis • Introductory Time Series with R • Time Series Analysis and Its Applications: With R Examples (3rd ed) by R.H. Shumway and D.S. Stoffer. Springer Texts in Statistics, 2011(package: astsa) http://elena.aut.ac.nz/~pcowpert/ts/#RScripts http://www.stat.pitt.edu/stoffer/tsa3/

  14. R Reference Card

  15. Data Mining with Rattle # to install package rattle and load the GUI install.packages("rattle", dependencies = c("Depends", "Suggests")) library(rattle) rattle() • Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R!) • by Graham Williams • http://www.r-project.org/doc/bib/R-books.html

  16. Drawbacks of R • Little support on dynamic or interactive graphics • Objects must generally be stored in physical memory • Functionality is based on consumer demand and user distribution • Not ideal for all situations

  17. Thank you !

More Related