360 likes | 585 Views
R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task. However, some “quick & dirty” commands are useful to know for when all the “better” options aren’t working. R Packages. What is an R package?
E N D
Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task. However, some “quick & dirty” commands are useful to know for when all the “better” options aren’t working
R Packages • What is an R package? • A series of programs bundled together • Once installed a copy of the package lives on the computer and doesn’t need to be reinstalled • Updating R • Must reinstall packages • May loose packages that aren’t kept updated
Loading Package/Contents • To load a package • library(package name) • Contents of package • library(help= package name) • For additional documentation • http://cran.r-project.org/ • PackagesPackage Name Downloads: Reference Manual • Note: Some packages may overwrite the contents or functions in another package, when this happens it will be indicated in the log
Advanced: Loading Packages • To find out what packages are already installed on a computer • installed.packages() • To check if a given package is installed • is.installed <- function(mypkg) is.element(mypkg, installed.packages()[,1]) • To install a package without clicking through windows • Install.package(“Package Name”) • These last two commands are particularly helpful when writing functions for other users
Functions within a Package • To get help • ?FunctionName • ??Topic of Interest • To see the source code • Function Name • To see an example • example(Function Name)
help(topic) ?topic help.search(“topic”) ??topic str() ls() dir() history() library() library(help=) rm() rm(list=ls()) example() setwd() source() function Getting Started: Loading Files
Data Manipulation: Data Entry • Types of Data • Numerical, categorical, logical, factors • mode(variable) • Formats of Data • Scalar, vector/array, matrix, data frame, list • Ways to enter data • Manually • read.csv,read.table,scan • library(foreign) • library(Hmisc)
Importing from SAS • Option One: • In SAS proc export DATA=file DBMS=CSV OUTFILE=“destination\name.csv"; run; • In R • read.csv()
read.csv() • Syntax • read.csv(file, header = TRUE, sep = ",“, dec=".", fill = TRUE,...) • File: the name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an absolute path, the file name is relative to the current working directory, getwd(). File can also be a complete URL. • Header: a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the first row contains one fewer field than the number of columns. • Sep: the field separator character. Values on each line of the file are separated by this character. If sep = "" (the default for read.table) the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns. • Dec: the character used in the file for decimal points. • fill :logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added. See ‘Details’. • Additional Options available, see documentation • Note: If you’re desperate to read in an unusual data type see “scan”
.RData • The extension .RData is a way to store objects created in R. • Store using the command save(c(object1, object2),file=“Storage.RData”) • Access later using load( “Storage.RData”)
Advanced: Reading Data directly from SAS or STATA • SAS Option Two: • In SAS • libname library xport =“destination\name.xpt"; • data library.data; • set data; • run; • In R • library(Hmisc) • data<-sasexport.get(“destination\name.xpt“) • STATA • library(foreign) • NOTE: THE PACKAGE FOREGIN CAN HANDLE MULTIPLE FILE TYPES INCLUDING SAS • data.stata<-read.dta(“file.dta")
c(…) seq(from,to) rep(x,times) data.frame() list() matrix() Data Entry • read.dta() • sasxport.get() • read.csv() • data() • data(R DataSet) • help(R DataSet) • load()
mode() is.character() is.numeric() is.logical() is.factor() class() is.matrix() is.data.frame() names() head() tail() length() dim() nrow() ncol() is.na() dimnames() rownames() colnames() unique() describe() levels() Data Information
Data Manipulation • It is possible to access subsets of a data item using bracketed commands. (e.g. x[n] ) • Options to do this includes the everything but command (x[-n]), multiple selections (x[1:n] or x(c(1,2,3)]) • Logical Arguments can also be used (x[x > 3 & x < 5]) • Lists use a double bracketing structure ( x[[n]] ) • Data frame items can be called using two formats • x[[“name”]] • x$name • Anything with row and column data uses a double structure to index (x[ i , j ])
as.numeric() as.logical() as.character() as.array() as.data.frame() as.matrix() factor() ordered() t() reshape() cat() rbind() cbind() merge() sort() order() library(reshape) rownames()<-c() colnames()<-c() na.omit() cut() Data Manipulation
nchar() substr() tolower() toupper() chartr() grep() match() %in% pmatch() charmatch() sub() strsplit() paste() Sys.time() Sys.Date() date() as.Date as.POSIXct() Character & Time Based Data
ftable() format() paste() xtable() write.table(data,"clipboard",sep="\t",col.names=NA) write.csv() write.foreign() write.dta sink() save() print() save.image() Data Export
format() • Syntax • format(x, trim = FALSE, digits = NULL, nsmall = 0L, justify = c("left", "right", "centre", "none"), width = NULL, na.encode = TRUE, scientific = NA, big.mark = "", big.interval = 3L, small.mark = "", small.interval = 5L, decimal.mark = ".", zero.print = NULL, drop0trailing = FALSE, ...) • X: any R object • Trim: logical, if FALSE numbers are right-justified to a common width, If TRUE the leading blacks for justification are suppressed. • Digits: how many significant digits should be used. • justify: character, vector should be left-justified, right-justified, or centered. • See also • format.Date,(methods for dates) • format.POSIXct (date-times)
Advanced Packages to try • gtools • reshape • Journal of Statistical Computing • http://stat-computing.org/ • Journal of Statistical Software • http://www.jstatsoft.org/