290 likes | 354 Views
R Lecture 5. Naomi Altman Department of Statistics. Example: Regression. The data are available at http://www.stat.psu.edu/~jls/stat511/homework/body.dat ?read.table body=read.table("body.txt",header=T) plot(body$hips,body$weight) plot(body$waist,body$weight) ?formula
E N D
R Lecture 5 Naomi Altman Department of Statistics
Example: Regression The data are available at http://www.stat.psu.edu/~jls/stat511/homework/body.dat ?read.table body=read.table("body.txt",header=T) plot(body$hips,body$weight) plot(body$waist,body$weight) ?formula lm.out=lm(weight~hips+waist,data=body) attributes(lm.out)
Formulas lm fits the regression of Y on a set of X variables. The variable for Y and the predictors are denoted by a formula of the form. You can also use formulas in other contexts. e.g. plot(weight~waist, data=body)
Object Oriented Programming in R or how a bunch of smart programming types made R easier to use and harder to program - at least in the eyes of a statistician
In the bad old days If I wanted to write a function similar to something already in R, I would edit the R code: myFun=edit(Rfun) myDensity=edit(density) Sometimes the R code would call a C or C++ program, but the code for that is also available.
But now ... plot boxplot rnorm
Classes and Generic Functions I have already mentioned that one of the attributes a R object can have is a class. A generic function is a function that captures the class of an object and then calls another function to do the actual work. If the function is called fun and the class is called cls, the function that does the work is (almost always) called fun.cls. If there is no suitable fun.cls, then fun.default is used.
e.g. plot(body$hips,body$weight) plot(lm.out) plot.default plot.lm methods(plot)
Classes Actually, a class can be a pair c("first","second") in which the "first" "inherits from" i.e. is a special case of "second". In practise, this means that it has all the components of class "first" objects but possibly some additional ones. If there is no fun.first, then the generic function will search for fun.second. Only if there is also no fun.second will fun.default be used.
e.g. plot uses plot.lm on an object with class "lm" and also on an object with class ("glm","lm")
'inherits' indicates whether its first argument inherits from any of the classes specified in the 'what' argument glm.out=glm(weight~hips+waist,data=body) class(glm.out) "glm" "lm" inherits(lm.out,"lm") inherits(glm.out,"lm") inherits(lm.out,"glm") inherits(glm.out,"glm") plot.lm plot.glm plot(glm.out)
unclass If you remove the class, most objects are just lists. lm.out unclass(lm.out) For example, the "lm" objects are lists with the following components: "coefficients" "residuals" "effects" "rank" "fitted.values" "assign" "qr" "df.residual" "xlevels" "call" "terms" "model" Some of these components are obvious. Some of them are matrix computations that can be used to compute, e.g. the leverages and Cook's Distance (notice that these have not been stored). Some of them are only empty - they are used primarily when the predictor variable is a factor (ANOVA).
Why use classes For the user: less to think about e.g. you can try generic functions like plot and summary with any output For the programmer: provides a framework e.g. you might think about having a plot.myfun and summary.myfun for the function you are writing also, you can use inheritance so that you do not need to write your own functions
Generic Functions Functions that act on many different types of objects are termed "generic functions". Examples include: plot print summary coefficients anova residuals
Generic Functions We have already seen that generic functions behave differently for different classes. The idea is that the user should not have to remember a lot of different function names. Generic functions are a "good thing" when you want R to do what someone else thinks it should do and can be a "bad thing" when you are trying to do something else with your data.
Generic Functions The form of the generic function "genfun" is genfun=function (object, ...) { UseMethod("genfun") }
Generic Functions We can use UseMethod to give aliases to the same function. genfun=function (object, ...){ UseMethod("genfun")} gen=function (object, ...){ UseMethod("genfun")} gfun=function (object, ...){ UseMethod("genfun")}
Generic Functions If you want an argument other than the first to be the one whose class controls the generic function, then the name of the argument must be sent to UseMethod genfun=function(x,y,z,...){ UseMethod("genfun",z) }
Generic Functions If UseMethod finds that the calling object inherits from a class, it searches for a function "genfun.class". If there is no function that matches the class, it looks through the inheritance list. If there is no match, or no class, the function "genfun.default" is used.
Generic Functions There is a lot more on this in the "S Poetry" manual - it looks very complete to me. I have been writing programs in S/R since 1981, and have not needed to create classes or methods but ...
Generic Functions I have often used an existing function to create new functions - I have been confused by failing to understand generic functions (especially "summary" and "print"). One way to become well-known is to distribute your methodology as an R package. To be distributed from CRAN or other project repositories, your package must adhere to R programming standards.
Generic Functions Some of the newer packages (particularly packages for bioinformatics) rely heavily on the use of Generic Functions, and you can never understand what they are doing without understanding at least the basics of this material.
Slots I was not able to find an intuitive definition for "slot" so this is my own heuristic. An object is a list with a class. A slot is a function that extracts data from an object. It may be one of the elements stored in the object, or a derived data element.
Slots For example: an lm object includes the list: "coefficients" "residuals" "effects" "rank" "fitted.values" "assign" "qr" "df.residual" "xlevels" "call" "terms" "model" We might build a new class, "Elm" (extended "lm")
Slots Suppose we wanted to write a method that draws a histogram of any of dependent variable, residuals, studentized residuals, fitted values. We could have a method of the form: hist.Elm=function(object,slot) Our slots would be: dependent, residuals, student, fitted
Slots If we set class(lm.out)=c("Elm","lm") then hist(lm.out,residual) would extract the residuals from the list and draw the histogram. hist(lm.out,student) would compute the studentized residuals (which are not stored) and draw the histogram.
Slots By convention, the slots of an object can be extracted either by: objectname@slotname or slotname(objectname)
Slots Again, I have used S/R for many years without writing or even encountering slots. But some of the recent packages use this programming concept, so it is important to understand it. My understanding is that slots are used primarily in areas like data-mining and microarrays, where the data storage requirements are large.
Learning to Use Objects and other Extensions Calling C or C++ from R: Writing R extensions Object oriented programming in R (S3 protocol) R Language Definition (S4 protocol) R Internals