150 likes | 274 Views
Combining the Power of R and Excel: RExcel A LISA Short Course February 2012. Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student, Statistics. As you come in, please get materials here: https ://filebox.vt.edu/users/lanham/LISA/. Motivation for this course :.
E N D
Combining the Power of R and Excel: RExcelA LISA Short CourseFebruary 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student, Statistics As you come in, please get materials here: https://filebox.vt.edu/users/lanham/LISA/
Motivation for this course: Two Facts • Excel is the most prevalent software used for data storage and analysis. There are a lot of built in statistical functions in Excel along in addition to the “Analysis ToolPak.” • R is a free and open source program, and one of the most powerful and the fastest-growing statistics programs. Why not use them both together!! This is you with Excel This is you with Excel + R Outcome from this course: I hope to have provided you some examples that you might incorporate in your own work that might prove beneficial.
Lets get started: 1) Double-click the RExcel2010 with Rcommander Icon This will open Excel and Rcommander. R commander is like using the standard R GUI, but looks a bit different. You will find R in the Excel Ribbon as well.
Part 1 Transferring data between R and Excel • RExcel Drop-down • Close R – Will close the open instance of R and Rcommander as well • Run Code – Will run R code • Get R Value (Array or Dataframe) – Gets data • Put R Value (Array or Dataframe) – Defines a cell or range for R • Get R Output – Retrieves code output from R to Excel • Set R working dir – Define the folder location you want to work from on your PC. • Load R file – Used to load a data set or .R file • Copy code – copies code in Excel • Debug R – If checked, this will open a debugger if an error occurs • Error log – This will show you all the R errors • Options – Offers a few basic options • Set R sever – allows to select the server type, server name (for remote servers), and R process name (for servers from a serverpool). • RExcel Help – Takes you here: file:///C:/Program%20Files%20%28x86%29/RExcel/doc/RExcel.html • Rhelp – Takes you here: http://127.0.0.1:18357/doc/html/index.html • Rcommander – Opens Rcommander with menus in the Excel Ribbon or in Rcommander. • Demo worksheets – There are five demos for learning how to use the software • Mark calc cells – If activated, this will mark all cells containing calculated results with a special marker in the upper left corner • About RExcel • Data from Excel to R • Data from R to Excel
Part 1 Functions, Arrays, and Dataframes • Advantages • Use Excel as a container for dependencies • Use R code functions without lengthy “IF” statements • Allows automatic recalculations via Excel’s computation engine(R will not do this by itself) See RExcelExamples workbook, Part1 tab
Regression: Excel and R Part 2 Excel • Excel Functions TREND(Y-range, X-range, X-value for prediction) function LINEST(Y-range, X-range, Const, Stats) array function • Excel’s Analysis TookPak Data -> Data Analysis -> Regression -> Then fill in the dialog box (see example sheet) R • Use Rcommander "Statistics" -> "Fit models" -> "Linear regression.." • Use R code via RExcel • Benefits of each: Use what you like and is more advantageous to your problem • The Excel functions automatically update • Analysis TookPak outputs the statistics in a nice readable table • Rcommander has nice drop-down menus • R provides plots that are not easily available via Excel alone • R is more extensible and allows more advanced modeling
Part 2 Regression: Assumption Review Gauss-Markov Theorem Tells us that our OLS estimators (our intercept and slope) are unbiased and have minimum variance among all linear unbiased estimators IF… Two assumptions: (1) Independence => = 0 (2) Equal variance (aka. Homoscedasticity, same finite variance) => To make tests inferences, do statistical tests, and create confidence intervals, we need to assume a third condition: (3) Error is normally distributed => )
Part 3 Regression: Assumption investigation (a.k.a. Diagnostics) • Linear relationship among Sales and Advertising looks fine. • What do you think about our independence assumption? • What do you think about the constant finite variance assumption? • What about normality? • Anything else stand out?
Part 3 Regression: Fit without influential points Here we see our new fitted line, in addition to how well our model performed at estimating sales.
Part 3 Regression: More diagnostics • Linear relationship among Sales and Advertising is fine. • What do you think about our independence assumption? • What do you think about the constant finite variance assumption? • What about normality? • Anything else stand out?
Part 3 Regression: Interpretation The F and Significant F tell us that our slope is statistically significant. Meaning, it is highly unlikely it is 0.
Part 4 Using R commander • Obtain data sets from R libraries or load in your own • Nice drop-down for basic statistics and plots (code prints to R commander window) • Common distributions are available via drop-down See RExcelExamples workbook, Part4 tab
Part 5 Using built-in R commander plug-ins These plots are useful, but somewhat dull. The code that generates these will show up in the R commander window (very useful for newbies). Like plotting in Excel, you can get what you need by default, but you’ll probably have to modify the graph a bit. Lets look at RmcdrPlugin.HH XY conditioning plot (HH) Side-by-side Boxplot
Additional References • http://rcom.univie.ac.at/RExcelDemo/