1 / 29

Hands-on Introduction to R

Hands-on Introduction to R. Why Leaning Programing?. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java, MATLAB, Python, Perl, R and/or Mathematica

austin-york
Download Presentation

Hands-on Introduction to R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hands-on Introduction to R

  2. Why Leaning Programing? • We live in oceans of data. Computers are essential to record and help analyse it. • Competent scientists speak C/C++, Java, MATLAB, Python, Perl, R and/or Mathematica • Data collection and analysis very important in Forensic Science since NAS 2009 • Using the above languages, codes can easily be made available for review/discovery

  3. Getting a computer to do anything useful • All machines understand is on/off! • High/low voltage • High/low current • High/low charge • 1/0 binary digits (bits) • To make a computer do anything, you have to speak machine language to it: 000000 00001 00010 00110 00000 100000 Add 1 and 2. Store the result.Wikipedia

  4. Getting a computer to do anything useful • Machine language is not intuitive and can vary a great deal over designs • The basic operations operations however are the same, e.g.: • Move data here • Combine these values • Store this data • Etc. • “Human readable” language for basic machine operations: assembly language

  5. Getting a computer to do anything useful • Assembly is still cumbersome for (most) humans A machine encoding 10110000 01100001 Assembly MOV AL, 61h Move the number 97 over to “storage area” AL

  6. Getting a computer to do anything useful • Better yet is a more “Englishy”, “high-level” language • Enter: C, C++, Fortran, Java, … • Higher level languages like these are translated (“compiled”) to machine language • Not exactly true for Java, but it’s something analogous…

  7. Getting a computer to do anything useful • Even more “Englishy”and “high-level” are interpreted languages • Enter: R MATLAB, Perl, Python, Mathematica, Maple, … • The “code” of these languages are “interpreted” as commands by a program that is already running • They make many assumptions behind the scenes • Much easier to program with • Much slower than compiled languages

  8. Why ? • R is not a black box! • Codes available for review; totally transparent! • R maintained by a professional group of statisticians, and computational scientists • From very simple to state-of-the-art procedures available • Very good graphics for exhibits and papers • R is extensible (it is a full scripting language) • Coding/syntax similar to Python and MATLAB • Easy to link to C/C++ routines

  9. Why ? • Where to get information on R : • R: http://www.r-project.org/ • Just need the base • RStudio: http://rstudio.org/ • A great IDE for R • Work on all platforms • Sometimes slows down performance… • CRAN: http://cran.r-project.org/ • Library repository for R • Click on Search on the left of the website to search for package/info on packages

  10. Finding our way around R/RStudio Script Window Command Line

  11. Handy Commands: • Basic Input and Output Numeric input x <- 4 variables: store information :Assignment operator x <- “text goes in quotes” Text (character) input

  12. Handy Commands: • Get help on an R command: • If you know the name: ?command name • ?plot brings up html on plot command • If you don’t know the name: • Use Google (my favorite) • ??key word

  13. Handy Commands: • R is driven by functions: func(arguement1, argument2) input to function goes in parenthesis function name function returns something; gets dumped into x x <- func(arg1, arg2)

  14. Handy Commands: • Input from Excel • Save spreadsheet as a CSV file • Use read.csv function • Needs the path to the file Mac e.g.: "/Users/npetraco/latex/papers/data.csv” Windows e.g.: “C:\Users\npetraco\latex\papers\data.csv” *Exercise: basicIO.R

  15. Handy Commands: • Matrices: X • X[,1] returns column 1 of matrix X • X[3,] returns row 3 of matrix X • Handy functions for data frames and matrices: • dim, nrow, ncol, rbind, cbind • User defined functions syntax: • func.name <- function(arguements) { • do something • return(output) • } • To use it: func.name(values)

  16. Handy Commands: • User defined function example: • Compute the intensities of the Planck distribution • Let the user input a Temperature • Let the user input endpoint. Assume it is in nm • Careful here. Make sure wavelength units are consistent with the other constants. • What is the “easiest” thing to do??

  17. First Thing: Look at your Data • Explore the Glass dataset of the mlbench package • Source (load) all_data_source.R • *visualize_with_plots.r • Scatter plots: plot any two variables against each other

  18. First Thing: Look at your Data • Pairs plots: do many scatter plots at once

  19. First Thing: Look at your Data • Histograms: “bin” a variable and plot frequencies

  20. First Thing: Look at your Data • Histograms conditioned on other variables: use lattice package RIs Conditioned on glass group membership

  21. First Thing: Look at your Data • Probability density plots: also needs lattice

  22. First Thing: Look at your Data • Empirical Probability Distribution plots: also called empirical cumulative density

  23. First Thing: Look at your Data • Box and Whiskers plots: range possible outliers possible outliers 25th-%tile 1st-quartile 75th-%tile 3rd-quartile median 50th-%tile RI

  24. Visualizing Data • Note the relationship:

  25. First Thing: Look at your Data • Box and Whiskers plots: Box-Whiskers plots for actual variable values Box-Whiskers plots for scaled variable values

  26. Confidence Intervals • A confidence interval (CI) gives a range in which a true population parameter may be found. • Specifically,(1 – a)×100% CIs for a parameter, constructed from a random sample (of a given sample size), will contain the true value of the parameter approximately (1 – a)×100% of the time. • Different from tolerance and prediction intervals

  27. Confidence Intervals • Caution: IT IS NOT CORRECT to say that there a (1 - a)×100% probability that the true valueof a parameter is between the bounds of any given CI. Take a sample. Compute a CI. Here 90% of the CIs contain the true value of the parameter Graphical representation of 90% CIs is for a parameter: true value of parameter

  28. Confidence Intervals • Construction of a CI for a mean depends on: • Sample size n • Standard error for means • Level of confidence 1- • is significance level • Use to compute tc-value • (1-)×100% CI for population mean using a sample average and standard error is:

  29. Confidence Intervals • Compute a 99% confidence interval for the mean using this sample set: (a/2=0.005) tc = 3.17 Putting this together: [1.52005 - (3.17)(0.00001), 1.52005 + (3.17)(0.00001)] 99% CI for sample = [1.52002, 1.52009] *Try out confidence_intervals.R

More Related