140 likes | 347 Views
Introduction to R for Biological Computing. Jeff Krause (Shodor). What is R?. The R Project for Statistical Computing Free, high-level interpreted language for statistical computing and visualization Open-source version of S-plus
E N D
Introduction to R for Biological Computing Jeff Krause (Shodor)
What is R? • The R Project for Statistical Computing • Free, high-level interpreted language for statistical computing and visualization • Open-source version of S-plus • Robust & flexible language that facilitates rapid development of computational science tools • Extensible, with large dedicated user community of prominent researchers • >2000 user contributed packages containing functions and data for specific applications in statistics, data analysis, data mining, visualization & graphing, numerical simulation, optimization, sequence analysis, parallel computing, … • Command line or console interface for base installation • The R Commander: A Basic-Statistics GUI for R • CRAN Rcmdr package page • R-Forge Rcmdr package page
Why R for biology? • It’s free! • Statistical data analysis • Clinical data • Statistical genetics • Bioinformatics • Easily extensible • Committed users have extended it’s capabilities
Introductory R resources • Web sites • The R Project homepage • The R wiki – many great resources including the “Getting Started” site • Contributed documentation page (cran.r-project.org/other-docs.html) • An Introduction to R - • Statistics Using R with Biological Examples • Applied Statistics for Bioinformatics Using R – The authors stated goal is to bridge the gap between • Using R for scientific computing • Ecology and epidemiology in the R programming environment • R and Octave • Matlab/R Reference • Articles • Books • Courses
First steps with R • Downloading and installing • Starting R • Getting help – “?foo”, help(foo) • Help menu – “html help” (“R help” on mac) • Rseek – Google powered search site for all things R • The basics • Assignment • Arithmetic • Vectors, random #’s, sort • time series as vector • Plotting • Matrices and arrays • Loops, scripts • Add-in package installation
Topic specific resources • Introductory & Basic statistics • Books • Introductory Statistics with R - • R package ISwR contains data and functions from the book • Introduction to Probability with R – • A First Course in Statistical Programming with R – Goes through an introduction to the language, programmin and graphics, then works through MCMC simulation, computational linear algebra and numerical optimization • Solutions to selected exercises • Modern Applied Statistics with S - • R package MASS contains data and functions from the book • Bioinformatics & Genomics • Books • Applied Statistics for Bioinformatics Using R – Free 272 mini-text pdf • Bioconductor Case Studies • Bioinformatics and Computational Biology Solutions Using R and Bioconductor • Computational Genome Analysis - • Packages • Bioconductor (www.bioconductor.org) - set of packages for analysis of genomic data • Seqinr
Numerical simulation • Books • An Introduction to Scientific Programming and Simulation, Using R • spuRs – R package containing functions and datasets from the book • Computer Simulation and Data Analysis in Molecular Biology and Biophysics • Describes the use of functions from a variety of R packages including: • Dynamics Models in Biology – Along with their supplemental Lab Manual for working in R • Their supplemental materials page includes resources for building simulations described in the text in both R and MATLAB • Epidemiology, Ecology & pop bio • Books • A Practical Guide to Ecological Modelling – Along with their supplement: “Using R for scientific computing” • Packages
High-performance and parallel computing with R • CRAN task page • Rmpi can be used with the LAM/MPI, MPICH / MPICH2, Open MPI, and Deino MPI implementations • GridR package by Wegener et al. can be used in a grid computing environment via a web service, via ssh or via Condor or Globus • rsprng package by Li, random-number generator for parallel computing