1 / 17

Overview of R & Bioconductor

Learn about the history of the R programming language, its functionalities, and the significance of Bioconductor in genomic data analysis. Discover resources, tools, and packages to enhance your data analysis skills using R. Includes details on Bioconductor, RStudio, documentation, and vignettes.

tcamacho
Download Presentation

Overview of R & Bioconductor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aedín Culhane aedin@jimmy.harvard.edu Overview of R & Bioconductor http://www.hsph.harvard.edu/research/aedin-culhane/

  2. R • Why is it called R? • The name is partly based on the (first) names of the first two R authors and partly a play on the name of the Bell Labs language ‘S • Initially written by Robert Gentleman, & Ross Ihaka, Dept of Statistics, University of Auckland, New Zealand (1996)

  3. Open source, development- flexible, extensible • Large number of statistical and numerical methods • High quality visualization and graphical tools • Extended by a very large collection of rapidly developing packages

  4. Short R History ˆ1991: Ross Ihaka, Robert Gentleman begin work on a project that will become R 1993: The first announcement of R 1995: R available by ftp 1996: A mailing list is started and maintained by Martin Maechler at ETH 1997: The R core group is formed 2000: R 1.0.0 is released

  5. Short R History Continued 2001: Bioconductor for the analysis and comprehension of genomic data using R 2008: The Omegahat project to enable connectivity between R and other languages 2010: Former co-founder and employees of SPSS found Revolution Analytics, a company which offers a commerical package around R. 2011: Rstudio Project provide a free open source integrated development environment (IDE) for R

  6. Jan 2009 Data Analysts Captivated by R’s Power "R is really important to the point that it’s hard to overvalue it,” said Daryl Pregibon, a research scientist at Google, which uses the software widely. “It allows statisticians to do very intricate and complicated analyses without knowing the blood and guts of computing systems.” Nov 10 2010 Names You Need to Know in 2011: R Data Analysis Software "R is rapidly augmenting or replacing other statistical analysis packages at universities"

  7. R R project (2.13 to be released April 2011)‏ Biannual release (normally April, October) Download core and contributed packages from CRAN Link: R Task Views

  8. Bioconductor Biannual release (normally April, October) to coincide with R release. Current: Bioconductor 2.8 (release coincide with R 2.13) To install use script on Bioconductor Website source("http://www.bioconductor.org/biocLite.R") biocLite()

  9. Packages Overview BioConductor web site • Bioconductor BiocViews Task view Software Annotation Data Experimental Data

  10. R Interface • Default R interface • Rstudio • www.rstudio.org • Cross platform, Windows/Mac/Linux • Others • MTinnR, Notepad++, RCMDR, etc

  11. RStudio • 4 windows -Editor, Console, History, Files/plots • Code completion • Easy access to help (F1) • One step Sweave pdf generation • Searchable history • Keyboard Shortcuts • http://www.rstudio.org/docs/using/keyboard_shortcuts

  12. R basics: Getting help • To get help • ?mean • help(mean) • help.search(“mean”)‏ • apropos("mean") • example(mean)‏ • http://www.bioconductor.org/help/

  13. Bioconductor resources • Lots of help available for each software package • Each package MUST contain vignette (howto)‏ • Also use documentation, workshop/course material online • Slides from talks, pdf of tutorials, R code • Feature of Bioconductor - Metadata

  14. Vignettes • A tutorial, frequently provides worked example of package use • Bioconductor documentation requirement • A vignette = executable document consisting of • a collection of documentation text • and code chunks. • Vignettes form dynamic, integrated, and reproducible statistical documents that can be automatically updated if either data or analyses are changed. • Vignettes can be generated using the Sweave function from the R tools package. • The original latex vignette file (.Rnw file)‏

  15. Vignette • Written in Sweave (Leisch, 2002). • Produce dynamic reports in which R code is embedded and executable • LATEX • All R code in vignette is checked (and executed) by R CMD check • http://www.bioconductor.org/docs/vignettes.html library("Biobase") library("GOstats") # Load package of interest openVignette()

  16. Annotation • Provides software for associating data with biological metadata from web databases (eg annotate package). • GenBank, LocusLink and PubMed • Software tools for processing genomic annotation data, from databases (eg GenBank, Gene Ontology, LocusLink, UniGene, AnnBuilder package)‏ • Data packages are distributed to provide mappings between different probe identifiers (e.g. Affy IDs). • BiomaRt software to use biomart to search Ensembl genomes and other marts

  17. What Packages do I need? Specific to you data and analysis pipeline but for examples: • Bioconductor Workshops • Bioconductor Workflows

More Related