1 / 20

Getting Data into R & Bioconductor

Aed í n Culhane aedin@jimmy.harvard.edu. Getting Data into R & Bioconductor. http://www.hsph.harvard.edu/research/aedin-culhane/. Simple Excel SpreadSheet data. Already described Read.table() Read.csv() scan() Are other formats eg netcdf However more datatype specialized.

tavon
Download Presentation

Getting Data into R & Bioconductor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aedín Culhane aedin@jimmy.harvard.edu Getting Data into R & Bioconductor http://www.hsph.harvard.edu/research/aedin-culhane/

  2. Simple Excel SpreadSheet data • Already described • Read.table() • Read.csv() • scan() • Are other formats eg netcdf • However more datatype specialized. • Look at Technologies on BiocViews. • http://www.bioconductor.org/packages/release/BiocViews.html

  3. Some common data types Microarray SNP Increasingly NGS

  4. A Microarray Overview

  5. Reading Affymetrix Data library(affy) require(affy) # Alternative affybatch <- ReadAffy(celfile.path="[Location of your data]") eSet<-justRMA()

  6. Sample R code

  7. ExpressionSet Class in R

  8. Assessing Data Quality

  9. Public Microarray Data ArrayExpress 21997 Studies (622,617 profiles,) GEO 22,735 Studies (558,074 profiles) Statistics May 2011

  10. >500,000 arrays x $500 = $250,000,000 Cancer Studies account for >14% of all studies in databases…

  11. R Code

  12. More on GEOquery require(GEOquery) Let's try to load the GDS810 dataset which contains data on Alzheimer's disease at various stages of severity. GDS810<-getGEO("GDS810") The getGEO function returns an object of class GEOData. You can get a description of this class like this: help("GEOData-class") Meta(GDS810) Columns(GDS810) head(Table(GDS810))

  13. Affy SNP Arrays

  14. Process – Affy SNP Arrays (Oligo package)

  15. Other Arrays • Illumina • Lumi package • 2 color spotted arrays • Limma package • Other arrays • http://www.bioconductor.org/help/workflows/oligo-arrays/

  16. Next Generation Sequencing Data

  17. R Code

  18. Exercise From GEO bring down GSE Download the dataset GSE1297 using getGEO This data will be downloaded as an eSet, so to see the expression data and phenoData, use pData and exprs Use ArrayQualityMetrics to Assess the data quality of these data

  19. With thanks to www.bioconductor.org/help/course.../Bioconductor-Introduction-lab.pdf

  20. Quick Aside: Interpreting hierarchical clustering trees Hierarchical analysis results viewed using a dendrogram (tree) Distance between nodes (Scale) Ordering of nodes not important (like baby mobile) A B Tree A and B are equivalent

More Related