1 / 32

An overview of Microarray Technology and Data Analysis

An overview of Microarray Technology and Data Analysis. Basic Data Analysis. The Illumina Beadarray Technology. Highly redundant (~50 copies of a bead) 60mer oligos Each array is deconvoluted using a colour coding tag system Human, Mouse, Rat, Custom. Affymetrix Technology.

Download Presentation

An overview of Microarray Technology and Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An overview of Microarray Technology and Data Analysis Basic Data Analysis

  2. The Illumina Beadarray Technology • Highly redundant (~50 copies of a bead) • 60mer oligos • Each array is deconvoluted using a colour coding tag system • Human, Mouse, Rat, Custom

  3. Affymetrix Technology • Highly redundant (~25 short oligos per gene) • PM-MM oligo system valuable for cross hybe detection • Human, Mouse, E. coli, Yeast…….. • Affy and illumina arrays have been systematically compared

  4. Spotted Arrays • Low redundancy • cDNA and oligo • Cy5/Cy3 dye • Cost and custom

  5. Worked Example: illumina data • Data contains 36 experiments by 47294 genes. Raw data extracted using Beadstudio. • Quality controlled in “R” package. Removed unexpressed genes using the Beadstudio Detection P-value. Leaves ~28,000 genes. • Quantile Normalised data, and quality controlled the normalisation in maCorrPlot “R” package. • Clustered using Hierarchical methods

  6. BeadArray Quality Control Primarily look at hybe controls (internal spikes) and the housekeeping genes. Stringency should be greater than 3-fold. Hybridisation Controls == Stringency ==

  7. The free R-stats package A massively powerful program with hundreds of plugins BUT requires a LARGE investment to learn. Some good web resources: Bioconductor Gives you access to good free Affy analysis tools

  8. Raw Data from Beadstudio Use the P-detection QC tool in Beadstudio2 or use the R code: >inds = apply(dat[,c(F,T)],1,function(x) any(x>=0.99)) >dat.present <- dat[inds,c(T,F)] Signal P-value column Normalisation in BeadStudio is also an option

  9. Normalisation • Why? • Remove chip to chip variation • Many different methods • A) Normalisation to the mean (old school) • B) Intensity-dependent normalisation • -to rank invariant genes (housekeeping) • -Quantile normalisation

  10. Boxplots showing raw data for 36 chips: 3 bad? >boxplot(log(dat.present)) Outliers 75% quartile Median 25% quartile

  11. After QC for low confidence genes (P<0.99) Note: ~50 replicate beads per array Outliers 75% quartile Median 25% quartile

  12. The effect of quantiles Normalisation on the filtered 36 data sets >library(affy) >Qdata <- normalize.quantiles(Rawdata)

  13. Judging the success of normalisation: maCorrPlot >library(maCorrPlot) >corrA.raw = CorrSample(mat.present_raw, np = 1000, seed = 1234) >plot(corrA.raw, main = "6-8 Quantiles") >dev.print (device=pdf, file = "6-8 Quantiles.pdf") One round of quantiles normalisation works well

  14. Looking for patterns in the data using correlation coefficients Diagonal Block of similar Samples

  15. Non Negative Matrix Factorisation Maths for the real world -image analysis -text analysis Works very well with array data Compares using small areas of change

  16. NMF: cancer classification etc Good way to visualize large data sets

  17. GeneSpring • Shared Resources has a copy which is available via Remote Desktop • High quality software; very carefully put together. Respected, tried and tested. • Good user friendly statistics

  18. Core GeneSpring functions • Drag and drop data table • Remove low expressing genes • Define replicates and groups • ANOVA • Expression across Pathways

  19. KEY FUNCTION: Experiments > Experiment parameters You must define the replicates in experiment parameters

  20. Experiments > Experiment Interpretation

  21. Filtering>Filter on Volcano Plot Plots most robustly changed genes P-value Fold Change

  22. Multiple 1-way ANOVA

  23. Pathways in GeneSpring View all data in parallel across pathway Clicking takes you to the NCBI

  24. The Free GeneSet Enrichment Analysis (GSEA) Program • where single-gene analysis finds little similarity between two independent studies, GSEA reveals many biological pathways in common • GSEA has a database of 1,325 biologically defined gene sets

  25. GSEA is supervised

  26. Make *.gct and *.cls files

  27. Monitoring Transcription Factor Regulons across cell types Network analysis

  28. NextBio: Comparing to all available data Query Biogroup (geneset) Your Data Uploaded NextBio Data 30,000 arrays Query Against

  29. Results of Query against all Biogroups Drill down to lists>individual genes>NCBI

  30. Dividing Biological Space

More Related