320 likes | 338 Views
An overview of Microarray Technology and Data Analysis. Basic Data Analysis. The Illumina Beadarray Technology. Highly redundant (~50 copies of a bead) 60mer oligos Each array is deconvoluted using a colour coding tag system Human, Mouse, Rat, Custom. Affymetrix Technology.
E N D
An overview of Microarray Technology and Data Analysis Basic Data Analysis
The Illumina Beadarray Technology • Highly redundant (~50 copies of a bead) • 60mer oligos • Each array is deconvoluted using a colour coding tag system • Human, Mouse, Rat, Custom
Affymetrix Technology • Highly redundant (~25 short oligos per gene) • PM-MM oligo system valuable for cross hybe detection • Human, Mouse, E. coli, Yeast…….. • Affy and illumina arrays have been systematically compared
Spotted Arrays • Low redundancy • cDNA and oligo • Cy5/Cy3 dye • Cost and custom
Worked Example: illumina data • Data contains 36 experiments by 47294 genes. Raw data extracted using Beadstudio. • Quality controlled in “R” package. Removed unexpressed genes using the Beadstudio Detection P-value. Leaves ~28,000 genes. • Quantile Normalised data, and quality controlled the normalisation in maCorrPlot “R” package. • Clustered using Hierarchical methods
BeadArray Quality Control Primarily look at hybe controls (internal spikes) and the housekeeping genes. Stringency should be greater than 3-fold. Hybridisation Controls == Stringency ==
The free R-stats package A massively powerful program with hundreds of plugins BUT requires a LARGE investment to learn. Some good web resources: Bioconductor Gives you access to good free Affy analysis tools
Raw Data from Beadstudio Use the P-detection QC tool in Beadstudio2 or use the R code: >inds = apply(dat[,c(F,T)],1,function(x) any(x>=0.99)) >dat.present <- dat[inds,c(T,F)] Signal P-value column Normalisation in BeadStudio is also an option
Normalisation • Why? • Remove chip to chip variation • Many different methods • A) Normalisation to the mean (old school) • B) Intensity-dependent normalisation • -to rank invariant genes (housekeeping) • -Quantile normalisation
Boxplots showing raw data for 36 chips: 3 bad? >boxplot(log(dat.present)) Outliers 75% quartile Median 25% quartile
After QC for low confidence genes (P<0.99) Note: ~50 replicate beads per array Outliers 75% quartile Median 25% quartile
The effect of quantiles Normalisation on the filtered 36 data sets >library(affy) >Qdata <- normalize.quantiles(Rawdata)
Judging the success of normalisation: maCorrPlot >library(maCorrPlot) >corrA.raw = CorrSample(mat.present_raw, np = 1000, seed = 1234) >plot(corrA.raw, main = "6-8 Quantiles") >dev.print (device=pdf, file = "6-8 Quantiles.pdf") One round of quantiles normalisation works well
Looking for patterns in the data using correlation coefficients Diagonal Block of similar Samples
Non Negative Matrix Factorisation Maths for the real world -image analysis -text analysis Works very well with array data Compares using small areas of change
NMF: cancer classification etc Good way to visualize large data sets
GeneSpring • Shared Resources has a copy which is available via Remote Desktop • High quality software; very carefully put together. Respected, tried and tested. • Good user friendly statistics
Core GeneSpring functions • Drag and drop data table • Remove low expressing genes • Define replicates and groups • ANOVA • Expression across Pathways
KEY FUNCTION: Experiments > Experiment parameters You must define the replicates in experiment parameters
Filtering>Filter on Volcano Plot Plots most robustly changed genes P-value Fold Change
Pathways in GeneSpring View all data in parallel across pathway Clicking takes you to the NCBI
The Free GeneSet Enrichment Analysis (GSEA) Program • where single-gene analysis finds little similarity between two independent studies, GSEA reveals many biological pathways in common • GSEA has a database of 1,325 biologically defined gene sets
Monitoring Transcription Factor Regulons across cell types Network analysis
NextBio: Comparing to all available data Query Biogroup (geneset) Your Data Uploaded NextBio Data 30,000 arrays Query Against
Results of Query against all Biogroups Drill down to lists>individual genes>NCBI