480 likes | 487 Views
Explore sources of error in microarray experiments, including chip quality, reproducibility, and hybridization quality. Learn methods to evaluate and enhance data quality through self-self hybridization and replicate experiments. Understand spot intensity distributions and variance in spot intensity to improve data quality.
E N D
Quality control issues Overview Chip quality Hybridisation quality Reproducibility NERC/Manchester Array Course
Chip Quality • Sources of error: • Print tips • PCR reactions • Humidity • Contamination • ….. NERC/Manchester Array Course
Error types • Basic problems • Slide background • Batch to batch variation NERC/Manchester Array Course
Chip quality • How to get information on chip quality? • Monitor number of flagged spots • Eyeball the TIF files • Self-self hybridisation • replicates NERC/Manchester Array Course
Microarray data quality NERC/Manchester Array Course
Flagged data is usually poor quality Flag data not included (left) and included (right) NERC/Manchester Array Course
Microarray data quality • Sources of data quality information • Self-self hybridisation microarray chips • In self-self hybridisation microarray chips, the gene expression levels of two same samples are measured from one chip. It is logically that: • The absolute gene expression levels measured from two samples should be the same. • The difference of gene expression level for any gene in two samples should be zero. • If measurements from two channels are not equal, then measurement error exists in experiment NERC/Manchester Array Course
Microarray data quality • Sources of data quality information • Self-self hybridisation microarray chips Let measurements from two channels of a slide be yjR ,yjG j=1,2,…N (N number of genes) • If yjR = yjG for all j • There is no measurement error between two channels • If yjR-yjG varies around zero • Random error exists between two channels • If yjR –yjG varies not around zero • Both random and systematic errors exist between channels • Self-self chips is the most useful information source of data quality of microarray experiment NERC/Manchester Array Course
Microarray data quality • Data quality information from replicate experiments • Replicate experiments are the main source of the information about reproducibility of a experiment. Take two replicates as example and the measurements are: • x1,j,x2,j,y1,j,y2,j; where: j=1,2…N, x, y -- different sample, 1, 2 --first second experiments • Different profiles available for selection: • x1,j- x2,j Channel intensity (difference of same X on different slides) • y1,j –y2,j Channel intensity (differenceof same Y on different slides) • (x1,jy1,j)- (x2,jy2,j)Point intensity • x1,j/ y1,j- x2,j/ y2,jRatio • Log scale are usually employed NERC/Manchester Array Course
Microarray data quality-between channelsExamples of data quality profile extracted from a self-self hybridisation chip NERC/Manchester Array Course
Microarray data quality-between channelsExamples of data quality profile extracted from a self-self hybridisation chip NERC/Manchester Array Course
Microarray data quality-between slidesExamples of data quality profile extracted from two self-self hybridisation chips X=log(Ra/Rb) mean(x)=0.068 Var(x)=0.600 NERC/Manchester Array Course
Microarray data quality-between slidesExamples of data quality profile extracted from two self-self hybridisation chips X=Aa-Ab mean(x)=0.076 Var(x)=0.326 NERC/Manchester Array Course
Microarray data quality-between slidesExamples of data quality profile extracted from two self-self hybridisation chips X=log(Ra/Rb) mean(x)=0.017 Var(x)=0.308 NERC/Manchester Array Course
Microarray data qualityExamples of data quality profile extracted from two self-self hybridisation chips NERC/Manchester Array Course
Microarray data qualityExamples of data quality profile extracted from two Ref.-Treatment hybridisation replicated chips NERC/Manchester Array Course
Microarray data quality • Conclusions • Both systematic error and random noise are observed between channels of a microarray chips. • Both systematic error and random error is observed between slides • Log(ratio) is the least noisy data NERC/Manchester Array Course
Hybridisation quality • Is there enough cDNA • Has the labelling worked? • Has it worked as expected for that species? NERC/Manchester Array Course
Spot intensity distributions • Is there a generic form • Can an understanding of the generic form • help in QC NERC/Manchester Array Course
Spot Intensity Distribution - 1 • Asymmetric, Heavy Tail. • Most spots have small intensity. Few have high intensity. NERC/Manchester Array Course
Spot Intensity Distribution - 2 • Logged data distribution is symmetric. • Logged data approximated by a Normal in central region. NERC/Manchester Array Course
Example Data Sets NERC/Manchester Array Course
Spot Intensity Distribution 3 • Characterize width of distribution by variance 2 • unaffected by simple normalization schemes, e.g. mean, median centering of log values. • Study variation of 2 between samples and between species NERC/Manchester Array Course
Var( log spot intensity) • 2 increasing with genome size (no. of genes) • Is this trend truly biological ? NERC/Manchester Array Course
Characterising the distribution • Microarray data obeys Benford’s law • P(D) = log10(1+D-1) NERC/Manchester Array Course
Statistics • Calculate fit to Benford’s law • Monitor the distribution width • (see Practical) NERC/Manchester Array Course
Experimental design Controlling variation Has the experiment worked Optimising the design NERC/Manchester Array Course
Experimental design Microarray experiment and its aims • Microarray experiments have multiple sources of variation which include the interesting variation (biological based) and other variations (non-biological based) • Microarray experiments target at the identification and measurement of biological based variation: • Normal vs. abnormal • General condition vs. extreme condition • Controls vs. treatments • Treatment vs. other different treatment • Or time series NERC/Manchester Array Course
Variation in microarray experiment • Microarray data is variable • The variation may arise from the experiment process through: • Extraction of samples; • Chips printing; • Dye; • Hybridisation; • Image processing; • Background handling. NERC/Manchester Array Course
Variation in microarray experimentsFactors related to accuracy and reproducibility of microarray data • Each process of microarray experiment usually involve a number of factors which affect the accuracy and reproducibility of the experiments. However, they can be classified into five categories: • Human; • Equipment; • Samples and chips; • Method, procedure, specification; • Environment; NERC/Manchester Array Course
Task • To maximise the appropriate biologically relevant data in a cost-effective way • Issues • How many repeats • Dye flips • To pool or not to pool? • What is the optimal design of the hybridisations? NERC/Manchester Array Course
Experiment (1) • Response to control variables NERC/Manchester Array Course
The experiment must be stable • Set up experiment • Collect data • Assess data • If stable continue • If not, why not? .. Correct and continue NERC/Manchester Array Course
Experiment 2 • Controlling for noise NERC/Manchester Array Course
Controlling for noise • Look at noise as a function of expression level • Self-hybridisations • Reverse labelling • Same sample, different preps • Different samples, different preps NERC/Manchester Array Course
How many repeats? • t = (m1-m2)/(2s2/N)0.5 therefore (m1-m2) = t(2s2/N)0.5 • Plug in values to get significant fold changes • t = 3, N=2, s2=0.3 then: 5-fold change is significant • t = 3, N=4, s2=0.3 then: 3.2-fold change is significant • t = 3, N=6, s2=0.3 then: 2.5-fold change is significant NERC/Manchester Array Course
Pooling • An issue of cost • Analysis modified • Average log does not equal log average NERC/Manchester Array Course
Experimental design • Y follows • Z=(y1+y2+…+yn)/n follows • z and z specified by: NERC/Manchester Array Course
Mixed tissue or single cell? NERC/Manchester Array Course
Has the experiment worked? • Use principal component analysis to cluster experiments NERC/Manchester Array Course
Experimental designBasics of experimental design • Examples: • (a) Reference – non-reference • (b) Loop • (c) Reference – non-reference plus ref-ref • (d) Modified loop NERC/Manchester Array Course
Basic principles • Make sure the system is behaving • Biological repeats are best • The most important comparisons should be performed on the same chip • Dye-flips are very useful NERC/Manchester Array Course