210 likes | 353 Views
Microarray Analysis. Jesse Mecham CS 601R. Microarray Analysis. It all comes down to Experimental Design Preprocessing Data Analysis. Experimental Design. Elimination of confounding factors Same cell line, minimal exposure Timing of sampling Technological considerations
E N D
Microarray Analysis Jesse Mecham CS 601R
Microarray Analysis • It all comes down to • Experimental Design • Preprocessing • Data Analysis
Experimental Design • Elimination of confounding factors • Same cell line, minimal exposure • Timing of sampling • Technological considerations • Hybridization considerations • Chip/tag selection
Slide to Data Gene Value D26528_at 193 D26561_cds1_at -70 D26561_cds2_at 144 D26561_cds3_at 33 D26579_at 318 D26598_at 1764 D26599_at 1537 D26600_at 1204 D28114_at 707
Preprocessing • Data import • Background adjustment • Normalization • Summarization of multiple probes per transcript • Quality control
Data Import • Incorporate various file formats into desired data formats • Different vendors have different representations • Sometimes desired data is not provided
Background Adjustment • It all comes down to one word…noise • Optical distortion • Non-specific hybridization • Equipment damage
M vs. A • M represents differential ratio M = (log R – log G) • A represents the fluorescence intensity A = (log R + log G)/2 • Desirable transformation would show uniform distribution of differential across intensities
Normalization • Normalization between samples needs to be established for a variety of reasons • Different reverse transcription efficiency levels • We are using PCR to amplify in separate plates • Hybridization inequalities • Variations in solution used in hybridization reaction • Spatial abnormalities between plates • Particularly apparent for in-house plates
Summarizing Data • Process of reducing the various samples into an analysis • The crux of microarray analysis • Can apply a linear or a non linear model using any of the following techniques • Support Vector Machines (SVM) • Neural Networks • Empirical Bayes
Quality Control • Concerned with accuracy and reproducibility • Dr. Piatetsy-Shapiro (last week’s colloquium) was primarily concerned with this area of microarray analysis • Detection of errors (x-validation) • Isolation and validation of significant results • Corrective behavior
Time for Fun • Dataset • ApoAI.RData • The apolipoprotein AI (ApoAI) gene is known to play a pivotal role in high density lipoprotein (HDL) metabolism. Mice which have the ApoAI gene knocked (KO) out have very low HDL cholesterol levels. • Puprose is to determine how ApoAI deficiency affects the action of other genes in the liver • Help determine what molecular pathways ApoAI operates on
Markers • All mRNA data from both knockout and wild-type were marked GREEN • KO and WT are marked RED • Oftentimes, both populations are run on same plate with one being marked RED and the other marked GREEN
Rwww.r-project.org • “S”-like GNU project language and environment for statistical computing • Great free package for linear and non-linear statistical modeling • Also includes: • an effective data handling and storage facility, • a suite of operators for calculations on arrays, in particular matrices, • a large, coherent, integrated collection of intermediate tools for data analysis, • graphical facilities for data analysis and display either on-screen or on hardcopy, and • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.
Bioconductorhttp://bioconductor.org • Open source package for statistical analysis of genomic data • Includes both statistical and graphical tools • Active project with a constant influx of new packages • Does not include more complex analysis tools at this time (SVM’s, etc.)