A Microarray-Based Screening Procedure for Detecting Differentially Represented Yeast Mutants

A Microarray-Based Screening Procedure for Detecting Differentially Represented Yeast Mutants Rafael A. Irizarry Department of Biostatistics, JHU rafa@jhu.edu http://biostat.jhsph.edu/~ririzarr

CEN/ARS aatt ttaa URA3 NHEJ Defective A DOWNTAG kanR UPTAG CEN/ARS B URA3 MCS Circular pRS416 EcoRI linearized PRS416 Transformation into deletion pool Select for Ura+ transformants Genomic DNA preparation PCR Cy5 labeled PCR products Cy3 labeled PCR products Oligonucleotide array hybridization

Which mutants are NHEJ defective? • Find mutants defective for transformation with linear DNA • Dead in linear transformation (green) • Alive in circular transformation (red) • Look for spots with large log(R/G)

5718 mutants 3 replicates on each slide 5 Haploid slides, 4 Diploid slides Arrays are divided into 2 downtags, 3 uptag (2 of which replicate uptags) Data

Average Red and Green Scatter Plot

Average Red and Green MVA plot

Improvement to usual approach • Take into account that some mutants are dead and some alive • Use a statistical model to represent this • Mixture model? • With ratio’s we lose information about R and G separately • Look at them separately (absolute analysis)

Histograms

Using model we can attach uncertainty to tests For example posterior z-test, weighted average of z-tests with weights obtained using the posterior probability (obtained from EM) Is Normal(0,1)

QQ-Plot

Uptag/Downtag Z-Scores

Average Red and Green MVA Plot

Average Red and Green Scatter Plot

1 YMR106C 9.5 47 69.2 a a 100 2 YOR005C 19.7 35 44.9 a d 100 3 YLR265C 6.1 32 35.8 a m 100 4 YDL041W 10.4 32 35.6 a m 100 5 YIL012W 12.2 31 21.7 a a 100 6 YIL093C 4.8 29 30.8 a a 100 7 YIL009W 5.6 29 -23.5 a a 100 8 YDL042C 12.9 29 32.1 a d 100 9 YIL154C 1.8 28 91.3 m m 82 10 YNL149C 1.7 27 93.4 m d 71 11 YBR085W 2.5 26 -15.8 a a 84 12 YBR234C 1.7 26 87.5 m d 75 13 YLR442C 6.1 26 -100.0 a a 100 ResultsTable

Siew Loon Ooi Jef Boeke Forrest Spencer Jean Yang Acknowledgements

END

Simple data exploration useful tool for quality assessment Statistical thinking helpful for interpretation Statistical models may help find signals in noise Summary

Acknowledgements Biostatistics Karl Broman Leslie Cope Carlo Coulantoni Giovanni Parmigiani Scott Zeger MBG (SOM) Jef Boeke Siew-Loon Ooi Marina Lee Forrest Spencer PGA Tom Cappola Skip Garcia Joshua Hare UC Berkeley Stat Ben Bolstad Sandrine Dudoit Terry Speed Jean Yang Gene Logic Francois Colin Uwe Scherf’s Group WEHI Bridget Hobbs Natalie Thorne

Warning • Absolute analyses can be dangerous for competitive hybridization slides • We must be careful about “spot effect” • Big R or G may only mean the spot they where on had large amounts of cDNA • Look at some facts that make us feel safer

R1 R2 R3 G1 G2 G3 R1 1.00 0.95 0.95 0.94 0.90 0.90 R2 0.95 1.00 0.96 0.90 0.95 0.91 R3 0.95 0.96 1.00 0.91 0.92 0.95 G1 0.94 0.90 0.91 1.00 0.96 0.96 G2 0.90 0.95 0.92 0.96 1.00 0.97 G3 0.90 0.91 0.95 0.96 0.97 1.00 Correlation between replicates

Correlation between red, green, haploid, diplod, uptag, downtag RHD RHU RDD RDU GHD GHU GDD GDU RHD 1.00 0.59 0.56 0.32 0.95 0.58 0.54 0.37 RHU 0.59 1.00 0.38 0.56 0.58 0.95 0.40 0.58 RDD 0.56 0.38 1.00 0.58 0.54 0.39 0.92 0.64 RDU 0.32 0.56 0.58 1.00 0.33 0.53 0.58 0.89 GHD 0.95 0.58 0.54 0.33 1.00 0.62 0.56 0.39 GHU 0.58 0.95 0.39 0.53 0.62 1.00 0.41 0.58 GDD 0.54 0.40 0.92 0.58 0.56 0.41 1.00 0.73 GDU 0.37 0.58 0.64 0.89 0.39 0.58 0.73 1.00

The mean squared error across slides is about 3 times bigger than the mean squared error within slides BTW

We use a mixture model that assumes: There are three classes: Dead Marginal Alive Normally distributed with same correlation structure from gene to gene Mixture Model

Each x = (r1,…,r5,g1,…,g5) will have the following effects: Individual effect: same mutant same expression (replicates are alike) Genetic effect: same genetics same expression PCR effect : expect difference in uptag, downtag Random effect justification

Does it fit?

Define a t-test that takes into account if mutants are dead or not when computing variance For each gene compute likelihood ratios comparing two hypothesis: alive/dead vs.dead/dead or alive/alive What can we do now that we couldn’t do before?

QQ-plot for new t-test

Better looking than others

1 YMR106C 9.5 47 69.2 a a 100 2 YOR005C 19.7 35 44.9 a d 100 3 YLR265C 6.1 32 35.8 a m 100 4 YDL041W 10.4 32 35.6 a m 100 5 YIL012W 12.2 31 21.7 a a 100 6 YIL093C 4.8 29 30.8 a a 100 7 YIL009W 5.6 29 -23.5 a a 100 8 YDL042C 12.9 29 32.1 a d 100 9 YIL154C 1.8 28 91.3 m m 82 10 YNL149C 1.7 27 93.4 m d 71 11 YBR085W 2.5 26 -15.8 a a 84 12 YBR234C 1.7 26 87.5 m d 75 13 YLR442C 6.1 26 -100.0 a a 100

A Microarray-Based Screening Procedure for Detecting Differentially Represented Yeast Mutants

A Microarray-Based Screening Procedure for Detecting Differentially Represented Yeast Mutants

Presentation Transcript

The Problem of Detecting Differentially Expressed Genes

Generating Test Data for Killing SQL Mutants: A Constraint-based Approach

Detecting Differentially Expressed Genes

Cell cycle genetics— originally from yeast mutants

A Kolmogorov -Smirnov Correlation-Based Filter for Microarray Data

Identifying differentially expressed sets of genes in microarray experiments

A Method for Detecting Pleiotropy

Generating Test Data for Killing SQL Mutants: A Constraint Based Approach

Geneticists  Mutants

DROSOPHILA PAINLESS MUTANTS: A MODEL FOR SCREENING CHEMESTHETIC IRRITANTS

A Gene Selection Method for Microarray Data based on Sampling

A Decision-Making Procedure for Resolution-Based SAT-solvers

Screening Devices for Detecting Collusion

Automatically Detecting Equivalent Mutants and Infeasible Paths

ArrayExpress - a Public Repository for Microarray Based Gene Expression Data

PROSPECTS for detecting a

A Web-based Microarray Experiment Management System

Searching for Differentially Expressed Genes

Procedure-Based Programming

Automatically Detecting Equivalent Mutants and Infeasible Paths

Membrane-based Yeast Two-Hybrid Screening

yeast based spreads market analysis