AffyDEComp: towards a benchmark for differential expression methods

AffyDEComp: towards a benchmark for differential expression methods Richard Pearson School of Computer Science University of Manchester

Overview • Why benchmark DE methods? • The Golden Spike data set • AffyDEComp • Conclusions • Recommendations

The need for benchmarks • Microarray analysis has many stages • Competing methods at each stage • Methodologists good at showing superiority • Results can appear contradictory • Confused end users choice driven by… • What they are familiar with • What colleagues use • What was used in their favourite paper • …and not by a scientific comparison

Benchmarking requirements • Methods: a set we wish to compare • Benchmark data: where truth is known • Metrics: by which to compare methods • Affycomp • Methods: Summarisation methods • Benchmark data: various spike-in studies • Metrics: various, including, e.g. area under ROC curve for a fold change classifier • Affycomp doesn’t compare DE methods

A benchmark for DE methods • Methods: • DE methods depend on summarisation • Compare summarisation/DE combinations • Benchmark data: • Affycomp spike-ins have few DE genes • Golden spike data has many DE genes, but also a few “issues”! • Metrics: • Based around areas under ROC curves

The Golden Spike data • 3 “sample”, 3 “control” arrays • Many RNAs “spiked-in” at known levels • “DE”, “Equal” and “Empty” probesets. • Controversial data set • Non-uniform null p-value distributions - use ROC • Spike-in concentrations high - unrepresentative • “DE” spike-ins all up-regulated - unrepresentative • Concentrations and FC confounded - loess • Different FC between “Equal” and “Empty”

“Empty” > FC than “Equal” • Most analyses have treated both Empty and Equal as True Negatives - to what effect?

“Empty” > FC than “Equal” • To illustrate how analysis choices effect results I’ll treat Empty and Equal as true negative (TN) and DE<=1.2 as true positive (TP)

2-sided test • Large apparent difference between methods • Can you guess which paper used this chart?

2-sided test • Large apparent difference between methods • Are TP correctly identified as up-regulated?

1-sided test of up-regulation • Probesets identified as up-regulated not TP

1-sided test of down-regulation • DE probesets are mostly being identified as down-regulated, despite the fact that they are in truth up-regulated We appear to be identifying TP as down-regulated

DE <=1.2 lower than Empty • TP are identified as down-regulated because most TN are “Empty” which have higher FC than DE <= 1.2

Remove “empty” probesets • We can remedy this by using just Equal probesets as our TN… • …bearing in mind that this makes the data somewhat atypical

Up-regulation - Empty in TN • Probesets identified as up-regulated generally not TP when using Empty in TN

Up-regulation - TN Equal • Probesets identified as up-regulated more likely to be TP when using only Equal as TN

Down-regulation - Empty in TN • DE probesets are mostly being identified as down-regulated, despite the fact that they are in truth up-regulated We appear to be identifying TP as down-regulated when including Empty in TN

Down-regulation - TN Equal We generally don’t identify TP as down-regulated when excluding Empty in TN

“Recommended” test • We recommend using just Equal as TN, and all DE as TP

Recommended Up-reg • Using our recommendations, tests of up-regulation generally find TP, as expected

Recommended Down-reg • Using our recommendations, tests of down-regulation generally don’t find TP, as expected

Analysis decisions to make • Summarisation method • DE method • Direction of DE (recommend up) • Choice of true negatives (equal only) • Choice of true positives (all DE) • Post-summarisation normalisation (loess using equal only) • Type of ROC chart (standard ROC) • Proportion of x-axis to display (all)

AffyDEComp - charts

AffyDEComp - comparison

AUCs - recommended choices

Conclusions • First step towards a reliable benchmark for DE • Golden Spike data has some value if use of empty probesets is revisited • Certain combinations of summarisation/DE methods seem poor • Keep it open (Bioconductor) - because science should be reproducible!

Recommendations • Create a new spike-in data set where • Spike-in concentrations are realistic • DE spike-ins both up- and down-regulated • Concentrations and FC not confounded • Larger number of arrays • Benchmarks using regulatory information • Benchmarks for Illumina data • Benchmarks for SNP chips (GWA studies) • manchester.ac.uk/bioinformatics/affydecomp

AffyDEComp: towards a benchmark for differential expression methods

AffyDEComp: towards a benchmark for differential expression methods

Presentation Transcript

Differential and Multistage Amplifiers

Differential Amplifiers (Chapter 8 in Horenstein)

Bac-to-Bac Baculovirus Expression System

Chapter 5 Differential and Multistage Amplifier

Chapter 15

DIFFERENTIAL EQUATIONS

Barron’s AP #2: Methods

ORDINARY DIFFERENTIAL EQUATIONS

Gene Expression - Microarrays

Econ 240 C

Differential Scanning Calorimetry

Chapter 2 Scanning

Microarrays, Expression, and Regulatory Networks

化工應用數學

Chapter 38

Welcome to MB Class

Econ 240 C

2009 Rheumatology Economic Survey

First-Order Differential Equations

Chapter 11: Gene Expression

BENCHMARK #4 REVIEW