Statistical Tools

Statistical Tools Balasubramanian Narasimhan, Data Coordinating Center Joint with Division of Biostatistics Stanford University

Microarray Data Analysis • Number of features greater than number of samples (p >> n) • Traditional statistical algorithms cannot be used • Simple, easily understood approaches are preferable • Goal is to extract a small number of meaningful features • Need a way to quantify errors in deciding which features are significant

Signficance Analysis of Microarrays • Relies on a permutation distribution as a reference distribution • Compared observed t-like statistic with expected statistic values from the permutation distribution to decide which genes are significantly differentially expressed • Unified theory handles all kinds of response: two-class, multi-class, censored data, paired data • Provides False Discovery Rates (FDR) to quantify errors in genes called significant

False Discovery Rate • Traditional approach to multiple hypothesis testing is a Bonferroni-type adjustment • FDR tells you what proportion of your genes called significant might actually be false positives. An FDR is 5% means out of 100 genes called significant, 5 of them could be false positives. • Just as p-values measure false positive rates, q-values (J. Storey) measure FDRs.

SAM and FDR • A p-value of 5 % means that among all null features, 5% will will meet the rejection criterion • A q-value of 5% means that among all features called significant, 5% will be false positives • SAM provides q-values

Prediction Analysis of Microarrays • Predict phenotype based on expression values • Algorithm is more general than the name implies • Utilizes shrunken centroids to select genes that characterize each class • Best illustrated with an example

Peak Probability Contrasts • Data arising from Time-of-Flight Mass spectrometry for measuring relative abundance of different sized proteins in a blood sample, a useful technological complement to microarrays • Several popular systems including MALDI (matrix assisted laser desorption/ionization) and SELDI (Surface enhanced laser desorption/ionization) • Existing algorithms: svm, trees ,boosting, genetic algorithms • Focuses on peaks in the spectra at least for initial analysis • Accounts for variation in the horizontal position and peak heights for the same biological peak • Gives a measure of importance for each peak • Filters out less significant peaks in a simple way • Provides FDR estimates

Software • We firmly believe that easily available and usable software is important • We therefore develop tools that are embedded in very common applications like Excel as many people seem to like the idea • Many of our tools are freely available on the web for download. (PPC will soon be on the list).

References • J. Storey & R. Tibshirani. Statistical significance for genome-wide studies, PNAS 100: 9440-9445 • R. Tibshirani, B. Narasimhan, T. Hastie, G. Chu. Diagnosis of multiple cancer types by Shrunken Centroids of Gene Expression, PNAS 2002 99:6567-6572 (May 14) • PPC paper to appear in Bioinformatics.

Statistical Tools

Statistical Tools

Presentation Transcript

Statistical Tools for Process Improvement - Applications

Statistical Tools for Multivariate Six Sigma

Statistical tools.

Lecture 1: Basic Statistical Tools

Statistical Tools for Health Economics

Overview of Major Statistical Tools

Overview of Major Statistical Tools

Useful Statistical Tools

BASIC STATISTICAL TOOLS

Application of available statistical tools

Chapter 2G Statistical Tools in Evaluation

Chapter 2F Statistical Tools in Evaluation

Statistical Tools for Multivariate Six Sigma

Statistical Tools, Performance Verification

Statistical Process Control (SPC) Tools

Statistical Tools for Process Improvement – Applications

Seven Quality Tools [Statistical Process Control]

Statistical tools: The genutil Package

Statistical Tools in Evaluation

Statistical Tools A Few Comments

Statistical Tools in Evaluation

Statistical Tools A Few Comments