Biostatistics 760

Biostatistics 760 Random Thoughts

Upcoming Classes • Bios 761: Advanced Probability and Statistical Inference • Bios 767: Longitudinal Data Analysis • Bios 780: Theory and Methods for Survival Analysis • Bios 841: Statistical Consulting

Bios 761 • Frequentist and Bayesian decision theory • Hypothesis testing: UMP tests, etc. • Bootstrap and other methods of inference • Stochastic processes: • Poisson processes • Markov chains • Martingales • Brownian motion

Bios 780 • Time-to-event data • Right censoring • Counting processes; martingales • Semiparametric approaches • Kaplan-Meier estimator • Log-rank statistic • Cox model • Data analysis

Bios 841 • Consulting versus collaboration • Bringing it all together to solve problems • Communicating about statistics • Three real problems • Three journal style reports • One final oral presentation • Real time problem solving • What is the role of statistical theory?

A Few War Stories • As a student: thesis on surrogates • As a postdoc: infectious diseases • As a new professor: cystic fibrosis (CF)* • Working on tenure: empirical processes • Empirical processes and cancer* • Chair of the DSMC for NICHD • Artificial intelligence and NSCLC

CF Neonatal Screening • 1992: Joined Phil Farrell’s CF study team • 1997: Farrell, Kosorok, Laxova, et al, published in NEJM • 2004 (Oct. 15): CDC recommended CF newborn screening: the 1997 article was judged the only valid randomized trial • States offering CF newborn screening: 3 in 1997, 12 in 2004, 45 today

What Role Did “Theory” Play? • Used state-of-the-art statistical methods that were robust (GEE) • In other CF research we have used: • Current status methods (parametric, robust) • Constrained regression estimation • Semiparametric bootstrap inference • Martingale based survival analysis • New work using artificial intelligence

Empirical Processes and Cancer • Non-Hodgkin’s Lymphoma Prognostic Factors Project (1993, NEJM) • Cox proportional hazards model employed to ascertain risks of 5 prognostic factors: Age, performance Status, serum lactate dehydrogenase Level, number of extranodal disease Sites, tumor Stage • Diagnostics show the model fits poorly

What is the Problem? • Poor survival function prediction • Possibly incorrect interpretation of risk factor effects • A model that adds a single parameter to the Cox model was developed and fit • This new model fits well (Kosorok,Lee and Fine, 2004) • Inference for the new model is complicated

What Does Theory Tell Us? • We can derive valid inferential tools for the new model: estimation and bootstrap • Robustness was also studied: we learn theoretically that the Cox model is robust to this kind of model misspecification: • The direction of the regression coefficients is preserved • Should use robust variance for Cox model

Theory Versus Applications • The title implies there is conflict between theory and applications • This isn’t true! • Theory provides a basis for correct thinking and problem solving for applications • Applications drive new theoretical development

Theory Can Be Impractical • Law of iterated logarithm: needs sample size of 108 (“asymptopia”). • Sometimes higher order approximations are needed before it becomes useful. • Sometimes computational properties of asymptotically optimal estimators are poor. • Some hard problems take years to solve.

Why Theory is Needed • Often it does work for practical sample sizes. • Can reveal properties that are universally valid: simulation studies are limited to the scenarios investigated. • Theory can lead toward methodological solutions (Cook and Kosorok, 2004 JASA). • Theory can drive scientific discovery. • Some results are beautiful.

Data Mining Versus Inference • Data mining is summarizing and representing data no matter how complicated • Inference is determining valid measures of uncertainty • Patterns obtained from data mining can be misleading • Inference without data mining may miss important structure

The Core of Statistics • Statistics is the science of science • How do we learn from our world and draw meaningful and valid conclusions from it? • Need both data mining and valid inference • Requires a unique kind of intuition • Needs many different intellectual perspectives • One of the most challenging of all fields

Everyone Needs Core Literacy • All statisticians need to know enough theory to have core literacy about statistics and to be able to problem solve • All statisticians need to know enough about applications to know what is important • All biostatisticians need to know enough statistical methods to be useful in practice • The purpose of a Ph.D. in Biostatistics is to enable the creation of new methodology

Semiparametric Inference • The study of statistical models with parametric and/or nonparametric parts • Can achieve trade-off between scientific meaning and model “robustness” • Estimation and inference are often hard • There exists an efficiency bound for parametric and some nonparametric parts • NPMLE, testing and estimating equations

Empirical Processes • Tools for complex model inference and high dimensional data • Can determine universal properties of semiparametric methods: • Consistency • Rate of convergence • Limiting distributions • Valid inference (empirical process bootstrap) • Empirical processes are everywhere

The Road Ahead • Whatever you choose to do, the core statistical theory classes will help you. • Be patient as your learn. • Be willing to work hard (struggle is good). • It takes many different kinds of thinkers with different learning styles. • There are important discoveries to be made in both applications and theory.

Biostatistics 760

Biostatistics 760

Presentation Transcript

Biostatistics 760

Biostatistics

Biostatistics

BIOSTATISTICS

Biostatistics

Biostatistics

Biostatistics

Biostatistics

Biostatistics

Biostatistics

Biostatistics

BIOSTATISTICS

Biostatistics

Biostatistics

Biostatistics

Biostatistics

BIOSTATISTICS

Biostatistics