Some key developments in data analysis

Some key developments in data analysis Michael Babyak, PhD

Areas of development • Discarding flawed techniques • New types of models • Treatment of missing data • Simulation and empirical tests • Validation

Techniques largely discredited or highly suspect • Categorization of continuous variables without good reason • Automated variable selection without validation • Overfitted or “cherry-picked” models

New types of models • Regression family • Clustered data • Factor analysis family

Generalized Linear Model Normal Binary/Binomial Count, heavy skew, Lots of zeros Poisson, ZIP, Negbin, gamma General Linear Model/ Linear Regression Logistic Regression ANOVA/t-test ANCOVA Transformed Chi-square Can be applied to clustered (e.g, repeated measures data)

Factor Analytic Family Structural Equation Models Partial Least Squares Latent Variables (Common Factor Analysis) Multiple regression Principal Components

You Use Latent Variables Every Day • A Single Measurement is an indicator of an underlying phenomenon, e.g. mercury rising in a sphygmomanometer measures the underlying construct of “blood pressure.” • How do you improve the reliability of blood pressure measurement? Measure more than once, perhaps even in different setting (e.g. ambulatory monitoring). • A Psychometric Scale is also a collection of indicators of an underlying process, attempting to triangulate on an underlying construct by multiple items (indicators). • A Latent Variable is a collection of indicators with the unshared/unreliable part of the indicators removed—what’s the problem?

Missing Data • Imputation or related approaches are almost ALWAYS better than deleting incomplete cases • Multiple Imputation • Full Information Maximum Likelihood

Out of Missing Data Work • Propensity Scoring • “Matches” individuals on multiple dimensions to improve “baseline balance” • Complier Average Causal Effect (CACE) • Generates a guess at the effect of a treatment among all potential compliers, including those in the control arm

Simulation Example Y = .4 X + error bs1 bs2 bsk-1 bsk bs3 bs4 …………………. Evaluate

True Model:Y = .4*x1 + e

Validation • Split-half better than nothing, but often too conservative • Bootstrap • Repeated splitting

Some Premises • “Statistics” is a cumulative, evolving field • Newer is not necessarily better, but should be entertained as regards the scientific question at hand • Keeping up is hard to do • There’s no substitute for thinking about the problem

http://www.duke.edu/~mababyak • michael.babyak @ duke.edu • http://symptomresearch.nih.gov/chapter_8/

Some key developments in data analysis

Some key developments in data analysis

Presentation Transcript

Some important developments

Some Statistical Issues in Microarray Data Analysis

Some Significant Developments in Photography

Some Topics in Statistical Data Analysis

New Developments in Data Analysis Tools: The Anaphe project

Some experiences in further analysis using MICS Data

Key developments in HR for health

Developments in data assimilation

Some Network Developments

Recent developments in group key exchange

18.0 Some Recent Developments in NTU

SOME NEW DEVELOPMENTS IN PUBLIC DIPLOMACY

Key Economic Developments

Some new Developments in Machine Learning

Some Developments of ABC

Omega-3 Market Developments in North America | Key Factors Analysis

18.0 Some Recent Developments in NTU

Key Developments in Inductor Market

Ambulance Drone Market Analysis & Key Developments

Some Latest Developments In Yahoo Mail

Pecans Market: Competitive Analysis and Key Developments