E N D
missingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvalumissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvalu On thepresenceofmissingvalues JENA GRADUATE ACADEMY Dr. Friedrich Funke
Learning objectives • What are missing values • How do I basically treat missing data • Why are data missing • How do I detect (the systematics of) missingness • How do I treat missing data - revisited
Basic Types of Missing Values • Unit-nonresponse (drop-out, attrition etc.) • Item-nonresponse • Missing Values by design
Something is Missing - Why worry? • Missingvaluesarealmosteverywhere • Inefficiency (lack of power) • Bias ofestimation (!!!) Missingvalueanalysiscansupportourunderstandingofthedata!
Missing value management (examples) Deletion Imputation Mean imputation Conditional Mean (regression) Hot deck/cold deck Maximum likelihood (EM, FIML) Multiple imputation • Listwise deletion (complete cases analysis) • Pairwise deletion (available data analysis) • Both are unwise deletion
Deletion ListwiseDeletion Pairwisedeletion Estimate each moment with all available non-missing cases Appears to use all information in data Covariance matrices can become non-positive-definite • most common way of dealing with missing data • (implicitly in SPSS) • conservative »At least I do nothing wrong« • Can result in zero cases
Mean imputation • Text neu machen!!!!!
Regression imputation • Actually a form of conditional mean imputation • Very elegant, ifyouaddresiduals (stochasticregressionimputation, mean=0 andvariance equal to the residual variance)
Hot deck imputation • fills in missing values on incomplete records using values from similar, but complete records of the same dataset (hot deck ofpunchcards)
Cold deck imputation • fills in missing values on incomplete records using values from similar, but complete records of external dataset • e.g. Historical imputation
Maximum Likelihood Approaches • Simple idea, but computationally complex • Loosely speaking, for a fixed set of data and underlying probability model, maximum likelihood picks the values of the model parameters that make the data "more likely" than any other values of the parameters would make them.
Multiple Imputation • Combination of several random imputations and integration Data integation Imputeddata (e.g. m=10) Separate analyses Incompletedata
Learning objectives • What are missing values • How do I basically treat missing data • Why are data missing • How do I detect (the systematics of) missingness • How do I treat missing data - revisited
Missingness is a probabilistic phenomenon Dataset (data matrix) MV »mechanism« (indicator matrix)
Typology of missingness distributions • MCAR Missingcompletelyatrandom • MAR Missingcompletelyatrandom • MNAR Non-ignorable (eq. 1 and 2 areviolated, missingnessdepends on themissingvaluesitself)
Typology of missingness distributions • X completely observed • Y variable with some missings • R missingness • missingness »mechanism« X MCAR Missingnessisindependent from empiricaldata Y R
Typology of missingness distributions • X completely observed • Y variable with some missings • R missingness • missingness »mechanism« X MAR Missingnessisrelatedtoobserveddata Y R
Typology of missingness distributions • X completely observed • Y variable with some missings • R missingness • missingness »mechanism« X MNAR Missingnessisrelatedtomissingdataas well Y R
Typology of missingness distributions • MCAR Missingcompletelyatrandom • MAR Missingcompletelyatrandom • MNAR Non-ignorable X X X Y Y Y R R R
Examples for MNAR • We are interested in income, but managers refuse to answer • We are interested in prejudice, but the racists skip that scale • We are interested in depression scores, but the depressed are too tired to complete the questionnaire X Y R
Now you can answer the question: • Does this rule of thumb make sense? • If up to 5% of my data are missing, I don‘t have a problem. If 50% are missing I am lost. NO! The amount of missingness is much less important than the reason for missingness!
Amount of missingness 10 % Missing 90% Missing missing missing present present
Mechanism of missingness MAR MNAR/NI Missingness depends mainly on Y Big trouble ahead • Missingness depends mainly on X • solvable
Mechanism of missingness Biased Median
Imputation with MCAR MCAR Although 90% aremissing, model basedimputationcanreproducethedata. Even under MCAR meanimputationisevil!
Imputation with MAR MAR Under MAR model basedimputationcanreproducethedata. Mean imputationisevil!
Take home message • missing values are not only decreasing efficiency/power • they can (severely) bias the parameter estimates • listwise and especially pairwise deletion is unwise deletion • naïve unconditional imputation is evil • understand the missingness „mechanism“ • under MCAR relax • Under MAR model based imputation is no alchemy Best Practice - PREVENTION