1 / 12

Replacing Missing Values

Replacing Missing Values. Jukka Parviainen Tik-61.181 Special Course in Information Technology 27.10.1999. Agenda. Motivation Objectives Meaning for the conclusions Origin of missing values (MV) Detection of missing values Replacing missing values Examples. References.

wynonna
Download Presentation

Replacing Missing Values

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Replacing Missing Values Jukka Parviainen Tik-61.181 Special Course in Information Technology 27.10.1999

  2. Agenda • Motivation • Objectives • Meaning for the conclusions • Origin of missing values (MV) • Detection of missing values • Replacing missing values • Examples Jukka Parviainen

  3. References • Pyle, DP for DM, chapter 8 • Hair, Anderson, Tatham, Black: Multivariate Data Analysis • Bishop: NN for PR Jukka Parviainen

  4. …missing values? Jukka Parviainen

  5. Motivation • There are always MVs in a real data set • MVs may have an impact on modeling, in fact, they can destroy it! • MVs contain also information!!! • Hint for the modeler: Avoid-Detect-Replace-Understand Jukka Parviainen

  6. “Definitions” • Missing value - not captured in the data set: errors in feeding, transmission, ... • Empty value - no value in the population • Outlier, out-of-range value Jukka Parviainen

  7. Objectives • Controlled and understood by the modeler • “Least harm”, no “new” information into a data set • statistical estimation of MVs not the primary issue, but DM • KISS - speed and simplicity • PIE-I/O - training+testing+execution Jukka Parviainen

  8. Origin and Detection • Missing data process • Degree of randomness • nonrandom • missing at random • missing completely at random • Detecting missing value patterns • number of MVs in each variable/case • compare MVP to complete sets Jukka Parviainen

  9. Replacing missing values • Randomness of MVs? • Methods • Use the complete data • Delete variable(s)/case(s) • Imputation methods... • Model based (ML, Bayes) • Use robust models Jukka Parviainen

  10. Imputation methods • Process of estimating MVs based on valid values of other variables / cases • Techniques: • distribution characteristics from all available valid values • replacing: case, mean substitution, cold deck, regression imputation Jukka Parviainen

  11. Examples • Polls, Questionnaires • Planning more than essential • human factors! • small amounts of data • Data from steel plant • Information system • errors, default values • lots of data Jukka Parviainen

  12. Questions • Does software applications help or hide the effect of missing values? (SPSS Clementine) • Execution/prediction phase of DM process? • What to do with alpha variables? Jukka Parviainen

More Related