1 / 12

WINsorizing

WINsorizing. What is it and why could it be inappropriate?. Kyle Allen & Matthew Whitledge May 7, 2013 . What is winsorizing?. What it isn’t… Trimming Truncating A ny other method that completely removes observations from the data Term first used in 1960

nedra
Download Presentation

WINsorizing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WINsorizing What is it and why could it be inappropriate? Kyle Allen & Matthew Whitledge May 7, 2013

  2. What is winsorizing? • What it isn’t… • Trimming • Truncating • Any other method that completely removes observations from the data • Term first used in 1960 • John W. Tukey; W. J. Dixon • “Numerical value of a wild observation is untrustworthy” • However, its direction of deviation is important • Decreasing the magnitude of the deviation, retaining its direction

  3. Winsorizingan example • Order the observations by value • Xi1, Xi2, …Xi100, where i denotes the ithregressor • If Winsorizing at 1% and 99%, then • The value for Xi1will be replaced by the value for Xi2 • The value for Xi100will be replaced by the value for Xi99 Another example: • Xi1, Xi2, …Xi100 • Winsorize at 10% (5% from bottom and 5% from the top) • Beginning Sample: • Xi1, Xi2, Xi3, Xi4,Xi5,Xi6,… Xi95,Xi96,Xi97,Xi98,Xi99,Xi100 • Winsorized Sample • Xi5, Xi5, Xi5, Xi5,Xi5,Xi6,… Xi95,Xi96,Xi96,Xi96,Xi96,Xi96

  4. Winsorizingalternatives • Are the observations really outliers? • Look at Cook’s D measure • Transform the variables • Take the log or square root of the variable • This shouldn’t be done only to increase significance • Median based estimations • Quantile regression • Median absolute deviation • Nonparametric methods

  5. Winsorizinga Sas example Lift Index Data • Workers perform lifting tasks • Each lift has an amount of stress associated with it • Measuring the number of days an employee missed based on the lift they were performing • 206 observations

  6. winsorizing SAS CODE • procsgplotdata=isqsdata.lilesmerge;scattery=dayslostx=alr;scattery=dayslost1 x=alr;run; • dataisqsdata.lileswin; setisqsdata.lileswin; ifsubject = 6thendayslost = 27; ifsubject = 35thendayslost = 27; run; • procqlimdata=isqsdata.liles; modeldayslost = alr; endogenousdayslost ~ censored(lb=0); run; • procqlimdata=isqsdata.lileswin; modeldayslost1 = alr; endogenous dayslost1 ~ censored(lb=0); run;

  7. winsorizing Look at your data

  8. Proc glim (non-winsorized)

  9. Proc glim (winsorized)

  10. WinsorizingImplications • May impact significance • The standard errors will decrease • Depending on how symmetrical the data is, the mean may increase or decrease • For example, if there is an extremely positive outlier, it will decrease the mean • The significance will be determined by the proportionate change in the estimated coefficient, relative to the change in the standard error

  11. Winsorizingwhy could it be inappropriate? • May be appropriate for • Ratios • Book to Market • Other measures in which the denominator can be extremely small • Never winsorize valid observations • Investment Returns • R&D expenditures • Truly exceptional observations • Large number of biological elements • Extremely low stress tolerances for mechanical implements • Model should produce data we could actually see

  12. Winsorizing bibliography • Bibliography • Brillinger, David R. “John W. Tukey: His Life and Professional Contributions.” The Annals of Statistics. 30(2002): 1535-75. • Dixon, W. J. “Simplified Estimation from Censored Normal Samples.” The Annals of Mathematical Statistics. 31(1960): 385-91. • Kafadar, Karen. “John Tukey and Robustness.” Proceedings of the Annual Meeting of the American Statistical Association. 2001. • Kruskal, William, Thomas Ferguson, John W. Tukey, E. J. Gumbel, and F. J. Anscombe. “Discussion of the Papers of Messrs, Anscombe and Daniel.” Technometrics. 2(1960): 157-66. • Tukey, John W. and Donald H. McLaughlin. “Less Vulnerable Confidence and Significance Procedures for Location Based on a Single Sample: Trimming/Winsorization 1. The Indian Journal of Statistics. 25(1963): 331-52. • Westfall, Peter H. and Kevin S. S. Henning. Understanding Advanced Statistical Methods. Boca Raton, FL: CRC Publishing, 2013.

More Related