1 / 29

Selecting the Right Distribution Using MKSE

Selecting the Right Distribution Using MKSE. Jerzy Wieczorek, Portland State University. My statistics master’s project. I’m reading…

yestin
Download Presentation

Selecting the Right Distribution Using MKSE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Selecting the Right DistributionUsing MKSE Jerzy Wieczorek, Portland State University Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  2. My statistics master’s project • I’m reading… • Weber, Leemis, and Kincaid, 2006, “Minimum Kolmogorov-Smirnov test statistic parameter estimates.” Journal of Statistical Computation and Simulation, vo. 76, no. 3, 195–206. • …and extending it to right-censored data. Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  3. “Parameter estimates”? • Statisticians will tell you that your data sampleis from some larger underlying population • μ, σ² : parameters of the population • x, s² : summary measures on the sample which estimate the parameters • Examples of distributions and their parameters: • Normal: mean, variance • Exponential: failure rate • Uniform: lower bound, upper bound • ¯ Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  4. “Parameter estimates”? • Most common way to estimate parameters: Maximum Likelihood Estimators (MLEs). • For a given distribution, find values of the distribution parameters that maximize the likelihood of seeing your data sample. • My project: computational tool for finding Minimum Kolmogorov-Smirnov Estimates (MKSEs) of parameters. Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  5. “Kolmogorov-Smirnov”? Kolmogorov Smirnov Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  6. “Kolmogorov-Smirnov”? • Kolmogorov-Smirnov test compares empirical CDF to proposedpopulation CDF,finds maximumdifference inheights, and tellsyou whether datacome from theproposed CDF. Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  7. “Kolmogorov-Smirnov”? • MKSE: choose parameter values that give lowest K-S teststatistic (i.e.,lowest vertical differencebetween CDFs). Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  8. “Right-censored”? • Monitoring failure times, but for some observations you only know “censoring time”: • Patients drop out of medical study before having a relapse • Equipment test run ends before all components have failed • (For MKSE on right-censored, we can use “Kaplan-Meier” estimator of CDF.) Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  9. MP 302.5 All Incident Free 6 PM Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  10. Current MKSE software output Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  11. Contributions of MKSFitter • Sanity check: Is proposed parametric model reasonable, or do others have a far better fit? • Easy “black box” parameter estimation tool when other estimators have no closed form or are cumbersome to calculate Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  12. Strengths and weaknesses • Most useful when fit of CDF is the most important consideration, i.e. for simulation • Performance comparable to MLE… • …but then again, performs no better than MLE • Does not provide standard error estimates • Requires optimization algorithm parameters to be prescribed Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  13. Next steps • Embed C code directly within R in order to… • Make easier to use! • Test performance on censored data • Estimate standard error via bootstrapping • Make use of other CDF estimates available in R • Publish MKSFitter R package Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  14. Resources • Weber, Leemis, and Kincaid, 2006, “Minimum Kolmogorov-Smirnov test statistic parameter estimates.” Journal of Statistical Computation and Simulation, vo. 76, no.3, 195–206. • Meead’s rainy-day speed data from PORTAL • Photographs: • http://en.wikipedia.org/wiki/Andrey_Kolmogorov • http://en.wikipedia.org/wiki/Vladimir_Ivanovich_Smirnov_(mathematician) Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  15. Review of the K-S statistic • Let X1, …, Xn be an i.i.d. random sample from a continuous distribution, and let F(x) be the CDF of some continuous distribution. • We wish to test the following: • H0: the Xi are from the distribution F(x) • HA: the Xi come from some other distribution • Construct the empirical distribution function • Fn(x) = (number of sample X’s ≤ x) / n Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  16. Review of the K-S statistic • The one-sample K-S statistic is • Dn = max{ |Fn(x) – F(x)| } • If Dn is “large enough,” reject H0 in favor of HA; otherwise conclude H0. • In other words,small Dn → proposed distribution is a good fit. Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  17. Minimum K-S Estimation (MKSE) • Calculate Fn(x) from your observations. • For a given set of parameter values for F(x), calculate and save the K-S statistic. • Repeat for many different sets of parameter values. The MKSE is the set of parameter values for which the K-S statistic is lowest. Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  18. One-parameter example • Using Lieblein and Zelen (1956) dataset,23 observations of ball-bearing failure time(in millions of revolutions): • 17.88 28.92 33.00 41.52 42.12 45.60 48.48 51.84 • 51.96 54.12 55.56 67.80 68.64 68.64 68.88 84.12 • 93.12 98.64 105.12 105.84 127.92 128.04 173.40 • Fit this to exponential distribution,which has a single parameter. Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  19. One-parameter example • MLE: • 72.22 • MKSE: • 96.10 Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  20. One-parameter example • MLE: • 72.22 • MKSE: • 96.10 Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  21. Multiple-parameter case • Use “Bell-Curve Based” (BCB) evolutionary algorithm to minimize K-S statistic over a range of parameter values in two (or more) dimensions. • BCB has been shown to perform well at finding global optimum in multidimensional problems with many local optima. Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  22. BCB algorithm • Optimize over k-dimensional parameter space. • For each of 100 generations: • Set 25 best points as “parents.” • Create 25 “children” at normally-distributed distances from weighted means of pairs of parents. z ~ N(0, 1) r ~ N(0, 4) P2 z M P1 r C1,2 Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  23. MKSFitter output example • Software covers wide range of distributions: Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  24. MKSE performance • Tested on simulated data from several different known distributions. • As sample size ↑: • frequency of selecting correct distribution ↑,mean K-S value ↓. • At large sample sizes, mean distance from estimated to true parameters is similar for MKSE and MLE. Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  25. MKSE performance • 100 random samplesat n = 10 • 100 random samplesat n = 100 Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  26. Modification for censored data • Replace the usual empirical CDF with Kaplan-Meier estimate of the survival function (or other estimate, as appropriate for censoring type). • Continue as before. Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  27. Censored MKSE performance • Mini-test: 10 random samples of size n=23, with event times and censoring times both Exp(1). • Mean of estimates of λ: 0.8985Standard deviation of estimates: 0.2747 • Each time, at least 4 other distributions had lower minimum K-S value than exponential did. • Exp. Power had lowest K-S value 4 of 10 times. Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  28. Censored MKSE performance • For proper analysis I must learn to embed this code in R, so I can run and save results hundreds of times.I will evaluate: • frequency of selecting correct distribution • accuracy of parameter estimates vs. MLE • (Cannot compare mean K-S value for different numbers of noncensored observations…) Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

  29. Next steps • Embed C code directly within R in order to… • Test performance for censored data • Bootstrap for standard error estimates • Make use of other CDF estimates available in R • Publish MKSFitter R package • Enable alternate statistics (Cramer-VonMises is consistent for distributions where K-S is not) Selecting the Right Distribution Using MKSE, Jerzy Wieczorek

More Related