1 / 34

Toward a unified approach to fitting loss models

Toward a unified approach to fitting loss models. Jacques Rioux and Stuart Klugman, for presentation at the IAC, Feb. 9, 2004. Handout/slides. E-mail me Stuart.klugman@drake.edu. Overview. What problem is being addressed? The general idea The specific ideas Models to consider

meadow
Download Presentation

Toward a unified approach to fitting loss models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Toward a unified approach to fitting loss models Jacques Rioux and Stuart Klugman, for presentation at the IAC, Feb. 9, 2004

  2. Handout/slides • E-mail me • Stuart.klugman@drake.edu

  3. Overview • What problem is being addressed? • The general idea • The specific ideas • Models to consider • Recording the data • Representing the data • Testing a model • Selecting a model

  4. The problem • Too many models • Two books – 26 distributions! • Can mix or splice to get even more • Data can be confusing • Deductibles, limits • Too many tests and plots • Chi-square, K-S, A-D, p-p, q-q, D

  5. The general idea • Limited number of distributions • Standard way to present data • Retain flexibility on testing and selection

  6. Distributions • Should be • Familiar • Few • Flexible

  7. A few familiar distributions • Exponential • Only one parameter • Gamma • Two parameters, a mode if a>1. • Lognormal • Two parameters, a mode • Pareto • Two parameters, a heavy right tail

  8. Flexible • Add by allowing mixtures • That is, where and all • Some restrictions: • Only the exponential can be used more than once. • Cannot use both the gamma and lognormal.

  9. Why mixtures? • Allows different shape at beginning and end (e.g. mode from lognormal, tail from Pareto). • By using several exponentials can have most any tail weight (see Keatinge).

  10. Estimating parameters • Use only maximum likelihood • Asymptotically optimal • Can be applied in all settings, regardless of the nature of the data • Likelihood value can be used to compare different models

  11. Representing the data • Why do we care? • Graphical tests require a graph of the empirical density or distribution function. • Hypothesis tests require the functions themselves.

  12. What is the issue? • None if, • All observations are discrete or grouped • No truncation or censoring • But if so, • For discrete data the Kaplan-Meier product-limit estimator provides the empirical distribution function (and is the nonparametric mle as well).

  13. Issue – grouped data • For grouped data, • If completely grouped, the histogram represents the pdf, the ogive the cdf. • If some grouped, some not, or multiple deductibles, limits, our suggestion is to replace the observations in the interval with that many equally spaced points.

  14. Review • Given a data set, we have the following: • A way to represent the data. • A limited set of models to consider. • Parameter estimates for each model. • The remaining tasks are: • Decide which models are acceptable. • Decide which model to use.

  15. Example • The paper has two example, we will look only at the second one. • Data are individual payments, but the policies that produced them had different deductibles (100, 250, 500) and different maximum payments (1,000, 3,000, 5,000). • There are 100 observations.

  16. Empirical cdf

  17. Distribution function plot • Plot the empirical and model cdfs together. Note, because in this example the smallest deductible is 100, the empirical cdf begins there. • To be comparable, the model cdf is calculated as

  18. Example model • All plots and tests that follow are for a mixture of a lognormal and exponential distribution. The parameters are

  19. Distribution function plot

  20. Confidence bands • It is possible to create 95% confidence bands. That is, we are 95% confident that the true distribution is completely within these bands. • Formulas adapted from Klein and Moeschberger with a modification for multiple truncation points (their formula allows only multiple censoring points).

  21. CDF plot with bounds

  22. Other CDF pictures • Any function of the cdf, such as the limited expected value, could be plotted. • The only one shown here is the difference plot – magnify the previous plot by plotting the difference of the two distribution functions.

  23. CDF difference plot

  24. Histogram plot • Plot a histogram of the data against the density function of the model. • For data that were not grouped, can use the empirical cdf to get cell probabilities.

  25. Histogram plot

  26. Hypothesis tests • Null-model fits • Alternative-it doesn’t • Three tests • Kolmogorov-Smirnov • Anderson-Darling • Chi-square

  27. Kolmogorov-Smirnov • Test statistic is maximum difference between the empirical and model cdfs. Each difference is multiplied by a scaling factor related to the sample size at that point. • Critical values are way off when parameters estimated from data.

  28. Anderson-Darling • Test statistic looks complex: • where e is empirical and m is model. • The paper shows how to turn this into a sum. • More emphasis on fit in tails than for K-S test.

  29. Chi-square test • You have seen this one before. • It is the only one with an adjustment for estimating parameters.

  30. Results • K-S: 0.5829 • A-D: 0.2570 • Chi-square p-value of 0.5608 • The model is clearly acceptable. Simulation study needed to get p-values for these tests. Simulation indicates that the p-values are over 0.9.

  31. Comparing models • Good picture • Better test numbers • Likelihood criterion such as Schwarz Bayesian. The SBC is the loglikelihood minus (r/2)ln(n) where r is the number of parameters and n is the sample size.

  32. Several models

  33. Which is the winner? • Referee A – loglikelihood rules – pick gamma/exp/exp mixture • This is a world of one big model and the best is the best, simplicity is never an issue. • Referee B – SBC rules – pick exponential • Parsimony is most important, pay a penalty for extra parameters. • Me – lognormal/exp. Great pictures, better numbers than exponential, but simpler than three component mixture.

  34. Can this be automated? • We are working on software • Test version can be downloaded at www.cbpa.drake.edu/mixfit. • MLEs are good. Pictures and test statistics are not quite right. • May crash. • Here is a quick demo.

More Related