180 likes | 288 Views
Explore the challenges of handling continuous variables in Generalized Additive Models (GAMs) and learn about categorization, polynomial treatments, and non-parametric functional smoothers. Dive into balancing degrees of freedom, data volume, and functional form for better modeling outcomes. Discover the keys to GAMs, backfitting for dimension reduction, and the use of cross-validation to optimize smoothing parameters. The text delves into implementing various smoothers such as LOESS, regression splines, and decision trees, illustrating their applications in multi-dimensional problems. Stay informed about parameter estimation, error criteria, and cross-validation techniques for effective model evaluation.
E N D
Generalized Additive Models Keith D. Holler September 19, 2005
GLM’s – The Challenge • What to do with continuous variables? • Eg. Age, credit score, amount of insurance • Options • Categorize – but how? • Equal volume, Tree, judgment • Appendix H, “A Practioner’s Guide to GLMs” by Duncan et al • Treat as polynomial • The Weierstrass Approximation Theorem • Eg Mileage (2 miles)^4 = 16 (25 miles)^4 = 390,625 • Look at categorical estimates, transform, rerun • Newage variable = age^3 if age < 20 + age^2 if age < 80 + minimum (age, 80) • All forms must be decided BEFORE model is run • Obviously, no clear winner!
Generalized Additive Models - GAMS • GLMs are special case of GAMs • Eg LN(E[PP]) = Intercept + f1(age) + f2(gender) + f3(symbol) + f4(marital) • The functions f1,f2,f3,f4 can be anything • GLM - Categorical, polynomial, transforms • Non-parametric functional smoothers • Decision trees • Balance degrees of freedom, amount of data, and functional form better
Smoothers – Partial List • Locally weighted running line smoother (LOESS) • Regression splines • Cubic smoothing splines • Monotonic splines • B-splines • Kernel smoothers • Running medians, means, lines • GLM – categories or polynomials • Decision Trees • Many can be extended to multiple dimensions
GAM – Keys • Backfitting allows reduction of dimension • Residual Z = LN(E[PP]) – intercept – f1(age) – f2(gender) – f4(marital) • Fit Z = f3(symbol) • Now a 2-dimensional problem “Y vs X” • Data drives the shape • Not determined apriori • Use of cross validation to find smoothing parameter • “Local” – many of the smoothers use only data points close to the point being predicted, instead of all.
Example – SAS Code proc gam data=all; class gender marital2; model clclmonz = param(gender marital2) spline(age2,df=4) spline(symbol,df=3) / dist=Poisson; output out=estall p; run;
Smoothing Spline • Error Criteria ∑ {Yi – g(ti) } ² + λ∫ { g” (t)} ² dt • λ is smoothing parameter • Reference: Nonparametric Regression and Generalized Linear Models, Green and Silverman
Example – Cross Validation proc gam data=all; class gender marital2; model clclmonz = param(gender marital2) spline(age2) spline(symbol) / method=GCV dist=Poisson; output out=estGCV p; run; Results in degrees of freedom of 17 and 14.
Miscellaneous • Parameter Estimates – 1 for each value • SPLUS • References • SAS Proc Gam • Generalized Additive Models, Hastie and Tibshirani
Q & A Keith D. Holler PhD, FCAS, ASA, ARM Personal Lines Research Department St. Paul Travelers kdholler@travelers.com (860) 277 – 4808 Research paper in progress for Ratemaking call