1 / 25

Principal Components Analysis

Principal Components Analysis. Eric Vaagen, FCAS Assistant Actuary September 5, 2008. Agenda. Motivation What is PCA? Background Simple example Is PCA right for you?. Motivation. Forecast average premium by coverage Explanatory variables Vehicle use, territory, driving record

lily
Download Presentation

Principal Components Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principal Components Analysis Eric Vaagen, FCAS Assistant Actuary September 5, 2008

  2. Agenda • Motivation • What is PCA? • Background • Simple example • Is PCA right for you?

  3. Motivation • Forecast average premium by coverage • Explanatory variables • Vehicle use, territory, driving record • Breakdown of change in average premium • Multicollinearity exists

  4. Average Premium 2002-2006

  5. Modeling Procedure Explanatory Variables Variable Selection Response Variable Model Chosen Variables

  6. Modeling Procedure Vehicle Use Territory Drv. Record Variable Selection Average Premium Multiple Regression Chosen Variables

  7. Variable Selection Variable Selection Methods • Stepwise regression • Forward, backward • PCA • Unsupervised • Partial least squares • Supervised • GLM

  8. Background • First described in 1901 by Karl Pearson • Find the best lines and planes to fit a set of points • What else did he discover? • Pearson’s χ² • Linear regression • Classification of distributions (exponential family)

  9. Vehicle use Pleasure Commute Business Territory Rural Suburban Urban PCA Example Explanatory Variables

  10. Vehicle Use 2002-2006

  11. Territory 2002-2006

  12. Example – Average Premium Response Variable

  13. Modeling Procedure Vehicle Use Territory PCA Average Premium Multiple Regression Chosen PCs

  14. PCA Procedure • PCs • No multicollinearity • The 1st PC has the most variance • Output • Weights to create the PCs • Variability of each PC

  15. Modeling Procedure Vehicle Use Territory 5 years x 6 variables Weights PCA 5 years x 6 variables Variability Chosen PCs

  16. Example – Scree Plot

  17. Pleasure Commute Business Rural Suburban Urban PC #1 -0.19 0.54 -0.40 0.56 -0.45 -0.03 Chosen Variables PC Calculation PC #3 -0.55 0.36 0.23 -0.02 0.47 -0.55 PC #2 -0.54 0.14 0.48 -0.20 -0.31 0.58

  18. PC Calculation • PC1 = - 0.19P + 0.54C - 0.40B + 0.56R - 0.45S - 0.03U • PC12002 = -0.19(30%)+0.54(50%)-0.40(20%) +0.56(20%)-0.45(30%)-0.03(50%)

  19. Example - Modeling Procedure Vehicle Use Territory PCA Average Premium Multiple Regression Chosen PCs

  20. Multiple Regression Example – Results

  21. ICBC Personal TPB

  22. Advantages • Eliminates multicollinearity • Most of the original variance is captured in a few principal components • More refined selection method

  23. Disadvantages • Can be hard to interpret the PCs • PC weights may not be stable from year to year • Difficult to explain

  24. Is PCA Right For You? • Concerned about multicollinearity? • Confident in the set of explanatory variables? • Want to reduce dimensionality, without throwing away variables?

  25. For More Information • 2008 Discussion Paper • PCA and Partial Least Squares: Two Dimension Reduction Techniques for Regression • http://www.casact.org/pubs/dpp/dpp08/08dpp76.pdf • Predictive modeling seminar • Oct 6-7, 2008 in San Diego, CA • PCA and Partial Least Squares

More Related