350 likes | 556 Views
Principal Components Analysis. Eric Vaagen, FCAS Assistant Actuary September 5, 2008. Agenda. Motivation What is PCA? Background Simple example Is PCA right for you?. Motivation. Forecast average premium by coverage Explanatory variables Vehicle use, territory, driving record
E N D
Principal Components Analysis Eric Vaagen, FCAS Assistant Actuary September 5, 2008
Agenda • Motivation • What is PCA? • Background • Simple example • Is PCA right for you?
Motivation • Forecast average premium by coverage • Explanatory variables • Vehicle use, territory, driving record • Breakdown of change in average premium • Multicollinearity exists
Modeling Procedure Explanatory Variables Variable Selection Response Variable Model Chosen Variables
Modeling Procedure Vehicle Use Territory Drv. Record Variable Selection Average Premium Multiple Regression Chosen Variables
Variable Selection Variable Selection Methods • Stepwise regression • Forward, backward • PCA • Unsupervised • Partial least squares • Supervised • GLM
Background • First described in 1901 by Karl Pearson • Find the best lines and planes to fit a set of points • What else did he discover? • Pearson’s χ² • Linear regression • Classification of distributions (exponential family)
Vehicle use Pleasure Commute Business Territory Rural Suburban Urban PCA Example Explanatory Variables
Example – Average Premium Response Variable
Modeling Procedure Vehicle Use Territory PCA Average Premium Multiple Regression Chosen PCs
PCA Procedure • PCs • No multicollinearity • The 1st PC has the most variance • Output • Weights to create the PCs • Variability of each PC
Modeling Procedure Vehicle Use Territory 5 years x 6 variables Weights PCA 5 years x 6 variables Variability Chosen PCs
Pleasure Commute Business Rural Suburban Urban PC #1 -0.19 0.54 -0.40 0.56 -0.45 -0.03 Chosen Variables PC Calculation PC #3 -0.55 0.36 0.23 -0.02 0.47 -0.55 PC #2 -0.54 0.14 0.48 -0.20 -0.31 0.58
PC Calculation • PC1 = - 0.19P + 0.54C - 0.40B + 0.56R - 0.45S - 0.03U • PC12002 = -0.19(30%)+0.54(50%)-0.40(20%) +0.56(20%)-0.45(30%)-0.03(50%)
Example - Modeling Procedure Vehicle Use Territory PCA Average Premium Multiple Regression Chosen PCs
Multiple Regression Example – Results
Advantages • Eliminates multicollinearity • Most of the original variance is captured in a few principal components • More refined selection method
Disadvantages • Can be hard to interpret the PCs • PC weights may not be stable from year to year • Difficult to explain
Is PCA Right For You? • Concerned about multicollinearity? • Confident in the set of explanatory variables? • Want to reduce dimensionality, without throwing away variables?
For More Information • 2008 Discussion Paper • PCA and Partial Least Squares: Two Dimension Reduction Techniques for Regression • http://www.casact.org/pubs/dpp/dpp08/08dpp76.pdf • Predictive modeling seminar • Oct 6-7, 2008 in San Diego, CA • PCA and Partial Least Squares