160 likes | 337 Views
H IERARCHICAL B AYESIAN M ODELLING OF THE S PATIAL D EPENDENCE OF I NSURANCE R ISK. L Á SZL Ó M ÁRKUS and M IKLÓS A RATÓ Eötvös Loránd University Budapest, Hungary. The basis for locally dependent premiums.
E N D
HIERARCHICALBAYESIANMODELLINGOF THE SPATIAL DEPENDENCE OF INSURANCE RISK LÁSZLÓ MÁRKUS and MIKLÓS ARATÓEötvös Loránd UniversityBudapest, Hungary
The basis for locally dependent premiums • Companies apply spatially dependent premiums for various types of insurances. More risky customers should pay more, but how to determine the dependence of risk on location? • We analyse third party liability motor insurances data for a certain company in Hungary. Only claim frequency is considered in this talk, claim size needs different models. So the occurrence of claims constitues the risks for the present talk. • An insurance company may not want to set its premium rating changing from locality to locality, but it has to know how much discrepancy is resulted from smoothing ie. aggregating for larger regions – customers are very sensitive for “unjustly set” rates.
Information from the neighbourhood to be used • Only the capital Budapest is large enough for reliable direct risk estimation. • In a village with 2 contracts 1 occurring claim increases dramatically the estimated risk, but not the true one. • What to do with localities with no contract • The spatial risk component cannot be estimated from the local experience alone, in addition the information available in the neighbourhoods has to be accounted for. But what to call neighbourhoods? • Being aware of its shortcomings, we choose all the localities within 15 km aerial distance to be neighbours of a given locality.
The inhomogeneous spatial Poisson process • Suppose the claim frequency Zj of the j-th individual contract to be distributed by Poisson law. • Its Poisson intensity parameter depends on the exposure timeτj(the time spent in risk), which is known to us as data. • Furthermore the intensity depends some other risk factors characterising the contract (such as car type, age etc.). • Finally the intensity parameter depends on the location where the contract belongs to. • Suppose in addition that interdependence among claim frequencies is created solely through the intensity parameters, i.e. Zj -s are conditionally independent given the values of the intensities. • Our final assumption is that the effects of the exposure time, risk factors and location are multiplicative on the intensity.
Contract-level model • So we end up with Zj distributed as Poisson(·j·τj·ei) with the average intensity or common claim frequency , the risk factor effect j, exposure τj and the spatial risk parameter ei. • The additional risk factors are • car type (30) • -gender (3 male, female, company) • -age group (6) • -population size (10) • For the first instance suppose and all ei –s to be equal to 1. Then j-s are easily estimable by a generalised linear model. • Introducing now j·τj as the modified exposure (denoted by τj*), we can build a model for the claim frequencies at locations and estimate the spatial risk parameter ei .
Location-level model • By virtue of the conditional independence, the claim frequency Yi at the i-th location will be distributed as Poisson(·∑τj*·ei), where the summation goes over all contracts belonging to location i. • In this model we consider ∑τj* as given (“observed” data), even though it contains estimated components, and denote it by ti. • After estimating ei it is possible to return to the contract level and reestimate the effects of the risk factors and iterate this procedure. • Remarkable that stability can be reached within a few steps.
The hierarchical Bayesian model • Let us introduce some further notations: • Yi: number of claims,ti: modified exposure time,θi: risk factor at the i-th location, i = 1, 2, …, N, λ: common claim frequency • A:neighbourhood matrix • ρ: parameter of the covariance • p, q, α, β: Bayesian parameters • The claim frequency follows a non-homogeneous Poisson process. That is, Yi-sare independent Poisson(·ti·ei) distributed random variables, givenλandΘi. • On the second level of model hierarchy suppose the spatial parameters Θj-s to be normally distributed with the covariance matrix Σ=(I-ρA)-1, depending on the neighbourhood matrix A
We must keep the covariance matrix Σ positive definite, therefore suppose the following prior on the parameter and ρ • p and q are conditional on ρprescribing the expectation and variance as p/(p+q)= ρ and ρ2(1-ρ)/(p+ρ)=2 • We have to take care of the update of ρ, since this distribution is not symmetric. R’s xbeta function helps to compute the correction for the posterior ratio • Under these assumptions the posterior can be computed as
From here we have the form for the log-posterior as • For λ the computation of the maximum likelihood estimator, conditional on ρ and Θ is possible, as • For ρ and Θ Metropolis-Hastings update is needed
The problem is that Θ is a 3111 long vector • Updating the posterior requires the computation of a quadratic form with a 3111x3111 matrix, at each coordinate of the 3111 long vector • So it is clearly paralysing step even on a very fast computer, even if trying to factorise the matrix into a full rank diagonal plus a sparse matrix • We used the following updating rule • Propose in all the coordinates, one by one, and compute the increment between the present logposterior and the one-coordinate-update. (By not updating the logposterior, we can use vector operation instead of cycles which is a lot faster) • Determine on this basis those coordinates where to accept the proposal • Update the logposterior • Update the other parameters • In these steps the logposterior is updated sequentially • 10 000 update of Θ (with cca. 80 % acceptance) and 250 000 update for ρ and λ is possible in about 2 hours running time on a PC.
Convergence of the parameters acceptance ratio λ : 34,4%, ρ: 18.6%, Θ: 74.1% means λ : 0.0000843, ρ: 0.00587, :0.344
By estimating λ we can compare the expected number of claims to the observed ones • There are other risk factors than location, that have to be accounted for, but suppose the opposite for a moment. • When expected < observed, compute the probability of sample exceedence P(Yjyj) , whereas when expected > observed compute the probability of sample domination P(Yj yj) • Plot these probabilities on a map – this is the so called probability map, measuring the inhomogeneity of the Poisson process
Probability map of claims, based on exposure time Other risk factors • There are further risk factors, like age of the policyholder, car type (ccm), or population size of the locality, etc. • A simple general linear model can be used for adjusting for these risk factors, but even then, a probability map clearly shows a spatial inhomogeneity in the remaining risks
Comparison of observed and expected • The expected claim frequency has to be compared to the observed one • and the probability map can be drawn • Clearly the residuals are almost equally likely everywhere