Insurance pricing is backwards and primitive, harking back to an era before (powerful) computers. One standard (and good) textbook on the topic is Non-Life Insurance Pricing with Generalized Linear Models by Esbjorn Ohlsson and Born Johansson (Amazon paid links UK | US). We have been doing some work in this area recently. Needing a robust internal training course and documented methodology, we have been working our way through the book again and converting the examples and exercises to R, the statistical computing and analysis platform. This is part of a series of posts containing elements of the R code.
We grab the data for Table 1.2 from the book’s web site and store it as an R object with lots of good meta information.
Code
################### Example 1.2con<-url("http://www2.math.su.se/~esbj/GLMbook/moppe.sas")data<-readLines(con, n =200L, warn =FALSE, encoding ="unknown")close(con)## Find the data rangedata.start<-grep("^cards;", data)+1Ldata.end<-grep("^;", data[data.start:999L])+data.start-2Ltable.1.2<-read.table(text =data[data.start:data.end], header =FALSE, sep ="", quote ="", col.names =c("premiekl", "moptva", "zon", "dur","medskad", "antskad", "riskpre", "helpre", "cell"), na.strings =NULL, colClasses =c(rep("factor", 3), "numeric",rep("integer", 4), "NULL"), comment.char ="")rm(con, data, data.start, data.end)# Cleanupcomment(table.1.2)<-c("Title: Partial casco moped insurance from Wasa insurance, 1994--1999","Source: http://www2.math.su.se/~esbj/GLMbook/moppe.sas","Copyright: http://www2.math.su.se/~esbj/GLMbook/")## See the SAS code for this derived fieldtable.1.2$skadfre=with(table.1.2, antskad/dur)## English language column names as comments:comment(table.1.2$premiekl)<-c("Name: Class","Code: 1=Weight over 60kg and more than 2 gears","Code: 2=Other")comment(table.1.2$moptva)<-c("Name: Age","Code: 1=At most 1 year","Code: 2=2 years or more")comment(table.1.2$zon)<-c("Name: Zone","Code: 1=Central and semi-central parts of Sweden's three largest cities","Code: 2=suburbs and middle-sized towns","Code: 3=Lesser towns, except those in 5 or 7","Code: 4=Small towns and countryside, except 5--7","Code: 5=Northern towns","Code: 6=Northern countryside","Code: 7=Gotland (Sweden's largest island)")comment(table.1.2$dur)<-c("Name: Duration","Unit: year")comment(table.1.2$medskad)<-c("Name: Claim severity","Unit: SEK")comment(table.1.2$antskad)<-"Name: No. claims"comment(table.1.2$riskpre)<-c("Name: Pure premium","Unit: SEK")comment(table.1.2$helpre)<-c("Name: Actual premium","Note: The premium for one year according to the tariff in force 1999","Unit: SEK")comment(table.1.2$skadfre)<-c("Name: Claim frequency","Unit: /year")## Save results for latersave(table.1.2, file ="table.1.2.RData")## Print the table (not as pretty as the book)print(table.1.2)################
premiekl
moptva
zon
dur
medskad
antskad
riskpre
helpre
skadfre
1
1
1
1
62.9000
18256
17
4936
2049
0.2703
2
1
1
2
112.9000
13632
7
845
1230
0.0620
3
1
1
3
133.1000
20877
9
1411
762
0.0676
4
1
1
4
376.6000
13045
7
242
396
0.0186
5
1
1
5
9.4000
0
0
0
990
0.0000
6
1
1
6
70.8000
15000
1
212
594
0.0141
7
1
1
7
4.4000
8018
1
1829
396
0.2273
8
1
2
1
352.1000
8232
52
1216
1229
0.1477
9
1
2
2
840.1000
7418
69
609
738
0.0821
10
1
2
3
1378.3000
7318
75
398
457
0.0544
11
1
2
4
5505.3000
6922
136
171
238
0.0247
12
1
2
5
114.1000
11131
2
195
594
0.0175
13
1
2
6
810.9000
5970
14
103
356
0.0173
14
1
2
7
62.3000
6500
1
104
238
0.0161
15
2
1
1
191.6000
7754
43
1740
1024
0.2244
16
2
1
2
237.3000
6933
34
993
615
0.1433
17
2
1
3
162.4000
4402
11
298
381
0.0677
18
2
1
4
446.5000
8214
8
147
198
0.0179
19
2
1
5
13.2000
0
0
0
495
0.0000
20
2
1
6
82.8000
5830
3
211
297
0.0362
21
2
1
7
14.5000
0
0
0
198
0.0000
22
2
2
1
844.8000
4728
94
526
614
0.1113
23
2
2
2
1296.0000
4252
99
325
369
0.0764
24
2
2
3
1214.9000
4212
37
128
229
0.0305
25
2
2
4
3740.7000
3846
56
58
119
0.0150
26
2
2
5
109.4000
3925
4
144
297
0.0366
27
2
2
6
404.7000
5280
5
65
178
0.0124
28
2
2
7
66.3000
7795
1
118
119
0.0151
That was easy. Now for something a little harder.
Example 1.3
Here we are concerned with replicating Table 1.4. We do it slowly, step-by-step, for pedagogical reasons.
Code
################### Example 1.3if(!exists("table.1.2"))load("table.1.2.RData")## We calculate each of the columns individually and slowly here## to show each step## First we have simply the labels of the tablerating.factor<-with(table.1.2,c(rep("Vehicle class", nlevels(premiekl)),rep("Vehicle age", nlevels(moptva)),rep("Zone", nlevels(zon))))## The Class columnclass.num<-with(table.1.2, c(levels(premiekl), levels(moptva), levels(zon)))## The Duration is the sum of durations within each classduration.total<-c(with(table.1.2, tapply(dur, premiekl, sum)),with(table.1.2, tapply(dur, moptva, sum)),with(table.1.2, tapply(dur, zon, sum)))## Calculate relativities in the tariff## The denominator of the fraction is the class with the highest exposure## (i.e. the maximum total duration): we make that explicit with the## which.max() construct. We also set the contrasts to use this as the base,## which will be useful for the glm() model later.class.base<-which.max(duration.total[1:2])age.base<-which.max(duration.total[3:4])zone.base<-which.max(duration.total[5:11])rt.class<-with(table.1.2, tapply(helpre, premiekl, sum))rt.class<-rt.class/rt.class[class.base]rt.age<-with(table.1.2, tapply(helpre, moptva, sum))rt.age<-rt.age/rt.age[age.base]rt.zone<-with(table.1.2, tapply(helpre, zon, sum))rt.zone<-rt.zone/rt.zone[zone.base]contrasts(table.1.2$premiekl)<-contr.treatment(nlevels(table.1.2$premiekl))[rank(-duration.total[1:2], ties.method ="first"), ]contrasts(table.1.2$moptva)<-contr.treatment(nlevels(table.1.2$moptva))[rank(-duration.total[3:4], ties.method ="first"), ]contrasts(table.1.2$zon)<-contr.treatment(nlevels(table.1.2$zon))[rank(-duration.total[5:11], ties.method ="first"), ]
The contrasts could also have been set with the base= argument, e.g. contrasts(table.1.2$zon) <- contr.treatment(nlevels(table.1.2$zon), base = zone.base), which would be closer in spirit to the SAS code. But I like the idiom presented here where we follow the duration order; it also extends well to other (i.e. not treatment) contrasts. I just wish rank() had a decreasing= argument like order() which I think would be clearer than using rank(-x) to get a decreasing sort order.
That was the easy part. At this stage in the book you are not really expected to understand the next step so do not despair! We just show how easy it is to replicate the SAS code in R. An alternative approach using direct optimization is outlined in Exercise 1.3 below.
Code
## Relativities of MMT; we use the glm approach here as per the book’s## SAS code at http://www2.math.su.se/~esbj/GLMbook/moppe.sasm<-glm(riskpre~premiekl+moptva+zon, data =table.1.2, family =poisson("log"), weights =dur)## If the next line is a mystery then you need to## (1) read up on contrasts or## (2) remember that the link function is log() which is why we use exp hererels<-exp(coef(m)[1]+coef(m)[-1])/exp(coef(m)[1])rm.class<-c(1, rels[1])# See rm.zone below for therm.age<-c(rels[2], 1)# general approachrm.zone<-c(1, rels[3:8])[rank(-duration.total[5:11], ties.method ="first")]## Create and save the data frametable.1.4<-data.frame(Rating.factor =rating.factor, Class =class.num, Duration =duration.total, Rel.tariff =c(rt.class, rt.age, rt.zone), Rel.MMT =c(rm.class, rm.age, rm.zone))save(table.1.4, file ="table.1.4.RData")print(table.1.4, digits =3)rm(rating.factor, class.num, duration.total, class.base, age.base, zone.base,rt.class, rt.age, rt.zone, rm.class, rm.age, rm.zone, m, rels)################
The result is something like this:
Rating.factor
Class
Duration
Rel.tariff
Rel.MMT
1
Vehicle class
1
9833.20
1.00
1.00
2
Vehicle class
2
8825.10
0.50
0.43
3
Vehicle age
1
1918.40
1.67
2.73
4
Vehicle age
2
16739.90
1.00
1.00
5
Zone
1
1451.40
5.17
8.97
6
Zone
2
2486.30
3.10
4.19
7
Zone
3
2888.70
1.92
2.52
8
Zone
4
10069.10
1.00
1.00
9
Zone
5
246.10
2.50
1.24
10
Zone
6
1369.20
1.50
0.74
11
Zone
7
147.50
1.00
1.23
Note the rather unusual and apparently inconsistent rounding in the book: 147, 1.66, and 5.16 would be better as 148 (the value is 147.5), 1.67, and 5.17.
Exercise 1.3
Here it gets interesting as we get a different value from the authors. Possibly a small bug on our part but at least we provide the code for you to check. So if you spot a problem let us know in the comments.
Code
################## Exercise 1.3## The values from the bookg0<-0.03305g12<-2.01231g22<-0.74288dim.names<-list(Milage =c("Low", "High"), Age =c("New", "Old"))pyears<-matrix(c(47039, 56455, 190513, 28612), nrow =2, dimnames =dim.names)claims<-matrix(c(0.033, 0.067, 0.025, 0.049), nrow =2, dimnames =dim.names)## Function to calculate the error of the estimateGvalsError<-function(gvals){## The current estimatesg0<-gvals[1]g12<-gvals[2]g22<-gvals[3]## The current estimates in convenient matrix formG<-matrix(c(1, 1, g12, g22), nrow =2)G1<-matrix(c(1, g12), nrow =2, ncol =2)G2<-matrix(c(1, g22), nrow =2, ncol =2, byrow =TRUE)## The calculated valuesG0<-addmargins(claims*pyears)["Sum", "Sum"]/(sum(pyears*G1*G2))G12<-addmargins(claims*pyears)["High", "Sum"]/(g0*addmargins(pyears*G2)["High", "Sum"])G22<-addmargins(claims*pyears)["Sum", "Old"]/(g0*addmargins(pyears*G1)["Sum", "Old"])## The sum of squared errorserror<-(g0-G0)^2+(g12-G12)^2+(g22-G22)^2return(error)}## Minimize the error function to obtain our estimategamma<-optim(c(g0, g12, g22), GvalsError)stopifnot(gamma$convergence==0)gamma<-gamma$parvalues<-data.frame(legend =c("Our calculation", "Book value"), g0 =c(gamma[1], g0), g12 =c(gamma[2], g12), g22 =c(gamma[3], g22), row.names ="legend")print(values, digits =4)## Close, but not the same.rm(g0, g12, g22, dim.names, pyears, claims, gamma, values)################
The resulting table is something like:
g0
g12
g22
Our calculation
0.0334
1.9951
0.7452
Book value
0.0331
2.0123
0.7429
Close, but not the same. Perhaps they used a different error function.