Insurance pricing is backwards and primitive, harking back to an era before computers. One standard (and good) textbook on the topic is Non-Life Insurance Pricing with Generalized Linear Models by Esbjorn Ohlsson and Born Johansson (Amazon UK | US). We have been doing some work in this area recently. Needing a robust internal training course and documented methodology, we have been working our way through the book again and converting the examples and exercises to R, the statistical computing and analysis platform. This is part of a series of posts containing elements of the R code.

Let’s get the preliminaries out of the way:

Now we can get started.

## Example 1.2

We grab the data for Table 1.2 from the book’s web site and store it as an R object with lots of good meta information.

1 1 1 1 62.9000 18256 17 4936 2049 0.2703
2 1 1 2 112.9000 13632 7 845 1230 0.0620
3 1 1 3 133.1000 20877 9 1411 762 0.0676
4 1 1 4 376.6000 13045 7 242 396 0.0186
5 1 1 5 9.4000 0 0 0 990 0.0000
6 1 1 6 70.8000 15000 1 212 594 0.0141
7 1 1 7 4.4000 8018 1 1829 396 0.2273
8 1 2 1 352.1000 8232 52 1216 1229 0.1477
9 1 2 2 840.1000 7418 69 609 738 0.0821
10 1 2 3 1378.3000 7318 75 398 457 0.0544
11 1 2 4 5505.3000 6922 136 171 238 0.0247
12 1 2 5 114.1000 11131 2 195 594 0.0175
13 1 2 6 810.9000 5970 14 103 356 0.0173
14 1 2 7 62.3000 6500 1 104 238 0.0161
15 2 1 1 191.6000 7754 43 1740 1024 0.2244
16 2 1 2 237.3000 6933 34 993 615 0.1433
17 2 1 3 162.4000 4402 11 298 381 0.0677
18 2 1 4 446.5000 8214 8 147 198 0.0179
19 2 1 5 13.2000 0 0 0 495 0.0000
20 2 1 6 82.8000 5830 3 211 297 0.0362
21 2 1 7 14.5000 0 0 0 198 0.0000
22 2 2 1 844.8000 4728 94 526 614 0.1113
23 2 2 2 1296.0000 4252 99 325 369 0.0764
24 2 2 3 1214.9000 4212 37 128 229 0.0305
25 2 2 4 3740.7000 3846 56 58 119 0.0150
26 2 2 5 109.4000 3925 4 144 297 0.0366
27 2 2 6 404.7000 5280 5 65 178 0.0124
28 2 2 7 66.3000 7795 1 118 119 0.0151

That was easy. Now for something a little harder.

## Example 1.3

Here we are concerned with replicating Table 1.4. We do it slowly, step-by-step, for pedagogical reasons.

The contrasts could also have been set with the `base=` argument, e.g. `contrasts(table.1.2\$zon) <- contr.treatment(nlevels(table.1.2\$zon), base = zone.base)`, which would be closer in spirit to the SAS code. But I like the idiom presented here where we follow the duration order; it also extends well to other (i.e. not treatment) contrasts. I just wish `rank()` had a `decreasing=` argument like `order()` which I think would be clearer than using `rank(-x)` to get a decreasing sort order.

That was the easy part. At this stage in the book you are not really expected to understand the next step so do not despair! We just show how easy it is to replicate the SAS code in R. An alternative approach using direct optimization is outlined in Exercise 1.3 below.

The result is something like this:

Rating.factor Class Duration Rel.tariff Rel.MMT
1 Vehicle class 1 9833.20 1.00 1.00
2 Vehicle class 2 8825.10 0.50 0.43
3 Vehicle age 1 1918.40 1.67 2.73
4 Vehicle age 2 16739.90 1.00 1.00
5 Zone 1 1451.40 5.17 8.97
6 Zone 2 2486.30 3.10 4.19
7 Zone 3 2888.70 1.92 2.52
8 Zone 4 10069.10 1.00 1.00
9 Zone 5 246.10 2.50 1.24
10 Zone 6 1369.20 1.50 0.74
11 Zone 7 147.50 1.00 1.23

Note the rather unusual and apparently inconsistent rounding in the book: 147, 1.66, and 5.16 would be better as 148 (the value is 147.5), 1.67, and 5.17.

## Exercise 1.3

Here it gets interesting as we get a different value from the authors. Possibly a small bug on our part but at least we provide the code for you to check. So if you spot a problem let us know in the comments.

The resulting table is something like:

g0 g12 g22
Our calculation 0.0334 1.9951 0.7452
Book value 0.0331 2.0123 0.7429

Close, but not the same. Perhaps they used a different error function.