The future of general insurance pricing – Make money from data

Allan Engelhardt was part of the expert panel of speakers at the Institute and Faculty of Actuaries event titled What is the new GLM? - GI pricing for the world of machine learning, cloud computing, and big data. The discussion was moderated by the wonderful Ed Plowman and the recording is available (only) to members of the IFoA.

We had over 100 attendees for this conversation about the future of general insurance pricing. Is it finally time to move away from GLM some 30 years after they became the standard for technical rating models? In this period, we have witnessed an explosion of computing power and data which opens up new approaches that significantly outperform classic models.

Our recommendations has not changed in the ten years since our case study on Advanced Analytics in Insurance and as we advocated in our Insurance pricing using R training courses. We said then:

We stress that we do not advocate the wholesale abandonment of classical models for modern techniques. Rather, we propose to make use of both: continuity and understanding tempered with the results from the latest up-to-date methods.

You should use modern models in (almost) all cases, but that does not mean you shouldn’t use GLM. We have much good process around building and communicating GLM technical pricing that we must not easily abandon.

Allan gave an overview of the landscape of popular approaches to GI pricing:

Incomplete list of model approaches used in GI pricing (download SVG)

Traditional. In the traditional approach we include all linear models; not just GLM but also the extensions GAM (which allows for continuous rating factors) and GLMM (which allows for longitudinal data where claims depend on past claims). These are well-established models with robust actuarial processes behind them.

Regularisation. As data sets become wider, defining and selecting good and robust rating factors become increasingly harder. In the bad old days we might have used stepwise AIC to select a minimal model; these days we much prefer modern regularisations, in particular as implemented fro R in package {glmnet}. The introduction article provides a useful overview. There is a python wrapper available as pip install glmnet.

Machine Learning. For large data sets, that is where your model matrix is both wide and long, modern machine learning approaches truly shine. Not only do they provide high performance models but they are also robust in selecting rating factors with variable selection built in to many of them.

There are many machine learning algorithms but xgboost stands out as perhaps the most popular. In the R language you have the {xgboost} package and there are bindings for python, Julia, C++, and many other languages. In R, it has largely supplanted the older (but easier to use) {gbm} package.

For an overview of selected machine learning approaches and algorithms, the Machine Learning & Statistical Learning task view on CRAN provides a useful starting point.

Deep Learning. All of the above approaches are for rectangular data. For non-standard data we will want to look at what is labelled ‘Deep Learning’ in the illustration. Specialised algorithms allow us to analyse text, images, sound, video, and much more. (Some of these can also be used effectively on rectangular data.)

So how do we use them?

Examples of how new approaches are used (download SVG)

Improve robustness: Many modern techniques have strong support for managing the number of features (rating factors) used through regularisation (as in {glmnet}) or variable importance (as in {xgboost}). This is essential when you have wide data sets or potentially complex interactions.

Validate the standard model: Everyone should be doing this! Automatically run a {xgboost} model on the same data set you use to build your GLM and alert actuaries when the predictions are materially different. Assuming you have a reasonable model environment and infrastructure, this will probably take you a couple of days to set up, after which it is automatic. Why would you not do this??

Feature engineering for the standard model: Use modern approaches to robustly identify complex features including interactions on the same data you use for the GLM and to create new features from data that is not accessible to the GLM approach. We had a client who used image recognition to provide additional data from images of the buildings they insured.

Replace the GLM model: This is not for everyone, but we do of course see some technical pricing teams abandon the GLM approach in favour of modern techniques.

There are no valid excuses for not at the very least validating your models using modern approaches. Why wouldn’t you?