Background

Premium Risk relates to the claims an insurance company pays from covering the risk associated to earning premium revenues. I implemented a standard frequency severity model in R, which is based on simulation from parametric distributions. For each simulation a non-parametric bootstrap is applied to the claims data used before fitting the parametric distributions.

The non-parametric bootstrap is a computationally heavy process, but it is both simple and widely accepted as a method that reflects the random nature of the parameters in a parametric model. It is trivial to implement in R.

Architecture

The code will run on either a cluster provisioned in Azure or on your localhost as it is embarrassingly trivial to parallelise over simulations. This means that model run times can easily be managed and kept low. The Azure cluster can be provisioned at scale within the R script with just 5 lines of code and an additional line to stop the cluster at the end.

Implementation

Below is an automatically generated plot of the Model source code on a per simulation basis:

In terms of variable pre-fixes:

Variables in use:

The entire model source code is 45 lines long - that is with the normal convention for new line spacing of curly braces but without blank lines for aesthetics or comments.

Performance

For a business expecting to generate 10,000 claims per simulation, it took 58 seconds to run 4,000 simulations of this model on 4 cores of my machine which is powered by a modern intel mobile processor.

Building more functionality

Due to the extensive capabilities of R it is easy to rapidly build models or extend them with new features. After building this model I decided to extend the model with an optional pre-fit feature, where the starting values in the distribution fitting function were set to pre-fit parameter estimates which were calculated once for all simulations by fitting to the claims data before the data is bootstrapped. This feature was implemented on top of existing functionalities as an optional parameter and took 14 extra lines and the code looked like this at the end:

When assessing the code I found that there was no performance gain. I was pleasantly surprised to see that the distribution fitting function is already well optimised here by default in R.

Conclusion

For Risk Modelling, R is an excellent choice. It is flexible enough to model complex systems rapidly. New features can be built in a handful of lines, and tested easily on a per simulation basis in an interactive R session. The performance is pretty good, and since simulation models are embarrassingly parallel it can be managed by provisioning a bigger cluster on Azure from inside the R source code.