R & Azure Cloud Risk Modelling

Background

Premium Risk relates to the claims an insurance company pays from covering the risk associated to earning premium revenues. I implemented a standard frequency severity model in R, which is based on simulation from parametric distributions. For each simulation a non-parametric bootstrap is applied to the claims data used before fitting the parametric distributions.

The non-parametric bootstrap is a computationally heavy process, but it is both simple and widely accepted as a method that reflects the random nature of the parameters in a parametric model. It is trivial to implement in R.

Architecture

The code will run on either a cluster provisioned in Azure or on your localhost as it is embarrassingly trivial to parallelise over simulations. This means that model run times can easily be managed and kept low. The Azure cluster can be provisioned at scale within the R script with just 5 lines of code and an additional line to stop the cluster at the end.

Implementation

Below is an automatically generated plot of the Model source code on a per simulation basis:

In terms of variable pre-fixes:

“data” is used for data.
“input” is used for inputs.

Variables in use:

data_exposure is the value of the exposure for the year to be modelled.
data_frequency_per_expo is the number of claims per unit of exposure in past years.
data_severity is ultimate claims data.
simulation_input is the fitted parameters and is required to generate the simulated claims.
simulation_values is the simulated claims values, and simulation_n_claims is the number of simulated claims
temp_eval_string is just a temporary variable that keeps things clean.

The entire model source code is 45 lines long - that is with the normal convention for new line spacing of curly braces but without blank lines for aesthetics or comments.

Performance

For a business expecting to generate 10,000 claims per simulation, it took 58 seconds to run 4,000 simulations of this model on 4 cores of my machine which is powered by a modern intel mobile processor.

Building more functionality

Due to the extensive capabilities of R it is easy to rapidly build models or extend them with new features. After building this model I decided to extend the model with an optional pre-fit feature, where the starting values in the distribution fitting function were set to pre-fit parameter estimates which were calculated once for all simulations by fitting to the claims data before the data is bootstrapped. This feature was implemented on top of existing functionalities as an optional parameter and took 14 extra lines and the code looked like this at the end:

When assessing the code I found that there was no performance gain. I was pleasantly surprised to see that the distribution fitting function is already well optimised here by default in R.

Conclusion

For Risk Modelling, R is an excellent choice. It is flexible enough to model complex systems rapidly. New features can be built in a handful of lines, and tested easily on a per simulation basis in an interactive R session. The performance is pretty good, and since simulation models are embarrassingly parallel it can be managed by provisioning a bigger cluster on Azure from inside the R source code.