What data is fed into the model (data layout, format, type, constraints)?
For a typical insurance pricing analysis, you need to find the finest level for which premium and exposure characteristics can be effectively matched with loss. Sometimes this is aggregated at the account level, but sometimes it is finer than this (line of business, location, business classification, etc. within account). The data we would be looking for would be a table at this level of detail with the following fields:
- Manual Premium
- Actual Premium
- Incurred Loss
- Policy Commencement Date
- Line of Business
- Rating Characteristic1
- Rating Characteristic2
- Rating Characteristic3
where the Rating Characteristics are anything that is available and that is being used for rating the policies. This should include geographical data such as ZIP code or county, business classification data such as SIC code, etc. Also any additional information that may be available to differentiate between accounts. A good example would be agency number. It may have no bearing on the current premium rating, but it is still interesting to consider as it may be predictive.
What periods of data will be required?
Multiple years are required, ideally 10 years, but we only need loss amounts at their current value, not the detailed loss development history, so that helps a bit.
What is the installation process?
The installation is very basic. We will provide an installer. After the installer is executed by someone with administrative rights and the software is installed on the target machine, we will provide a registration code. We recommend the following minimum requirements (but others may work as well):
What are the hardware/software requirements?
- Windows 10 or Windows 7 operating system
- 4 GB RAM
- 2 GHz processor
- Microsoft SQL Server (can be installed locally on a PC or on a central server)
Do you have a users’ guide for the software?
The user guide is the set of help screens within the software.
Does the output of the model include the production of SQL Server tables? Reports? Spreadsheets?
It can produce printed reports (often printing to Adobe pdf is useful here), sql server tables and csv files that can be used for downstream processes.
How “clean” must the data be for the model to run correctly and produce expected results?
Only data where premium, exposure, and losses can be effectively matched should be included (no nulls in those fields), but rating characteristics can contain null values, and the software is still effective. Often the fact that a characteristic value is null is found to be predictive.
How does it typically take to run the model once fed with data?
This depends on the size of the data, including number of records and characteristics, but it tends to range from a matter of minutes to a number of hours to run a model. Part of the art is making adjustments to model settings, trying different characteristics, and then comparing model results. This typically takes a few days or longer. For the immediate project, the goal is to decide on these model choices in this initial analysis, and then lock them in for future runs of the model so as to minimize the time and effort for updates.