Channeling the Flood of Predictive Data

By John Baldan and Jeff De Turris

Let's face it: We're awash in data. The amount of digital data on the Internet — thanks to the increased use of social networks, smartphones, and proprietary databases — is currently estimated at one trillion gigabytes. Yet insurance companies are scrambling for more data. Any data element, provided it's not too onerous to collect or proscribed by law, is fair game for use in predictive models. The attitude seems to be: Capture it now, then see if it's predictive.

Many types of data might be fed into the modeling hopper. Data can be internally available or purchased externally. It can be raw data or normalized by statisticians to be more suitable for inclusion in a model. Better yet, it can be value-added third-party data demonstrated to be predictive in many other settings, in many other data sets. Whatever kind of data it is, though, it invites analysis.

Competition stimulates evolution, and in the particular case of insurance, this means a continued trend toward more complex under­writing and pricing systems.

For example, auto insurers are now scrutinizing many untraditional variables for model use. Take a car, any car. What are the characteristics of the places where it is driven? Are they rural, suburban, or urban? Are they in or near traffic "hot spots"? On what kinds of roads is the car driven — at what speed, at what time of day, in what kinds of weather? What is the vehicle's safety rating and rear bumper height? Who is really driving the car? (Commercial auto insurers have the lead in answering the last question because they've been able to institute programs to monitor the drivers of insured fleets. But personal auto insurers may also be able to answer that question through future advances in telematics.)

Models can be voracious, accommodating more data than small and medium-sized insurers typically have available. Some insurers are working with their IT units to append additional variables to existing ratemaking data sets, while others are seeking data from third parties. The goal is finding models that differentiate between higher-cost and lower-cost risks. Put another way, given any subset of an insurer's book currently rated similarly, can the model consistently and verifiably identify which risks will be higher-cost and which will be lower-cost? Any insurer that deploys such a model will be armed with insights that its competitors lack and will be able to differentiate itself in the marketplace.

In 2010, the Verisk Analytics management team surveyed and interviewed personal lines insurers of all stripes — small and large, coast to coast — about refinements they would like in our manuals, as well as their general attitude toward predictive modeling. We learned that nearly everyone is interested in modeling and that many — even those without much data — would like to do it themselves.

Insurers could be viewed as belonging to three groups. Some insurers would be satisfied with an industrywide predictive model, provided it offered significant lift over their current rating practices and incorporated some of the latest variables that industry leaders have adopted.

A second group of insurers would be satisfied with an industrywide predictive model but only if tailored to their own book of insureds. While the industrywide model might identify many key predictors of loss, the relative lift generated by individual predictors can vary by insurer, either because the insurer writes an unusual niche or because the insurer has already deviated from industry practice through rating modifications.

Finally, some insurers want something completely "outside the box." A model that differs in key ways from those of its competitors allows an insurer to distinguish itself from the competition, with all the concomitant benefits of marketing differentiation and a profitable niche. This type of rating philosophy requires both unique in-house data and third-party value-added predictor variables.

Each of these approaches to rating refinement is valid — and they all require data. In short, competition stimulates evolution, and in the particular case of insurance, this means a continued trend toward more complex underwriting and pricing systems. Like the scope of the Internet, insurers' databases will undoubtedly continue to grow in size and complexity.

John Baldan is director of modeling at ISO. Jeffrey De Turris is assistant vice president of personal lines at ISO.

Putting Data to Work

We learned from a recent personal lines customer survey that insurers have an appetite for modeling. However, insurer needs can range from off-the-shelf industrywide predictive models to value-added predictive variables for building their own models. Verisk Insurance Solutions has developed a long-term action plan that will equip insurers with a variety of tools for today's predictive modeling environment.

Developing Models
ISO Risk Analyzer® — a suite of predictive models to help insurers classify, segment, and price insurance risks — delivers a range of rating information, from modeled loss costs to components (predictive variables), to help insurers develop their own models.

The ISO Risk Analyzer environmental modules produce expected losses at a more refined level than traditional territories, using detailed information regarding cost differences associated with a risk's location. The personal auto environmental module provides this information by policy coverage, and the homeowners environmental module provides the information for all perils combined as well as by peril (fire, lightning, nonhurricane wind, theft, weather-related water, nonweather-related water, hail, liability, and other).

Through our personal auto and homeowners environmental modeling efforts, we analyzed the interactive effects of hundreds of variables to produce components constructed from variables such as weather, census information, elevation, business location, loss experience, and trend. Those components can be powerful input to insurers' own models.

We have also developed a personal auto physical damage vehicle module that provides expected loss information for comprehensive and collision coverages. Components for collision and comprehensive (separately by peril) are also available. Output is provided by components such as body style, vehicle specifications, performance, and safety features. Delivery of the vehicle module through our VINMASTER® offers insurers an additional symbols product.

Also in the works is a homeowners rating factors module, integrating the predictive analytic features of the environmental module with by-peril rating relativities for policy deductible, amount of insurance, and age of construction. On the horizon is a homeowners building characteristics module, which predicts expected losses by peril based on an analysis of building characteristics such as square footage.

Enhancing Rating Manual Information
Using output from our predictive modeling efforts, we will introduce optional rating rules to provide greater refinement within our manuals. Those rules will be filed with regulators — but not on behalf of participating insurers — to provide customers maximum flexibility regarding their use.

We will soon file an alternate personal auto primary class plan with factors that vary by coverage and provide greater age refinement. The information for the class plan comes from an analysis we performed on driver characteristics. Our alternate accident and conviction surcharge plan will also be refined.

Alternate homeowners manual rating rules, to be implemented next year, will allow insurers to rate homeowners risks by peril.

And this is just the beginning. We will continue to develop models and offer alternate manual rating rules that use data and analytics to provide new solutions and options that fit customers' needs.