Big Data for Insurers: Avoiding Black Swans and Trusting Turkeys
By Jim Weiss and Su Wash
How best to make use of big data remains a mystery for many property/casualty insurers. Four out of five insurers reportedly use big data, with the most extensive uses far and away being in predictive exercises such as pricing, underwriting, and risk selection. Even so, research firm Gartner reports that 60 percent of big data projects are abandoned, suggesting that organizations may not always be looking in the right places to derive value. In some cases, abandoning a project may be a relatively positive outcome compared with the risks of erroneous prediction and suboptimal decision making that can often result from ill-conceived big data ventures.
For years, scientists and entrepreneurs have pondered the possibility of an “intelligence explosion,” a scenario in which algorithms run amok and threaten society’s well-being. Naturally, these discussions tend to focus less on predicting a date for this takeover by artificial superintelligence than on minimizing the risk of its occurrence. Insurers deciding their big data strategies would be wise to take heed. When considering the evolution over time of data volume, velocity, and variety (the “three Vs” used to describe big data) in insurance, we see that outmoded devotion to prediction may be limiting many insurers from experiencing big data’s additional potential in the realm of exposure modification. An ideal strategy may be a “barbell” that flexes its muscle in both regards.
Prediction with purpose
In general, insurers haven’t always been as focused as today on predicting the elusively perfect price point for each policyholder. Figure 1 displays various data sources introduced over the years or those that may be, along with estimates of potential value from a perspective of prediction and exposure modification. Overall, insurers have long turned to rate reviews that use aggregated premium and loss data to project redundancies and shortfalls in revenue—and modify pricing accordingly. While the nature of these analyses is predictive, the motivation is largely exposure modification. Higher rates tend to cause policyholder attrition, whereas lower ones tend to drain surplus.
As data increases in volume, velocity, and variety, diminishing returns are forecast from a prediction perspective. Yet there can be significant value from exposure modification. For example, many insurers have helped prevent hundreds of thousands in losses by providing IoT-based notifications. Insurers that purpose big data solely for pricing, underwriting, and risk selection may overlook where the greatest value resides.
Identifying a proper rate level can reduce risk on multiple levels. An insurer’s insolvency risk can be reduced, for example, and the policyholder’s resulting risk could decrease due to not being indemnified after an unfortunate event. These old insurance analyses are simple, rely on “small data” (at least by modern standards), and are oriented toward risk reduction.
Increased availability of detailed policyholder-level information has moved the needle closer to predictive perfection. For example, an insurer’s rate review may project a revenue shortfall. Based on this, an insurer could increase rates across the board, but the immediate cause of the shortfall can often be attributable policyholders who file claims, not those who don’t. Moreover, it’s provable that individuals with prior claims are significantly more likely to experience future ones. So, an insurer may elect simply to surcharge policyholders with prior claims as opposed to increasing rates uniformly to address a projected shortfall. It’s debatable how strongly the knowledge that a claim may increase one insured’s rate incentivizes loss control (as if the idea of a claim weren’t traumatic enough), but there’s fairness in charging a bit more to those more likely to generate losses.
External data meets introspection
Insurers that better align price with risk are likely to find a competitive advantage. Conversely, carriers that don’t surcharge policyholders who have had prior claims probably become inundated with applications from the claim-prone population and experience profitability deterioration. These lagging insurers are often forced to refine their predictions in kind, after which many leading insurers seek new and larger sources of external information to retain their advantage, with potential to set in motion a spiral.
Some data, such as Verisk’s Public Protection Classifications and Vehicle Symbols, confer incentives to improve fire safety practices or purchase safer vehicles. Others, such as credit history, have clear and significant prediction potential but little or no obvious loss control incentives. All these additional data sources align price with risk ostensibly well, but injecting new costs and complexity into analysis tends to yield comparatively little in the way of insights into exposure modification.
The analyses described above rely heavily on “prior information,” that is, data used to predict “given condition X is present, outcome Y is probable to occur.” Such thinking generally does not work well for catastrophes (cats) that occur every thousand years or so, where looking at the “immediate past” (such as the past 100 years) often provides minimal insight. Insurers famously found this out after Hurricane Andrew in 1992, when many carriers became insolvent. This led to an explosion of catastrophe models that compensate for a lack of historical data by simulating millions of possible events using meteorological, geological, and structural data. Cat models do not predict catastrophes, but they allow insurers to more judiciously manage concentration and surplus in recognition of the possible. The understanding conferred by cat models may also influence building codes and permit grants, reducing risk across the board. Although the data used is large and complex, cat models swing the pendulum back toward exposure modification.
Connection vs. dejection
The Internet of Things (IoT) has become the technology most frequently associated with big data in insurance. Vehicles, homes, businesses, and machines are increasingly equipped with sensors and devices connected to the Internet. Many insurers have begun to make use of resulting data over the past two decades. For example, usage-based insurance (UBI) for auto offers discounts to policyholders who provide IoT data that evidences they operate their vehicles infrequently or “safely” (for example, at appropriate speeds relative to conditions). Before IoT, policyholders’ premiums in general were relatively invariant to the degree or nature of usage of insured assets. In contrast, with pay-per-mile UBI, drivers could reportedly reduce premiums by 50 percent or more by operating vehicles less frequently. An insurer could even conceivably incentivize policyholders to relocate their vehicles and themselves when sensors detect an imminent catastrophe. While price competition is one motivating factor behind IoT usage, greater potential may reside in policyholders reducing their risk.
Ironically, the availability of sensor data may lead to exposure modification hysteria not unlike the prediction addiction for which IoT provides a prospective cure. For each data source described above, Figure 2 displays estimated potential (combining benefits from prediction and modification in Figure 1) compared with estimated model risk. Autonomous vehicles (AVs) in a sense represent UBI in its perfect state, because they crunch sensor data to manage the risk of accidents down to zero, in the process perhaps eliminating underwriting from the equation. AVs may also devastate many insurers by creating the next Andrew-like catastrophe. AVs are Internet-connected and subject to hacking. Their algorithms are self-learning and contain degrees of complexity that would make financial instruments underlying the 2008 financial crisis seem rudimentary. Even existing UBI programs have been accused of sometimes offering confusing or dangerous advice. Though well-intentioned, overreliance on exposure modification may lead to a brutal and unforeseen crash for many insurers and AVs alike.
Data with greater volume, velocity, and variety leads to greater complexity in algorithms and approaches. Resulting opacity, overconfidence in prediction, and excessive intervention with reality foments greater model risk. Like Taleb’s Thanksgiving turkey or the turkeys in the German study, much of big data’s value is thereby “eaten up.” Reducing model risk using simpler, more tractable approaches to big data tends to maximize net value.
Nassim Nicholas Taleb (risk analyst and author of The Black Swan) famously presents the example of a Thanksgiving turkey that fails to predict the risk of his own demise because it is overly reliant on analysis of past data suggesting its owner’s devotion to providing a comfortable and well-fed lifestyle. Many insurers may be behaving just like this turkey as they approach big data.
The false confidence bestowed by more precise predictions may lead some insurers to insufficiently perceive and prepare for the risks of the future. Some of these may be natural and may be overlooked due to an uninformed belief that cat models reflect the full realm of the possible. Other risks may be man-made or even of an insurer’s own making and can result from transacting in the cyber realm or entrusting critical decisions to artificial intelligence (AI). For example, insurers offering cyber cover may not fully grasp their own cyber risk, and carriers replacing claims departments with AI may yearn for human intelligence if and when bad actors outsmart their algorithms.
Scientists in Germany recently discovered a refreshing alternative to turkey-like behavior by attaching sensors to livestock in the remote mountains of central Italy. The scientists could easily have built an animal mortality prediction algorithm with the resulting IoT data and used it to inform livestock insurance pricing. There would be nothing patently wrong with this, but the pricing would likely neglect how easily animals may perish in a disaster. This reality was not lost on the animals. Before an earthquake, turkeys characteristically got eaten, but cows, sheep, and dogs showed surprising acumen in perceiving an impending quake and skipping town.
Scientists analyzing IoT data with loss prevention in mind would likely not be far behind, but those building a rating plan may well have stayed in the lab in a misbegotten quest for competitive advantage. Luckily, many insurers are awakening to the possibilities of loss prevention, using IoT to avert hundreds of thousands of dollars in losses through simple notifications. Prediction will tend to always have its place wherever there’s signal to be found, but many insurers may find that big data’s biggest value is in simply honing our natural instincts for risk reduction.