Beyond the Box of Data: Next Steps

By Phil Hatfield

Part II

Most any baker would agree a finished cake is only as good as its ingredients. The same holds true in deriving actionable insights from data, which is where governance can play a critical role in ensuring the fulfillment of data requirements and quality expectations (see part 1 of this article, “Getting Actionable Insights from a Box Full of Data”). However, without properly assembling the data, the result likely will be an unwieldy mess. The process involves the baking equivalent of sifting and mixing the prescribed ingredients in the proper manner, which, in data terms, translates into integration, analysis — and the icing on the cake — reporting.

How the disparate data sources are linked together within the database and the ability to aggregate data along those dimensions will either enable or limit the types of information and analysis that can be derived from the database. Once again, that’s an area in which we need to have some idea of where we’re going before we can determine how best to get there. Keeping our end goal of actionable insights in mind, what types of analysis and information do we expect to get from the data? At what level of granularity do we need to maintain the data?

If it’s important to know whether a property is located in a flood plain but your database doesn’t know the location of the property more precisely than a ZIP code, then you’ve set an impossible task for yourself. As a rule, it’s best to maintain your data at the most granular level possible and then aggregate to other levels if the most refined level of detail is not needed. However, keeping very granular data does increase costs in terms of complexity and efficiency. While the dimensions used to link your various data sources are limited only by your imagination and the type of information you want to generate, four dimensions are common in the insurance industry.

  1. Geography. It’s amazing how much of the information we deal with is related to the physical location of the content in our databases. Rating territories, market areas, vehicle telematics, demographics — all of these are based on geography. It’s hard to imagine an information system that doesn’t have geography as one of its dimensions. One of the tricky parts of the geography aspect is that there isn’t just one geography. There are latitude and longitude coordinates to identify points on the globe; political boundaries that describe the areas administered by cities, counties, and states; rating territories developed by insurers that often differ by the type of coverage; U.S. census geography that may coincide with some political boundaries but not others; and postal codes that are often used to describe geography but weren’t really designed for that purpose. Accurately and efficiently linking and rationalizing data that uses various geographic systems is crucial to obtaining a complete picture of the relationships in your data. The process is complex and a data specialty in its own right.
  2. Entity. An entity is an actor in the economic world of our business. It could be a person, a household, a customer, a business, a vendor, a claimant, a provider — the roles are almost endless. Obviously, a single entity might play several different roles as they relate to our business, and it would be helpful to know that. Is the personal auto policyholder John Q. Smith the same as John Smith, the CFO of ABC Inc., to whom we sold a workers' comp policy, and the same as J.Q. Smith, the third-party liability claimant on a different policy? We want to know. We want to be able to understand and analyze all the ways in which each entity acts and interacts with our business. The capability to link all the data sources and have them at our disposal is called entity resolution. And again, that’s a relatively specialized subspecialty of data management.
  3. Product. Luckily, compared with the dimensions of geography and entity, products are often reasonably well defined within operating systems and are usually incorporated into reports. So the primary decisions to be made in the enterprise data resource context revolve around what’s a “product”? For example, is it just a written policy? Or does it include the whole range of products, services, and interactions you provide to your customers, including quotes, billing history, claims, loss control, premium audit, and so on? You need to address the issue of granularity in this context as well. Do you care to analyze data at the customer level, policy level, coverage level? Usually, you’ll want to capture the data at the most granular level available and then aggregate as needed.
  4. Time. There are two aspects of time in an enterprise data resource that may be different from the systems that are the source of the data. First, many operational systems don’t need to maintain historical data for long periods of time, and so they don’t. After a certain period, the operational system may summarize the data and delete the detail or archive the data. But certain types of analysis require years and years of data. Often the data resource will end up housing more historical data than the operational system. Second, summarized data may not be enough; we need data at the most granular (there’s that word again) timescale for certain types of analysis. Particularly for predictive modeling, we often need to extract what was known as of a particular date. For a policy in force during calendar year 2008, for example, it may not be enough to know that we paid $5,000 in claims against that policy. What we actually may need to know is that as of June 30, 2008, there was one open claim with no payments and a case reserve of $1,000. Designing the temporal dimension of the data resource requires forethought and planning as well as some idea of the types of analysis you expect to conduct with the data.

Data, databases, data warehouses, data management, big data — we talk a lot about data. But there’s an important distinction between data and our ultimate goal of actionable business insights. The process of turning disparate data items into a coherent scenario relevant to our business is the process we commonly call analysis. If that scenario also suggests a decision or action we should take, then we have an actionable business insight.

For example, John Q. Smith lives at 123 Elm Street is a single piece of data (or a fact) and not very useful on its own. But if we can add some more data, context, and business knowledge to this fact, we may derive that John Q. Smith, who lives at 123 Elm Street, has had a homeowners, auto, and umbrella policy with ABC Insurance Co. for the past seven years and never had a claim. Today, he has just closed on a new home at 234 Maple Street. The customer lifetime value model we’ve developed indicates that Mr. Smith will be worth $15,000 to our company as a customer in the future. We know from other analyses we’ve done in the past that major life events, such as the purchase of a new home, are triggers for insurance shopping behavior. Therefore, we should make sure we contact Mr. Smith so we don’t lose this valuable customer.

The definition of the word “analysis” is the breaking apart of something into its constituent elements. But that’s the easy part. The real art of what is commonly called analysis is actually the opposite — the synthesis of the constituent elements into something new and useful (in our case, actionable business insights). Analysts must draw upon all their intelligence, talent, and business knowledge to derive insights from what had been just a box of data.

We’ve collaborated to design and build a data resource that will provide information useful for decision makers. It’s well documented. We can slice and dice our data by geography, by customer, by coverage, and for any time period in the past ten years. We’ve analyzed the data and discovered new insights we didn’t know were there.

But all that effort will be for nothing if we don’t get the data in the hands of decision makers in a form they can digest and at the time when they require a decision. We’ll call that reporting — the icing on the cake — but it can take many forms, depending on the needs of the users. Some people require the ability to analyze the data on their own so they can look at it from different perspectives. Perhaps they will require a custom data cube and a robust business intelligence tool for their purposes. Some users will be do-it-yourself power users and will need access to the data in as raw a form as possible. Maybe they just need a database client interface to the data. Some people will want only an occasional current snapshot of metrics that are important to them. That’s exactly what a dashboard does. As a general rule, the more occasional the data use, the simpler the tool can be, though more work is required on the information before delivery.

By using these four steps to create an effective enterprise data resource — governance, integration, analysis, and reporting — you’ll be more likely to produce a tempting “cake,” made with the right quantity and quality of data. Further, unlike cake, with data, not only can you have it and eat it too — but, with analytics, you’ll wind up with even more data than you started with. Just think: Had Marie Antoinette said, “Let them have data,” she may well have kept her head — indeed, a valuable lesson for today’s executive.

Phil Hatfield, J.D., CPCU, leads the Modeling Data Services group for ISO Insurance Programs and Analytic Services, a unit of Verisk Analytics.

« Back to part I