Is Data the New Oil?

By Perry Rotella

About a year ago, on a CNBC Squawk Box segment, “The Pulse of Silicon Valley,” host Joe Kernan posed a question to Ann Winblad, the legendary investor and senior partner at Hummer-Winblad: “What is the next really big thing?” Her response: “Data is the new oil.”

Winblad talked about predictive analytics as the new hot spot for venture investing and discussed the growth of companies that can derive value from the huge amounts of data being stored.

That’s not the first time we’ve heard the phrase “Data is the new oil.” For example, marketing commentator Michael Palmer blogged back in 2006 ( data_is_the_new.html): “Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, et cetera, to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.”

Critics of the analogy have pointed out that oil is finite, while data is always renewable, being created in vast quantities every day. However, defining data as the new oil simply helps underline the emergence of data quality, management, and analysis as “the next really big thing.” According to IBM, the digital universe will grow to 8 zettabytes by 2015. The real impetus and value are the potential insights we can derive from this new, vast, and growing “natural” resource. If data is the next big thing, then companies need to think about a new business model that productively exploits this valuable resource.

It also follows that the 21st century chief information officer becomes a fulcrum to leverage the value of data as an indispensable business resource. The CIO’s role is fast evolving. In particular, to achieve C-suite credibility, CIOs must continually demonstrate ideas and abilities to influence the enterprise in material ways. Developing a capability to mine and refine data into greater business insights, more productive decision making, and new product offerings is clearly one key avenue for CIOs to make and sustain a substantial influence on the enterprise. Such responsibility and contribution begin with effective data management.

Under the leadership of the CIO, companies must increasingly treat all forms of data as valuable enterprisewide corporate assets — and better impart and manage those assets, especially locally and within business units. Coordinated distribution and sharing of data about customers and products often provide opportunities to up-sell, cross-sell, and create customer service and retention plans best aligned with a customer’s true value to the organization. Imparting product data proficiently can open up opportunities to create new and innovative solutions or capabilities across lines of business. Consequently, an enterprisewide understanding of customer behavior, products, and transactions drives and enables advanced analytics. In turn, data-driven predictive analytics platforms translate into insightful decision making and better business outcomes — often made possible by the CIO’s leadership in conjunction with the data assets and talents of the organization.

At the Helm of the Data Refinery
If we accept the concept that data is the new oil, then the “data refinery” is the new strategic operating model for companies with digital exposure (see sidebar). And the CIO is the executive best positioned to lead the enterprise forward in this new model.

New sources of “crude” data are researched and provisioned to enable the enterprise to commercialize this valuable resource. The raw data flows into the enterprise, including customer-contributed proprietary data, purchased data, freely available data, and streaming data. The enterprise refines the raw data, creating valuable intellectual property (IP) through a proprietary process, subject matter expertise, analytics, software, and combinations of data sets. The “refined” data is stored in databases appropriate to the type and scale of the data — just like oil products such as gasoline, heating fuel, and motor oil are stored in various tanks. The refined data products can be distributed to customers over the Internet as analytic insights or used to develop new products. All this must be done with a full understanding of the legal, regulatory, and contractual restrictions surrounding the use of data.

Data Excellence = Executive Success
Operational excellence is paramount to any executive’s success. For CIOs, while thinking strategically may occupy most of their time, their ability to drive day-to-day operations effectively is what keeps them at the wheel. In today’s environment where data is the new oil, data is also the fuel for driving an operation to its desired destination.

That said, I’d like to delve a bit deeper into my thoughts about the three key considerations a CIO must address to ensure operational success in managing the data refinery.

The Verisk Data Refinery

The graphic below illustrates the data refinery process we follow at Verisk Analytics. It exemplifies the continuous data loop that leads to the production of the ISO ClaimSearch® product.

Exploration and Production
ISO ClaimSearch, a proprietary product that helps the property/casualty insurance industry with claims processing and fraud detection is a key vehicle we use to extract insights on business dynamics. This is achieved by combining newly available data sets with traditional data sources. Exploring and acquiring new data sources represent the first step in the data refinery. For example, we recently added event data recorder (automobile black box) information to ISO ClaimSearch to enable better understanding of auto claims.

The foundation of ISO ClaimSearch is a business model in which property/casualty insurers contribute claims data that Verisk uses to help insurers adjust and evaluate claims for fraud. In addition to contributed claims data, Verisk procures other information that insurers find useful in assessing the validity of a claim, such as criminal records, court records, historical weather data, motor vehicle data, OFAC data, and physician data.

In the next step of the refinery process, Verisk transforms the raw data to create unique intellectual property (IP). This includes fraud scoring, identification of suspicious claims activity, discovery of potential fraud rings, creation of a complete profile of the insured, visualization software to illustrate the correlation of data elements, and extraction of vehicle characteristics for auto claims based on vehicle identification number (VIN).

Figure 1
Enterprise Data Management Is Vital to Our Growth

The Verisk Data Factory

1. Integrate data as an enterprise asset.
The main components of that are as follows:

  • Metadata management. Not knowing what you’re integrating is worse than not integrating at all. A good insurance industry example is data associated with the term “paid loss.” There are so many questions that relate to one seemingly simple term: Is paid loss the loss net as to reinsurance coverage? Net as to deductibles? Including other expenses? Who is a customer? What is a product? That is the challenge of metadata management. High-quality, comprehensive metadata should include detailed and understandable definitions, code values, data quality metrics, and data profiles — easier said than done, especially in a large enterprise.
  • Taxonomies or classification of categorical structures. Taxonomies help organize and label the data to make it more easily usable. They can be supplemented with or replaced by sophisticated search functionality — but a mix of both is probably the ideal.

Before an organization can integrate data, it must first “discover” the data. In the refinery analogy, discovery — a primary job for data managers and product managers — is identifying new data sources. Those new sources can be within the current fields (think fracking in oil), as yet undiscovered fields (the Arctic), or previously known fields where technology now enables access (deep-sea extraction).

2. Develop data management as a core enterprise capability.
Data management and integration tools include entity resolution, identity management, and data standards. Data standards are key because they minimize the need for the other tools: If data is defined in a standardized way at creation, there will be less friction in the data acquisition and reporting process. Regarding unstructured data, the data can be stored as is (for example, a photograph), but more value will be realized if structured data is derived — and stored — from the unstructured data (square footage from an aerial photo). Back to my data refinery analogy: You can store crude oil not knowing what its use ultimately will be or store refined oil in such a manner that it can serve many purposes — heating, gasoline, diesel, or plastics. In the case of unstructured data, you can do both.

3. Recognize the privilege of data stewardship.
From a data management perspective, data governance is the broader term used to ensure that data stewardship is properly executed. The data steward should do his/her best to make certain that data is a corporate asset, not just an operating unit (local) asset. That means, where possible, we need to define and create data that extends beyond its intended use in a way that maximizes its value-creating potential and does not hinder future utility.

The bottom line: A strong data management capability is crucial to success in the world of Big Data, especially where data is the new oil.

Perry Rotella is senior vice president and chief information officer of Verisk Analytics.