Insurers looking to seize opportunities presented by the growing Internet of Things (IoT) often face one immediate hurdle as they survey the countless products and systems that are going online, including cars, kitchen appliances, and heating and cooling equipment. In the huge amount of raw data these devices collect, the key word is “raw”: The wealth of information in this data isn’t always easy to extract.
The way to usable insight often lies through data science, predictive modeling, machine learning, and artificial intelligence (AI). These disciplines typically use complex methods that take significant investments of time, resources, and intellectual capital to master. A new white paper from Verisk, Inside the IoT Data Refinery: Extracting Insights from Telematics, takes the example of connected cars to explore the process of assessing, cleansing, and validating IoT data for potential use across the insurance life cycle, from lead generation and underwriting to claims.
Driving data collected by plug-in devices, mobile apps, and factory-installed systems provides the first common data elements typically used for insurance pricing. In line to follow these are electronic data recorders (EDRs, or “black boxes”), connected homes, and even wearables. But before the data can be analyzed and modeled accurately, it often must be prepared through processes that include data assessment, data cleansing, and device harmonization.
Verisk’s new report describes the typical assessment process whereby the Verisk IoT team works with potential data partners of the Verisk Data Exchange™ to assess new data sets and their use. Examples are focused on vehicle data—from original equipment manufacturers (OEMs), telematics service providers, mobile data loggers, and aftermarket devices—because of its accessibility.
Step-by-step data assessment
The first step in any data assessment or analysis of a data set is often examining the data specifications to identify any architectural failure at the earliest phase. When reading the specs, it should be evident whether the available elements can provide a minimal viable model and whether the resolution and frequency of the data are sufficient for modeling.
After reading, understanding, and approving the specs, a database that fits the specs should be structured and populated with data. When building and extracting the data, key factors include compatibility between the data format and the data set received; preservation of all data within the database structure; ability to trace and isolate each data element; and data storage in both raw and analysis-ready formats.
Finally, data should be assessed in terms of general statistical analysis, compatibility with specifications, completeness, and validation across the multiple sensors that feed the data set. A set of tests and assessments should be created based on the provided specs, but it’s important to modify them as the assessment unfolds and when new findings on the data emerge.
An experienced hand
Verisk has been assessing telemetry data for more than a decade, examining various technologies and an array of devices, data elements, and data sources. Much of this work is done at the Verisk Telematics Innovation Center based in Tel Aviv, Israel. No matter how the data is delivered—connected car, mobile, or dongles—Verisk has developed effective mechanisms to help evaluate the data thoroughly.