Big Data: What’s on the Menu?

By Peter Marotta

If you think Big Data is big now, just wait — and it won’t be very long.

Duke University scientists have built an experimental camera 30 times more powerful than any camera available today. The device can produce images with one billion pixels (one gigapixel) as opposed to images with tens of millions of pixels produced by the most sophisticated cameras today. Duke is also developing a 50-gigapixel device — 250 times the resolution of 20/20 vision.

According to NewAge.com, as of the last quarter of 2011, there are 1.2 billion social media users over the age of 15 worldwide (more than 80 percent of the world’s population for that age group), and Internet users spend close to one in every five minutes on social networking sites.

Other technologies and fields of science are similarly expanding the amount of data available in the world. In addition to social media and photos, new data sources such as satellite imagery, video, voice, telematics, radio frequency identifiers (RFIDs), event data recorders (EDRs), and electronic health records (EHRs) are becoming more prevalent and accessible.

So, if you think data is big now, wait until the next generation of devices and technologies hits the mainstream.

Delicious Data
The insurance industry is using Big Data to analyze and solve problems associated with managing risk — risk-specific pricing, climate change, premium leakage, fraud, loss control, enterprise risk management, and so forth.

The job of the data manager is to make sure the main ingredient needed to support those activities — data — is available, accessible, and of high quality. Data management and analytics have much in common with the culinary arts. Cooking is a blend of art, science, technology, and ingredients — and so are data and analytics.

A good chef must know what’s in the pantry, what recipes can use those ingredients, and what cooking tools are available. Most important, the results must pass the customer’s taste test. And while cooking has a strong base in the traditional, innovation transforms the culinary arts through imaginative cooking techniques, exotic ingredients, and state-of-the-art equipment.

Data managers harvest, inventory, and organize the data ingredients. Data scientists, actuaries, and statisticians are the chefs who develop the recipes and menus. Technologists provide the tools to process the data, and management provides direction and the overall theme, or cuisine.

The task today is not only to gather data but, more important, to harness it in a way that produces value for the enterprise. A data manager’s toolkit must contain new or improved functionalities: master data management (MDM), data dictionaries, data profiles, audits and controls, data and text mining, entity resolution, data visualization, longitudinal functionality, and data lineage.

For Big Data applications, those tools are essential.

COA Dashboard: Settled Date Trend

Metadata: The Crucial Ingredient
But the most critical element of the toolkit is metadata — the unifying element underlying all other toolkit functions. Simply put, metadata is data that provides information about data, but its form and functions are myriad and complex. Metadata is key to understanding data and defining and assessing its quality and purpose. Metadata keeps the chef’s pantry well stocked, cataloged, organized, and accessible.

Metadata facilitates getting the most value from data by providing the context for the data, thereby allowing data managers, data scientists, and all data users to understand and make proper use of data. And a well-constructed, accessible metadata repository supports knowledge sharing, facilitates research, and improves communication about data.

So, what metadata attributes are needed? While metadata standards do exist for select applications (archiving, librarianship, records management), data sets (arts, biology, geography), and functions (education, government, science), there are no industrywide metadata standards specific to insurance. This does not mean the insurance industry is without metadata standards entirely — quite the opposite is true. Many metadata standards organizations — ACORD, IAIABC, ANSI, WCIO, to name a few — have developed data element attributes that can form a basic metadata standard. However, no singular view exists across these organizations and within the industry.

The Dublin Core Metadata Initiative, an open, not-for-profit organization supporting innovation in metadata design and best practices (http://dublincore.org/about-us/), has defined a simple, generic Metadata Element Set consisting of 15 metadata elements: title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage, and rights.

Building upon the Dublin Core list and various data element standards used in the U.S. insurance industry, a base Insurance Metadata Element Set could include the elements listed in Table 1.

Metamorphosis
Once the desired metadata has been identified and archived, the next issue is organizing the metadata so users will know the breadth and depth of the data available to them. When the pantry contains a handful of items, organization is not a priority. However, when the pantry’s contents are numerous, it becomes a necessity. The bigger the data, the greater the need for organization.

cover imageOne approach familiar to all who use the Internet is a search engine. Based on a query, all applicable instances are returned. But when dealing with granular data, many false positives can result. For example, if searching for “state,” a return may show address state, jurisdiction state, and risk state, as well as state as a verb or state buried in another word, such as “misstated.” Sophisticated search engines use algorithms to minimize false returns or prioritize returns, which leads to the second approach: taxonomies.

Taxonomy is the science of classification. When applied to data management, taxonomies are an information content architecture. Taxonomies can be simple lists, hierarchies (trees), or facets (stars). As with metadata, there are no industrywide taxonomy standards specific to insurance. Table 2 shows one possible classification structure for insurance data.

Big Data is changing metadata. Application-specific subject matter knowledge and expertise must give way to broader, enterprisewide knowledge and understanding — not usurping the role of subject matter experts but rather making their expertise an enterprise resource.

Tools of the Trade
Once the pantry has been stocked and properly organized, the chef needs to assess the available tools and technologies — making use of search capabilities, classification structures, data in context, and shared meanings. The tools can include Big Data appliances, statistical software, rule engines, computing grids and platforms, and geocoders.

The challenge is for the actuary, statistician, data scientist, or product developer to use the data and tools to develop or improve products and processes. These chefs must select the ingredients needed to prepare each dish and organize them into a coherent and appetizing menu.

Some of the work can be accomplished using prior experience, iterative analysis, or experimentation. The fruits of Big Data result from such innovation and experimentation.

The data manager’s work, however, does not end with filling the pantry and serving dinner. The data manager must provide for documenting the results of the chef’s efforts: the menus, the recipes, what ingredients were used, what techniques could be improved, what new ingredients are needed.

All parties — data managers, data scientists, actuaries, statisticians, product developers, technologists, and managers — must not lose sight of the objective. That objective is not only to have a well-stocked pantry, the latest equipment, or the most innovative menu but to add value to the overall dining experience — thereby guaranteeing a satisfied customer.

Peter Marotta, AIDM, FIDM, is enterprise data administrator of Enterprise Data Management at Verisk Analytics.