Insurers Capitalize on Big Data and Hadoop Information-Driven Insights with Cloudera Enterprise Data Hub Version: 103
Table of Contents Introduction 3 Data Challenges for Insurers 4 From Data Silos to Data Consolidation 4 Transforming Big Data into a Competitive Advantage 5 Use Cases for An Enterprise Data Hub 5 Claims Fraud 5 Underwriting & Risk 6 Regulatory Compliance 6 Customer Insights (360 Degree View) 7 Usage-Based Insurance 7 Subrogation 8 Cyber Security 8 Proven Results With Cloudera 9 Markerstudy Increases Policy Count by 120%, Reduces Claims Leakage by 5 million 9 Major P&C Insurer Reduces Exposure to Risk Across Lines of Business 10 European Auto Insurer Offers Pay-As-You-Drive (PAYD) Program and Reduces Number of Claims by 30% 11 Cloudera Enterprise: The Right Solution for Insurers 11 Robust Ecosystem of Partners 12 2
Introduction Rapid adoption of digital technologies, Internet connectedness and the proliferation of unstructured data have heralded a new era of opportunities and challenges for insurers of all types and sizes. While these forces of change have opened the doors to untapped markets, they have also exposed insurers to heightened levels of risk. 3
Data Challenges for Insurers Insurance is a business built on data. Insurers analyze data to understand, evaluate and assume profitable risks. While data is the most valuable asset for insurers; actuaries, underwriters and other key stakeholders are hard pressed to obtain the right data at the right time. Many are evaluating risks and making strategic decisions based on only a sample set of historical and industry data. These limited data sources provide a myopic view and place tremendous burden on insurers. It hinders their ability to: Accurately assess risk across lines of business Reduce claims and underwriting leakage Detect and analyze fraud patterns Detect and report on regulatory non-compliance breaches Proactively monitor security and prevent data breaches Gain customer insights across omni-channel touch points Develop programs such as Pay-As-You-Drive (PAYD) that require easy access to telemetry data Insurers have long struggled with data silos. Current storage systems, multiple legacy applications and point solutions have kept insurers in a gridlock for decades. Moreover, these systems have created multiple versions of truth and data inconsistencies. They are costly to maintain and integrate, consuming valuable compute, IT and financial resources. While they serve an important purpose, these systems were not built to cost-effectively store and correlate at scale the variety, volume and velocity of data that is generated and needed by insurers today. Nor were they designed to support all types of workloads such as stream processing and machine learning which are both essential for detecting claims fraud and preventing cybercrime. The ability to rapidly analyze large volumes of both structured and unstructured data that comes from a wide variety of sources demands a technology platform that is specifically built for and dedicated to that purpose. It requires a high performance platform that can ingest billions of data records (representing petabytes and terabytes of data), perform advanced analytics and deliver results in real-time or minutes versus days and weeks. From Data Silos to Data Consolidation In a survey 1 of 242 underwriters and 220 members of the Chartered Institute of Loss Adjusters regarding their collection and exploitation of data to support business functions, 86% indicated that the key to making best use of big data is to be able to analyze data from all sources together rather than separately. To capitalize on the value of big data and leapfrog the competition, leading insurers are moving towards consolidated data management. The introduction of an enterprise data hub (EDH) built on open-standard and open-source Apache Hadoop provides a cost-effective way for insurers to aggregate and store all their data, in any format, for all types of workloads, in a highly secure environment. For the first time, business users can access rich data sources, blend and analyze data from any source, in any amount, detect patterns, model risk and gain valuable real-time insights that deliver results. By consolidating the data in one place, an enterprise data hub (EDH) can greatly simplify data management and reduce storage costs. Unlike rip and replace solutions, an EDH enables organizations to break data silos while continuing to leverage and augment the value of their existing data warehousing and other IT investments. 1 Chartered Institute of Loss Adjusters (CILA): The Big Data Rush: How Data 4
Data Sources Storage and Analytics BI Tools Core Systems Claims, UW, Policy, Billing Social Media Analytics SAS Database Datamart EDW BUSINESS/IT Enterprise & Point Solutions Clickstream, Web Logs Telematics Sensor Data, Mobile 3rd Party Data, Weather, Traffic... Process Ingest Sqoop, Flume, Kafka Transform MapReduce, Hive, Pig, Spark Discover Analytic Database Impala Search Solr Model Machine Learning SAS, R, Spark, Mahout Security and Administration YARN, Cloudera Manager, Cloudera Navigator Unlimited Storage HDFS, HBase Serve NoSQL Database HBase Streaming Spark Streaming Centralized Access to All Your Data for All Types of Workloads Transforming Big Data into a Competitive Advantage The use of an enterprise data hub powered by Hadoop has been steadily gaining momentum across data-intensive industries such as insurance and financial services. By aggregating and correlating petabytes and terabytes of internal and external data in a single platform, insurers can solve some of the toughest problems such as claims fraud, regulatory non-compliance, data and cyber security breaches that are costing insurers billions in losses annually. According to Deloitte 2 by 2016, 25% of global insurers will have adopted big data analytics for at least one security and fraud use case. The same survey indicated that 82% of insurance executives have cited data as a strategic priority. Use Cases for An Enterprise Data Hub Insurers of all types and sizes are using an enterprise data hub (EDH) across key areas of their business to gain valuable insights, solve tough problems, ask big questions and drive results. Claims Fraud Underwriting and Risk Customer Insights (360 Degree View) Usage-Based Insurance Subrogation Cyber Security An Enterprise Data Hub Delivers Value Across Key Areas of Operation Claims Fraud Insurance fraud both opportunistic and professional has steadily escalated over the last decade. Rising fraud cuts profits for insurers. In the United States alone, non-health insurers lose over $40 billion to fraud annually. The numbers are equally dismal in other parts of the world. According to the Insurance Information Institute 3, nearly 10% of an insurer s claim volume is fraudulent. The negative impact of fraud is not only felt by insurance companies but also passed on to consumers. The National Insurance Crime Bureau (NICB) 4 estimates that fraud has contributed to an increase in premiums by $200 to $300 per family on an annual basis. 2 Deloitte Insurance Risk Management Survey: State of the Industry (2014) 3 Insurance Information Institute: Insurance Fraud (March 2015) 4 Fox Business News: The True Cost of Auto Insurance Fraud (Feb 2015) 5
While insurers have amassed massive amounts of claims data on average 30 years worth much of it is inaccessible for analysis as it resides across multiple disparate systems and in archives. The sheer complexity of fraud schemes has rendered obsolete existing tools and methods of detecting fraud with subsets of data. Using Spark (Machine Learning) and other key components of an enterprise data hub (EDH), insurers can aggregate massive amounts of historical claims and policy data and correlate it with other data sources such as adjusters claim notes, images, social media content, clickstreams and web log data for real-time detection of fraud and identification of fraudulent patterns. Insurers can compare experimental results with simulations that leverage actual recorded data. They can deploy large testing sandboxes for research and development and enable real-time fraud detection to catch suspicious cases in flight. This tiered system combining development, simulated testing and real-time detection has proven highly effective in reducing false-positives and reducing the time to identify new cases of fraud. Underwriting & Risk Accurate assessment of risk and risk-based pricing requires massive amounts of data from multiple sources for analysis. Yet most insurers have only subsets of data available at any given time. This has led to greater exposure to risk for many, higher loss ratios, inaccurate reserve estimates, and an inability to create sophisticated catastrophe (CAT) models that can generate multiple scenarios. Comprehensive risk management requires pooling, correlating and analyzing data at a granular level by leveraging multiple internal and external data sources. The ability to tap into more data sources and access up-to-date information such as geospatial data, weather or traffic patterns provides a real competitive differentiator to insurers. Moreover, reliable risk categorization enables insurers to create differential pricing that drives profitable books of business. It enables insurers to take on new risks either by entering new growth markets or by targeting new insurable products. Regulatory Compliance The past several years have seen an unprecedented number of regulations aimed at the financial services and insurance industries. Regulations impacting insurers include Dodd- Frank, Know Your Customer (KYC), the Foreign Account Tax Compliance Act (FATCA), International Financial Reporting Standards (IFRS) and Solvency II, among others. These regulations have brought an added set of challenge to the industry. For instance, regulations such as Solvency II require complex risk calculations that necessitate detailed data from a variety of different source systems with stringent requirements for real-time reporting. 6
An enterprise data hub (EDH) serves as a flexible repository to land all of an organization s unknown-value data. It speeds up business intelligence reporting and analytics to deliver markedly better throughput on key service-level agreements. It enables insurers to model risk by running scenarios against massive amounts of internal and external data, all accessed centrally with full fidelity in a scalable, governed and unified environment. With built-in data security, lineage and governance, it enables critical features necessary to comply with common regulatory requirements. For instance, the ability to track, understand and protect access to sensitive data. As well as maintain comprehensive audit trails and track every access attempt, right down to the user ID, IP address and full query text. Deploying Cloudera allows us to process orders of magnitude more information through our systems, and that technological capability in combination with Experian s expertise in bringing together data assets is driving new, real insights into tomorrow s marketing environments. Customer Insights (360 Degree View) There are many reasons why insurers want customer insights across omni-channel touch points. Insurance products have become largely commoditized. As such companies are turning to customer experience as a key area of differentiation to drive customer retention and loyalty. With little variation between product and service, it has become increasingly challenging for insurers to retain existing customers and acquire new ones. For instance, claim is considered the moment of truth for policyholders. Churn and dissatisfaction are directly correlated with an individual s experience with how expediently their claim is processed and settled. Being able to identify who is likely to churn will enable insurers to proactively reach out to those customers with appropriate retention offers. Jeff Hassemer, VP of Product Strategy, Experian An enterprise data hub (EDH) enables insurers to correlate data across multiple data sources including policy and claims, geo-location, demographics, lifestyle, sentiment and behavioral data to obtain a 360-degree view of the customer. With deep insights, insurers can deliver contextually relevant experiences and targeted offers that increase conversion rates and help build customer loyalty. Usage-Based Insurance Programs such as Pay-As-You-Drive (PAYD) and Pay-How-You-Drive (PHYD) enable insurers to gain accurate and deep insights into individual policyholder s driving patterns (e.g. miles driven, time of the day, number of times the driver braked hard, etc.). These insights not only enable the insurer to reward good drivers with better rates and subsequently lower claim costs but also customize plans at the individual level. With an enterprise data hub (EDH), insurers can store and analyze all driving data captured through sensors. Moreover, telematics-based insurance (UBI) offer several upsides to insurers including reduced claim costs, better risk pricing, mitigating adverse selection and moral hazard, modifying risk behavior and improving brand recognition and loyalty. 7
With an enterprise data hub (EDH), insurers can cost-effectively store and analyze sensor data to incentivize and retain customers. Moreover, they can correlate sensor-based information with other sources of data such as traffic and weather patterns, historical claims and policy data to create granular risk and pricing models, beyond standard market segmentation. Subrogation Subrogation is a critical part of the claims management process and a key method for insurers to mitigate claims losses. It is the right for an insurer to pursue 3rd party that caused an insurance loss to the insured. Research indicates that up to 20% of subrogation/claims recovery opportunities are lost today. It is estimated to be $15 billion annually in the United States alone. One of the reasons for poor subrogation recovery is the diligent scrutiny of massive amounts of documentation that is required for each claim that is processed. Much of the data that surrounds claim subrogation is unstructured which is difficult to analyze. For instance, various forms, pictures, video recordings, narratives, police reports etc. that are collected to process the claim are attached in the claim dossier. The answers to whether claims recovery is possible lie in these documents. For many insurers, the subrogation process is still predominantly manual. The sheer volume of claims handled and the need to process and settle them expediently makes the task of subrogation that much harder. With an enterprise data hub (EDH), insurers can aggregate, analyze and identify claims that can be recovered with speed and accuracy. Cyber Security In recent years, cyber risk and security have become key areas of focus for insurers. Research indicates that nearly 95% of all enterprise networks have been compromised due to external attacks. And only 3% of organizations feel safe against insider threats. The financial and reputational losses to businesses stretch into tens of billions of dollars annually. For insurers, the ramification of cyber security extends both to their own organization and in their ability to analyze the risk it represents. For instance, cyber insurance is fast becoming a new area of coverage for insurers that offer commercial lines. This type of risk is complex and difficult to analyze and model. It requires real-time access to new data sources. Today, most insurers lack direct insights into the cyber liabilities surrounding intangible digital assets. An enterprise data hub (EDH) can create a unified security data platform. Insurers can analyze data across endpoint, network, cloud and users. It powers a new generation of security analytics products designed to detect advanced persistent threats (APT) across 8
terabytes of data. Using machine learning via Apache Spark, stream processing, search and query capabilities all key components of an enterprise data hub organizations can shorten the time for breach mitigation. Proven Results With Cloudera Leading insurers of all types and sizes are using Cloudera s enterprise data hub (EDH) to improve performance, obtain actionable insights, reduce costs and drive a profitable book of business. Here are just a few examples: We were impressed by the depth of Cloudera s expertise, and full service contribution. Working with our Cloudera EDH, we are now more sensitive to changes in the market and customer behavior, and can adjust in real time. Ultimately we can provide a better service, including pricing and product offering. Dan Fiehn, Group Head of IT, Markerstudy Markerstudy Increases Policy Count by 120%, Reduces Claims Leakage by 5 million Markerstudy is a privately owned UK-based general insurer that serves 1.75 million customers, generating annual revenues of over 1.4 billion. The company specializes in auto insurance across personal and commercial lines of business. It provides competitive insurance policies for both standard and non-standard auto coverage that encompass young drivers, high value and high performance cars as well as fleet such as taxis, trucks and ambulances. Business Drivers Markerstudy is a fast-growing organization. While its rates and policies have always been data driven, the insurer was unable to utilize the breadth of data sources it needed. Only data from quotes was analyzed, leaving valuable data from internal and external sources unavailable for analysis. As Markerstudy s customer volume and channel grew, it could no longer deliver rate changes through its distribution network at an adequate speed and frequency. Attempting to deliver personalized solutions and thus multiple versions of products through different channels was too complex, resulting in errors and delays. The insurer was also concerned about claims leakage and fraud. To address some of the challenges, Markerstudy built a centralized rate portal, known as the Insurer-Hosted Rating Hub (IHR or The Hub). The Hub brought consistency and reduced errors that came from the complex web of interactions between Markerstudy and its intermediaries. It also led to an increase in data, forcing its analysts to destroy older records, relying on sample data thus reducing their ability to view trends over time. The Big Data Insight project was delivering significant results within six months of its original design. Solution The Markerstudy team determined that growing its existing legacy-based infrastructure was not a viable option to make use of the growing and diverse data volumes. As such, it gravitated to a Hadoop approach. Markerstudy evaluated five different Hadoop distributions, ultimately selecting Cloudera. Cloudera s unmatched security features including Cloudera Navigator were particularly attractive to Markerstudy, as were the management capabilities and ease of deployment that Cloudera Manager brings. Today the Rating Hub relies on Cloudera Enterprise. The enterprise data hub (EDH) has enabled the insurer to create a single view across over 12 data sources including weather, traffic patterns and geo-location data. Cloudera Search, a key component of the enterprise data hub, indexes hundreds of millions of records and provides near real-time access and multi-content exploration. 9
Benefits With Cloudera Enterprise, Markerstudy has achieved significant gains including: Approximately 5 million reduction in claim costs through better fraud detection and prevention at point-of-quote 120% increase in policy count over an 18-month period 50% reduction in customer cancellation rates and increased customer retention at renewal Analyzes 80 years of historical data across 50 states 75X faster Major P&C Insurer Reduces Exposure to Risk Across Lines of Business One of the largest p&c insurance companies in the United States consolidated its data with Cloudera Enterprise. The company has been in existence for over 80 years. During this period, the insurer collected a massive amount of data that spans personal and commercial lines of business. Business Drivers The insurer had a highly complex IT infrastructure and data management environment. Much of the data resided in silos across disparate systems. Storing, accessing and analyzing this data were both cost-prohibitive and time consuming. Moreover, it was very difficult to correlate external data such as traffic patterns, socioeconomic studies, and weather patterns with historical and other sources of information. A primary example of the challenge faced by business analysts was graph link analysis. For instance, they could look at data from a single U.S. state at a time with each state s analysis requiring about a day to process but could not run analytics on multiple states, no less all 50 states, at once. Solution With a first objective of speeding up processing times and consolidating its disparate data sets to achieve more scalable analytics, this leading insurance company chose Cloudera Enterprise. Initially, the three main technical cases for adopting Hadoop were flexible and active data storage, integrated and efficient ETL and applied statistics and computation. The insurer then expanded the use of Cloudera Enterprise to improving customer insights and gaining a holistic view of risk across lines of business. It brought together customer, policy and claims data as well as data from external sources including weather, traffic, crime, credit and telemetric sensor data in its Hadoop cluster. Some of these data sources had never been brought together before, and much of the historical data, which was newly digitized, could not be analyzed in tandem with external sources prior to landing in Hadoop. Today, the company s enterprise data hub is integrated with its incumbent mainframes and data warehouse. The enhanced architecture was designed specifically to complement existing infrastructure. Benefits Since deploying Cloudera Enterprise, the insurer has benefitted from: Reduced storage costs Improved performance with holistic view of risk across lines of business. The carrier is able to analyze and run risk models across 80 years of historical data from all 50 states 75X faster Faster time to market. Data scientists, actuaries and other key stakeholders are able to gain quick and easy access to the right data at the right time 10
European Auto Insurer Offers Pay-As-You-Drive (PAYD) Program and Reduces Number of Claims by 30% A major European auto insurer uses Cloudera Enterprise and telematics to gather, store and analyze telematics-based sensor data. The data is obtained from black box devices installed in their clients vehicles. The insurer uses this information to adjust rates and deductibles based on each individual s driving patterns, mitigating risk while increasing both profit and market share. Business Drivers The auto insurer faced a number of issues in certain geographies. For instance, it encountered that Italy had the highest frequency of accidents as well as the highest average cost of damages per claim versus some of the other European countries. Total claims paid by the insurer were increasing year over year. Solution The insurer subcontracted with three geographically distinct third-party vendors to capture the telematics data. This data is uploaded from black boxes that are installed in vehicles to the insurer s edge node. Every hour, data files from the edge node are batch-loaded into Hadoop Distributed File System (HDFS) in raw format. Other data from external sources such as weather, traffic patterns, accident information, etc. are also loaded into HDFS. Machine Learning via Spark a key component of Cloudera Enterprise is applied to analyze customer behavior, classification and prediction followed by the calculation of a risk score. Benefits: Since deploying its PAYD program powered by Cloudera Enterprise, the insurer has seen positive results on a number of fronts: Reduction in the number of claims by 30% Able to attract and retain low-risk (safer) drivers with lower cost incentives Increase in policy renewals and customer satisfaction rates Cloudera Enterprise: The Right Solution for Insurers Cloudera Enterprise offers a new paradigm for breaking data silos, aggregating and working with data of all types, in any volume, from any data source, for all key industry workloads. For the first time, actuaries, underwriters, data scientists and other key business stakeholders can access rich data sources, blend and analyze data from any source, in any amount, detect patterns, model risk and gain valuable real-time insights that deliver results. Cloudera Enterprise, with Apache Hadoop at the core, is: Unified an integrated platform, bringing diverse users and application workloads to one pool of data on common infrastructure, no data movement required. Secure compliance-ready perimeter security, authentication, granular authorization and data protection (through encryption and key management). Governed enterprised-grade data auditing, data lineage and data discovery. Managed best-in-class holistic interface that provides end-to-end system management and key enterprise features, such as zero-downtime rolling upgrades. Open open platform means both open-source and open-standard so organizations can be confident that their investment in Hadoop will be sustainable, portable, better integrated with the ecosystem and of the highest quality. 11
Robust Ecosystem of Partners Cloudera has the largest ecosystem of partners in the big data market. Our partners are acknowledged leaders and innovators in their respective categories and key technology providers to the insurance industry. They include leading business intelligence, data warehouse and storage systems, enterprise application, hardware, software, analytics, and global systems integrators, among others. About Cloudera Cloudera is revolutionizing enterprise data management by offering the first unified Platform for big data, an enterprise data hub built on Apache Hadoop. Cloudera offers enterprises one place to store, access, process, secure, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Cloudera s open source big data platform is the most widely adopted in the world, and Cloudera is the most prolific contributor to the open source Hadoop ecosystem. As the leading educator of Hadoop professionals, Cloudera has trained over 40,000 individuals worldwide. Over 1,700 partners and a seasoned professional services team help deliver greater time to value. Finally, only Cloudera provides proactive and predictive support to run an enterprise data hub with confidence. Leading organizations in every industry plus top public sector organizations globally run Cloudera in production. For additional information, please visit us at: www.cloudera.com cloudera.com 1-888-789-1488 or 1-650-362-0488 Cloudera, Inc. 1001 Page Mill Road, Palo Alto, CA 94304, USA 2015 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.