WHITEPAPER Voice of the Customer: How to Move Beyond Listening to Action Merging Text Analytics with Data Mining and Predictive Analytics
Successful companies today both listen and understand what customers are saying and are taking action in response to customer feedback by incorporating the voice of the customer (VOC) into business strategies for sales, marketing and support. Although structured data continues to be the primary source for business intelligence, the growing volumes of unstructured data includes valuable customer feedback from text-based sources such as call center notes, customer communications, emails, surveys, claims forms and the Web. Organizations have known for some time that analyzing this unstructured data from proprietary internal data sources could provide valuable insight for the business. More recently, businesses have also become interested in unstructured data from external Web sources such as social media from Tweets, blogs, news, and online forums, in order to analyze and understand customer feedback. Listening to customer feedback has been shown to be helpful for brand and reputation management, public relations and customer service. Moreso, taking action based on customer feedback helps to improve business performance across sales, marketing and risk operations. These improvements directly improve the bottom line with measureable return on investment. The ability to combine and transform both internal and external sources of structured and unstructured data into actionable information is the key to discovering insights and customer intelligence previously unavailable and the key to moving beyond listening to action. Business analysts and knowledge workers want to gain and use insights gleaned from customer feedback to address critical business issues such as fraud, customer churn and customer experience management. Text Analytics + Data Mining + Predictive Analytics Today, the merging of text analytics with data mining and predictive analytics technologies enables companies to combine and transform VOC data (regardless of data type and source) into actionable information that both predicts and recommends next best actions and strategies. Moreover, the outcomes of this merged and advanced data analysis can be made operational within existing business processes for business performance improvement and competitive advantage. Data mining and predictive analytics are mainstream technologies that are used to analyze and model data from advanced perspectives, and summarize it into useful information that can be used to increase revenue, reduce costs or mitigate risk. These data analytics technologies are experiencing a new growth in use and popularity as a result of big data that requires exceptional technologies to efficiently process large quantities of data within accelerated timeframes. Text analytics the process of analyzing unstructured text, extracting relevant information, and transforming it into structured information is becoming a mainstream technology. Factors fueling its adoption include both a maturing of the technology and a better understanding of the technology s business value. The rapid growth of external unstructured data that is available via the Web and social media is also supporting its broader adoption. 2
The Business Case for Text Analytics The business case for text analytics is compelling given that an estimated 80% of today s business relevant data is unstructured or text-based. Gaining insight into this data allows companies to uncover and take action on new opportunities to increase sales and profitability via faster, better decisions and improved business processes. Data found in unstructured sources (such as customer correspondence, emails, customer surveys, call center records, blogs and websites) has the potential to reveal information and relationships that have previously been invisible to business analysts and knowledge workers. Other drivers for text analytics include faster time to value and cost reduction. Companies are always looking to get results more quickly. One obvious way is to eliminate the time spent manually culling through text to extract relevant information by using text analytics technology. There are several ways companies are leveraging text analytics to solve business problems and improve business efficacy including: Understanding the voice of the customer Improving customer retention Predicting and reducing churn Identifying and reducing claims fraud Developing cross-sell, upsell and next best offer strategies Monitoring and analyzing brand reputation There are numerous approaches available for analyzing unstructured data. These methods take advantage of statistics, linguistics and other areas in computational science. Specifically, text analytics can be used to extract: Entities: Things such as people, places, companies and products i.e. Barack Obama, New York City, Microsoft or Buick. Concepts/Themes: What a document is about in terms of a topic or an idea; generally this is a noun phrase; i.e. wireless promotion. Sentiment: A view, feeling, or attitude toward a situation or event; an opinion; generally termed positive, negative or neutral. This extracted data can then be merged with proprietary structured data (i.e. transactional data, revenue, CRM, demographic information) and analyzed using data mining and predictive analytics in order to determine relationships and trends, and identify predictive patterns. 3
Merging unstructured text-based data with structured data gives users the ability to include additional predictive variables in their models and thus improves the accuracy and increases the lift (the effectiveness) of these models. This paper explores three use cases that describe business process scenarios and the value of merging and analyzing structured and unstructured data together. The use cases explore using text analytics, data mining and predictive analytics to improve customer retention, fraud abuse detection and reduction, and segmentation and cross-sell. Customer Retention Scenario Business Challenge A wireless telecommunications company is experiencing high levels of customer churn. Company executives believe that this will negatively impact their market share and make investors unhappy. Since the cost associated with acquiring new customers is much higher than retaining existing customers, the company needs to understand why customers are leaving and develop a strategy to reduce the customer churn rate. In an effort to gain a better understanding, the company has gathered a tremendous amount of structured customer information including customer ID, number of minutes billed each month, months left on contract, calling plans, promotion response, and demographic information. Analysts at the company have tried to find patterns in this data, and while the structured data points to a possible connection between high value customers and disconnect rates, it is not enough to predict those consumers who might drop the service. Solution The company expects that the information contained in customer call center records might provide further insight into the churn issue. To find these answers, analysts at the company extracted certain entities and themes from call center records and combined this with structured customer data suddenly a pattern emerged. Structured Data: Customer Information Unstructured Data: Call Center Records Customer ID Monthly Minutes Months to Renewal Themes Sentiment 00001 1500 3 Monthly Monday Madness Negative 00002 1400 2 Competitor Y Negative 00003 700 6 Text Messaging Neutral 00004 1600 2 Monthly Monday Madness Negative 00005 2400 3 Monthly Monday Madness Neutral 4
The table above illustrates this concept. The table includes a small sample of a single day s data from one call center representative s records of customers who have recently disconnected their service. The structured data (monthly minutes, months to renewal) suggests that customers with high monthly minutes and a short time left on their contracts are disconnecting the service, but the data alone doesn t provide any clues as to why they are disconnecting. A pattern emerged after the analyst combined this structured data with the unstructured text from the call center records. The unstructured data was obtained by using text analytics across call center and email interaction notes to extract themes and sentiment. Various entities and noun phrases were extracted and grouped under the heading of themes. These included Monthly Monday Madness, Text Messaging and Competitor Y. Additionally, using text analytics, a particular sentiment (positive, negative, neutral) was associated with each call center record. From this analysis, it became clear that the customers were interested in a very attractive promotional offer (called Monthly Monday Madness) that as existing customers they were not eligible for. The wireless company had been aggressively promoting the Monthly Monday Madness offer in order to acquire new customers. Therefore, existing, high value consumers were angered and assumed that they were over paying for their service. Not surprisingly, the sentiment associated with the Monthly Monday Madness inquiries is negative. Note that the Monthly Monday Madness theme appears several times a day in a call center representative s notes. Each individual call center representative might not have seen a pattern because they each process a large number of calls each day. However, when all of the call center records are put together, across all of the representatives, a pattern emerges. Using this historical data together with the text analyzed from the call center notes, the company determined that those existing customers who have high usage and a short time left on their contracts should be offered the Monthly Monday Madness promotion. Results By merging the unstructured call center data with existing structured customer information, the wireless telecommunications company was able to build a predictive model that looked something like: There is a greater probability that customers utilizing more than 1400 minutes a month and with less than 3 months remaining on his/her contract, will drop the service (churn) if not allowed to participate in the Monthly Monday Madness promotion. Moving forward, the company is able to operationalize this model (i.e. make it part of a business process) by developing strategic models with embedded rules to provide a certain treatment to consumers who meet these criteria. This would allow the company to reduce the number of customers who might leave the service by proactively offering them an enticing promotion. 5
Fraud Abuse Detection and Reduction Scenario Business Challenge The property division of an insurance company is concerned about the amount of money it is losing on potentially fraudulent property and casualty claims. Company executives know they could lower customer premiums if the company could pay out less in fraudulent claims. The company is concerned about both hard fraud (deliberate) and soft fraud (where a relatively honest person exaggerates a claim). They have collected information regarding type of claims, amount paid, and demographic information about claimants, but this structured customer information is not enough to determine if a claim is suspicious. Solution The company decided to build a model using unstructured, text-based data extracted from claims notes that could compare the attributes of a given claim against known fraud indicators such as whether the same contractor was used multiple times by a specific adjustor or whether different variations of a name appear in different claims. The company business analysts know they could use text analytics to extract entities such as contractor name or claimant name from its claims notes and merge this with structured customer information to help determine which claims should be sent to its special investigation unit. For example, the table below illustrates a sample of both structured data (claimant ID, claim code, claim amounts) and unstructured data (contractor name). The unstructured data (contractor name) is an entity that was extracted from claims notes using text analytics. The table shows two adjustors (11 and 12) and some of their claims for claim code 124 (which in this case is ice damming repair). Claim amount (structured data) is, on average, lower for adjustor 12 than adjustor 11. This is clear from the structured data alone. Structured Data: Customer Claim Information Unstructured Data: Claims Notes Claimant ID Adjustor ID Claim Code Claim amount Contractor 12345 11 124 $4000 Ed Home Repair 12346 12 124 $3000 Hat Construction 12347 11 124 $3500 Ed Home Repair 12348 12 124 $2500 M&M Contractors 12349 11 124 $4000 Ed Contractors 12350 12 124 $1500 R&R 12551 11 124 $3000 Ed & Sons Home Repair Additionally, using historical data to look at range, average, and median payouts for ice damming repairs in a particular geography, analysts could see that adjustor 11 s payouts have been higher, which is enough to flag these payouts. 6
However, it is the unstructured data that provided the analysts with more evidence of potential fraud. For example, the name Ed (entity) appears multiple times for adjustor 11 in multiple forms suggesting suspicious activity. The entity Ed appears as Ed Home Repair, Ed Contractors, and Ed & Sons Home Repair. Results By merging the unstructured claims notes with the structured customer claim information, the insurance company was able to build a predictive model that looked something like: If payouts for claim code 124 in geography A exceed $2,500, flag the claim for the special investigation unit. This predictive model can be made operational as part of the standard claims processing function. Additionally, using the structured data (i.e. higher average claim amounts) together with the unstructured text extracted from the claims notes (the variations on the name Ed) allows the analyst to pinpoint and flag potential problems with particular adjustors. Segmentation and Cross-sell Scenario Business Challenge Banks typically place a high priority on customer satisfaction in order to improve retention and to sell existing customers additional financial products and services. Regardless of the customer interaction channel whether online, face-to-face, call center, etc., banks have realized that they need deep customer insight across all customer touch points in order to improve customer service levels and thus increase the likelihood to sell additional products to existing customers. A large bank is collecting structured attributes such as types of accounts, dollars in account, dollar flow through account, and demographic information. Using this structured data the bank has created basic customer segments according to customer worth (high, mid, low value), but it wants to create more targeted segments in order to provide higher value cross-sell opportunities. These new segments might be more targeted for lifestyle changes such as the birth of a child or a child going off to college. The bank recently implemented an online customer support function in order to better understand and segment its customers and provide them with new product/service information to meet their needs. This allows the bank to monitor the voice of the customer via online social channels such as Twitter, blogs and forums. From both the internal unstructured customer feedback information and the external voice of the customer text-based information, the bank gained insight into negative sentiment associated with its brand specifically related to overdraft protection. 7
The bank decided to use this unstructured voice of the customer information together with the internal customer data to segment and target new customer segments for cross-sell opportunities and in doing so improve customer experience and its brand reputation. Solution The table below illustrates some sample records of customers with student debit card accounts linked to their customer ID, and who have recently left the bank. The appearance of a student debit account linked to an existing mid/high value customer ID represents a lifestyle change the customer now has older children who are starting to manage their own finances. The new student debit accounts are often linked to the parent customer ID. From the structured customer information (average balance, online transfers, overdraft fees, types of accounts) it can be seen that those high net worth customers who have student debit accounts linked to their customer ID, and who were making a number of online transfers each month into student debit accounts, had recently left the bank. This pattern alone is enough to create a segment that includes high value parent customers. Customer ID Structured Data: Customer Banking Information Average Balance Online Transfers 1234 $20,000 10 per month 1333 $5000 1 per month 1444 $15000 5 per month 1333 $10,000 6 per month Overdraft Fees/Month Types of Accounts $30 Checking Student Debit $0 Checking Student Debit $10 Checking Student Debit $40 Checking, Savings, Student Debit Left Bank? Unstructured Data: Online/VoC Sentiment Theme Yes Negative Credit card, overdraft protection Yes Neutral Debit card, overdraft protection Yes Negative Overdraft protection Yes Negative Credit card, overdraft protection The potential cross-sell and upsell opportunities lie in analyzing the unstructured data (theme and sentiment) which the bank has been able to extract from its online customer support interactions in addition to external voice of the customer data. In the table it can be seen from the unstructured data grouped under theme, that this segment of high value parent customers who have recently left the bank were consistently asking about overdraft protection via online customer support interactions (perhaps in regards to the new student debit account linked to their customer ID). From the sentiment extracted, it is clear that this segment of the bank s customers were not happy about the available overdraft protection. 8
Results By merging the unstructured online customer support data and external voice of the customer information with structured customer account information, the bank was able to build a predictive model that looked something like: There is a greater probability that high value parent customers who have recently linked a student debit account to their customer ID will leave the bank if not offered better overdraft protection via a cross-sell offering. The bank can then operationalize this model by developing a special cross-sell overdraft protection offer to this high value parent customer segment. The offer provides them with increased overdraft protection for a nominal annual fee which can be billed directly to the parent s credit card or to the parent s checking account. Based on this insight, the bank could put an additional business rule in place that segments this group of customers and automatically offers them this overdraft protection. Conclusions Analyzing unstructured, text-based data represents a huge opportunity for companies to gain better insight and better serve their customers. Text data extracted from call center notes, emails, social media, and a myriad of other customer communication touch points can be analyzed and combined with structured data to increase the accuracy of predictive models and the effectiveness of data mining and business analytics. Leading companies are already deploying this technology as it is rapidly moving out of the early adopter phase and becoming mainstream. The value proposition is simply too compelling to ignore. Companies today both listen, understand and respond to customer feedback by incorporating the voice of the customer into business strategies for sales, marketing and support. Business analysts and knowledge workers want to gain and use insights gleaned from customer feedback to address critical business issues such as fraud, customer churn, segmentation and customer experience. The merging of text analytics with data mining and predictive analytics technologies enables companies to combine and transform VOC data (regardless of data type and source) into actionable information that both predicts and recommends next best actions and strategies. The outcomes of this merged and advanced data analysis can be made operational within existing business processes for business performance improvement and competitive advantage. Taking action based on customer feedback helps to improve business performance across sales, marketing and risk operations and these improvements directly bolster the bottom line. 9
About Angoss Software As a global leader in predictive analytics, Angoss helps businesses increase sales and profitability, and reduce risk. Angoss helps businesses discover valuable insight and intelligence from their data while providing clear and detailed recommendations on the best and most profitable opportunities to pursue to improve sales, marketing and risk performance. Our suite of desktop, client-server and big data analytics software products and Cloud solutions make predictive analytics accessible and easy to use for technical and business users. Many of the world's leading organizations use Angoss software products and solutions to grow revenue, increase sales productivity and improve marketing effectiveness while reducing risk and cost. Corporate Headquarters 111 George Street, Suite 200 Toronto, Ontario M5A 2N4 Canada Tel: 416-593-1122 Fax: 416-593-5077 European Headquarters Surrey Technology Centre 40 Occam Road The Surrey Research Park Guildford, Surrey GU2 7YG Tel: +44 (0) 1483-685-770 www.angoss.com 10 Copyright 2012. Angoss Software Corporation www.angoss.com