Miig Ustructured Text Data for Isurace Aalytics May advaces i iformatio maagemet, data warehousig techologies ad low cost of storage have provided isurers with the luxury of collectig ad storig large volumes of data both structured ad ustructured, collected through multiple chaels i a wide variety of forms. Busiess Itelligece, especially data miig has heavily relied o the aalysis of orgaized labeled data (otherwise called structured data). The modelig ad aalysis of data is easy whe data is structured. To stay ahead i the curret competitive market, isurace decisio makers delve o this data for busiess itelligece. Ustructured data collected through the differet chaels is i the form of free text. Though extractig useful iformatio from ustructured data is difficult, takig the extra-mile will provide orgaizatios with valuable isights for developig busiess strategies. Because of the huge volume of data, maual processig is a ot a viable optio. Orgaizatios are lookig towards the usage of text miig tools for extractig iformatio from ustructured free form text. The compellig beefits of text miig such as improved efficiecy i isurer s capital maagemet, exploratio of ew risks, fraud detectio ad trackig ca help isurers focus o providig superior service to their customers. This white paper iteds to provide a overview of text miig ad the processes ivolved with its usage i the isurace idustry as the backdrop. The paper also discusses TCS capabilities ad offerigs i text miig.
Miig Ustructured Text Data for Isurace Aalytics About the Author Sitikatha Kar Sitikatha Kar has 12+ years of experiece o several facets of IT software developmet spaig from Applicatio Developmet, Project Maagemet ad Program Maagemet. Sitikatha has domiat experiece o the Isurace Idustries Core Busiess Fuctios ecompassig Isurace Policy Admi Systems, Claims ad Reisurace applicatios. The author ca be cotacted at kar.sitikatha@tcs.com Feroz Kha A L Feroz is a solutio architect i the Isurace vertical of TCS. He has more tha ie years of experiece o several facets of IT software developmet spaig from applicatio developmet, project maagemet ad program maagemet. He has domiat experiece o the Isurace idustries core busiess fuctios ad large-scale trasformatio programs. The author ca be cotacted at feroz.kha@tcs.com 1
Miig Ustructured Text Data for Isurace Aalytics Table of Cotets Itroductio....................................................... 3 Text Miig for Isurace........................................... 4 Miig of Claims Documets a example of text miig i isurace aalytics................................................. 5 Text Miig Complemets Data Miig............................. 9 TCS Capabilities.................................................... 11 Text Miig Experiece i Isurace................................ 13 Summary.......................................................... 14 Refereces......................................................... 14 2
Miig Ustructured Text Data for Isurace Aalytics Itroductio Isurace compaies have bee icreasigly ivestig o aalytical tools ad techiques to get value added iformatio that ca augmet their decisio makig process. Util recetly, the focus o Busiess Itelligece (BI) applicatios were costraied i gettig valuable reports o various isurace products, claims, distributors ad overall operatioal aspects. The costraits were trouced by the usage of data miig. The ecoomic crises, scale of operatios, lack of skilled resources ad maturity of data aalytic techiques have fuelled the growig implemetatio of data miig techiques. Compaies use data miig for key decisio-makig processes, tred ad patter aalysis. Data miig o structured data has helped compaies i areas like Fraud Detectio, Market Aalysis, Claims Treds, Product ad Pricig Aalysis ad so o. The key success of data miig depeds maily o quality ad coverage of structured data. Durig aalysis of large volumes of data, quality ad coverage are two opposig poles. With more coverage, the quality of data miig is reliquished. It is impractical to predict ad capture all the key iformatio i structured data. It is also a published fact that isurace compaies capture large amout of ustructured data, the data volume ratio beig 1,2 80:20 i favor of ustructured data. Text miig is a importat techique that ca harvest vital ad useful iformatio from ustructured data. Ustructured data is collected through differet chaels ad is usually i the form of free text adjuster otes, customer service feedback, ad so o. More importatly, ustructured data captures critical iformatio like loss descriptio, evets prior to loss, specific icidets, critical remark by reviewer ad so o. I additio, there are possibilities of discoverig ukow/uidetified risks from miig ustructured data. For example, a leadig isurer uses text miig techiques to determie their sudde surge i homeower claims i a particular area ad foud that there is a ew risk due to mold i that particular area. Isurace compaies have realized the importace of explorig iformatio i ustructured text. Compaies have started usig text-miig tools for ehacig their data aalytic capabilities. I the isurace idustry, the importace of text miig is icreasig due to large volumes of captured ustructured data. This white paper focuses o providig a overview of text miig with examples specific to the isurace idustry. The paper also provides iformatio o the text miig capabilities ad offerigs of TCS. 3
Miig Ustructured Text Data for Isurace Aalytics Text Miig for Isurace Text miig combied with data miig has a wide rage of applicatios i the isurace domai. This sectio briefly describes the applicatio areas where text miig ca play a critical role. Cotet Maagemet Market Research Competitio Aalysis Fraud Detectio Isurace Text Miig CRM Lead Geeratio New Product Offerig Risk Maagemet Figure 1 : Applicatio Areas Oe of the most extesively used applicatio area of text miig is customer service. I today s competitive market, retaiig customer is a major challege ad isurace compaies have take may iitiatives to uderstad ad serve customer better. Most of the data related to customer service is ustructured data such as voice ad email coversatios, chat ceters ad so o. Text miig tools ca be helpful i aalyzig large amout of data i idetifyig key customer cocers, customer preferece ad so o. Customer aalysis ad customer specific products are keys to busiess growth. May isurace carriers are usig sophisticated aalytical tools for customer profilig of customer specific products ad campaig maagemet. Text Miig for Isurace: Claims I additio, there is vast amout of iformatio available olie such as research documets, competitor aalysis reports, patet applicatios ad idustry specific iformatio that are critical for busiess decisios. Text miig of all the metioed data ca provide very useful iformatio o future market treds, busiess eviromets, customer behavioral patter ad so o. This iformatio ca accelerate ew product developmet ad its implemetatio ehacig the compay s competitive advatage. May isurace compaies have started usig text miig primarily for ehacig their fraud trackig ad detectio capabilities. Other key areas are Claim Cost Maagemet, Risk Maagemet ad Loss Reserve. Fraud Detectio A major area of fiacial loss for isurers is isurace fraud. Data miig techiques are deployed extesively for detectig fraud cases. Nevertheless, fraudsters fid ew ways to seak through the system. Isurers fid it expesive to remodel ad rebuild the systems to track ew frauds. Text miig ca help here by miig the ustructured text data for fraud patters. I may cases, it has bee foud that the iformatio cotaied i adjuster s otes ad referrals ca be very critical to determie a fraud case. For example, claimat may provide uusual level of details about accidet, road coditios, weather ad so o. This ca be due to rehearsed text, a highly suspicious cadidate for fraud. 4
Miig Ustructured Text Data for Isurace Aalytics Claims Cost Maagemet Claims Cost Maagemet gets high sigificace due to its direct impact o customer relatioship. Claims costs are impacted by the level of ivestigatio, supply chai cost, amout of payout ad speediess of service. Claims costs are a overall calculatio ivolvig all claims, i reality, oly a small percetage of all liability ad workers compesatio claims are drivig a majority of the costs. Isurace carriers are icreasigly cocered o this ad makig effort to brig some cotrol over it. For example, oe of the focus areas i workers compesatio is early detectio of the right treatmet approach for a Ijury. Text miig ca be of immese help i idetifyig treatmet patters as most of the treatmets or medical reviews are part of text documets. I hadlig a workers comp claim, a domiat treatmet patter ca be foud for a particular type of ijury patter. Iformatio like early itervetio of medical specialists, diagosis ad rehabilitatio procedures ca be effective i addressig the delay i hadlig the claim ad cotaiig the claim cost. The ijured employee also receives prompt ad quality care. Speedig up the claims process ca icrease the employee's chace of makig a full medical recovery ad returig to work. Risk Maagemet Curret ecoomy dowtur ad regulatory rules like Solvecy II have compelled isurers to relook at their risk mitigatio strategies. Solvecy II framework provides icetives for a good risk maagemet system. Text miig ca help extractig treds o evets ad explorig ew ukow risks or the effects of kow risks. This iformatio ca help i formulatig a risk mitigatio strategy. Improvemets ca also be made to risk rule egies ad applicatio codes based o the frequecy aalysis of certai risks idetified by text miig. For example, may claims related to back ijuries are reported from costructio idustries due to heavy liftig or log duratio work. Practices, which will prevet these kids of ijuries or improve uderwritig practices for these coverages, ca help isurace carriers i maagig their claims. Loss Reserve I loss reservig, there is a opportuity for carriers to use predictive modelig alog with text miig to implemet more timely ad effective reservig practices. By idetifyig ad aalyzig loss patters, models ca help risk maagers set iitial case reserves more quickly ad accurately ad make iformed reserve adjustmets as eeded. Risk Maagers ca also use predictive aalytics to help them develop pricig structures based o the forecasted risk level of a claim, as well as reduce the overall fiacial impact of the costs of claim escalatio. Miig of Claims Documets a example of text miig i isurace aalytics This sectio explais the process of text miig with respect to its usage based o claims documets. Sice text miig ivolves itese computig, it is importat to cosider both the quality ad quatity of the documets ad data to be ivolved for text miig. Iitial filterig o irrelevat ad poor quality documets ca be effective i improvig the processig time ad quality of the results. The steps ivolved i text miig process are as depicted. 5
Miig Ustructured Text Data for Isurace Aalytics Critical Processes Collect Documets Extract Documets Features Search Patter & Treds Query & Browse Geeric System Architecture Documet Fetchig Pre Processig Task: Features / Term Extractio Processes Documets Categorized Keyword Labeled Text Miig Algorithms Idetificatio / Tred Aalysis Browsig Queries, Search Graphics NLP Domai Departmet POS Taggig Parsig NP, Costituecy, Depedecies Iformatio Extractio Etity & Relatio Extractio, Co-referece resolutoi Figure 2: Text Miig Process Pre-Processig ad Cocept Extractios Textual iformatio usually cotais descriptios etered by call cetre operators, diary otes by adjusters ad commets associated with idividual claims ad/or cases. I additio, text data may be highly fragmeted ad filled with the use of umerous abbreviatios ad acroyms. Hadlig such ustructured text data for miig purposes ca be doe i the followig two ways: Use abbreviatios or acroyms as text patters while defiig rules for extractig iformatio from the text Expad words before the categorizatio/iformatio extractio Expasios of these words ca be doe by usig dictioaries augmeted maually for acroyms specific to a idividual or compay. The first step is Tokeizatio, which ivolves breakig the text ito seteces ad words. The mai challege i idetifyig setece boudaries i Eglish laguage is distiguishig betwee a period that sigals the ed of a setece ad a period that is part of a previous toke like Mr., Dr., ad so o. The ext preprocessig step is Feature Selectio. Most of the words are irrelevat for iformatio aalysis ad eed to be removed to avoid uecessary processig. Most systems at least remove the stop words the fuctio words ad i geeral, the commo words that usually does ot cotribute to the sematics of the documets ad have o real beefits. This may eve result i performace improvemet owig to oise reductio. 6
Miig Ustructured Text Data for Isurace Aalytics Geeralized versus Specialized Backgroud Kowledge Decisio to use geeral ad specialized backgroud kowledge for preprocessig is very critical. The more commoly used source for geeralized backgroud kowledge is Wordet, which ecompasses taxoomies, lexicos ad otology for the Eglish laguage. Iformal simpler taxoomies or lexicos based o geeral-use kowledge sources such as commercial dictioaries, ecyclopedias, fact books, or thesauri are also used. Specialized backgroud kowledge origiates from the particular domai or problem area. Otology specific to a particular area is commercially available. Iformatio related to a claim i a adjuster ote/diary is highly costraied i terms of vocabulary. Most of the etities are geeric i ature, ad ca be hadled usig a commercial dictioary. Taxoomies ca be leveraged for specific iformatio like ijuries ad related treatmets, stadard treatmets ad diagosis. Additioally there are terms used by specific adjusters or isurace compaies. These etities are small i umber ad ca be hadled by augmetig the dictioary or usig a separate maual dictioary. Natural Laguage Processig: Statistical versus Rule-Based Techiques Oe of the most used optios for Natural Laguage Processig (NLP) relies o statistical techiques, ivolvig the processig of words foud i texts. These techiques will require traiig materials, which will provide the examples of idetifyig key etities, their relatios ad depedecies. Ay chages to the traiig documets or algorithm will require retraiig. Aother approach adopted for atural laguage processig is rule-based techiques leveragig kowledge sources such as otology ad taxoomies - both geeralized ad domai specific. The decisio o the techique to be used is ofte depedet o the availability of traiig materials, exteral resources, ad actual text aalysis tasks required i the resultig applicatios. The uderlyig iformatio i claims documets ca fall ito geeric ad specific categories. For example: Geeric: Types of loss ad its cause Specific: Iput from experts, ew types of loss or techiques to hadle a loss Rule-based techiques are more suitable for hadlig geeric iformatio. A statistical approach gives a additioal capability for users for defiig cocepts, which is iterestig. Therefore, both rule-based ad statistical approach is suitable for hadlig claims documets. For defiig rules, existig exteral or iteral framework eeds to be leveraged i order to quicke this process. Too may rules ca complicate rule maiteace ad performace of the system. The rules ca be defied declaratively usig DIAL. DIAL is preseted as a rule-based iformatio extractio laguage where the patter matchig elemets are either explicit strigs foud i the text (such as the word expressio), a word class (a specific set of lexical terms), or aother rule. Performace Measures The most commo performace measures are the classic iformatio retrieval measures of recall ad precisio: Recall for a category is defied as the percetage of correctly classified documets amog all documets belogig to that category Precisio is the percetage of correctly classified documets amog all documets that are assiged to the category by the classifier 7
Miig Ustructured Text Data for Isurace Aalytics There is always a tradeoff betwee these two parameters. To achieve high accuracy for some etities, a large umber of rules/traiig documets are required which ca affect the cost ad effort. I claims processig, the mai goal is to retrieve iformatio which ca aid the decisio makig process. Substatial iformatio is available i the structure data. Therefore, a very high precisio may ot be required. Meawhile it is importat to defie a threshold limit i order to avoid wrog iformatio extractio Miig ad Likig Operatio Miig capabilities support large rage of query types, iclude full rage of distributio, proportio, frequet set, ad associatio queries. Commercial software offers several precofigured query formats that are used very ofte durig aalysis activities. Followig are some of these queries: Frequecy distributio of etities Distributio of associatio betwee etities Tred Aalysis Assigees that appear together o the same patet (idicates joit veture, joit developmet, partership, or other corporate relatioship) Miig operatio supports a wide rage of costraits o its etire query types, icludig typical backgroud, sytactical, quality-threshold, ad redudacy costraits. It also supports time-based costraits o may queries, allowig variatio o tred aalysis queries ad flexibility i comparig distributios over differet time-based evets/facts. Likig is creatig liks. This is based o the outcome of the preprocessig stage, betwee etities either by usig cooccurrece iformatio (withi lexical uit such as a documet, paragraph, or setece) or by usig the sematic relatioships betwee the etities as extracted by the iformatio extractio module (such as Ijuries from certai evets, Loss Patte from certai locatios or idustries ad so o). Algorithms Categorizatio, iformatio extractio ad patter matchig uses various mathematical models icludig statistical eural etworks, decisio tree ad rule based ad geetic algorithms. A compariso amog them is early impractical though there are some preferred approaches for specific process ad cotext. Followig table details the algorithms. Process Steps Algorithms Preferred Categorizatio (Supervised) Clusterig (Usupervised) Iformatio Extractio Likig Aalysis Support Vector Machie (SVM) Cart, C4.5 (Decisio Tree) Ripper (Decisio Rule) Neural Network knn (Nearest Neighbor) Bayesia Logistic Regressio AdaBoost (Beggig ad Boostig) k-meas Algorithm Hierarchical Agglomerative Clusterig (HAC) EM Based Probabilistic Clusterig Hidde Markov Model (HMM) Stochastic Cotext Free Grammar Maximal Etropy Markov Model Graph Theory Cocepts SVM, AdaBoost, knn k-meas Algorithm HMM - 8
Miig Ustructured Text Data for Isurace Aalytics Text Miig Complemets Data Miig Data miig is extesively used i isurace idustries for derivig useful iformatio, predictig busiess solutios, optimizig operatioal parameters ad so o. Oe of the deficiecies i structured data is that it does ot capture the dyamic iformatio. Oe of the examples is evets leadig to loss. Loss ca occur i may ways ad is early impractical to capture all evets through structured data. Hece, this iformatio is captured i a ustructured format. Aother example is, uderwritig ew lies of busiess where uderstadig of critical data develops gradually, hece may critical data is captured i ustructured data. Claims Datamiig I claims, data miig is used extesively to determie loss reserves, forecast future losses, loss patters, loss treds, fraud detectios ad so o. May of the predictios like Forecast of Future Losses ad Fraud Detectio depeds mostly o models based statistical aalysis of data ad busiess rules. Therefore, data compoets play a major role i determiig the accuracy of the predictio. A missig data attribute or error i data ca sigificatly affect the outcome. For example, a North America isurace compay foud that their datamiig tools were ot able to detect the reaso for sudde jump i home claims. After aalysis, it was foud that oe of the data attribute was missig from the structured data ad available i ustructured data. Therefore, it is importat to evaluate the ustructured data for critical iformatio. The iformatio extracted from data miig ca be utilized for aidig ad improvig the data miig aalysis. Example 1 Sceario for Auto Accidet Fraud: There are may patters i fraud related to Auto claims. Followig is a accidet sceario, perso has some prior ijuries ad do ot have meas to treat those ad the perso has collaborated with a practitioer/service provider for gettig moey from isurace compaies by makig false claims I such cases, the perso is likely to raise large amout of medical bills related to lab tests ad treatmets. To idetify the claim as potetial fraud case, a compariso ca be made o the followig: Accidet coditio versus Ijury Ijury versus lab tests ad treatmets (selected for this example) Most of the details about lab tests ad treatmets are available i Adjuster Note ad scaed medical documets. Text miig ca help i gettig the treatmet related iformatio that ca be combied with structured data like ijury code, age, geder, ad other ifo for fidig potetial fraud. Key Structured Data 1 Claim ID 6 Geder 11 Ijury Locatio 2 Birth Date 7 Accidet Code 12 Prior Ijury 3 Loss Date 8 Body Part Code 13 Treatmet Code 4 Loss Descriptio 9 Ijury Code 14 Rehabilitatio Code 5 Ijury Descriptio 10 VIN 9
Miig Ustructured Text Data for Isurace Aalytics Ustructured Data: Adjuster Notes Followig are few examples of oe claim. 03/01/2010: Mr Bob, the claimat was goig for a log distace drive, met with a accidet. His vehicle was hit from behid. The car has a det ad he is havig severe pai i the back 04/01/2010: A paymet of $10,000 made agaist medical practitioer fees ad series of tests recommeded by the Chiropractor. Followig are the tests X-ray, Scaig, Blood test ad so o 04/01/2010: A paymet of $5,000 made agaist treatmet related to Chiropractor. Chiropractor has recommeded cotiuous Physiotherapy, medicatio ad so o 04/01/2010: Mr Joh, Lawyer o behalf of claimat requested advace paymet for the treatmet cosiderig the expesive ature of treatmet. He requested a advace paymet of $50,000 to cover two moths of medical expese Pre-Processig ad Cocept Extractio I the iitial steps, stop words ad commoly used words (ot relevat to Target operatio) are removed. I this example, cocepts related to lab test ad treatmets are selected. The extracted cocepts ca be compared agaist a table or file cotaiig valid ijury ad treatmet code for isurer (colum Selected ). The cocept importace reflects the umber of claim (percetage) files referred to a particular treatmet versus total umber of claims files. Cocept Extractio Cocepts Selected Cocepts Selected Severe Pai Chiropractor X RAY Physiotherapy Scaig Medicatio Chiropractor Blood Test Combiig with Structured Data for Ijury Back Pai Structured Data Elemets Text Miig Elemets Cocept Importace Ijury Code, Age, Geder, Paymet X Ray 100 Scaig 10 Blood Test 50 Physiotherapy 30 Medicatio 60 These data suggest that, scaig is used ifrequetly for the metioed ijury profile. This may be due to oe of the the followig reasos: 1. Fraudulet case 2. Perso has prior ijury 3. Ijury Data error 10
Miig Ustructured Text Data for Isurace Aalytics A perso s prior ijury ca be verified by coectig to their medical history. Data error ca be re-checked with secured police reports or recorded coversatios. If the claim does ot belog to reasos 2 ad 3, it is most likely a fraudulet case. Example 2 Patter Aalysis: Back ijury is most commo claim i worker s compesatio claim. Most of the ijury data ad treatmet data are available i a ustructured data format. A patter aalysis will help i idetifyig most cotributig idustries ad commo treatmet method applied for these ijuries. This will also help uderwriters ad claim adjusters i terms of risk maagemet ad optimizig the treatmet pla. Followig is a example of Tred Aalysis - Back Ijury for Costructio Idustries. Costructio Miig Heavy Idustries Auto 650 1250 550 550 Back Ijury 2125 250 625 Treatmet 1 300 1500 150 325 Treatmet 2 100 300 Treatmet 3 325 Hospital 1 Hospital 2 Hospital 3 Figure 3 : Example of Back Ijury Lik Aalysis TCS Capabilities TCS has a dedicated R&D team, which has created a array of toolsets ad methodologies for hadlig differet types of applicatios i text miig area. TCS has executed may challegig text-miig projects for its customers usig these i-house tools. The key areas, which TCS tools addresses are: Opiio/Setimet Aalysis Aalysis of free text i customer surveys Aalysis of call cetre trascripts Aalysis of free text i eterprise 11
Miig Ustructured Text Data for Isurace Aalytics Opiio/Setimet Aalysis Social etworkig sites have become a importat chael for expressig opiios, exchagig ideas, sharig kowledge ad so o. Data from blogs, forums ad other social etworkig sources cotai iformatio that ca be critical for may compaies, for example, opiio o particular isurace product or services o claims ad so o. May isurers are showig iterest for automatic aalysis of these texts/ voices for derivig useful iformatio. A TCS tool called Optra offers wide rage of features icludig opiio ad setimet aalysis alog with idetificatio of the causes for egativity or positivity as expressed i the feedback. Followig are few key features: Extractio of importat Etities Evet Idetificatio Setimet Aalysis Opiio Trackig Tred Aalysis Aalysis of Free Text i Customer Surveys May compaies have implemeted customer survey as oe of the key method, to better uderstad their products, employees, parters ad so o. Due to large amout of free form text data collected durig surveys, text miig ca be used to perform automatic data ad patter aalysis. QUEST, a tool developed ad offered by TCS has a wide rage of capabilities icludig slicig ad dicig survey resposes to geerate reports/charts, group textual suggestios ad commets ito similar classes, discover commo characteristics of employees with high/low satisfactio, ad fid hidde depedecies ad patters amog resposes. Aalysis of Call Cetre Trascripts Aalysis of call cetre trascripts has become aother focus area for may compaies for improvig productivity of help desk, customer satisfactio, idetifyig frequet problem areas ad so o. The TCS Tool TAAS, offers rich set of fuctioalities icludig: Idetifyig characteristics of tickets/problems/resources that take time to resolve Factors impactig the resolutio time Top k types of most usatisfied customers Idetifyig profile of experts with respect to tickets resolvig capacity Assessig the state of the system with key process idicators Uderstad customer setimets, opiios from feedback from forums like Blog, Discussio Forum Geerate Pro-active alarms Uderstad true voice of customer TCS Text Miig Tools Optra TAAS Improve productivity i the call ceter Call ceter performace Cotextual feedback o customer experiece o call ceter Automatio of clusterig ad segmetatio of customer Setimet Aalysis/Feedback Aalysis Isightful ifereces combied with data miig Quest INX Auto-summarizatio of documets, reports for exec level kowledge sharig Reduce costs ad ehace speed of kowledge publishig Figure 4: TCS Text Miig Tools 12
Miig Ustructured Text Data for Isurace Aalytics Aalysis of Free Texts i Eterprise As metioed earlier, i isurace idustries, critical iformatio is available i ustructured data format, for example, Adjuster Notes, Claims Diary ad so o. The INX tool offered by TCS ca be used for data extractio from such ustructured data sources. The INX tool has may features icludig: Support multiple documet formats: Doc, RTF, XML Built-i Library for Pre-Processig Itegratio with SharePoit User extesible routies Gazette learig fuctioality Cycle maagemet ad reportig Text Miig Experiece i Isurace TCS has cosiderable experiece i deployig text miig for isurers across the globe for various purposes icludig: Customer Survey Aalytics Warraty Claims Aalytics Customer Survey Aalytics A leadig isurace provider i UK, which has a huge customer base, collected customer feedback to uderstad the Voice-of-Customer as part of their customer satisfactio improvemet iitiative. The customer feedback was captured as structured attributes such as serial umber, product type ad so o as well as free text attributes customer resposes for specific questios. The busiess team of the isurer required a quick ad deep isight of the customer feedback ad wated to make the kowledge available to the operatioal people. TCS was assiged the task of aalyzig the customer feedback data. TCS QUEST tool was used to extract isight from the feedback data. TCS followed a phased approach to idetify the right tool ad subsequetly desiged the text miig solutio. Followig were the key features of the solutio: Automated clusterig that scas all questios ad resposes thus uearthig more from the iformatio Less errors ad reduced effort Further opportuity to ehace questioaire ad improve quality of feedback collected I additio to achieve a better uderstadig of the customer, the implemetatio of the text miig solutio provided deeper isight i the followig aspects: Idetificatio of Key Product issues metioed i the text data Derivatio of a Product Satisfactio Idex Voice of customer - key suggestios for improvemet of product 13
Miig Ustructured Text Data for Isurace Aalytics Warraty Claims Aalytics INX tool has bee deployed for customers i the specialty isurace domai. I oe of the cases, the customer was cocered about the huge claim paymet related to warraty policies ad required diagoses to cotrol the claim cost. As warraty repair records cotaied rich text data o damages ad repair, customer wated to implemet text miig to uderstad the root cause aalysis of the most frequet, complex ad expesive problems. INX tool was used for extractig the iformatio ito a structured repository ad the kowledge discovery techiques were used to fid sequeces of repair actios ad associatios. The followig were the key beefits of the implemetatio: Aalysis, simplificatio ad optimizatio (for example, for cost or time) of maiteace procedures Represetatio of best practices for maiteace (for example, as rules or fault trees) Traiig of maiteace persoel towards best practices Reducig compoet replacemets Improvemets i product maufacturig ad desig Summary Text Miig is still i its ascet stage but is icreasigly gettig popular i may areas of the Isurace domai. This is evidet from the recet treds ivolvig large isurers usig text miig for maagig claims costs, discoverig ew risks ad uderstadig the customer eeds better. The processes of text miig discussed i this paper are a widow to the world of text miig aalytics. Isurers should choose the right aalytical techiques ad combie them with the power of text miig techiques to achieve optimum results. TCS has implemetatio expertise i text miig i Isurace domai that most isurers would look for i software vedors. TCS has implemeted text-miig products such as INX for may customers i the recet years ad has strategic alliaces with leadig aalytic vedors such as SAS ad IBM. Isurace compaies have ivested i capturig structured data, which are critical for data miig operatios. Text miig ca add aalytical value by extractig data from ustructured texts, which ca augmet the core structure data for data miig operatios. We believe isurace carriers will icreasigly use text miig to augmet their data miig capabilities. Refereces 1. Key Issues for Eterprise Cotet Maagemet Iitiatives, 2009, Garter, 23 March 2009. 2. Clarabridge Bridgepoits Article http://clarabridge.com/default.aspx?tabid=137&moduleid=635&articleid=551 14
About TCS Isurace Solutios Group TCS Isurace Solutios Group is a strategic team that weaves isurace domai expertise with techology acume. With decades of experiece behid them, the team works o creatig customized solutios leveragig TCS capabilities ad offerigs to achieve customers busiess goals. Team delivers busiess value through right solutios to global isurace clietele. About Tata Cosultacy Services (TCS) Tata Cosultacy Services is a IT services, busiess solutios ad outsourcig orgaizatio that delivers real results to global busiesses, esurig a level of certaity o other firm ca match. TCS offers a cosultig-led, itegrated portfolio of IT ad IT-eabled services delivered through its uique Global Network Delivery TM Model, recogized as the bechmark of excellece i software developmet. A part of the Tata Group, Idia s largest idustrial coglomerate, TCS has over 160,000 of the world's best traied IT cosultats i 42 coutries. The Compay geerated cosolidated reveues of over US $6.3 billio for fiscal year eded 31 March 2010 ad is listed o the Natioal Stock Exchage ad Bombay Stock Exchage i Idia. For more iformatio, visit us at www.tcs.com. isurace.practice@tcs.com TCS Desig Services M 0710 Subscribe to TCS White Papers TCS.com RSS: http://www.tcs.com/rss_feeds/pages/feed.aspx?f=w Feedburer: http://feeds2.feedburer.com/tcswhitepapers All cotet / iformatio preset here is the exclusive property of Tata Cosultacy Services Limited (TCS). The cotet / iformatio cotaied here is correct at the time of publishig. No material from here may be copied, modified, reproduced, republished, uploaded, trasmitted, posted or distributed i ay form without prior writte permissio from TCS. Uauthorized use of the cotet / iformatio appearig here may violate copyright, trademark ad other applicable laws, ad could result i crimial or civil pealties. Copyright 2010 Tata Cosultacy Services Limited www.tcs.com