The Business Case fr Using Unstructured Text Analytics n IBM Pwer Systems fr Critical Decisin Making Stephen Markham, PhD Michael Kwlenk, PhD Ple Cllege f Management Nrth Carlina State University It's the questin; nt the technlgy Michael Kwlenk Big data prmises t dramatically alter the business envirnment, hwever, technlgy is nly a decisin enabler. Many firms apply structured appraches t big data fr rutine peratinal decisins. Making un-prgrammed, critical and strategic decisins, hwever, usually invlves unstructured data. This paper describes the business value f using big data techniques t gather and analyze unstructured data. This apprach uses critical thinking embedded in a prcess alng with advanced servers and sftware t cnvert big data int business value. The prcess and results pen new pprtunities that require adaptive structures, culture and expertise t fulfill the prmise f big data. This article has three purpses: First, t explain the difference in hw structured and unstructured data are used in decisin making. Secnd, give examples f hw cmpanies realized business value using unstructured data. Third, prvide guidance fr hw cmpanies can realize similar business value using unstructured data, and why picking the right servers and sftware can make a difference.
1. The difference in hw structured and unstructured data are used in decisin-making Unstructured text cmprises abut 80% f the data available tday. Firms that nly use structured data miss benefitting frm the majrity f available infrmatin. Unstructured data includes all the text cntained in gvernment reprts frm the SEC, NIH, NSF, DOE, as well as all academic research, business and financial analyst s reprts, cnsultant research results and many ther surces. Unstructured text is als fund in a myriad f scial media utlets such as Facebk, blgs, custmer cmplaint lgs, Twitter, news transcripts, ppular press, specialty magazines and many thers. Cntained within unstructured data are custmer needs, cmpetitr actins, emerging trends, and ther individual pieces f infrmatin necessary t make critical business decisins. Structured appraches t big data gather and aggregate rws and clumns f numbers t infrm decisin makers. Large sets f numbers are analyzed using advanced statistical techniques t reveal valuable patterns in the data. These techniques allw decisin makers t see what has r what is happening in real time. Analyzing data that can be added, subtracted, multiplied and divided relies n a structured apprach t aggregate the data and are essential fr making peratinal decisins such as pricing, distributin, and inventry. Unstructured appraches n the ther hand seek t islate critical pieces f infrmatin. Fr example, unstructured big data finds annuncements that a cmpetitr is building a new facility, r that a custmer is expanding peratins. This will give decisins-makers time t react befre the structured data reveals a decrease in sales revenue. T reliably find a needle-in-the-haystack ne must use big data t gather vast amunts f unstructured text and use specialized prgrams t lk fr specific pieces f infrmatin acrss tens f millins f dcuments with the unflinching eye f a cmputer. Critical Thinking Drives the Use f Big Data. The ability t make data driven decisins assumes that peple knw what questins t ask but ur experience has shwn that this is nt always the case. Many rganizatins lack prcesses t apply critical thinking. In each f the industry-spnsred prjects cnducted by the Center fr Innvatin Management Studies (CIMS) 1 bth startups and Frtune 500 cmpanies struggle with generating strategic inquiries. Thinking must drive the use f big data big data cannt drive thinking. Critical thinking is the basis fr finding the data surces and tls t recgnize the underlying meaning in unstructured text. Fr example, the statement Our cmpany needs t examine scial 1 CIMS is a graduated IUCRC. Frmed in 1984, it is the nly NSF-spnsred research center dedicated t investigating the rganizatinal and managerial effects n innvatin. 2
media fr sentiment analysis can be brken dwn int a series f sub questins such as: Sentiment as it relates t ur prduct Sentiment as it relates t the cmpetitin Custmer likes and dislikes f the prduct class in questin Hw ur cmpany s new prducts address custmer dislikes What is the cmpetitin develping that may address custmer dislikes Drawing n the principals f critical thinking, a prcess was develped t gain business value frm unstructured text. In practice, the applicatin f critical thinking t the develpment f the business decisin-making prcess has been successful in numerus cmpanies. The Prcess fr Using Critical Thinking t Analyze Unstructured Data The prcess (Figure 1) requires the participatin f a crss-disciplinary team f affected stakehlders frm the start and has prven successful in a number f industries. This prcess uses critical thinking techniques t: 1) define the prblem statement and frm specific questins t inquire abut, 2) identify surces f infrmatin, 3) identify the search terms and define the relatinships between the terms (called rules), 4) apply big data technlgy t the terms and rules t gather, stre and analyze massive amunts f data cllected frm external and internal surces, 5) assess the data fr sufficiency, applicability and veracity, and finally 6) nce the filters have been applied t the data, the team evaluates the evidence that either supprts r refutes the assumptins r cnditins required t make the decisin. It is imprtant t emphasize that this is an iterative prcess. The act f cllecting and filtering the data ften results in new insight that requires further investigatin. This requires the team t vercme individual and grup bias when faced with new evidence. Issues Big Data Tls Decisins Define Questins Identify Surces Dictinary and Rules Big Data Wrk Assess Data Scre Data Iterative prcess Figure 1. Prcess fr unstructured data enabled decisin-making 3
2. Examples f hw cmpanies realized business value using unstructured data Just five f the dzens f examples f business questins successfully addressed with this prcess are utlined in the table belw. Each example represents a different type f the mst cmmnly asked questins. Anther takeaway frm these examples is that the decisins are nt based n quantitative analysis f structured data but rather n the interpretatin f the facts derived frm analyzing unstructured data. The selectin f the questins, use f terms and the creatin f dictinaries and rules allws ne t cnfigure the sftware t use it t answer a wide variety f critical decisins. Industry: Cmpany Questin Surces Outcme Temprary wrkfrce: Kelly Services Industrial Gases: Air Prducts University: NC State Clinical Research Organizatin: PRA Internatinal Nn-Gvernmental Organizatin: Clintn Health Care Access Initiative Develp new service fferings in healthcare staffing Find new custmers and market pprtunities Identify cmmercial partners fr new technlgies Prvide business intelligence fr new clinical trials Find the fit between new technlgies and market pprtunities fr disease diagnstics SEC, URLs, Trade jurnals, Prfessinal jurnals, Insurance prviders SEC, News feeds, Industry publicatins, Building permits SEC, URLs, Industry publicatins Clintrials, PubMed Clintrials, PubMed VC firms Decisin t mve frward in an unexpected healthcare dmain Identificatin f a new custmer planning t build new facilities Ptential partners identified fr cllabratins Identify new physicians/hspitals with expertise in areas f clinical trials. Identificatin f research labs active in cutting edge diagnstic research Table 2. Industry Examples Kelly Services. Develp new staffing service fferings in healthcare Kelly Services is a leader in temprary staffing and has expanded its fferings int a fullservice staffing slutins cmpany. They wanted t explre if ffering temprary staffing services t the healthcare industry wuld be viable. Wrking with internal and external subject matter experts, unstructured data abut the need fr this new service, and cmpetitin and regulatins, nt nly revealed numerus pprtunities but als prvide guidance abut hw t prceed. 4
It was fund that there is a large need fr nurses t wrk remtely t prvide medical assessments and advice t a variety f clients including hspitals, nursing care facilities, insurance cmpanies and self-insuring rganizatins. It was als fund that there is a nurse shrtage and the need was nt being met n a lcal r reginal basis. Cmbining Kelly s natinal staffing capabilities with reginal needs allws Kelly t uniquely meet demands unknwn by their cmpetitrs. It was als fund that the ability fr health care prviders t get paid fr these types f telemedicine services varied greatly by state. Unstructured text analytics was used t find the exact regulatins in every state. This prvided Kelly with the knwledge t enter the right states with favrable regulatins with the right services. This new service is nw ne f tw majr innvative initiatives in Kelly. Air Prducts and Chemicals. Find new custmers and market pprtunities Air Prducts ffers industrial gases and specialty chemicals in a dzen business verticals. They wanted t identify ptential new custmers and prvide their sales frce with specific infrmatin abut these custmers. In metal prcessing it is imprtant t identify custmers early, even befre prductin starts. Metal prcessing plants are designed with specific gas feeds, s it is imprtant fr gas suppliers t knw as sn as pssible in rder t meet new custmer s needs. The data gathering prcess started by reading all f the SEC data fr cmpanies with metal prcessing SIC cdes t identify cmpanies that were planning capital expenditures n new plants and equipment. Als reading all the lcal newspapers in an attempt t find cmpanies mentined in regard t hiring peple. Additinally, reading freign newspapers (22 languages) in cuntries that prduce metal prcessing equipment fr annuncements abut new sales f equipment. Finally, reading all building permits in the US t identify metal prcessing cmpanies that were building new plants. The research uncvered a metal prcessing cmpany building a new plant in Alabama wh rdered a specific type f equipment and annunced a specific number f new jbs. Nt nly did Air Prducts identify a new prspect, they als learned what type f equipment they were ging t use (thus defining the type f gas needed), hw much metal the cmpany planned n prcessing (indicating the vlume f gas needed) -- all eighteen mnths befre cnstructin started. This gave Air Prducts a distinct cmpetitive lead in appraching this new custmer. Additinal analysis als revealed a number f existing custmers expanding their peratins, infrmatin that was nt previusly knwn. Nrth Carlina State University. Identify ptential cmmercial partners fr new technlgies Like many universities and cmpanies NC State has many patented technlgies nt being cmmercialized. The university wanted t pinpint what cmpanies might have a need fr these technlgies. Using a subset f technlgies, unstructured text analysis was used t identify the characteristics f cmpanies fr each f the technlgies. 5
Results identified high ptential targets that were apprached by university fficials with the technlgy capabilities matched t the uses fr the target cmpany. This resulted in additinal negtiatins and licenses fr previusly unused technlgies. PRA. Prvide business intelligence fr new clinical trials PRA Internal is a clinical research rganizatin that cnducts drug trials fr pharmaceutical cmpanies. They wanted t gain better insight n what ther clinical research rganizatins were ding, and t better predict what their custmers might want in the near future s they culd better serve them. Unstructured text analytics was used t read fur millin data files cntaining ver 75 millin web pages. These unstructured surces f infrmatin were then linked t structured infrmatin cntained in www.clinicaltrails.gv and cmpany specific data files t prvide a much mre in-depth picture f what was happening in their industry. In ne prject, PRA wanted t knw what was happening in myelma research. They wanted t knw a variety f things nt cmmnly available, such as the reasn fr previus myelma trial failures and what cmpanies might be wrking n myelma withut it shwing up in gvernment and industry reprts. This research uncvered several relevant findings. The majrity f trial failures were due t lack f enrllment, but ther imprtant reasns emerged. Interestingly, seven cmpanies were fund using the same gene targets fr a variety f ther indicatins such as asthma, breast and lung cancer, Parkinsn s disease, Alzheimer s, sickle cell anemia, and blindness. Als fund were the names f the managers f these prjects, which enabled PRA t frm relatinships t better advance their wn research. Clintn Healthcare Access Initiative (CHAI). Find a fit between new technlgies and market pprtunities fr disease diagnstics CHAI wanted t assess if their healthcare investments in diagnstics was effective. They were supprting a large number f initiatives and wanted t maximize the effect they were having. Big data was used t assess nt just the effectiveness f the new diagnstics but als what impact the new diagnstic tls might have. By assessing unstructured data frm thusands f articles and reprts it was fund that the impact f develping new diagnstics fr new diseases wuld have much less impact than making existing diagnstics fr mre cmmn diseases. This caused a reassessment f their strategy and realignment f their resurces t favrably impact mre peple. Cnclusin. These examples demnstrate the business value f using unstructured text analytics t find and assess new business pprtunities, find and qualify new custmers, prvide vastly imprved and mre detailed business intelligence and t make strategic resurce allcatin decisins. Unstructured data can be used fr many ther applicatins where decisins require knwing specific infrmatin that cannt be added subtracted multiplied r divided. 6
3. Hw cmpanies can realize business value using unstructured data The advantage f using unstructured big data is that the tls and techniques have nw been refined t allw peple with critical thinking skills t answer imprtant business questins withut being sftware prgrammers r statisticians. The prcess can seem cmplicated at first and the sftware specialized, but it is n lnger in the exclusive purview f data scientists. An unstructured big data prject requires teaming acrss the IT department, t supprt the sftware and help gather and stre the data, and the statistical analysts, t run the big data prcess. But by far the mst imprtant ingredient fr successful unstructured data analysis is the critical thinking ability f the cmpany s subject matter experts (SMEs). This is an interdisciplinary decisin-making prcess capability - it is nt a netime event. Therefre, the cmpany must cmmit the SME resurces necessary t run the prcess and make the decisins, and the senir management with the authrity t act n them. This demcratizatin f big data fr use by business peple is nt just a benefit r even an aim f these tls, rather it is a necessity. Business cntent must drive the questins, terms, surces and rules necessary t realize business value. It is critical t chse the right sftware t cnduct unstructured text analytics. There are many cmmercial and pen surce prgrams that can be used. If yu d nt chse a prgram that has a graphical user interface and the ability t seamlessly interact with large data sets yu will need data scientists just t run the technical part f the sftware. IBM Cntent Analytics Studi (ICA) sftware uniquely meets all the criteria t allw business cntent peple t engage in big data analytics f unstructured data fr themselves. With a few days f training mst business peple learn t use ICA well enugh t apply it n a cntinuing basis in their area. Thus, big data can be used rutinely by a wide variety f peple rather than n a special prject basis. It is als imprtant t use the right server platfrm fr big data. Althugh we seek t islate highly specific infrmatin t make decisins rather than aggregate large data sets, we can nly identify that critical piece f infrmatin if we can gather it and prcess it in a timely manner. Cnsequently, chsing the right server platfrm can be a crucial aspect f deplying a successful unstructured big data prject. Sme cmpanies just getting started decide t use x86 because they already have the servers installed and the big data sftware, including ICA, runs n x86. There are prfund limitatins, hwever, t this apprach. While x86 servers may suffice fr small demnstratins, the lw reliability f the platfrm hinders adptin f big data. Yu may nt be able t actually demnstrate the fully value f big data n x86 servers. We use bth x86 and IBM Pwer Systems servers fr prcessing big data. There is n dubt that the Pwer servers are far superir. The x86 servers crash s ften that we chse nt t use them with ur clients. 7
Pwer System server, with the POWER8 prcessr-based technlgy, were designed fr big data t run mre cncurrent queries in parallel faster, acrss multiple cres with mre threads per cre. They als have increased memry bandwidth and faster IO t ingest, mve and access data faster. This allws cmpanies t run these data-hungey analytics queries faster. Gaining a cmpetitive advantage with big data Unstructured data analysis can be the surce f great business value if the apprpriate prcesses, tls, skills and structure are implemented tgether. Frustratin and disappint await yu if yu d nt implement these elements tgether. The cst f failure is t fall behind cmpetitrs that are successful at implementing big data. The key t a successful implementatin is t establish a simple prcess fr business cntent experts and decisin makers t use n a regular basis. This means that bth the inputs and utputs f the decisin making prcess must be apprpriately resurced and rles and respnsibilities established. Yu must establish a reprting relatinship with decisin makers t ensure accuntability. Perfrmance and utcme expectatins must be established fr what yur cmpany wants frm big data. The right infrastructure must be used t ensure the required levels f perfrmance and reliability and met. A prper implementatin prmises business value and cmpetitive advantage by arming decisinmakers with mre targeted infrmatin. 8 POC03173-USEN-01