A Member f OneBeacn Insurance Grup Big Data Making Sense f Structured and Unstructured Data Authr: Edgar Germer, Risk Cntrl Specialist Published: December 2013 Executive Summary Are yu being bmbarded with emails abut big data, weekly if nt daily? D yu hear the term big data being tssed abut as if everyne shuld knw what it is? Then surely, yu must be asking yurself What is big data anyway? This whitepaper will help answer that questin thrugh a high level verview f this cncept addressing: What are big data and big data analytics? What is meant by structured versus unstructured data? What technlgies, applicatins and ecnmic pprtunities are presented by big data and big data analytics? What is Big Data? N ne is quite sure wh cined the term big data and began using it in the cntext we see tday The earliest hints are attributed t Jhn Mashey at Silicn Graphics in the early 1990s 1 In its simplest frm, big data is nthing mre than a large cllectin f data residing in strage databases The key attribute is that the amunt f data stred exceeds the rganizatin s strage and cmputing capacity, verwhelming the rganizatin and inhibiting the ability t analyze the data fr decisin-making purpses The magnitude r extent f big data is dependent n an individual rganizatin s cmputing and strage capacity Tday typical big data magnitudes are in the rder f terabytes (trillin r 1,000,000,000,000 r 10 12 bytes) and petabytes (10 15 ), with exabytes (10 18 ) and zettabytes (10 21 ) nt far away T put it in perspective, 1 terabyte equates t the data frm 2,000 hurs f CD-quality music while 10 terabytes equates t the data fund within the entire US Library f Cngress print cllectin Tw petabytes culd stre all US academic research libraries Fr a deeper and smewhat fun perspective n what these magnitudes represent click here Industry expert IDC (Internatinal Data Crp) describes the grwth in data as being a five dimensinal phenmenn 2 : Vlume f data: Greater vlumes f data generated frm transactins, text, scial media, sensrs and s n Variety f data: Data tday cmes in many frmats traditinal data, text dcuments, email, sensr and meter-cllected data, vide, audi, pht, etc By sme estimates, 80 percent f an rganizatin's data is nt numeric 3 Velcity f data: Hw fast data is being prduced and hw quickly it must be prcessed t meet demand Variability f data: Data generatin varies and is incnsistent Daily, seasnal (Mther s Day), and event-triggered (Arab Spring and Uprising) peaks in data lads can be challenging t manage especially when scial media frums such as Twitter are invlved 1
Cmplexity f data: Large vlumes and varieties f data frm multiple surces make it challenging t link, match, cleanse and transfrm data acrss systems Structured vs Unstructured Data Tday, data is viewed quite differently than years ag Traditinally, data had t be prvided in a structured (numerical) frmat in rder t be mined and analyzed By sme estimates, structured data tday represents abut 10 30% f a cmpany s data The remaining 70 90% f data is cnsidered unstructured, meaning freefrm text, images, audi and vide 4 Unstructured data cmes frm websites, crrespndence, custmer service center recrds, scial media, custmer cmplaints and many ther surces It is cntained in dcument repsitries, emails, spreadsheets, audi/visual files, scial media sites and texting channels The availability f unstructured data and the ability t extract meaningful infrmatin frm it is a significant difference adding t the big data/big analytics phenmenn What is Big Data Analytics? Due t relatively limited cmputing capacity histrically available, rganizatins have been cnfined t analyzing subsets f data fr decisin-making purpses; r they were limited t simplistic analysis because the vlume f data verwhelmed their prcessing platfrms Organizatins culd be verlking imprtant trends due t finite prcessing pwer, strage capacity r tls t effectively analyze the extent f such data Tday, hwever, we have specialized sftware tls and extensive cmputing prcessing pwer t tackle this prblem thanks t the emergence f big data analytics Big data analytics is the expedient prcessing f large vlumes f data using new hardware and sftware technlgies t extract meaningful infrmatin t make better decisins 5 Crrelatins that were never pssible may be develped if there is a large enugh dataset, apprpriate analytical tls and sufficient cmputing pwer Hidden patterns emerge allwing fr better decisin making and assessment in fields such as business, healthcare, epidemilgy and s n Enabling Technlgies A number f new technlgical advancements that are enabling rganizatins t make the mst f big data and big data analytics include 6 : Cheap, abundant strage and server prcessing capacity: The cst f a gigabyte f strage has drpped frm apprximately $16 in 2000 t less than $007 as f Nvember 2011 7 Faster prcessrs: Based n Mre s law, prcessr speeds cntinue t duble at least every 18 mnths, if nt faster New technlgies: Strage and prcessing technlgies designed specifically fr large data vlumes, including unstructured data Three key big data prcessing architectures include: Grid cmputing: A centrally managed grid infrastructure prvides dynamic wrklad balancing, high availability and parallel prcessing fr data management, analytics and reprting In-database prcessing: Reduces the time needed t prepare data and build, deply and update analytical mdels In-memry prcessing: Allws prcessing f data in memry rather than n a disk making data cmputing faster and mre efficient Clud cmputing: Allws big data analytics t be delivered as a service thrugh clud-based strage and high speed cnnectivity 2
Text analytics fr unstructured data: Sftware that identifies, extracts, and interprets relevant data and structures it t reveal patterns, sentiments and relatinships within dcuments 8 Smart Filters: Natural Language Prcessing (NLP) that allws fr the prcessing and interpretatin f nnstructured data fr additinal analytics The benefit f this capability is better business decisins thrugh the analysis f whle datasets instead f smaller subsets in a fractin f the time in minutes r hurs cmpared t days r weeks Analytics & Applicatins f Big Data Big data and big data analytics are applicatin independent meaning that the data can be generated and the analytics can be used by a variety f applicatins 9 The visin fr big data is that rganizatins will harness mre relevant data, apply analytical tls and use the results t make the best decisin Accrding t a survey cnducted by industry expert Ecnmist Intelligence Unit in June 2011, there is a strng link between an rganizatin s effective data management and its financial perfrmance Eric Brynjlfssn, an ecnmist at the Slan Schl f Management at the Massachusetts Institute f Technlgy, fund that cmpanies that adpted data-driven, decisin making achieved prductivity bsts f 5-6% 10 Examples f applicatins f big data analytics include: Analyze millins f SKUs t determine ptimal pricing and maximize prfits Calculate entire risk prtflis in minutes and understand future pssibilities t mitigate risk Mine custmer data fr insights that drive new strategies fr custmer acquisitin, retentin, campaign ptimizatin and develpment f next-best ffers Quickly identify custmers wh matter the mst Send tailred recmmendatins t custmer s mbile devices at the ptimal time - while they are in the right lcatin t take advantage f ffers Analyze data frm scial media t detect new market trends and changes in demand Determine rt causes f failures, issues and defects by investigating user sessins, netwrk lgs and machine sensrs Gvernment intelligence t track terrrists 11 Big data analytics were used t track dwn the Bstn Marathn Bmbers 12 Insurance cmpanies and financial institutins (MasterCard, Visa) use big data analytics t identify fraud 13 Health industry derives meaning frm unstructured data t uncver disease patterns, causes and effects t imprve healthcare 14 Ecnmic Opprtunities Aside frm ptential increases in prfits, efficiency and natinal safety, big data is giving rise t many ecnmic pprtunities such as: New prfessins: There is a shrtage f individuals with skills t manage data effectively As a result, universities are wrking with private industry t address their specific needs 15 Grwth in private industry: New prducts and service pprtunities fr sftware and hardware cmpanies specializing in data management and strage Startups and existing firms develping new prducts t capitalize n the grwing need fr better and faster analytics and data management 3
Grwth f clud service prviders: Increase in data strage needs will fuel the grwth f such prviders Grwth f revenue: Generate significant financial value acrss sectrs 16 $300 billin value per year in US health care $100 billin+ revenue fr service prviders in glbal persnal lcatin data 60+% increase in net margin pssible fr US retail Cnclusin As the ld adage ges knwledge is pwer Big data and big data analytics are tls that facilitate the acquisitin f knwledge s clearly are a pwerful resurce delivering gamechanging insights and cmpetitive advantages As big data changes the rules f the game fr rganizatins f all sizes it has becme an industry unt itself Cmpanies have n chice but t participate in the pprtunity r risk lsing market share and falling behind Lking ahead, this is a trend that s here t stay The relative affrdability and availability f such analytical tls allws rganizatins f all sizes t significantly imprve their decisinmaking and business plan executin Big data means big pprtunities, and the cmpanies willing t leverage these tls will be psitining themselves t best cmpete within their industry segments bth in the near and lng term Cntact Us T learn mre abut hw OneBeacn Technlgy Insurance can help yu manage nline and ther technlgy risks, please cntact Llyd Takata, SVP f OneBeacn Technlgy Insurance at ltakata@nebeacntechcm r 9528526028 References 1 Lhr, Steve (February 1, 2013) The Origins f Big Data : An Etymlgical Detective Stry New Yrk Times Retrieved Octber 14, 2013 http://bitsblgsnytimescm/2013/02/01/the-rigins-f-big-data-an-etymlgicaldetective-stry/?_r=0 2 What is Big Data? SAS Website Retrieved Octber 2013 http://wwwsascm/bigdata/ 3 Ibid 2 4 Ppe, David Frm Big Data t Meaningful Infrmatin SAS Retrieved Octber, 2013 http://wwwsascm/resurces/whitepaper/wp_59833pdf 5 Big Data Analytics Why is it imprtant? SAS Retrieved Octber 2013 http://wwwsascm/big-data/big-data-analyticshtml 6 Ibid 2 7 Ramanathan, Deepak (2012) Big Data Meets Big Analytics SAS Retrieved Octber 2013http://deslidesharenet/deepakramanathan/big-data-meets-big-analytics 8 Ibid 4 9 Ibid 2 10 Big Data Harnessing a game changing asset (September 2011) Ecnmist Intelligence Unit Retrieved Octber 2013 http://wwwsascm/resurces/asset/sas_bigdata_finalpdf 11 Felman, Susan and thers (June 2012) Unlcking the Pwer f Unstructured Data IDC Health Insights Retrieved Octber 2013 http://www- 01ibmcm/sftware/ebusiness/jstart/dwnlads/unlckingUnstructuredDatapdf 12 Rathnam, Lavanya (June 7, 2103) Hw Big Data was used t find the Bstn Bmbers icrunchdata News Retrieved Octber 2013 http://newsicrunchdatacm/pst/2013/06/07/big-data-bstn-bmber 4
13 Building Believers hw t expand the use f predictive analytics in claims SAS Retrieved Octber 2013 http://wwwsascm/resurces/whitepaper/wp_59831pdf 14 Ibid 11 15 Ibid 10 16 Manyika, James (May 2011) Big data: The next frntier fr innvatin, cmpetitin, and prductivity McKinsey Glbal Institute Page 8 retrieved Octber 2013 http://wwwmckinseycm/insights/business_technlgy/big_data_the_next_frntier_fr_i nnvatin 5