1 Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director Emerging Internet Technologies IBM Software Group
2 What is Big Data?
3 The Adjacent Possible
4 Inexpensive disk + Increased processing power + Data Warehouse +The Web + X = Big Data X=Sensors used to gather climate information, posts to social media sites, digital pictures and videos, transaction records, cell phone GPS signals, and more.
5 161 exabytes of data were created in million times the amount of information contained in all the books ever written. In 2010 the number reached hit 988 exabytes. IDC estimates that 1.8 zettabytes were created and replicated in IBM Corporation
6 Every day, people create the equivalent of 2.5 quintillion bytes of data from sensors, mobile devices, online transactions, and social networks. Every month people send one billion Tweets and post 30 billion messages on Facebook. 90% (or more) of the world s data is unstructured IBM Corporation
7 The true nature of information
8 Unstructured Data Is noisy Is often times dirty Is often full of valuable information
9 The Big Data Imperative Big Data has swept into every industry and business function. Businesses need to put the power of Big Data analytics in the hands of their business employees Data Scientist is somewhat misleading. Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers. McKinsey Global Institute Big Data Business Patterns Computational Journalism Chief Legal Officer Retail Business Planner IT Systems Management Pharma - Clinical Trials Business Fraud Detection Evidence Based Medicine Web Archiving IBM Corporation 9
10 Today s Problem Data growing at compound annual growth of 60%/year Storage capacity continue to increase dramatically Storage access speeds have not kept up At transfer speed of 500 MB/sec - 1 terabyte of data will require ~30 mins to read from single drive Enter Map/Reduce Automates the mechanisms of large-scale distributed computation ( i.e. work distribution, load balancing, replication, failure/recovery) Divide & Conquer: Split 1 terabyte split among 100 drives will require ~20 seconds to read M/R parallel processing model provides cost effective framework for new generation of analytic applications on unstructured or semi-structured data 2010 IBM Corporation
11 Requirement: A New Class of Big Data Applications Big Data analytics must be brought to the line-of-business user. Leverage easy-to-use manipulation metaphors Use natural language technologies for analytics Provide rich visualizations to quickly identify insights 2010 IBM Corporation
12 Buyer Sentiment Analysis Demo
13 Social Media: Chiliean Earthquake Chilean earthquake fifth largest earthquake in recorded history The affected areas suffered major devastation - buildings, airports, hospitals, prisons, bridges, and roads were severely damaged Land-based communications systems suffered major outages The wireless 3G infrastructure remained intact and operational 2010 IBM Corporation Sharenomics - Rise of Social Economy Slide 13
14 Social Media: Chiliean Earthquake 2010 Social networking on wireless networks major form of communications Extreme Blue students collected 226 million Tweets, analyzed,categorized by incidence type and location Tweets included - Can I get food? Can I get gas? Are the bridges down - images The results were visualized Completed in ~12 weeks 2010 IBM Corporation Sharenomics - Rise of Social Economy Slide 14
15 Big Data = Volume, Variety and Velocity Volume - Scale from terabytes to zettabytes Variety - Relational and non-relational data types from an everexpanding variety of sources Velocity - Streaming data and large volume data movement 2010 IBM Corporation 15
16 Big Data = Volume, Variety and Velocity Volume - Scale from terabytes to zettabytes Variety - Relational and non-relational data types from an everexpanding variety of sources Velocity - Streaming data and large volume data movement 2010 IBM Corporation
19 The Supercomputer is based on over 1,200 high powered IBM System X servers and can perform 150 trillion calculations per second -- equivalent to 30 million calculations per Danish citizen per second. Vestas expects its data sets will grow to 20-plus petabytes over the next four years.
20 Big Data = Volume, Variety and Velocity Volume - Scale from terabytes to zettabytes Variety - Relational and non-relational data types from an everexpanding variety of sources Velocity - Streaming data and large volume data movement 2010 IBM Corporation
21 Seton Healthcare Family Reducing CHF readmission to improve care IBM Content and Predictive Analytics for Healthcare uses the same type of natural language processing as IBM Watson, enabling us to leverage information in new ways not possible before. We can access an integrated view of relevant clinical and operational information to drive more informed decision making and optimize patient and operational outcomes. Business Challenge Seton Healthcare strives to reduce the occurrence of high cost Congestive Heart Failure (CHF) readmissions by proactively identifying patients likely to be readmitted on an emergent basis. What s Smart? IBM Content and Predictive Analytics for Healthcare solution will help to better target and understand high- risk CHF patients for care management programs by: Utilizing natural language processing to extract key elements from unstructured History and Physical, Discharge Summaries, Echocardiogram Reports, and Consult Notes Leveraging predictive models that have demonstrated high positive predictive value against extracted elements of structured and unstructured data Providing an interface through which providers can intuitively navigate, interpret and take action Smarter Business Outcomes Seton will be able to proactively target care management and reduce re- admission of CHF patients. Teaming unstructured content with predictive analytics, Seton will be able to identify patients likely for re- admission and introduce early interventions to reduce cost, mortality IBM solution IBM Content and Predictive Analytics for Healthcare IBM Cognos Business Intelligence IBM BAO solution services 2011 IBM Corporation
22 IBM Content and PredicUve AnalyUcs for Healthcare The Seton CHF Readmission SoluUon Raw Informa=on Unstructured Data (Cerner Clinical Documenta0on: History and Physical, Discharge Summary, Echocardiogram.) Structured Data (Avega Cost Data, DSS Admission History, DSS Procedure History, Cerner Clinical Events) IBM Watson for Healthcare UUlizing natural language processing to extract key elements from unstructured IBM Content and History and Physical Predic=ve and Discharge Summary Analy=cs Content AnalyBcs Natural Language Processing Medical Fact and Rela0onship Extrac0on (Annota0on) Trend, PaIern, Anomaly, Devia0on Analysis Health Integra=on Framework Confirm hypotheses or seek alternafve ideas with confidence based responses from learned knowledge* Leveraging predicuve models that have demonstrated high posiuve predicuve value against Analyzed extracted and elements of structured Visualized and unstructured data Informa=on PredicBve AnalyBcs Predic0ve Scoring and Probability Analysis Data Warehouse and Model Master Data Management Advanced Case Management Dynamic Mul=mode Interac=on Providing an interface through which providers can intuiuvely navigate, interpret and take Search acuon and Visually Explore (Mine) Monitor, Dashboard and Report (Cognos BI) Ques%on and Answer* Custom SoluBons Partners (HLI) Specialized Research Business AnalyBcs IBM CorporaUon 2011 IBM CorporaUon
23 What Really Causes Readmissions at Seton Key Findings The Data We Thought Would Be Useful Wasn t 113 candidate predictors from structured and unstructured data sources Structured data was less reliable then unstructured data increased the reliance on unstructured data New Unexpected Indicators Emerged Highly Predic=ve Model 18 accurate indicators or predictors (see next slide) Predictor Analysis % Encounters Structured Data % Encounters Unstructured Data 49% at 20 th percen0le 97% at 80 th percen0le Ejec0on Frac0on (LVEF) 2% 74% Smoking Indicator 35% (65% Accurate) 81% (95% Accurate) Living Arrangements <1% 73% (100% Accurate) Drug and Alcohol Abuse 16% 81% Assisted Living 0% 13% IBM CorporaUon
24 Visualizing the Results: Readmissions Dashboard Cognos dashboard reporung system can help in monitoring the key clinical, operauonal and financial metrics. More importantly, being able to track down the top priority cases for case management Clinical Sta=s=cs: admission count, readmission count and readmission rate 2.Opera=onal Sta=s=c: Counts of different length of stay periods 3.Financial Sta=s=c: Total direct cost by total admission and by readmission 4.Mortality: mortality rate 5.Average length of stay 6.Average direct cost by total admission and by readmission only 7.PA Model Score: Distribu0on of propensity of readmission IBM CorporaUon
25 Big Data = Volume, Variety and Velocity Volume - Scale from terabytes to zettabytes Variety - Relational and non-relational data types from an everexpanding variety of sources Velocity - Streaming data and large volume data movement 2010 IBM Corporation
26 USC Annenberg School of Communications 2010 IBM Corporation
27 InfoSphere Streams 2010 IBM Corporation 27
28 Big Data Platform Vision Bringing Big Data to the Enterprise Big Data Solutions Client and Partner Solutions Big Data User Environments Developers End Users Administrators Data Warehouse InfoSphere Warehouse Warehouse Appliances Netezza Master Data Mgmt AGENTS Big Data Enterprise Engines INTEGRATION InfoSphere MDM Database DB2 Analytics Streaming Analytics Internet Scale Analytics SPSS Business Intelligence Open Source Foundational Components Cognos Hadoop MapReduce HDFS Hbase Pig Lucene Jaql Marketing Unica 2010 IBM Corporation 28
BUY BIG DATA IN RETAIL Table of contents What is Big Data?... How Data Science creates value in Retail... Best practices for Retail. Case studies... 3 7 11 1. Social listening... 2. Cross-selling... 3.
INTELLIGENT BUSINESS STRATEGIES W H I T E P A P E R Architecting A Big Data Platform for Analytics By Mike Ferguson Intelligent Business Strategies October 2012 Prepared for: Table of Contents Introduction...
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
IBM Software Big Data & Analytics Thought Leadership White Paper Better business outcomes with IBM Big Data & Analytics The insights to transform your business with speed and conviction 2 Better business
IBM Industries White paper Business analytics in the cloud Driving business innovation through cloud computing and analytics solutions 2 Business analytics in the cloud Contents 2 Abstract 3 The case for
IBM Software Thought Leadership White Paper June 2013 The top five ways to get started with big data 2 The top five ways to get started with big data Big data: A high-stakes opportunity Remember what life
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 1 (2014), pp. 33-40 International Research Publications House http://www. irphouse.com /ijict.htm Big Data
OPEN DATA CENTER ALLIANCE : sm Big Data Consumer Guide SM Table of Contents Legal Notice...3 Executive Summary...4 Introduction...5 Objective...5 Big Data 101...5 Defining Big Data...5 Big Data Evolution...7
what can businesses do to capture the full potential of big data? helping companies observe and assess their data sets, identify potential revenues and mitigate challenges contents introduction 3 identify
CHAPTER 1.7 Harnessing the Power of Big Data in Real Time through In-Memory Technology and Analytics SAP AG Companies today have more data on hand than they have ever had before. It is estimated that an
View Point Use of Big Data Technologies in Capital Markets - Ruchi Verma, Sathyan R Mani Abstract Data is growing at a tremendous rate with an increase in digital universe from 281 Exabyte s (year 2007)
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R B i g D a t a : W h a t I t I s a n d W h y Y o u S h o u l d C a r e Sponsored
Big Data in Big Companies Date: May 2013 Authored by: Thomas H. Davenport Jill Dyché Copyright Thomas H. Davenport and SAS Institute Inc. All Rights Reserved. Used with permission Introduction Big data
July 2013 Contents 1. Introduction 3 2. What is Big Data? 4 3. Big Data Adoption 5 4. Drivers and Barriers 11 5. Opportunities for Digital Entrepreneurship 14 5.1. Supply-side Business opportunities 14
IBM Software Group 2014 Cloud, Big Data, Mobile, Social and Security Pairoj Ruamviboonsuk Software Client Architect IBM SWG Thailand Igniting change the transformative power of computing Back-office computing
How to embrace Big Data A methodology to look at the new technology Contents 2 Big Data in a nutshell 3 Big data in Italy 3 Data volume is not an issue 4 Italian firms embrace Big Data 4 Big Data strategies
3 Big Data: Challenges and Opportunities Roberto V. Zicari Contents Introduction... 104 The Story as it is Told from the Business Perspective... 104 The Story as it is Told from the Technology Perspective...
Front cover Building Big Data and Analytics Solutions in the Cloud Characteristics of big data and key technical challenges in taking advantage of it Impact of big data on cloud computing and implications
An Oracle White Paper March 2013 Big Data Analytics Advanced Analytics in Oracle Database Advanced Analytics in Oracle Database Disclaimer The following is intended to outline our general product direction.
Why is BIG Data Important? March 2012 1 Why is BIG Data Important? A Navint Partners White Paper May 2012 Why is BIG Data Important? March 2012 2 What is Big Data? Big data is a term that refers to data
Big Data: Challenges and Opportunities Roberto V. Zicari Goethe University Frankfurt This is Big Data. Every day, 2.5 quintillion bytes of data are created. This data comes from digital pictures, videos,
2 Peter Hinssen, editor The age of data-driven medicine Big data helps reveal hidden health trends and build risk models The Age of Datadriven Medicine Is the second in a series of thought-provoking booklets
Big Data: Beyond the Hype Why Big Data Matters to You White Paper BY DATASTAX CORPORATION October 2013 Table of Contents Abstract 3 Introduction 3 Big Data and You 5 Big Data Is More Prevalent Than You
NESSI White Paper, December 2012 Big Data A New World of Opportunities Contents 1. Executive Summary... 3 2. Introduction... 4 2.1. Political context... 4 2.2. Research and Big Data... 5 2.3. Purpose of