How To Get More Data From Your Computer



Similar documents
Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada

Big Data and Trusted Information

IBM Solution Framework for Lifecycle Management of Research Data IBM Corporation

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

IBM Data Warehousing and Analytics Portfolio Summary

Getting Started Practical Input For Your Roadmap

Industry Impact of Big Data in the Cloud: An IBM Perspective

How the oil and gas industry can gain value from Big Data?

Putting Analytics to Work In Healthcare

Predictive Care Models to Improve Outcomes Brendan Fowkes Sr. Healthcare Solution Executive May 14, 2013

Big Data Integration and Governance Considerations for Healthcare

Smarter Analytics. Barbara Cain. Driving Value from Big Data

IBM Big Data in Government

Sources: Summary Data is exploding in volume, variety and velocity timely

IBM Big Data Platform

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Beyond Watson: The Business Implications of Big Data

Big Data & Analytics for Semiconductor Manufacturing

A New Era Of Analytic

Uncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Big Data Zurich, November 23. September 2011

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Analyzing Big Data: The Path to Competitive Advantage

Real World Use of BIG DATA. Tim Brown Information Management Technical Pre-Sales Aruna Kolluru Information Management Technical Pre-Sales 04/2013

Demystifying Big Data Government Agencies & The Big Data Phenomenon

Business Analytics for Big Data

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

PREDICTIVE ANALYTICS FOR THE HEALTHCARE INDUSTRY

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Data Refinery with Big Data Aspects

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

22 SMARTENTERPRISEMAG.COM

BIG DATA IS MESSY PARTNER WITH SCALABLE

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

BIG DATA & SOCIAL INNOVATION KENNETH THOMAS, CLIENT MANAGER

Turning Big Data into Big Decisions Delivering on the High Demand for Data

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

BAO & Big Data Overview Applied to Real-time Campaign GSE. Joel Viale Telecom Solutions Lab Solution Architect. Telecom Solutions Lab

BIG DATA I N B A N K I N G

Big Data Use Case Deep Dive 5 Game Changing Use Cases for Big Data

Beyond the Single View with IBM InfoSphere

IBM BigInsights for Apache Hadoop

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

IBM InfoSphere BigInsights Enterprise Edition

Massive Scale Analytics for a Smarter Planet

IBM Big Data Platform

Big Data-Challenges and Opportunities

Are You Ready for Big Data?

IBM Software June 2014 Thought Leadership White Paper. The top five ways to get started with big data

The 3 questions to ask yourself about BIG DATA

The top five ways to get started with big data

How To Make Data Streaming A Real Time Intelligence

W H I T E P A P E R. Architecting A Big Data Platform for Analytics INTELLIGENT BUSINESS STRATEGIES

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Data Centric Computing Revisited

Using Analytics to Improve Population Health

The New Normal: Get Ready for the Era of Extreme Information Management. John Mancini President, DigitalLandfill.

Taming the Beast of Big Data

Get Ready for Big Data with IBM System z

Dianne Fodell Global University Programs IBM Corporation

Testing Big data is one of the biggest

Generating the Business Value of Big Data:

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Business Analytics and the Nexus of Information

What you can accomplish with IBMContent Analytics

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

Empower Your organization with

Dell Information Management solutions

PREDICTIVE ANALYTICS DEMYSTIFIED

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Big Data. Fast Forward. Putting data to productive use

Taming the Beast of Big Data

NextGen Infrastructure for Big DATA Analytics.

Big Data overview. Livio Ventura. SICS Software week, Sept Cloud and Big Data Day

Leveraging Information For Smarter Business Outcomes With IBM Information Management Software

The IBM Agile Information Governance Process

IBM Business Analytics and Optimization The Path to Breakaway Performance

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

Taking A Proactive Approach To Loyalty & Retention

Solve your toughest challenges with data mining

BUY BIG DATA IN RETAIL

Are You Ready for Big Data?

Transforming the Telecoms Business using Big Data and Analytics

Deploying Big Data to the Cloud: Roadmap for Success

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

The Next Wave of Data Management. Is Big Data The New Normal?

Sunnie Chung. Cleveland State University

Big Data + Predictive Analytics = Actionable Business Insights: Consider Big Data as the Most Important Thing for Business since the Internet

Solving big data problems in real-time with CEP and Dashboards - patterns and tips

The Future of Data Management

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Big Analytics: A Next Generation Roadmap

IBM Predictive Analytics Solutions

So Just What Is Big Data? James E. Tcheng, MD, FACC, FSCAI

Real-Time Big Data Analytics + Internet of Things (IoT) = Value Creation

Transcription:

Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director Emerging Internet Technologies IBM Software Group

What is Big Data?

The Adjacent Possible

Inexpensive disk + Increased processing power + Data Warehouse +The Web + X = Big Data X=Sensors used to gather climate information, posts to social media sites, digital pictures and videos, transaction records, cell phone GPS signals, and more.

161 exabytes of data were created in 2006 3 million times the amount of information contained in all the books ever written. In 2010 the number reached hit 988 exabytes. IDC estimates that 1.8 zettabytes were created and replicated in 2011. 2010 IBM Corporation

Every day, people create the equivalent of 2.5 quintillion bytes of data from sensors, mobile devices, online transactions, and social networks. Every month people send one billion Tweets and post 30 billion messages on Facebook. 90% (or more) of the world s data is unstructured. 2010 IBM Corporation

The true nature of information

Unstructured Data Is noisy Is often times dirty Is often full of valuable information

The Big Data Imperative Big Data has swept into every industry and business function. Businesses need to put the power of Big Data analytics in the hands of their business employees Data Scientist is somewhat misleading. Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers. McKinsey Global Institute Big Data Business Patterns Computational Journalism Chief Legal Officer Retail Business Planner IT Systems Management Pharma - Clinical Trials Business Fraud Detection Evidence Based Medicine Web Archiving... 2010 IBM Corporation 9

Today s Problem Data growing at compound annual growth of 60%/year Storage capacity continue to increase dramatically Storage access speeds have not kept up At transfer speed of 500 MB/sec - 1 terabyte of data will require ~30 mins to read from single drive Enter Map/Reduce Automates the mechanisms of large-scale distributed computation ( i.e. work distribution, load balancing, replication, failure/recovery) Divide & Conquer: Split 1 terabyte split among 100 drives will require ~20 seconds to read M/R parallel processing model provides cost effective framework for new generation of analytic applications on unstructured or semi-structured data 2010 IBM Corporation

Requirement: A New Class of Big Data Applications Big Data analytics must be brought to the line-of-business user. Leverage easy-to-use manipulation metaphors Use natural language technologies for analytics Provide rich visualizations to quickly identify insights 2010 IBM Corporation

Buyer Sentiment Analysis Demo

Social Media: Chiliean Earthquake 2010 2010 Chilean earthquake fifth largest earthquake in recorded history The affected areas suffered major devastation - buildings, airports, hospitals, prisons, bridges, and roads were severely damaged Land-based communications systems suffered major outages The wireless 3G infrastructure remained intact and operational 2010 IBM Corporation Sharenomics - Rise of Social Economy Slide 13

Social Media: Chiliean Earthquake 2010 Social networking on wireless networks major form of communications Extreme Blue students collected 226 million Tweets, analyzed,categorized by incidence type and location Tweets included - Can I get food? Can I get gas? Are the bridges down - images The results were visualized Completed in ~12 weeks 2010 IBM Corporation Sharenomics - Rise of Social Economy Slide 14

Big Data = Volume, Variety and Velocity Volume - Scale from terabytes to zettabytes Variety - Relational and non-relational data types from an everexpanding variety of sources Velocity - Streaming data and large volume data movement 2010 IBM Corporation 15

Big Data = Volume, Variety and Velocity Volume - Scale from terabytes to zettabytes Variety - Relational and non-relational data types from an everexpanding variety of sources Velocity - Streaming data and large volume data movement 2010 IBM Corporation

The Supercomputer is based on over 1,200 high powered IBM System X servers and can perform 150 trillion calculations per second -- equivalent to 30 million calculations per Danish citizen per second. Vestas expects its data sets will grow to 20-plus petabytes over the next four years.

Big Data = Volume, Variety and Velocity Volume - Scale from terabytes to zettabytes Variety - Relational and non-relational data types from an everexpanding variety of sources Velocity - Streaming data and large volume data movement 2010 IBM Corporation

Seton Healthcare Family Reducing CHF readmission to improve care IBM Content and Predictive Analytics for Healthcare uses the same type of natural language processing as IBM Watson, enabling us to leverage information in new ways not possible before. We can access an integrated view of relevant clinical and operational information to drive more informed decision making and optimize patient and operational outcomes. Business Challenge Seton Healthcare strives to reduce the occurrence of high cost Congestive Heart Failure (CHF) readmissions by proactively identifying patients likely to be readmitted on an emergent basis. What s Smart? IBM Content and Predictive Analytics for Healthcare solution will help to better target and understand high- risk CHF patients for care management programs by: Utilizing natural language processing to extract key elements from unstructured History and Physical, Discharge Summaries, Echocardiogram Reports, and Consult Notes Leveraging predictive models that have demonstrated high positive predictive value against extracted elements of structured and unstructured data Providing an interface through which providers can intuitively navigate, interpret and take action Smarter Business Outcomes Seton will be able to proactively target care management and reduce re- admission of CHF patients. Teaming unstructured content with predictive analytics, Seton will be able to identify patients likely for re- admission and introduce early interventions to reduce cost, mortality IBM solution IBM Content and Predictive Analytics for Healthcare IBM Cognos Business Intelligence IBM BAO solution services 2011 IBM Corporation

IBM Content and PredicUve AnalyUcs for Healthcare The Seton CHF Readmission SoluUon Raw Informa=on Unstructured Data (Cerner Clinical Documenta0on: History and Physical, Discharge Summary, Echocardiogram.) Structured Data (Avega Cost Data, DSS Admission History, DSS Procedure History, Cerner Clinical Events) IBM Watson for Healthcare UUlizing natural language processing to extract key elements from unstructured IBM Content and History and Physical Predic=ve and Discharge Summary Analy=cs Content AnalyBcs Natural Language Processing Medical Fact and Rela0onship Extrac0on (Annota0on) Trend, PaIern, Anomaly, Devia0on Analysis Health Integra=on Framework Confirm hypotheses or seek alternafve ideas with confidence based responses from learned knowledge* Leveraging predicuve models that have demonstrated high posiuve predicuve value against Analyzed extracted and elements of structured Visualized and unstructured data Informa=on PredicBve AnalyBcs Predic0ve Scoring and Probability Analysis Data Warehouse and Model Master Data Management Advanced Case Management Dynamic Mul=mode Interac=on Providing an interface through which providers can intuiuvely navigate, interpret and take Search acuon and Visually Explore (Mine) Monitor, Dashboard and Report (Cognos BI) Ques%on and Answer* Custom SoluBons Partners (HLI) Specialized Research Business AnalyBcs 2 2011 IBM CorporaUon 2011 IBM CorporaUon

What Really Causes Readmissions at Seton Key Findings The Data We Thought Would Be Useful Wasn t 113 candidate predictors from structured and unstructured data sources Structured data was less reliable then unstructured data increased the reliance on unstructured data New Unexpected Indicators Emerged Highly Predic=ve Model 18 accurate indicators or predictors (see next slide) Predictor Analysis % Encounters Structured Data % Encounters Unstructured Data 49% at 20 th percen0le 97% at 80 th percen0le Ejec0on Frac0on (LVEF) 2% 74% Smoking Indicator 35% (65% Accurate) 81% (95% Accurate) Living Arrangements <1% 73% (100% Accurate) Drug and Alcohol Abuse 16% 81% Assisted Living 0% 13% 3 2011 IBM CorporaUon

Visualizing the Results: Readmissions Dashboard Cognos dashboard reporung system can help in monitoring the key clinical, operauonal and financial metrics. More importantly, being able to track down the top priority cases for case management. 1 2 3 4 5 6 7 1.Clinical Sta=s=cs: admission count, readmission count and readmission rate 2.Opera=onal Sta=s=c: Counts of different length of stay periods 3.Financial Sta=s=c: Total direct cost by total admission and by readmission 4.Mortality: mortality rate 5.Average length of stay 6.Average direct cost by total admission and by readmission only 7.PA Model Score: Distribu0on of propensity of readmission 5 2011 IBM CorporaUon

Big Data = Volume, Variety and Velocity Volume - Scale from terabytes to zettabytes Variety - Relational and non-relational data types from an everexpanding variety of sources Velocity - Streaming data and large volume data movement 2010 IBM Corporation

USC Annenberg School of Communications 2010 IBM Corporation

InfoSphere Streams 2010 IBM Corporation 27

Big Data Platform Vision Bringing Big Data to the Enterprise Big Data Solutions Client and Partner Solutions Big Data User Environments Developers End Users Administrators Data Warehouse InfoSphere Warehouse Warehouse Appliances Netezza Master Data Mgmt AGENTS Big Data Enterprise Engines INTEGRATION InfoSphere MDM Database DB2 Analytics Streaming Analytics Internet Scale Analytics SPSS Business Intelligence Open Source Foundational Components Cognos Hadoop MapReduce HDFS Hbase Pig Lucene Jaql Marketing Unica 2010 IBM Corporation 28