Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends



Similar documents
A Glimpse at the Future of Predictive Analytics in Healthcare

The Predictive Data Mining Revolution in Scorecards:

Big data: Unlocking strategic dimensions

High-Performance Analytics

EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS

Integrating a Big Data Platform into Government:

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Oracle Real Time Decisions

Dell Information Management solutions

Driving Business Value with Big Data and Analytics

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Data Mining Applications in Higher Education

Big Data. Fast Forward. Putting data to productive use

Analytics-as-a-Service: From Science to Marketing

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Scalable Machine Learning to Exploit Big Data for Knowledge Discovery

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Break Down the Barriers to Better Data Analytics

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

How to leverage SAP HANA for fast ROI and business advantage 5 STEPS. to success. with SAP HANA. Unleashing the value of HANA

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

The Worksoft Suite. Automated Business Process Discovery & Validation ENSURING THE SUCCESS OF DIGITAL BUSINESS. Worksoft Differentiators

Big Data-Challenges and Opportunities

Azure Machine Learning, SQL Data Mining and R

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

Effective Data Integration - where to begin. Bryte Systems

Big Analytics: A Next Generation Roadmap

CoolaData Predictive Analytics

A Cloud Based Solution with IT Convergence for Eliminating Manufacturing Wastes

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

Hurwitz ValuePoint: Predixion

Big Data at Cloud Scale

Table of Contents. June 2010

Session 61 L, Applications of Data Analytics in Health Insurance. Moderator/Presenter: Henning Chiv, FSA, MAAA

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

Video Analytics and Security

The Internet of Things and Big Data: Intro

Focus on the business, not the business of data warehousing!

Data Virtualization A Potential Antidote for Big Data Growing Pains

Hexaware E-book on Predictive Analytics

Predicting From the Edge in an

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Dell Software. Jiří Svatuška

The Challenge of Handling Large Data Sets within your Measurement System

Finding the right cloud solutions for your organization

Augmented Search for Software Testing

Predictive Finance. Can CFOs connect the dots looking forward? Shridar Jayakumar. Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Building a BI/Analytics Foundation

Smarter Infrastructure Instrumented, Interconnected, Intelligent... Patterns of Innovation

Data Isn't Everything

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

Uncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM

Building a Scalable Big Data Infrastructure for Dynamic Workflows

MARKETING ANALYTICS AS A SERVICE

The Future of Data Management

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

The digital future and dealing with disruption

Data-Driven Decisions: Role of Operations Research in Business Analytics

Kepware Whitepaper. Enabling Big Data Benefits in Upstream Systems. Steve Sponseller, Business Director, Oil & Gas. Introduction

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Some Research Challenges for Big Data Analytics of Intelligent Security

JAPAN UNIVERSE. RICH DATA & the Increasing Value of the INTERNET OF THINGS. The DIGITAL UNIVERSE of OPPORTUNITIES GET STARTED COUNTRY BRIEF

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

Opportunities with Predictive Analytics. Greg Leflar, Vice President

Business White Paper Process Analytical Technology (PAT): Recommendations for Statistical Software to Support PAT

7 Principles of the IoT

STATISTICA Solutions for Financial Risk Management Management and Validated Compliance Solutions for the Banking Industry (Basel II)

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Industrial Roadmap for Connected Machines. Sal Spada Research Director ARC Advisory Group

The New World of Healthcare Analytics

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

Contents of This Paper

The Canadian Realities of Big Data and Business Analytics. Utsav Arora February 12, 2014

Big Data Integration: A Buyer's Guide

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Fortify. Securing Your Entire Software Portfolio

From Big Data to Smart Data Thomas Hahn

Intel s Big Data Journey

Data Mining Solutions for the Business Environment

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Q1 Labs Corporate Overview

Data Project Extract Big Data Analytics course. Toulouse Business School London 2015

The Internet of Everything

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

An Introduction to Advanced Analytics and Data Mining

Igniting the Next Industrial Revolution

Business Intelligence for Healthcare Benefits

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

How to Deploy Models using Statistica SVB Nodes

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Transcription:

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends Spring 2015 Thomas Hill, Ph.D. VP Analytic Solutions Dell Statistica

Overview and Agenda Dell Software overview Dell in healthcare, life sciences What is predictive analytics ; what is learning? How human experts learn Statistical analysis vs. pattern recognition Q&A 2

Dell Leadership in Software 16 th Largest Software manufacturer +6,000 Team members 2,000 + software engineers 2,500 + software sales 2M User community members EMA Radar Report Value Leader for Boomi Cloud Integration 90% Of Global 1000 are Dell Software customers NSS Labs Highest overall protection Next-Gen Firewall +1M Customers Gartner 9 Magic Quadrants 3

End-toEnd Data Prediction ROI Data Insight Action Automation ROI Value Chain Advanced Analytics Business Intelligence Integration Management Infrastructure Predict and optimize the future Statistica Big Data Analytics Powered by Kitenga Understand historical events Statistica Real-time data movement on- and offpremise Boomi, Toad Data Point, TIC Improve performance of the data platforms Toad, Foglight Servers & Storage Put the right data in the right place at the right time Quality Control Root Cause Forecasting Optimization Monitoring & Alerting Validated & Auditable Automated & Repeatable Data Mining Predictive Modeling Machine Learning Text Mining BOOMI: Flexible data connectors to cloud, cloud/onpremise integration Toad Data Point & Toad Intelligence Central Heterogeneous data sources, complex joins, staging repository 4

How Process Experts Learn: A simple experiment 1 1 See also: Lewicki, Paul, Hill, Thomas, & Czyzewska, Maria (1992). Nonconscious acquisition of information. American Psychologist, 47, 796-801 5

Learning Engines for Big High-Velocity Data According to Scientific American (October 25, 2011): Ipad 2: 64 billion bytes (64 gigabytes) Processing speed: 170 megaflops (1 Megaflop = 1 million floating-point operations per second) Power consumption: 2.5 watts Your cat: Capacity: 98 trillion bytes (98 terabytes) Processing speed: 61 million megaflops Human brain Capacity: 3.5 quadrillion bytes (3.5 Petabytes) Processing speed: 2.2 billion megaflops Power consumption: 20 watts Human cognition is still unmatched in many ways Capacity Processing speed Numbers of models 6

Expertise and Foresight Simple experiment: Track a target Location: Diagnostic Random Deterministic location (predictor) (predicted) 7

Expertise and Foresight Simple experiment: Track a target Location: Diagnostic Random Deterministic location (predictor) (predicted) 8

Expertise and Foresight Simple experiment: Track a target Location: Diagnostic Random Deterministic location (predictor) (predicted) 9

Expertise and Foresight Simple experiment: Track a target Location: Diagnostic Random Deterministic location (predictor) (predicted) 10

Expertise and Foresight Simple experiment: Track a target Location: Diagnostic Random Deterministic location (predictor) (predicted) 11

Expertise and Foresight Simple experiment: Track a target Location: Diagnostic Random Deterministic location (predictor) (predicted) 12

Conclusions After a very short time, response times clearly show: Subjects have learned the pattern and correctly anticipate the 4 th target When pattern changes, subjective experience is: something is wrong Effective expertise is acquired through exposure to exemplars by extracting repeated patterns Complex knowledge and the ability to predict what will happen next is the result of applying simple learners algorithms to data That is precisely what modern pattern recognition algorithms do, and why they are effective at accumulating expertise Conscious analysis of course has a place But in most cases is not suitable to achieve accurate predictions Human experts are effective because they have automated predictive modeling 13

What is pattern recognition? Knowledge Discovery vs. Statistical Analysis Statistical Analysis Focuses on hypothesis testing and parameter estimation Fits parsimonious statistical models with the goal to explain complex relationships with fewer parameters Examples: Regression, nonparametric statistics, factor analysis, quality control Pattern Recognition (Data Mining) The data are your model! Algorithms include: Trees, boosted trees, voted trees (forests), SVM, neural nets, numerous clustering methods, Kohonen networks,... Association and sequence rules,... 14

Some Use Cases and Best Practices 15

Governance for Analytics at Shire: Validation, Audit Logs, Electronic Signatures Goals: Control and monitor production processes to reduce operational risk and cost Identify areas of improvement Solutions: Install Dell Statistica Enterprise Quality Control (QC) with Web Data Entry on three environments (development, validation, and production). Results: Enable immediate Day 1 results with real-time data capture and analysis platform by replacing an assortment of tools Seamless, unified data presentation, regardless of data source Greater predictability of production; fewer defects and lost batches Expedited root cause analysis reduces operational risk and 16 cost Translate insights into practice with Dell s comprehensive healthcare analytics solutions

Lessons Learned: Validated Analytics When The results of analytics affect real people in important ways, it is critical that results are right To embrace as an institution advanced analytics, it cannot rely on individual champions and experts; instead you need: Documentation of requirements, test plans: Verify that results are correct Version control of models: Verify that the analytics that are deployed are the correct ones that were documented and approved Approvals of models and electronic signatures, data pre-processing steps, etc.: Provide the tools necessary to trace all decisions and assumptions that drove the specific analytic approach 17

University of Iowa transforms the operating room with realtime analytics Goals: Predict patients with the biggest risk of surgical site infections Reduce infection rate to improve patient care and decrease costs Solutions: Merge historical EHR data and live patient vital signs to predict infection likelihood Provide doctors with real-time, predictive decisions, using Dell Statistica, during surgical procedures so they can create a plan to reduce risk Results: 58% reduction in occurrence of surgical site infections Personalize healthcare based on patient s own characteristics Reduce cost of patient care 300K Annual deaths as a result of preventable harm in hospitals 18

Lessons Learned: Every Projects Starts at the End Every project starts at the end, i.e., ask: How do I know I am done? How do I know that I won? What would ideal results look like and how would they be translated into actions to generate real ROI? Everything flows from there Where to get data How to pre-process data What modeling methods to use What user interfaces are critical 19

Statistica supports critical business outcomes for Danske Bank Statistica Decision Platform enabled Danske Bank to upgrade their existing SAS-based risk- and creditscoring platform and infrastructure to gain the insights needed to act on complex issues. Efficient (risk) modeling and model life-cycle management Efficient model deployment from development through production environments Real-time web based credit scoring of customer records In an increasingly complex environment of risk models, the StatSoft implementation provides a good basis for keeping track of model versions, performance and content. a modern software platform that is not only a top performer but also a good neighbor to existing IT assets Jens Chr. Ipsen First Vice President and Development Manager Risk Management Systems 20

Lessons Learned: Prescriptive Analytics, Rules, and Real-Time Analytics Real-world predictive analytics systems need to integrate with heterogeneous data Rely on standards and stay away from proprietary interfaces, languages Results of analytics have to be actionable The statement Age is related to Risk is not actionable; the statement If Age > 30 and. then Approve is actionable The analytics strategies must be informed by how results are to be used For example, Weight-of-Evidence coding of predictors is useful for creating meaningful classes for inputs If results must be interpretable, reason-scores must be part of the modeling output plan for that Big Data allow to build many models for small segments of the population; automation and efficient deployment are critical 21

Analytic Challenges and Maturity: The Future Analytic Maturity model model ca. 2015 ca. -? 2013 Automated 22 Automated What should Modeling I do and why? Automated What are the Model alternatives Calibration Automated and what are Actions their costs?

Final Thoughts Disruptive applications and solutions for automated analytics are quickly multiplying IoT (Internet of Things) applications are only one example Applications to help doctors make better decisions in emergency rooms Automated manufacturing Monitoring complex systems, servers, cloud systems, buildings Adaptive systems for managing customer models Dell has end-to-end solutions to cover everything Data storage, cloud/on-prem Big data nosql databases, Hadoop, others Data access and integration, cloud/on-prem/hybrid Statistical analysis, predictive analytics Flexible architecture to support integration for real-time and batch applications Services and domain expertise to support a wide range of solutions 23

Some How-To Overviews and References Overviews and how-to s On-line StatSoft Electronic Text Book (www.statsoft.com/textbook/) 24

25