Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends Spring 2015 Thomas Hill, Ph.D. VP Analytic Solutions Dell Statistica
Overview and Agenda Dell Software overview Dell in healthcare, life sciences What is predictive analytics ; what is learning? How human experts learn Statistical analysis vs. pattern recognition Q&A 2
Dell Leadership in Software 16 th Largest Software manufacturer +6,000 Team members 2,000 + software engineers 2,500 + software sales 2M User community members EMA Radar Report Value Leader for Boomi Cloud Integration 90% Of Global 1000 are Dell Software customers NSS Labs Highest overall protection Next-Gen Firewall +1M Customers Gartner 9 Magic Quadrants 3
End-toEnd Data Prediction ROI Data Insight Action Automation ROI Value Chain Advanced Analytics Business Intelligence Integration Management Infrastructure Predict and optimize the future Statistica Big Data Analytics Powered by Kitenga Understand historical events Statistica Real-time data movement on- and offpremise Boomi, Toad Data Point, TIC Improve performance of the data platforms Toad, Foglight Servers & Storage Put the right data in the right place at the right time Quality Control Root Cause Forecasting Optimization Monitoring & Alerting Validated & Auditable Automated & Repeatable Data Mining Predictive Modeling Machine Learning Text Mining BOOMI: Flexible data connectors to cloud, cloud/onpremise integration Toad Data Point & Toad Intelligence Central Heterogeneous data sources, complex joins, staging repository 4
How Process Experts Learn: A simple experiment 1 1 See also: Lewicki, Paul, Hill, Thomas, & Czyzewska, Maria (1992). Nonconscious acquisition of information. American Psychologist, 47, 796-801 5
Learning Engines for Big High-Velocity Data According to Scientific American (October 25, 2011): Ipad 2: 64 billion bytes (64 gigabytes) Processing speed: 170 megaflops (1 Megaflop = 1 million floating-point operations per second) Power consumption: 2.5 watts Your cat: Capacity: 98 trillion bytes (98 terabytes) Processing speed: 61 million megaflops Human brain Capacity: 3.5 quadrillion bytes (3.5 Petabytes) Processing speed: 2.2 billion megaflops Power consumption: 20 watts Human cognition is still unmatched in many ways Capacity Processing speed Numbers of models 6
Expertise and Foresight Simple experiment: Track a target Location: Diagnostic Random Deterministic location (predictor) (predicted) 7
Expertise and Foresight Simple experiment: Track a target Location: Diagnostic Random Deterministic location (predictor) (predicted) 8
Expertise and Foresight Simple experiment: Track a target Location: Diagnostic Random Deterministic location (predictor) (predicted) 9
Expertise and Foresight Simple experiment: Track a target Location: Diagnostic Random Deterministic location (predictor) (predicted) 10
Expertise and Foresight Simple experiment: Track a target Location: Diagnostic Random Deterministic location (predictor) (predicted) 11
Expertise and Foresight Simple experiment: Track a target Location: Diagnostic Random Deterministic location (predictor) (predicted) 12
Conclusions After a very short time, response times clearly show: Subjects have learned the pattern and correctly anticipate the 4 th target When pattern changes, subjective experience is: something is wrong Effective expertise is acquired through exposure to exemplars by extracting repeated patterns Complex knowledge and the ability to predict what will happen next is the result of applying simple learners algorithms to data That is precisely what modern pattern recognition algorithms do, and why they are effective at accumulating expertise Conscious analysis of course has a place But in most cases is not suitable to achieve accurate predictions Human experts are effective because they have automated predictive modeling 13
What is pattern recognition? Knowledge Discovery vs. Statistical Analysis Statistical Analysis Focuses on hypothesis testing and parameter estimation Fits parsimonious statistical models with the goal to explain complex relationships with fewer parameters Examples: Regression, nonparametric statistics, factor analysis, quality control Pattern Recognition (Data Mining) The data are your model! Algorithms include: Trees, boosted trees, voted trees (forests), SVM, neural nets, numerous clustering methods, Kohonen networks,... Association and sequence rules,... 14
Some Use Cases and Best Practices 15
Governance for Analytics at Shire: Validation, Audit Logs, Electronic Signatures Goals: Control and monitor production processes to reduce operational risk and cost Identify areas of improvement Solutions: Install Dell Statistica Enterprise Quality Control (QC) with Web Data Entry on three environments (development, validation, and production). Results: Enable immediate Day 1 results with real-time data capture and analysis platform by replacing an assortment of tools Seamless, unified data presentation, regardless of data source Greater predictability of production; fewer defects and lost batches Expedited root cause analysis reduces operational risk and 16 cost Translate insights into practice with Dell s comprehensive healthcare analytics solutions
Lessons Learned: Validated Analytics When The results of analytics affect real people in important ways, it is critical that results are right To embrace as an institution advanced analytics, it cannot rely on individual champions and experts; instead you need: Documentation of requirements, test plans: Verify that results are correct Version control of models: Verify that the analytics that are deployed are the correct ones that were documented and approved Approvals of models and electronic signatures, data pre-processing steps, etc.: Provide the tools necessary to trace all decisions and assumptions that drove the specific analytic approach 17
University of Iowa transforms the operating room with realtime analytics Goals: Predict patients with the biggest risk of surgical site infections Reduce infection rate to improve patient care and decrease costs Solutions: Merge historical EHR data and live patient vital signs to predict infection likelihood Provide doctors with real-time, predictive decisions, using Dell Statistica, during surgical procedures so they can create a plan to reduce risk Results: 58% reduction in occurrence of surgical site infections Personalize healthcare based on patient s own characteristics Reduce cost of patient care 300K Annual deaths as a result of preventable harm in hospitals 18
Lessons Learned: Every Projects Starts at the End Every project starts at the end, i.e., ask: How do I know I am done? How do I know that I won? What would ideal results look like and how would they be translated into actions to generate real ROI? Everything flows from there Where to get data How to pre-process data What modeling methods to use What user interfaces are critical 19
Statistica supports critical business outcomes for Danske Bank Statistica Decision Platform enabled Danske Bank to upgrade their existing SAS-based risk- and creditscoring platform and infrastructure to gain the insights needed to act on complex issues. Efficient (risk) modeling and model life-cycle management Efficient model deployment from development through production environments Real-time web based credit scoring of customer records In an increasingly complex environment of risk models, the StatSoft implementation provides a good basis for keeping track of model versions, performance and content. a modern software platform that is not only a top performer but also a good neighbor to existing IT assets Jens Chr. Ipsen First Vice President and Development Manager Risk Management Systems 20
Lessons Learned: Prescriptive Analytics, Rules, and Real-Time Analytics Real-world predictive analytics systems need to integrate with heterogeneous data Rely on standards and stay away from proprietary interfaces, languages Results of analytics have to be actionable The statement Age is related to Risk is not actionable; the statement If Age > 30 and. then Approve is actionable The analytics strategies must be informed by how results are to be used For example, Weight-of-Evidence coding of predictors is useful for creating meaningful classes for inputs If results must be interpretable, reason-scores must be part of the modeling output plan for that Big Data allow to build many models for small segments of the population; automation and efficient deployment are critical 21
Analytic Challenges and Maturity: The Future Analytic Maturity model model ca. 2015 ca. -? 2013 Automated 22 Automated What should Modeling I do and why? Automated What are the Model alternatives Calibration Automated and what are Actions their costs?
Final Thoughts Disruptive applications and solutions for automated analytics are quickly multiplying IoT (Internet of Things) applications are only one example Applications to help doctors make better decisions in emergency rooms Automated manufacturing Monitoring complex systems, servers, cloud systems, buildings Adaptive systems for managing customer models Dell has end-to-end solutions to cover everything Data storage, cloud/on-prem Big data nosql databases, Hadoop, others Data access and integration, cloud/on-prem/hybrid Statistical analysis, predictive analytics Flexible architecture to support integration for real-time and batch applications Services and domain expertise to support a wide range of solutions 23
Some How-To Overviews and References Overviews and how-to s On-line StatSoft Electronic Text Book (www.statsoft.com/textbook/) 24
25