Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD



Similar documents
Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Azure Machine Learning, SQL Data Mining and R

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

MS1b Statistical Data Mining

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Introduction to Data Mining

Predictive Analytics: Revolutionizing Business Decision Making

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Data Mining + Business Intelligence. Integration, Design and Implementation

Ten Mistakes to Avoid

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Data Mining Applications in Higher Education

Data Mining Part 5. Prediction

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

not possible or was possible at a high cost for collecting the data.

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Data Mining. Nonlinear Classification

Hexaware E-book on Predictive Analytics

Machine Learning with MATLAB David Willingham Application Engineer

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Make Better Decisions Through Predictive Intelligence

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Information Management course

SEIZE THE DATA SEIZE THE DATA. 2015

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Pentaho Data Mining Last Modified on January 22, 2007

B2B opportunity predictiona Big Data and Advanced. Analytics Approach. Insert

The Predictive Data Mining Revolution in Scorecards:

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90

from Larson Text By Susan Miertschin

Predictive Data modeling for health care: Comparative performance study of different prediction models

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Knowledge Discovery and Data Mining

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

TEXT ANALYTICS INTEGRATION

Big Data and Marketing

Predictive Modeling Techniques in Insurance

An Introduction to Advanced Analytics and Data Mining

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Advanced analytics at your hands

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

The Data Mining Process

MBA Data Mining & Knowledge Discovery

SAP FINUG Teknologiaseminaari

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Chapter 12 Discovering New Knowledge Data Mining

Get to Know the IBM SPSS Product Portfolio

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SESSION CODE: 603

Technology and Trends for Smarter Business Analytics

IBM SPSS Modeler Professional

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

TDA and Machine Learning: Better Together

BIG DATA What it is and how to use?

Fast Analytics on Big Data with H20

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner.

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Using Data Mining for Mobile Communication Clustering and Characterization

Predictive modelling around the world

Leveraging Ensemble Models in SAS Enterprise Miner

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Foundations of Artificial Intelligence. Introduction to Data Mining

Business Analytics and Data Mining for CRM Business Analytics and Data Mining for CRM: Jumpstart workshop

MSCA Introduction to Statistical Concepts

Data Mining Practical Machine Learning Tools and Techniques

How To Make A Credit Risk Model For A Bank Account

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

HT2015: SC4 Statistical Data Mining and Machine Learning

Machine learning for algo trading

An Introduction to Data Mining

TIETS34 Seminar: Data Mining on Biometric identification

How To Perform An Ensemble Analysis

Data Isn't Everything

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

ANALYTICS CENTER LEARNING PROGRAM

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Model Deployment. Dr. Saed Sayad. University of Toronto

IBM's Fraud and Abuse, Analytics and Management Solution

Data Mining: Overview. What is Data Mining?

Learning outcomes. Knowledge and understanding. Competence and skills

Customer and Business Analytic

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Transcription:

Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD

Presenter

Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing, BI, and analytics: High-quality, vendor-neutral educational offerings Independent analyst research staff and thought leadership Trusted sources of emerging information and trends Ability to bring together qualified BI/DW professionals and solution providers www.tdwi.org Premium Membership, conferences, seminars, research, publications, topical portals, whitepaper library, and numerous online programs 3

Agenda Introduction to big data and predictive analytics Popular predictive analytics methodologies Examples Guidelines Deployment models

Big Data M2M/IoT Volume Mobile/Location Social Text Formats 5

A Confusion of Words

Big Data Analytics Text analy,cs Stream mining Predic,ve analy,cs Link analysis Big Data Analy,cs Slicing and dicing Visual discovery Etc.

Predictive Analytics A statistical or data mining solution consisting of algorithms and techniques that can be used on both structured and unstructured data to determine outcomes

A Lot of It Is Used to Predict Behavior People Churn Marketing Fraud detection Machine Operations maintenance And much, much more! Good source for use cases

Of Course, It Isn t Just About Modeling CRISP Lifecycle

A Vast Array of Techniques Source: TDWI BPR on Predictive Analytics, 2014; n=242

Supervised Use it when you know outcomes of interest Leave vs. stay Revenue prediction Need enough data for training, testing, validation

Unsupervised Does not include target information Looks for commonalities/hidden structures in data May not produce useful insight Is it prediction?

Techniques Supervised Classification Regression Neural networks Unsupervised Clustering Association Supervised Deep learning, auto-encoders Decision trees, random forests, gradient boosting Support vector machines, Bayesian classifiers, principal component, discriminant analysis Unsupervised Nearest-neighbor mapping, k-means clustering, selforganizing maps Factor analysis, link analysis

Decision Trees Good for classification and prediction with known, discrete outcomes

Linear Regression Used to predict a continuous variable from independent variables

Artificial Neural Networks (1) Biological to Mathematical Source agh.edu

Artificial Neural Networks (2) Can be used on a range of problems; good for classification and estimation Source: Commonsenseatheism.com

Clustering Used to group observa,ons by perceived similarity Source: Babelomics

Association Rule Mining Transac'on Items 1 milk, leduce 2 leduce, diapers, beer, cookies 3 milk, diapers, beer, plas,c bags 4 leduce, milk, diapers, beer 5 leduce, milk, diapers, plas,c bags Diapers -> Beer Used to find relationships Two concepts: support and confidence

Quick Quiz How much revenue will this customer bring? Regression Who is going to take a certain action? Classification What are my customer segments? Clustering If a customer buys X, what else might it buy? Association rules

Strengths & Weaknesses: Decision Trees Strengths Easy to understand Rules vs. equations Easy to explain Not a black box Data doesn t have to follow any distribution Can handle interactions between variables Weaknesses Continuous value predictions Can be computationally expensive to train Can have problems if many classes and few training examples Overfitting

Strengths & Weaknesses: Regression Strengths Simple to use Easy to explain through independent variables Weaknesses Relationship needs to be linear Hard-to-handle categorical variables or variables that interact Outliers hard to model

Strengths & Weaknesses: Neural Networks Strengths Good for a specific class of problems May be easy to implement Non-linear/interaction variables Weaknesses Hard-to-explain output (black box) Output might be unpredictable Training can take a long time

Strengths & Weaknesses: K Means Strengths Good for large datasets Simple Efficient Weaknesses Need to specify K upfront Sensitive to outliers, which may result in incorrect cluster boundaries Needs a mean (categorical data?)

Strengths & Weaknesses: Association Rules Strengths Simple Text data (categorical) Weaknesses Can be computationally expensive Potential for spurious patterns Rules do not mean causality

Ensemble Modeling Multiple models are combined to solve a problem

Vendors Are Offering a Range of Options for Predictive Analytics UI easier to use: visual vs. code based Automation Collaboration/interactivity Cloud options Operationalizing and embedding advanced analytics

Operationalizing An example: 29

TDWI Big Data Maturity Model

QUESTIONS?