Predictive Dynamix Inc



Similar documents
Chapter 12 Discovering New Knowledge Data Mining

Data Mining Part 5. Prediction

An Introduction to Data Mining

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Prediction of Stock Performance Using Analytical Techniques

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Mining and Neural Networks in Stata

not possible or was possible at a high cost for collecting the data.

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

An Overview of Knowledge Discovery Database and Data mining Techniques

Data Mining Part 5. Prediction

Data quality in Accounting Information Systems

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

A Property & Casualty Insurance Predictive Modeling Process in SAS

A New Approach for Evaluation of Data Mining Techniques

A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.

Machine Learning with MATLAB David Willingham Application Engineer

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

Data Mining + Business Intelligence. Integration, Design and Implementation

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Algorithmic Scoring Models

SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

DATA MINING TECHNIQUES AND APPLICATIONS

Social Media Mining. Data Mining Essentials

TNS EX A MINE BehaviourForecast Predictive Analytics for CRM. TNS Infratest Applied Marketing Science

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

DEA implementation and clustering analysis using the K-Means algorithm

Introduction to Machine Learning Using Python. Vikram Kamath

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

The Research of Data Mining Based on Neural Networks

Knowledge Based Descriptive Neural Networks

Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America

Data Mining mit der JMSL Numerical Library for Java Applications

Data Mining and Visualization

DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA

A Property and Casualty Insurance Predictive Modeling Process in SAS

Classification of Bad Accounts in Credit Card Industry

Chapter 20: Data Analysis

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Question 2 Naïve Bayes (16 points)

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Extracting Fuzzy Rules from Data for Function Approximation and Pattern Classification

8. Machine Learning Applied Artificial Intelligence

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

How To Use Neural Networks In Data Mining

Segmentation of stock trading customers according to potential value

How To Make A Credit Risk Model For A Bank Account

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA

Digital image processing

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

The Data Mining Process

Get to Know the IBM SPSS Product Portfolio

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Weather forecast prediction: a Data Mining application

LCs for Binary Classification

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

Classification algorithm in Data mining: An Overview

L3: Statistical Modeling with Hadoop

Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition

Numerical Algorithms Group

Neural Networks algorithms and applications

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Pentaho Data Mining Last Modified on January 22, 2007

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Lecture 8 February 4

1. Classification problems

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Programming Exercise 3: Multi-class Classification and Neural Networks

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Machine learning for algo trading

Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry

Neural Networks and Support Vector Machines

Improving Decision Making and Managing Knowledge

Neural Network Add-in

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

IBM SPSS Neural Networks 22

Role of Neural network in data mining

Comparison of K-means and Backpropagation Data Mining Algorithms

CS Introduction to Data Mining Instructor: Abdullah Mueen

Make Better Decisions Through Predictive Intelligence

Transcription:

Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished by analyzing and modeling the dynamics of the application-specific data. In its raw form, this data is of limited value and is mainly used for reporting what has happened. However, when the data is compiled into a compact model, it is a powerful tool for proactively predicting what will happen. In the abstract, a predictive model is a computational structure that can accurately forecast an outcome of interest (aka, output or dependent variable) when provided with Predictive s Model Output input data (aka, independent variables) that have a measurable causal or coincident relationship to the output. In order for predictive modeling to be useful in given application, two fundamental principles must hold: - Outcomes must have some level of predictability from known data. That is, similar patterns represented across model inputs should be indicative of similar outputs. There should exist some measurable relationship between the set of known data values that will be used as model inputs and the resulting output value(s) that the model is tasked to approximate. - Relationships that existed in the past will continue to hold in the future such that it is reasonable to use past observations to infer future behavior. Predictive modeling, then, is a body of technologies capable of approximating the relationship between known input data measures and some resulting output. Page 1

Application Classes Predictive Dynamix Inc There are generally two classes of predictive modeling applications that differ by the type of output the model produces: Forecasting: Forecasting models generate outputs that are continuous-valued. That is, the output should be a value ranging from the minimum to the maximum allowed. These models are used in applications such as forecasting/estimating: sales, volumes, costs, yields, rates, temperatures, scores, etc. Classification: Classification models generate outputs that are 1-of-n discrete possible outcomes. Often there is a single output that represents a boolean (i.e., yes/no) outcome. These models are used in pattern recognition applications to do fraud detection, target recognition, vote forecasting, prospect classification, churn prediction, bankruptcy prediction, etc. Predictive Modeling Algorithms There are generally two classes of modeling architectures: mapping and segmentation. There are a multitude of flavors of each type. Some of the most useful are described below. Functional Mapping Model Types Linear Regression Logistic Regression Polynomial Regression MLP Neural Networks Segmentation Model Types Decision Tree Cluster-based Rule-based Fuzzy Logic Mapping Architectures: Functional mapping models generally adjust the coefficients in some continuous equation(s) in order to approximate the relationship between input and output variables. The computational basis can range from simple equations, with a limited number of terms, to complex equation series, with many weights that allow highly non-linear interactions and synergies between variables. Regression: Regression is generally the simplest mapping form. Regression models use only one coefficient (i.e., weight term) per input variable. Linear: Linear regression models are widely used. Because there is a single coefficient per input variable they are easily understood. For the Page 2

case of a single input variable, the solution is a line (y = X* + c), for two inputs the solution is a plane (y = X1*1 + X2*2 + c), and for a higher number of inputs the solution is an n-dimensional hyperplane (y = X1*1 + X2*2 + + Xn*n + c). Linear regression models are adequate for many applicatoins but cannot describe X 0 variable interactions X 1 and can be highly susceptible to noisy X data. n Y = X i * i + c Logistic: Logistic regression applies the same general equation as linear regression (y = X i i + c). As a post process, logistic regression applies a logistic function (i.e., s-curve) to the output value of the summation function. This allows the model to better represent binary outputs (logit) and more real-world principles such as 80-20 relationships, diminishing returns, etc. Polynomial: Polynomial regression techniques generally pre-process one or more inputs into a quadratic/polynomial form (y = AX 2 + BX + C), then solves for the coefficients using linear regression. This form allows inputs to interact and can represent more complex relationships than linear or logistic regression. MLP Neural Networks: Multi-layer perceptron (aka, MLP) neural networks have an architecture that allows multiple coefficients per input variable. The processing at each node is functionally equivalent to logistic regression. Multiple Output nodes (organized into layers) allow the model to represent complex, nonlinear, interactions between variables. MLP neural networks have been described as universal approximators because they are capable of accurately approximating any functional relationship. Page 3

Segmentation Architectures: Segmentation architectures use a variety of methods to segment or break-up the solution space into discrete groups of cases. Decision Tree: Decision trees use hierarchical, joint variable conditions to breakup a solution space into subspaces. These conditional sub-spaces can then be used to classify input patterns. They are called trees due to the hierarchical node/link flowchart graph that is often used to depict the various conditional decision paths. Clustering: Cluster models segment a solution space by distributing some number of prototype vectors or cluster centroids across the solution space in order to group similar cases together. An input pattern is classified by mapping it to the cluster centriod that is closest to it. The model s output is inferred based on the known cases that were assigned to the cluster during training. Rules: Like decision trees, rule-based methods can break-up a solution space by using If-Then conditional tests. Rule-based methods do not use decision tree s hierarchical conditional checking, but rather incorporate independent If- Then rules to infer an output result. Fuzzy Logic: Fuzzy logic segments each variable s range into overlapping membership functions/sets. If-Then rules are then used to test joint variable membership conditions in order to describe model logic. hen a fuzzy model is fired, all rules are fired, however, each rule s conclusion is asserted only to the degree that its premise is true. Because membership functions can overlap, fuzzy models can resolve simultaneously conflicting rules. Page 4

Comparing Modeling Technologies The following example shows how different model types learn the dynamics of the dataset on a simple 2-input / 1-output problem. The raw data used to build the model is Z shown in this 3-D scatter plot. The X & Y variables are model inputs and the Z variable is the output value that the model is tasked to X, Y predict. Linear Regression: The model s fit is limited to a plane through the data. Cluster-based Model (Self-Organizing Map): Cluster centroids can be seen as plateaus on the surface. Neural Network (MLP): The model is highly accurate due to its ability to represent non-linear, interacting variable relationships. Decision Tree (CHAID): Rules (i.e., tree leaf nodes) can be seen as plateaus on the response surface. It is interesting to note how the two segmentation methods (Decision Tree & Clustering) produce very similar results on this dataset. Page 5