The Operational Value of Social Media Information. Social Media and Customer Interaction

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "The Operational Value of Social Media Information. Social Media and Customer Interaction"

Transcription

1 The Operational Value of Social Media Information Dennis J. Zhang (Kellogg School of Management) Ruomeng Cui (Kelley School of Business) Santiago Gallino (Tuck School of Business) Antonio Moreno-Garcia (Kellogg School of Management) Social Media and Customer Interaction Social media is computer-mediated tool that allow people to create, share or exchange information in virtual communities and networks 83% of Forbes 5 companies use at least one of the social media sites to interact with customers. Top 5 retailers in U.S. earned $3.3 billion from social shopping in 214 (26% higher than 213) 2 1

2 11/17/15 Facebook Facebook accounts for 65% of total social revenue. 3 Research Question In practice with limited data, can we improve sales forecast accuracy and in turn inventory management by incorporating publicly available social media information? If so, by how much? How should we select forecasting models when incorporating social media information? How should we extract features from social media information to improve forecast accuracy? Roadmap Data Features Algorithms Framework Results and Implications 4 2

3 Sales and Social Data (213/1 213/8) Online Apparel Company Sales Operational Data Daily sales and revenue data (2 years) The company s promotion and marketing data (1 year) The company s own daily sales forecast Jan Feb Mar Apr May Jun Jul Sales, Comments, Posts Normalized Comments, Posts and Sales (Data with Forecast) Social Media Data (from Facebook API) The company's 1.5k posts, 24k comments, and 22k likes from more than 7k Facebook users (2 years). Jan Feb Mar Apr May Jun Jul 5 Forecasting Framework Features Features: Basic Features (per day) Past sales, advertisement, promotion data, seasonality Non-textual Features Number of posts, number of comments from users, number unique users visited, etc. Social Features (per day) Textual Feature Number of unique words in comments, average sentiment of comments, etc. For each day: Features i 3 Features i 2 Features i 1 Sales i Sales i = f(features i 1, Features i 2, Features i 3 ) In our algorithm, we use past 7 days data, each day has 4 features. Therefore, our forecasting function takes 7 * 4 = 28 features in total. Socher et al. (213) 6 3

4 Forecasting Framework Forecast Algorithm Forecasting models (linear and nonlinear, with variable selections): Linear models: linear regression, Lasso and forward selection Support vector machine: Linear and Radial Kernel Ensemble models: Gradient boosting model and Random forest. Sales at t-1 Tree 1 Tree 2 Tree N < 5 5 Sales at t = 5 Sentiment at t-1 < 1 1 Sales at t = 4 Sales at t = 6 Voting Leo Breiman (21), Anyd Liaw and MaBhew Wiener (215). 7 Forecasting Framework Forecast Framework Training Period: 9 days Testing Period: 45 days Training Period: 1-fold cross validation with 3 repeats (Industry standard) to tune hyper-parameters. Performance Metrics: Mean Absolute Percentage Error: MAPE = 1 P N N i=1 P N Root Mean Squared Error: RMSE = i=1 (F i S i ) 2 q 1 N Fi Si Si Testing Period: We update our forecast models every 5 days. (Test period = 45 -> We have /5 = 1 models in each testing period). Prediction Horizon (1 7): the number of days ahead we predict. (i.e., Prediction Horizon = 2 ó We predict the sales of the day after tomorrow) 8 4

5 . Sales Jan Feb Mar Apr May Jun Jul Jan Feb Mar Apr May Jan Jun Feb Jul Mar Apr May Jun Jul Jan Feb Mar Apr May Jun Jul Comments Posts Sales Sales 11/17/15 Forecasting Framework Information Value Basic Features Basic Features Basic Features Social Features ML model+ Framework Unknown model + Framework Company Proprietary Information Social Media Forecast Baseline Forecast Company Forecast Value of Social Media Information Value of Machine Learning Models 9 Forecasting Main Results Model: Random Forest (with number of trees and tree depth as tunable variables) Training: 213/1/1 213/4/1 (9 days) Testing: 213/4/2 213/5/6 (45 days) 12 MAPE of Sales Forecasts over Different Prediction Horizaons MAPE (%) %* 23.%*** 9.5% 2.4%** 27.3%*** 15.5%** 18.5%** model Base Forecast Social Forecast Company Forecast Prediction Horizon (days) 1 5

6 Different Starting times MAPE of Sales Forecasts over Different Starting s (Prediction Horizon = 1) 2 MAPE (%) %** model Base Forecast Social Forecast Company Forecast 1 23.%*** 18.7%** 14.%* 13.6%** 5 Jan 1 Jan 15 Feb 1 Feb 15 Mar 1 Starting Value of Social Information is significantly positive throughout different training and testing periods. 11 Different Machine Learning models MAPE of Sales Forecasts over Different Models (Prediction Hoizon = 1) 15 MAPE (%) 1 5 Linear Models SVM Ensemble Models Data base social Linear RegressionForward Selection Lasso SVM (Linear) SVM (Radial) GBM Random Forest Social media Information is valuable if the statistical model : Is nonlinear in features Has variable selection 12 6

7 Model Inspection How important is social media information in forecasting models? Random Forest: Weighted-average Gini Index. 12 Top 2 Features with Highest Gini Importance 9 Past Sales Gini Importance 6 Sentiment Past Comments Type Base Non textual Textual 3 Features Social media information is important in forecasting. Textual processing of social media information is helpful. 13 Take-Away Social Media information helps! In practice with limited data, we can reduce MAPE by 15% when incorporating social media information. Forecasting models that are nonlinear and can actively select variables will benefit more from incorporating social media information. Only counting of social media activities are not enough. Textual analysis of comments and posts are also, if not more, important. 14 7

8 Model Inspection How important is social media information in forecasting models? Random Forest: Weighted-average Gini (impurity) Index. For example, there is a box of balls, half of them are red and half of them of green. We want to predict the color of the ball base on two binary features A and B. All balls with A = 1 are red. Half of balls with B = 1 are red. Which feature is more important?.5*.5= A 1 GI = B 1 GI = - Gini index: Gini impurity index: G = P n c i=1 p i(1 p i ) Gini index of a node: I n = G parent Pi G split i 13 Model Inspection Forecast Algorithm How important is social media information in forecasting models? Lasso: how many variables left from basic features and social features? 1 lasso = 1 NX X px (y i px ij j ) 2 + j A 2 i=1 j=1 j=1 We use a simple and efficient Lasso implementation: Least Angle Regression (Efron et al. 23) We can inspect the model now to see which variables are more correlated with our forecasted sales. 15 8

9 Model Inspection How important is social media information in forecasting models? Lasso: how many variables left from basic features and social features? 3 Number of Features in Lasso Regression Number of Features % 44.1% 44.1% Base Non textual Textual Feature Type Social media information is important in forecasting. Textual processing of social media information is helpful. 16 InLasso FALSE TRUE Least Angle Regression lasso = 1 2 NX X (y i px ij j ) 2 + i=1 1. Find the most correlated x i. 2. Move i to the least square coe cient until some x j becomes as correlated with the residuals as x i. 3. Move ( i, j) to the joint least square coe cients of the residuals until some x k becomes as correlated as x i and x j. 4. Repeat 2, 3 until we selected k variables. j=1 1 px j A j=1 18 9

The Operational Value of Social Media Information

The Operational Value of Social Media Information The Operational Value of Social Media Information Ruomeng Cui Kelley School of Business, Indiana University, Bloomington, IN 47405, cuir@indiana.edu Santiago Gallino Tuck School of Business, Dartmouth

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

HOSPIRA (HSP US) HISTORICAL COMMON STOCK PRICE INFORMATION

HOSPIRA (HSP US) HISTORICAL COMMON STOCK PRICE INFORMATION 30-Apr-2004 28.35 29.00 28.20 28.46 28.55 03-May-2004 28.50 28.70 26.80 27.04 27.21 04-May-2004 26.90 26.99 26.00 26.00 26.38 05-May-2004 26.05 26.69 26.00 26.35 26.34 06-May-2004 26.31 26.35 26.05 26.26

More information

10/27/14. Consumer Credit Risk Management. Tackling the Challenges of Big Data Big Data Analytics. Andrew W. Lo

10/27/14. Consumer Credit Risk Management. Tackling the Challenges of Big Data Big Data Analytics. Andrew W. Lo The Challenge of Consumer Credit Risk Management Consumer Credit Risk Management $3T of consumer credit outstanding as of 8/13 $840B of it is revolving consumer credit Average credit card debt as of 10/13:

More information

Median and Average Sales Prices of New Homes Sold in United States

Median and Average Sales Prices of New Homes Sold in United States Jan 1963 $17,200 (NA) Feb 1963 $17,700 (NA) Mar 1963 $18,200 (NA) Apr 1963 $18,200 (NA) May 1963 $17,500 (NA) Jun 1963 $18,000 (NA) Jul 1963 $18,400 (NA) Aug 1963 $17,800 (NA) Sep 1963 $17,900 (NA) Oct

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

THE UNIVERSITY OF BOLTON

THE UNIVERSITY OF BOLTON JANUARY Jan 1 6.44 8.24 12.23 2.17 4.06 5.46 Jan 2 6.44 8.24 12.24 2.20 4.07 5.47 Jan 3 6.44 8.24 12.24 2.21 4.08 5.48 Jan 4 6.44 8.24 12.25 2.22 4.09 5.49 Jan 5 6.43 8.23 12.25 2.24 4.10 5.50 Jan 6 6.43

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

Beating the MLB Moneyline

Beating the MLB Moneyline Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

More information

Trees and Random Forests

Trees and Random Forests Trees and Random Forests Adele Cutler Professor, Mathematics and Statistics Utah State University This research is partially supported by NIH 1R15AG037392-01 Cache Valley, Utah Utah State University Leo

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation

Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation James K. Kimotho, Christoph Sondermann-Woelke, Tobias Meyer, and Walter Sextro Department

More information

Package trimtrees. February 20, 2015

Package trimtrees. February 20, 2015 Package trimtrees February 20, 2015 Type Package Title Trimmed opinion pools of trees in a random forest Version 1.2 Date 2014-08-1 Depends R (>= 2.5.0),stats,randomForest,mlbench Author Yael Grushka-Cockayne,

More information

Cross Validation. Dr. Thomas Jensen Expedia.com

Cross Validation. Dr. Thomas Jensen Expedia.com Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract

More information

Maximize Revenues on your Customer Loyalty Program using Predictive Analytics

Maximize Revenues on your Customer Loyalty Program using Predictive Analytics Maximize Revenues on your Customer Loyalty Program using Predictive Analytics 27 th Feb 14 Free Webinar by Before we begin... www Q & A? Your Speakers @parikh_shachi Technical Analyst @tatvic Loves js

More information

Fast Analytics on Big Data with H20

Fast Analytics on Big Data with H20 Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,

More information

Applying Data Science to Sales Pipelines for Fun and Profit

Applying Data Science to Sales Pipelines for Fun and Profit Applying Data Science to Sales Pipelines for Fun and Profit Andy Twigg, CTO, C9 @lambdatwigg Abstract Machine learning is now routinely applied to many areas of industry. At C9, we apply machine learning

More information

Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution

Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Rie Johnson Tong Zhang 1 Introduction This document describes our entry nominated for the second prize of the Heritage

More information

TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP

TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

Stock Market Forecasting Using Machine Learning Algorithms

Stock Market Forecasting Using Machine Learning Algorithms Stock Market Forecasting Using Machine Learning Algorithms Shunrong Shen, Haomiao Jiang Department of Electrical Engineering Stanford University {conank,hjiang36}@stanford.edu Tongda Zhang Department of

More information

DATA MINING SPECIES DISTRIBUTION AND LANDCOVER. Dawn Magness Kenai National Wildife Refuge

DATA MINING SPECIES DISTRIBUTION AND LANDCOVER. Dawn Magness Kenai National Wildife Refuge DATA MINING SPECIES DISTRIBUTION AND LANDCOVER Dawn Magness Kenai National Wildife Refuge Why Data Mining Random Forest Algorithm Examples from the Kenai Species Distribution Model Pattern Landcover Model

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

AT&T Global Network Client for Windows Product Support Matrix January 29, 2015

AT&T Global Network Client for Windows Product Support Matrix January 29, 2015 AT&T Global Network Client for Windows Product Support Matrix January 29, 2015 Product Support Matrix Following is the Product Support Matrix for the AT&T Global Network Client. See the AT&T Global Network

More information

Microsoft Azure Machine learning Algorithms

Microsoft Azure Machine learning Algorithms Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation

More information

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully

More information

Analysis One Code Desc. Transaction Amount. Fiscal Period

Analysis One Code Desc. Transaction Amount. Fiscal Period Analysis One Code Desc Transaction Amount Fiscal Period 57.63 Oct-12 12.13 Oct-12-38.90 Oct-12-773.00 Oct-12-800.00 Oct-12-187.00 Oct-12-82.00 Oct-12-82.00 Oct-12-110.00 Oct-12-1115.25 Oct-12-71.00 Oct-12-41.00

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims

More information

C19 Machine Learning

C19 Machine Learning C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

NAV HISTORY OF DBH FIRST MUTUAL FUND (DBH1STMF)

NAV HISTORY OF DBH FIRST MUTUAL FUND (DBH1STMF) NAV HISTORY OF DBH FIRST MUTUAL FUND () Date NAV 11-Aug-16 10.68 8.66 0.38% -0.07% 0.45% 3.81% 04-Aug-16 10.64 8.66-0.19% 0.87% -1.05% 3.76% 28-Jul-16 10.66 8.59 0.00% -0.34% 0.34% 3.89% 21-Jul-16 10.66

More information

Mining Wiki Usage Data for Predicting Final Grades of Students

Mining Wiki Usage Data for Predicting Final Grades of Students Mining Wiki Usage Data for Predicting Final Grades of Students Gökhan Akçapınar, Erdal Coşgun, Arif Altun Hacettepe University gokhana@hacettepe.edu.tr, erdal.cosgun@hacettepe.edu.tr, altunar@hacettepe.edu.tr

More information

Predicting borrowers chance of defaulting on credit loans

Predicting borrowers chance of defaulting on credit loans Predicting borrowers chance of defaulting on credit loans Junjie Liang (junjie87@stanford.edu) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm

More information

Predictive Modeling of Titanic Survivors: a Learning Competition

Predictive Modeling of Titanic Survivors: a Learning Competition SAS Analytics Day Predictive Modeling of Titanic Survivors: a Learning Competition Linda Schumacher Problem Introduction On April 15, 1912, the RMS Titanic sank resulting in the loss of 1502 out of 2224

More information

COE BIDDING RESULTS 2009 Category B Cars >1600 cc

COE BIDDING RESULTS 2009 Category B Cars >1600 cc Quota System A COE BIDDING RESULTS 2009 B Jan-2009 Quota 1,839 1,839 1,100 1,099 274 268 409 411 767 758 Successful bids 1,784 1,832 1,100 1,097 274 260 401 386 763 748 Bids received 2,541 2,109 1,332

More information

Using Ensemble of Decision Trees to Forecast Travel Time

Using Ensemble of Decision Trees to Forecast Travel Time Using Ensemble of Decision Trees to Forecast Travel Time José P. González-Brenes Guido Matías Cortés What to Model? Goal Predict travel time at time t on route s using a set of explanatory variables We

More information

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network , pp.67-76 http://dx.doi.org/10.14257/ijdta.2016.9.1.06 The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network Lihua Yang and Baolin Li* School of Economics and

More information

A BEST Case: Forecast Improvement Project. A Tale of Two BUs

A BEST Case: Forecast Improvement Project. A Tale of Two BUs A BEST Case: Forecast Improvement Project A Tale of Two BUs Green Belt Project Scope: EUR Region Two Business Units with distinct supply chains Goal: Accuracy improvement by Q3 2013 Workshop in Nov 2013

More information

Classification and Regression by randomforest

Classification and Regression by randomforest Vol. 2/3, December 02 18 Classification and Regression by randomforest Andy Liaw and Matthew Wiener Introduction Recently there has been a lot of interest in ensemble learning methods that generate many

More information

Classifiers & Classification

Classifiers & Classification Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin

More information

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee 1. Introduction There are two main approaches for companies to promote their products / services: through mass

More information

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS* COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) 2 Fixed Rates Variable Rates FIXED RATES OF THE PAST 25 YEARS AVERAGE RESIDENTIAL MORTGAGE LENDING RATE - 5 YEAR* (Per cent) Year Jan Feb Mar Apr May Jun

More information

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS* COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) 2 Fixed Rates Variable Rates FIXED RATES OF THE PAST 25 YEARS AVERAGE RESIDENTIAL MORTGAGE LENDING RATE - 5 YEAR* (Per cent) Year Jan Feb Mar Apr May Jun

More information

Web Site Visit Forecasting Using Data Mining Techniques

Web Site Visit Forecasting Using Data Mining Techniques Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many

More information

Evidence to Action: Use of Predictive Models for Beach Water Postings

Evidence to Action: Use of Predictive Models for Beach Water Postings Evidence to Action: Use of Predictive Models for Beach Water Postings Canadian Society for Epidemiology and Biostatistics Caitlyn Paget, June 4 th 2015 Goal is to improve program delivery Can we improve

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Model Combination. 24 Novembre 2009

Model Combination. 24 Novembre 2009 Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya

More information

Bike sharing model reuse framework for tree-based ensembles

Bike sharing model reuse framework for tree-based ensembles Bike sharing model reuse framework for tree-based ensembles Gergo Barta 1 Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Magyar tudosok krt. 2.

More information

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d. EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

More information

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is

More information

Classification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model

Classification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model 10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1

More information

INCREASING FORECASTING ACCURACY OF TREND DEMAND BY NON-LINEAR OPTIMIZATION OF THE SMOOTHING CONSTANT

INCREASING FORECASTING ACCURACY OF TREND DEMAND BY NON-LINEAR OPTIMIZATION OF THE SMOOTHING CONSTANT 58 INCREASING FORECASTING ACCURACY OF TREND DEMAND BY NON-LINEAR OPTIMIZATION OF THE SMOOTHING CONSTANT Sudipa Sarker 1 * and Mahbub Hossain 2 1 Department of Industrial and Production Engineering Bangladesh

More information

Package acrm. R topics documented: February 19, 2015

Package acrm. R topics documented: February 19, 2015 Package acrm February 19, 2015 Type Package Title Convenience functions for analytical Customer Relationship Management Version 0.1.1 Date 2014-03-28 Imports dummies, randomforest, kernelfactory, ada Author

More information

Machine Learning Methods for Demand Estimation

Machine Learning Methods for Demand Estimation Machine Learning Methods for Demand Estimation By Patrick Bajari, Denis Nekipelov, Stephen P. Ryan, and Miaoyu Yang Over the past decade, there has been a high level of interest in modeling consumer behavior

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

Sales Forecast for Pickup Truck Parts:

Sales Forecast for Pickup Truck Parts: Sales Forecast for Pickup Truck Parts: A Case Study on Brake Rubber Mojtaba Kamranfard University of Semnan Semnan, Iran mojtabakamranfard@gmail.com Kourosh Kiani Amirkabir University of Technology Tehran,

More information

8 Jan : "

8 Jan : GRAPHIC EPHEMERIS (Data Sheets) for 12 months from January 2013 until December 2013 planets: H aspects: ASTRODIENST ZÜRICH Dammstr. 23, CH-8702 Zollikon Phone +41-1-392 1818 Fax 391 7574 Order 0.0-0 Page

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS CLARKE, Stephen R. Swinburne University of Technology Australia One way of examining forecasting methods via assignments

More information

Big Data Techniques Applied to Very Short-term Wind Power Forecasting

Big Data Techniques Applied to Very Short-term Wind Power Forecasting Big Data Techniques Applied to Very Short-term Wind Power Forecasting Ricardo Bessa Senior Researcher (ricardo.j.bessa@inesctec.pt) Center for Power and Energy Systems, INESC TEC, Portugal Joint work with

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

RETAIL SUPPLY CHAIN CONFERENCE 2015

RETAIL SUPPLY CHAIN CONFERENCE 2015 CASE STUDY: COMBINING VENDOR PERFORMANCE AND PREDICTIVE ANALYTICS TO DRIVE PROFIT IMPROVEMENTS Pep Boys, a $2 billion retailer, recently participated in a joint case study by Compliance Networks (a vendor

More information

GRAPHIC EPHEMERIS (Data Sheets) for 12 months from January 2004 until December 2004

GRAPHIC EPHEMERIS (Data Sheets) for 12 months from January 2004 until December 2004 GRAPHIC EPHEMERIS (Data Sheets) for 12 months from January 2004 until December 2004 planets: H aspects: ASTRODIENST ZÜRICH Dammstr. 23, CH-8702 Zollikon Phone +41-1-392 1818 Fax 391 7574 Order 0.0-0 Page

More information

9 Jan : " 9 Jan : " 9 Jan : " 9 Jan : "

9 Jan :  9 Jan :  9 Jan :  9 Jan : GRAPHIC EPHEMERIS (Data Sheets) for 12 months from January 2008 until December 2008 planets: H aspects: ASTRODIENST ZÜRICH Dammstr. 23, CH-8702 Zollikon Phone +41-1-392 1818 Fax 391 7574 Order 0.0-0 Page

More information

Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016

Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016 Event driven trading new studies on innovative way of trading in Forex market Michał Osmoła INIME live 23 February 2016 Forex market From Wikipedia: The foreign exchange market (Forex, FX, or currency

More information

Demand forecasting & Aggregate planning in a Supply chain. Session Speaker Prof.P.S.Satish

Demand forecasting & Aggregate planning in a Supply chain. Session Speaker Prof.P.S.Satish Demand forecasting & Aggregate planning in a Supply chain Session Speaker Prof.P.S.Satish 1 Introduction PEMP-EMM2506 Forecasting provides an estimate of future demand Factors that influence demand and

More information

Using INZight for Time series analysis. A step-by-step guide.

Using INZight for Time series analysis. A step-by-step guide. Using INZight for Time series analysis. A step-by-step guide. inzight can be downloaded from http://www.stat.auckland.ac.nz/~wild/inzight/index.html Step 1 Click on START_iNZightVIT.bat. Step 2 Click on

More information

Applied Multivariate Analysis - Big data analytics

Applied Multivariate Analysis - Big data analytics Applied Multivariate Analysis - Big data analytics Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org M1 in Economics and Economics and Statistics Toulouse School of

More information

Performance Measures in Data Mining

Performance Measures in Data Mining Performance Measures in Data Mining Common Performance Measures used in Data Mining and Machine Learning Approaches L. Richter J.M. Cejuela Department of Computer Science Technische Universität München

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 - Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create

More information

Predicting Flight Delays

Predicting Flight Delays Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

More information

Grid Operations and Planning

Grid Operations and Planning Grid Operations and Planning Report Kent Saathoff ERCOT Board of Directors April 20, 2010 Kent Content Summary March 2010 Operations Peak Demand: Actual vs. Forecast On-line Resources: Total at Peak and

More information

Comprehensive Forecasting System for Variable Renewable Energy

Comprehensive Forecasting System for Variable Renewable Energy Branko Kosović Sue Ellen Haupt, Gerry Wiener, Luca Delle Monache, Yubao Liu, Marcia Politovich, Jenny Sun, John Williams*, Daniel Adriaansen, Stefano Alessandrini, Susan Dettling, and Seth Linden (NCAR,

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner

The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner Paper 3361-2015 The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner Narmada Deve Panneerselvam, Spears School of Business, Oklahoma State University, Stillwater,

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Case 2:08-cv-02463-ABC-E Document 1-4 Filed 04/15/2008 Page 1 of 138. Exhibit 8

Case 2:08-cv-02463-ABC-E Document 1-4 Filed 04/15/2008 Page 1 of 138. Exhibit 8 Case 2:08-cv-02463-ABC-E Document 1-4 Filed 04/15/2008 Page 1 of 138 Exhibit 8 Case 2:08-cv-02463-ABC-E Document 1-4 Filed 04/15/2008 Page 2 of 138 Domain Name: CELLULARVERISON.COM Updated Date: 12-dec-2007

More information

Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering

Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering IEICE Transactions on Information and Systems, vol.e96-d, no.3, pp.742-745, 2013. 1 Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering Ildefons

More information

Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

More information

Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups

Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups Achim Zeileis, Torsten Hothorn, Kurt Hornik http://eeecon.uibk.ac.at/~zeileis/ Overview Motivation: Trees, leaves, and

More information

Machine Learning over Big Data

Machine Learning over Big Data Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed

More information

SINGULAR SPECTRUM ANALYSIS HYBRID FORECASTING METHODS WITH APPLICATION TO AIR TRANSPORT DEMAND

SINGULAR SPECTRUM ANALYSIS HYBRID FORECASTING METHODS WITH APPLICATION TO AIR TRANSPORT DEMAND SINGULAR SPECTRUM ANALYSIS HYBRID FORECASTING METHODS WITH APPLICATION TO AIR TRANSPORT DEMAND K. Adjenughwure, Delft University of Technology, Transport Institute, Ph.D. candidate V. Balopoulos, Democritus

More information

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION N PROBLEM DEFINITION Opportunity New Booking - Time of Arrival Shortest Route (Distance/Time) Taxi-Passenger Demand Distribution Value Accurate

More information

Tree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems

Tree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems Tree Ensembles: The Power of Post- Processing December 2012 Dan Steinberg Mikhail Golovnya Salford Systems Course Outline Salford Systems quick overview Treenet an ensemble of boosted trees GPS modern

More information

Regression and Time Series Analysis of Petroleum Product Sales in Masters. Energy oil and Gas

Regression and Time Series Analysis of Petroleum Product Sales in Masters. Energy oil and Gas Regression and Time Series Analysis of Petroleum Product Sales in Masters Energy oil and Gas 1 Ezeliora Chukwuemeka Daniel 1 Department of Industrial and Production Engineering, Nnamdi Azikiwe University

More information

Towards Effective Recommendation of Social Data across Social Networking Sites

Towards Effective Recommendation of Social Data across Social Networking Sites Towards Effective Recommendation of Social Data across Social Networking Sites Yuan Wang 1,JieZhang 2, and Julita Vassileva 1 1 Department of Computer Science, University of Saskatchewan, Canada {yuw193,jiv}@cs.usask.ca

More information

Decompose Error Rate into components, some of which can be measured on unlabeled data

Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance

More information

An innovative approach combining industrial process data analytics and operator participation to implement lean energy programs: A Case Study

An innovative approach combining industrial process data analytics and operator participation to implement lean energy programs: A Case Study An innovative approach combining industrial process data analytics and operator participation to implement lean energy programs: A Case Study Philippe Mack, Pepite SA Joanna Huddleston, Pepite SA Bernard

More information

Indian School of Business Forecasting Sales for Dairy Products

Indian School of Business Forecasting Sales for Dairy Products Indian School of Business Forecasting Sales for Dairy Products Contents EXECUTIVE SUMMARY... 3 Data Analysis... 3 Forecast Horizon:... 4 Forecasting Models:... 4 Fresh milk - AmulTaaza (500 ml)... 4 Dahi/

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Text Analytics The three-minute guide

Text Analytics The three-minute guide Text Analytics The three-minute guide Text Analytics The three-minute guide 1 What is text analytics? Detecting hidden signals There s a good chance that your organization is awash in unstructured, text-rich

More information