Drugs store sales forecast using Machine Learning



Similar documents
Machine Learning in Statistical Arbitrage

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Sales Forecasting for Retail Chains

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Data Mining. Nonlinear Classification

Server Load Prediction

Predicting Flight Delays

Predicting borrowers chance of defaulting on credit loans

Predicting daily incoming solar energy from weather data

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee

Classification of Bad Accounts in Credit Card Industry

Classification and Regression by randomforest

Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network

Making Sense of the Mayhem: Machine Learning and March Madness

Forecasting in supply chains

Machine learning for algo trading

Active Learning SVM for Blogs recommendation

Chapter 6. The stacking ensemble approach

Predict Influencers in the Social Network

JetBlue Airways Stock Price Analysis and Prediction

Early defect identification of semiconductor processes using machine learning

Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering

D-optimal plans in observational studies

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

Beating the NCAA Football Point Spread

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

The normal approximation to the binomial

Forecasting methods applied to engineering management

CS 147: Computer Systems Performance Analysis

IST 557 Final Project

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Comparison of machine learning methods for intelligent tutoring systems

Credit Scoring Modelling for Retail Banking Sector.

Java Modules for Time Series Analysis

Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

New Year Holiday Period Traffic Fatality Estimate,

Data Mining: Algorithms and Applications Matrix Math Review

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Demand Forecasting When a product is produced for a market, the demand occurs in the future. The production planning cannot be accomplished unless

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Supply Chain Forecasting Model Using Computational Intelligence Techniques

Author Gender Identification of English Novels

Designing a learning system

Modeling of Sales Forecasting in Retail Using Soft Computing Techniques

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

The Artificial Prediction Market

How To Predict Web Site Visits

Combining GLM and datamining techniques for modelling accident compensation data. Peter Mulquiney

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Predictive Data modeling for health care: Comparative performance study of different prediction models

Support Vector Machines with Clustering for Training with Very Large Datasets

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Supervised Learning (Big Data Analytics)

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Beating the MLB Moneyline

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Statistical Machine Learning

Introduction to Logistic Regression

Decompose Error Rate into components, some of which can be measured on unlabeled data

Lecture 13: Validation

Support Vector Machine (SVM)

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Scalable Developments for Big Data Analytics in Remote Sensing

Sales and operations planning (SOP) Demand forecasting

Time series Forecasting using Holt-Winters Exponential Smoothing

Predict the Popularity of YouTube Videos Using Early View Data

Support Vector Machines Explained

Statistical Learning for Short-Term Photovoltaic Power Predictions

Regression and Time Series Analysis of Petroleum Product Sales in Masters. Energy oil and Gas

Sheet Resistance = R (L/W) = R N L

Pentaho Data Mining Last Modified on January 22, 2007

Moving averages. Rob J Hyndman. November 8, 2009

Indian School of Business Forecasting Sales for Dairy Products

The Operational Value of Social Media Information. Social Media and Customer Interaction

Data Mining: An Overview. David Madigan

Equity forecast: Predicting long term stock price movement using machine learning

Principal components analysis

A Study on the Comparison of Electricity Forecasting Models: Korea and China

2.2 Elimination of Trend and Seasonality

Standardization and Its Effects on K-Means Clustering Algorithm

Better credit models benefit us all

MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

2) The three categories of forecasting models are time series, quantitative, and qualitative. 2)

Analysis of Bayesian Dynamic Linear Models

8.1 Summary and conclusions 8.2 Implications

Data Mining Practical Machine Learning Tools and Techniques

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Machine Learning Final Project Spam Filtering

Hedge Effectiveness Testing

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques Page 1 of 11. EduPristine CMA - Part I

Data Mining - Evaluation of Classifiers

E-commerce Transaction Anomaly Classification

Sales forecasting # 2

Transcription:

Drugs store sales forecast using Machine Learning Hongyu Xiong (hxiong2), Xi Wu (wuxi), Jingying Yue (jingying) 1 Introduction Nowadays medical-related sales prediction is of great interest; with reliable sales prediction, medical companies could allocate their resources more wisely and make better profits. We join a Kaggle competition to predict the everyday drug sale for each store based on the store, promotion and competitor data. We aim to apply different machine learning techniques to tune the model and make predictions on drug sales using time series analysis. 2 Data The Kaggle website provides us training data of 1115 Rossmann stores daily sales dated back to 2013, with 1,017,209 entries in total. The training data includes features of promotion and competitors information. Since we are not able to access to the real sales amount for testing during Kaggle competition, we decide to use 70% of the contest given training data as the training set for our model, the rest 30% as test set for cross validation. For now, the cross validation will be our estimated test error. 3 Method and Results Since the problem involves time series data, we intend to use time series analysis model to deal with it in the first place. We establish auto-regression (AR) model and train it using the data we have to get the parameters. We test AR models with different order numbers, and calculate the test errors. In the time series analysis part, we predicted each store's sale based on just its past data. Now we want to see how stores are different according to their different features. First, we establish a time-independent model by averaging the daily sales per store and collapsing the time dimension; in this frame we use Random Forest to select features and then use Support Vector Regression to fit different features (such as assortment type and competitors) to each store's mean sales. Finally, we manage to generalize the time series model to predict several stores at a time, instead of one by one in the first section. 3.1 Time Series Analysis In order to get a big picture, we first plot several stores daily sales with respect to time evolution. From the plots, we recognize that it's a time series data and for a certain store, its daily sales evolve periodically with a period of a year. For example, according to Figure 2, we can see that there will be sales peaks at certain dates around January and May, but in general the sales keep at a constant level. 3.1.1 Auto-Regression (AR) Model

Since the prediction for the daily sale for each store from past data seems to be a time series problem, we manage to solve this problem by building a time series analysis model. The basic method we are using is auto-regression (AR) model. X p = ϕ X + ε t i t i t i= 1 We looked online and discovered a package called "Forecast" in R, which does autoregression. We made some revision so the program could do what we want. To use this model, we pre-process the data to make weekly sales as a time unit. The first reason we want to do that is because that "zero Sunday sales" makes it hard to use daily sales as a time unit, and monthly sale would kill all the detail patterns; the second reason is that we plot sales at different days in a week through time, and discover that they basically follow the same pattern (Figure 1). Since we condense the daily sales to weekly sales at the beginning, after we predicted the weekly sales we still have to regenerate sales on each day in that week. We assume a normal distribution N(µ, σ 2 ) with µ equal to the weekly sales and variance σ 2 to be trained. We compare AR models with order p = 1, 2, 5, 10, 20 to see which order number fit best. Figure 1 Sales of store1 on different days of the week 3.1.2 Cross-validation For each order p, we train parameters φ, ε, µ, and σ with 70% of the data and cross validation with the remain 30% to get the test error. The cost function we are using is the sum of the square of the difference between predicted and real daily sales of store1. The table below is the comparison among test errors using different order number. Order p 1 2 5 10 20 Test Error 1.4447 0.47354 2.4489 2.1463 1.9401 (10^10) As we can see order number p = 2 yields the lowest test errors. We plot the predicted sales of store1 (Figure 3). If we compare Fig 2 and Fig 3, we can see the model capture the general pattern of the store1's daily sales.

Specifically speaking, for sales on Tuesday, Wednesday and Thursday, our predictions are within 10% of the real daily sales. However, it gives conservative predictions for Mondays and Fridays, when sales are apparently higher compared to other days in a week. Figure 2 Sales of store1 at daily base Figure 3 Predicted daily sales of store1 3.2 Random Forest Since there are so many factors influencing the sales, such as store types, promotions, competitors, and even holidays, we are trying to identify the most important features that influence the sales. We use random forest to do that; in order to decrease the amount of data to test the model and the program, we average the daily sales for each store, so the amount of data decrease from a million to a thousand. In the raw data provided, five parameters have missing data points. We first combine CompetitionSinceMonth and CompetitionSinceYear to a new variable CompetitionExistMonth. Similarly, we create a new variable PromoExistMonth from four promotion variables. We also scale the variable CompetitionDistance by 100. Figure 4 Feature selection through random forest We built a model using the "random Forest" package in R. Our model was built on 500 trees and 8 variables in the "store.csv" and the mean of sales for each store as the response variable. We used na.omit for the missing values and generated the variable importance plot for the feature selection. The out-of-bag errors were traced for the model.

It seems that the out-of-bag errors are pretty small. But we will have to dig more into the data and model fitting to find out if it's overfitting. The variable importance plot is as following: As you can see from the plot, the feature Assortment is the most important variable because including it could make the biggest reduction in the MSE. Also notice that CompetitionExistence is the least important one among all variables in the store characteristics. 3.3 Support Vector Regression (SVR) We used Support Vector Regression (SVR) method to find relation between each store's mean sales and different kind of features. SVR is a variation of Support Vector Machine (SVM). The essence of SVM is to find a classification boundary with maximum margins to the feature points; by similar token, SVR is to find a regression curve with minimum margins to the feature points. We used the standard liblinear2.1 package in MATLAB to implement SVR. 1 2 min wb, ω 2 () i T () i y ω x b ε st.. for i = 1, 2,...,m T ( i) ( i) ω x + b y ε According to the data, there are four different store types (a, b, c, d) and three different assortment types (a, b, c), in total 8 combinations (aa, ac, ba, bb, ca, cc, da, dc). We ran SVR algorithm with linear kernel according to different combinations. Test Error (a, a) (a, c) (b, a) (b, b) (c, a) (c, c) (d, a) (d, c) Linear 2.63 1.60 2.61 0.00412 0.110 0.378 0.263 1.09 Linear + Parabola Sqrt + Parabola 2.62 1.57 933 0.00497 0.107 0.385 0.305 1.09 2.49 1.57 1.15*10 ^7 0.00938 0.106 0.378 0.228 1.09 3.4 Extension for Time Series Model In this model we used a method similar to seasonalized times series regression, in which prediction of sales Y = T SI. We adjusted this equation to Y = T DI, in which Y means the daily sale prediction, T means the sale trend, and DI means daily sale index. To obtain T, we linearly fitted of all historical sales data of a single store with equation y = a + bt, in which a = - /-, and b =. / 0. Then to predict sale at time t 1, T t 1 = a + bt 1. DI was obtained from historical data, and the 365 days in a year each has a different DI. For example, if we wish to get DI for January 2nd, we got historical sales on January 2nd, say s 3, s 4, s 5, at time point t 3, t 4, t 5, respectively, and linearly fitted sales y 3, y 4, y 5, then DI = 3 6 7 + 6 0 + 6 8. DI reflects the sales 5-7 - 0-8 fluctuations at different periods within a year. We take considerations for zero sales cases, and if s 3 = 0, we just ignores it when calculating DI. Instead in the final prediction we assigned sales on Sundays to be zero, based on observations of historical data, and holiday dates with all historical sales data equal to zero, like January 1st, were also predicted to be zero. We used 76% historical data for training, and 24% for testing, and the cost functions for 10 stores were listed.

Test Error Test Error Store1 Store2 Store3 Store4 Store5 1.3154 4.0130 6.8025 5.9619 5.3891 Store6 Store7 Store8 Store9 Store10 2.6773 1.0367 7.1013 5.0954 2.6067 Figure 5 Comparison between raw and predicted sales of store 1 & 2. 4 Conclusion and Prospective We use AR model to predict the sales with small discrepancy to the test data, and we use RF and SVR to find relations between store mean sales and other features. There are certainly rooms for improvements. We can make further predictions on daily sales using SVR. Even though we found the relations between the features, and make fairly good predictions on average sales for each store. We think next we could try to use SVR see how the parameters set in AR change according to features. By doing so, we could automate the process of making predictions on daily sales for all the stores in the time series model. Reference: [1] Wensen Dai et al. A Clustering-based Sales Forecasting Scheme Using Support Vector Regression for Computer Server. Procedia Manufacturing 2 ( 2015 ) 82 86. [2] Marco Hulsmann et al. General Sales Forecast Models for Automobile Markets and their Analysis. Transactions on Machine Learning and Data Mining Vol. 5, No. 2 (2012) 65-86.