The Operational Value of Social Media Information
|
|
|
- Esther Walters
- 10 years ago
- Views:
Transcription
1 The Operational Value of Social Media Information Ruomeng Cui Kelley School of Business, Indiana University, Bloomington, IN 47405, Santiago Gallino Tuck School of Business, Dartmouth College, Antonio Moreno Kellogg School of Management, Northwestern University, Evanston, IL 60208, Dennis Zhang Kellogg School of Management, Northwestern University, Evanston, IL 60208, While the value of using social media information has been established in multiple business contexts, the field of operations and supply chain management has not yet explored the possibilities it offers in improving firms operational decisions. This paper attempts to do that by empirically studying whether using publicly available social media information can improve the accuracy of daily sales forecasts. We collaborated with an online apparel retailer to assemble a data set that combines (1) detailed internal operational information, including data on sales, advertising, and promotions, and (2) publicly available social media information obtained from Facebook. We implement a variety of machine learning methods to create daily sales forecasts. For each method, we compare the accuracy of the forecasts we obtain with and without the social media information. We find that using social media information results in statistically significant improvements in the out-of-sample accuracy of the forecasts, with relative improvements ranging from percent to percent over different forecast horizons. Nonlinear models with feature selection, such as random forests, perform significantly better than linear models, such as linear regression. Our preferred method (random forest) yields an out-of-sample MAPE of 7.21 percent when not using social media information and 5.73 percent when using social media information. In both cases, this significantly improves the accuracy of the company s internal forecasts (a MAPE of percent). We decompose the improvement into the value of utilizing state-of-the-art machine-learning methods and the value of incorporating social media information, and we explore the contribution of each of the social media features to the improvement of forecast accuracy, finding that the comments encoded using natural language processing techniques have the highest predictive power. Key words : social media, Facebook, sales forecast, machine learning, random forest 1. Introduction Social media is creating new channels for customers to interact and communicate with companies, with more than 83 percent of the Inc. 500 companies using at least one of the main social media sites (Hameed 2011). Customers routinely use social media to engage in discussions about compa- 1 Electronic copy available at:
2 2 Author: The Operational Value of Social Media Information nies products and services. As a consequence, customers preferences, opinions, and emotions are embedded in their activities on social media. With social networks such as Facebook and Twitter, more and more customers are expressing their preferences online and their purchasing decisions increasingly are influenced by friends referrals. Recognizing this, many companies have started to use social media information to improve their marketing and advertising decisions. For example, Kasper Rorsted, CEO of German consumer goods company Henkel AG, told an interviewer that digital and social media have an enormous impact on how people are viewing fast-moving consumer goods... We are very much focused on engaging online with consumers (NG 2013). At the same time, a number of academic disciplines have started to look at the impact of social media information from different angles. Multiple studies in marketing, finance, and information systems have demonstrated the value of social media information when making managerial decisions (e.g., see Bollen et al. 2011, Aral 2011, and Goh et al. 2013). Despite the attention that practitioners and academics from different fields have devoted recently to social media, the field of operations management has not yet studied the opportunities that social media information offers to improve operational decisions. In this paper, we illustrate the benefits this information can provide from an operations management perspective by studying whether it is possible to improve sales forecasts by incorporating publicly available social media information. One potential reason why the operational value of social media information has been underexplored so far is that researchers need to combine two types of data: internal operational data available to the firm and external social media information available on social media sites. To close this gap, we collaborated with an online apparel retailer to assemble a data set that allows us to study the value of social media information to improve sales forecasting. We begin by obtaining advertising and promotional data from the company, internal operational data e.g., daily sales for new customers who have never transacted with the company before and for repeat customers who previously purchased from the company and the company s internal sales forecast. Then we collect social media information publicly available from Facebook, such as company posts, user comments, and shares and likes data. We obtain this information from the company s official Facebook page using the Facebook public application programming interface (API). 1 We encode the user comments using natural language processing (NLP) techniques. Finally, we combine operational and social media information to evaluate the forecast accuracy improvement that can be obtained when considering publicly available social media information. To 1 Facebook is the social medium where the focal company has the biggest presence. In fact, over 90 percent of the company s customers who visit its official website from social media websites come from Facebook. Electronic copy available at:
3 Author: The Operational Value of Social Media Information 3 the best of our knowledge, ours is the first study to document the value of social media information to improve sales forecasting. We develop a machine-learning-based framework to evaluate the impact of social media information on sales forecasting accuracy. It is important to consider that constructing sales forecasts with social media information is a high-dimensional problem (i.e., the number of variables is large relative to the number of observations). The complex nature of social interaction among users of any social network leaves us with hundreds of observed variables that can be relevant from a social media information perspective. Since the underlying demand process can change rapidly over time (which is particularly true in the fashion industry), sales forecasting methods usually are trained using relatively new data that can include only a few months of the company s recent history. For this reason, it is not uncommon for companies to have only a few hundred observations of past sales used in their analysis, whereas the number of available variables coming from the social media stream usually is much larger. Machine learning models are well equipped to handle this type of high-dimensional problem. Consequently, in our analysis we implement a variety of machine learning models with different characteristics and degrees of sophistication. 2 The models we use include simple linear regression, linear regression with forward selection, lasso, support vector machines with linear kernel, support vector machines with radial kernel, gradient boosting method, and random forests. We apply each of the machine learning models to construct two different sales forecasts: (1) a baseline forecast that uses only the internal operations data and (2) a social-media-enhanced forecast that incorporates publicly available social media information in addition to the internal operations data. In each case, we use cross validation to select the model s parameters and the subset of features (when applicable). The comparison between the out-of-sample forecast accuracy (measured in a test set) achieved using these two types of forecasts allows us to quantify the value of social media information in improving sales forecasts. In addition, the apparel company maintains its own internal sales forecasts that use the operational data we have access to in addition to other information that is unknown to us. Comparing the company s internal forecasts and our baseline forecasts is informative of the potential value of using advanced statistical learning models to build forecasts. Our main contribution is to demonstrate that considering social media information improves daily sales forecast accuracy, measured by the out-of-sample mean absolute percentage error 2 The statistics and computer science communities use slightly different terminology. In statistics, these methods usually are referred to as statistical learning methods; in computer science, they usually are referred to as machine learning. We use both terms interchangeably.
4 4 Author: The Operational Value of Social Media Information (MAPE). This is true for all the methods we implement except for linear regression with no variable selection. The accuracy improvement of our social-media-enhanced forecasts is large, ranging from percent to percent across different forecast horizons, and even larger relative to the company s internal forecasts, ranging from percent to percent. The positive and significant improvement in sales forecasts is robust with respect to a number of forecasting models, prediction horizons, and testing periods and for sales forecasts focusing on sales coming from new or repeat customers. We study the contribution of each of the features included in our models to the predictive accuracy of the forecast. This allows us to observe which aspects of the social media information are most helpful in improving forecast accuracy. We find that the most important social media features when improving sales forecasting are the sentiment and informativeness of past comments, which we encode using state-of-the-art natural language processing techniques. This result illustrates the potential of using automatically encoded textual information in operational decision making. Another feature of our analysis is that since we implement multiple machine learning techniques, we can examine what type of model benefits most from including social media information in the analysis. Interestingly, the value of social media information increases with the machine learning technique s level of sophistication. In particular, we show that nonlinear models with feature selection present benefit most from incorporating social media information. Naive models, such as linear regression without variable selection, can in fact present a lower out-of-sample forecast accuracy when social media information is incorporated. We interpret this in light of the high-dimensional nature of the sales forecasting problem with social media information using the well-established bias variance trade-off framework. In order to fully unlock the potential of social media information in improving forecast accuracy, companies first have to adopt advanced forecasting techniques. Our results show companies that invest in more sophisticated forecasting techniques and incorporate social media information in their forecasting analysis can obtain significant benefits in terms of their forecast accuracy. Our results also indicate that even when companies make demand predictions based exclusively on operational information, such as past sales and advertisement plans, choosing the appropriate forecasting model can have a large impact in terms of that forecast s accuracy. Specifically, in our study, the out-of-sample MAPE of the baseline forecast improves from percent (for forward linear regression) to 7.21 percent (for random forests). The relevance of improving sales forecast accuracy cannot be overstated. This is especially true for fashion apparel retailers that on a regular basis need to deal with the tension between supply chain frictions and demand uncertainty. We show that by incorporating social media information, companies can improve their sales forecasts and apply this greater accuracy to a number of tangible
5 Author: The Operational Value of Social Media Information 5 operational benefits. For example, previous work has shown that under conventional inventory policies, improvement in sales forecasting translates to a safety stock reduction. This suggests that enriched forecasting models with social media information can result in substantial inventory savings. The rest of this paper is structured as follows: Section 2 provides a review of relevant literature. Section 3 introduces the empirical setting and describes the data. Section 4 presents our forecasting framework and describes the various statistical learning models we implemented. Section 5 demonstrates the empirical value of social media information in improving the accuracy of sales forecasts using the random forest model and discusses the relative importance of the social media features in improving forecast accuracy. Section 6 analyzes the value of social media information across different statistical learning models. Section 7 concludes the paper and points at some of the managerial implications of our analysis. 2. Literature Review Social media has become part of our lives during the last decade. As social media data has become accessible to researchers, various studies have attempted to demonstrate the value of this information in finance, marketing, and information systems. For example, Bollen et al. (2011) propose a predictive model for the stock market using microblog data and demonstrate that social media information improves forecast returns of major financial indexes, Chen and Xie (2008) study how sellers should adjust their marketing strategies dynamically to maximize their revenues based on online reviews and comments, and Luca (2011) quantifies the reviews of local restaurants posted on social media sites and points out the necessity to manage social media information. Despite this growing literature on social-media-related work from different academic areas, to our knowledge there has been no research in the operations management community that attempts to empirically assess the value of social media information. Practitioners often have expressed interest in unlocking the potential of social media information to improve their daily operations (see, for example, Supply Chain Nation 3 ), but there is no rigorous evidence regarding the impact of incorporating social media information within the forecasting process. Our work addresses these shortcomings by quantifying the operational value of social media arising from the improved ability to forecast sales when social media information is considered. While our specific focus on the operational value of social media is a novel topic in the operations management literature, there are three streams of literature in operations management that are relevant to our work: the literature on sales forecasting, the theoretical literature that studies 3
6 6 Author: The Operational Value of Social Media Information operations in the presence of social networks, and the emerging literature that uses machine learning techniques to study operational issues. Sales forecasting has a long tradition of research in operations management. The operations management literature historically has been concerned with the issue of sales forecasting and improving sales forecasts. Accurate sales forecasts have been shown to convey information of future earnings and returns to investors (Nichols and Tsay 1979, Penman 1980) and also can facilitate online and in-store inventory management for companies. For example, Gaur et al. (2007) use dispersion among experts forecasts as a measure of demand uncertainty for new products, Bassamboo et al. (2015) study how crowds can obtain better forecasts than can individuals using prediction markets, Kesavan et al. (2010) apply historical inventory and gross margin data to improve the forecasting of firm-level annual sales and show an improvement over the consensus forecasts from equity analysts, Osadchiy et al. (2013) propose a model that combines analysts forecasts and financial market returns to show that this combined measure improves forecast accuracy, and Kremer et al. (2011) study the behavioral bias of forecasting and propose an intervention to improve the forecast accuracy. And downstream information, such as point-of-sales, has been shown to add value to upstream order forecasting (see Gaur et al and Cui et al. 2014). These papers use financial market index data, accounting variables, or external operations information to improve sales forecasts. As the above studies discuss, even a small improvement in sales accuracy can lead to large impacts on stock prices and inventory cost reductions. Our work is closely related to this stream of research but focuses on the role of a novel indicator: social media information. Our results complement this stream of literature by pointing at the value of social media information, which is available publicly and can be obtained easily to help predict sales. A recent literature stream has studied various issues in the operations management space when considering the presence of social network/learning effects. Most of the work in this space is based on theoretical models. Candogan et al. (2012) study a dynamic revenue management problem when customers consumption depends on their friends consumption in a social network. Allon and Zhang (2015) study how to manage services in the presence of social networks. Jing (2011) and Papanastasiou and Savva (2014) propose dynamic pricing strategies for new products when customers strategically delay their purchases to learn more about products. Ifrach et al. (2011) and Yu et al. (2013) provide insights into dynamic pricing strategies when the strategies affect not only the revenue but also the information flow to subsequent customers and their friends. While the aforementioned literature focuses on the theoretical value of social network information in pricing decisions, we study the empirical value of publicly available social media information in forecasting. In terms of methodology, our work uses machine learning methods to study a question of operational interest. Recent papers in operations have used machine learning tools to analyze such
7 Author: The Operational Value of Social Media Information 7 problems as demand estimation and price optimization in online retail (Johnson et al. 2015), predicting emergency department waiting times (Ang et al. 2015), or developing a feature-based algorithm to solve nurse staffing problems (Rudin and Vahn 2015). We contribute to this emerging stream of work by implementing machine learning methods to construct sales forecasts that incorporate social media information. 3. Empirical Setting and Data We partnered with an online retailer that sells men s clothing. The company sells exclusively through its online website, and is listed among the Top 500 largest e-retailers in America in To study the value of social media information, we collect two data sets: internal operational data from the apparel company and publicly available social media data from Facebook Operational Data The internal operational data we obtain from the apparel company includes sales, the advertising and promotions schedule, and the company s internal sales forecasts. The sales data includes daily total sales, daily sales by new customers (who never bought from the company before), and daily sales from repeat customers (who previously purchased from the company). We obtain this data for the period between October 2011 and July The company frequently advertises its new product launches and announces promotional activities to customers. These actions are likely to affect sales and potentially correlate to social media activity since advertising can attract comments and likes on the company s Facebook page. For this reason it is crucial to collect this internal information and use it in our baseline forecasts. If we did not do this, the improvements in forecast accuracy that we obtain when incorporating social media information could in fact arise from its ability to capture information present in other internal variables that usually are available to firms through other, more immediate channels. In order to capture the value of social media information on top of the information related to marketing efforts, which is internally available to the firm, we collect detailed information about the company s marketing activities. In particular, we obtain the complete history of the company s campaigns because is the company s primary marketing and communication tool (since its main presence is online). The number of advertising and promotional activities using other channels is small and highly correlated with the campaigns. Consequently, including information extracted from the campaigns in our baseline forecasts allows us to consider the effect of marketing interventions independently from social media information. We construct three important daily dummy variables related to the company s marketing effort, specifically, whether 4
8 8 Author: The Operational Value of Social Media Information Table 1 Summary Statistics of Operations Data Category Variable Mean Median Std. Dev Max Min Total Sales Sales Data New Sales Repeated Sales Ad Marketing Data Promotion In Promotion Forecast Data Internal Sales Forecasts the company (1) sends out advertisement s, (2) sends out an announcing a promotion, 5 and (3) is running a promotion on that day. 6 The information of the firms marketing activities extracted from its ing campaigns is available to us from January 2012 to December Note that there can be other variables available to the firm but not to us that have important predictive power when constructing forecasts. It is not feasible for us to access all of the company s internal information that could be incorporated into its forecasts. However, we are able to obtain the company s internal sales forecasts, which provides a very useful benchmark. The company internally predicts daily sales to facilitate its operational decisions. We do not have details about the specific algorithm used, but based on our conversations with company officials we know the firm uses a variety of available structured and unstructured information; we learn this does not include social media information. Comparing the accuracy of the company s internal sales forecasts with the accuracy of our baseline forecasts (without the social media information), we can quantify the value of using state-of-the-art machine-learning techniques when constructing sales forecasts. Adding social media information to our baseline models allows us to quantify the value of combining social media information with machine learning techniques. The company s internal daily sales forecasts are available from January 2012 to December Table 1 presents summary statistics for the operations data (a full set of the operational features can be found in Table 8 of the Appendix). We focus on the period from January 2013 to July 2013 for our analysis since our sales data covers up to July As we will show in Section 3.2, social media usage was not stable until the end of On average, there are total transactions per day. Sales from repeat customers account for around 65 percent of the total sales. The coefficient of variation of the total sales is approximately 40 percent. The company sends out marketing s 69 percent of the days in our sample, and 20 percent of these s are related to promotions. The company is running promotions 27 percent of the days in our sample. Note that our period of analysis covers several major holidays, such as New Year s Day and Memorial 5 These s notify customers of promotions the company is running or going to run in the near future. 6 We focus on whether there was a promotion on that day because promotion depth is relatively homogenous over time (between 15 and 20 percent).
9 Author: The Operational Value of Social Media Information 9 Day, which usually have high volatility in sales. Therefore, we also include holiday dummies to capture this information Social Media Information In addition to the company s operational data, we collect detailed social media information for the company, specifically focusing on Facebook. Facebook is the main social network used by the retailer, accounting for more than 90 percent of visits to the company s website that come from social media sites. An additional advantage of using Facebook data is that unlike other social media sites, such as Twitter or Pinterest, Facebook offers a public application programming interface (API) to access the complete data set of social media activities on any company s Facebook web page. Since we want to document the value of using publicly available social media information to improve sales forecasts, it is appropriate to focus on a social media company that provides this type of information. Our partner company has an official Facebook web page that more than 300,000 users follow. Figure 1 shows an example of what a standard post by a similar company looks like. In this particular example we use a post from Gap to protect the anonymity of the company we partner with. In general, users can interact with a company s post in three ways. First, they can write remarks using the comment option under each post. Second, a like button allows users to endorse or approve the post. Third, a share button allows users to post he company s post on their own Facebook page. If a user shares, likes, or comments on a post, her friends may see the same post in their news feed. Moreover, a user s comments and likes can be seen by her friends if they visit her main Facebook page. As shown in Figure 1, there are 320 Facebook users who liked this post, 5 comments, and 14 people who shared it. Data Collection. We collect the social media information using Facebook Graph API. The Facebook API provides a public programming interface for developers to extract social activities on Facebook. Our goal is to record all social activities that users made on the company s Facebook web page. We develop a Python program to extract all posts on the company s Facebook web page through the API starting on January Since each post is identified by a universal post ID in the Facebook database, we follow each post to find all the identifiers of each comment, like, and share made by users who responded to that particular post. Finally, following the unique identifiers, we extract information related to each comment, like, and share, such as the unique users, each user s number of comments, the comment s content, and the timestamps of these events when available. In total, we have 171, 279 unique users who interacted with at least 1 of the 1, 943 company s posts. Among these interactions, there are 25, 730 comments and 266, 534 likes. Note that Facebook does not allow developers to get the timestamps of likes and shares. We focus on the comment and
10 10 Author: The Operational Value of Social Media Information Figure 1 A Facebook Post Example. post information, for which we know the timestamps, to map the social media information with the company s operations information on a specific day. Feature Extraction. Social media information can create a word-of-month (WOM) effect among people. The importance of WOM has been well established in the literature (see Arndt 1967 and Chevalier and Mayzlin 2006). The main two attributes of WOM are volume (i.e., the amount of WOM information) and valence (i.e., positive or negative sentiment). Following this stream of literature, we extract two sets of features volume and valence from our social media information. The volume features consist of the daily number of posts (representing social media activity by the company), the daily number of comments (representing the users activity), the daily number of unique users who commented, the number of comments by the company, etc. In our analysis, we also include the variance of the above measures. In addition, we extract valence features from comments. To be specific, we construct two measures of valence for a comment: its informativeness and its sentiment. Following the literature in computer science (Resnik et al. 1999), we measure the informativeness of each comment with the number of words, sentences, and unique words. The informativeness is a widely used measure that has been proven to be useful in many other prediction settings, such as forecasting stock market and brand equity (Chen et al. 2014).
11 Author: The Operational Value of Social Media Information 11 Table 2 Summary Statistics of Four Social Media Features. Category Variable Mean Median Std. Dev Max Min Volume Data No. of Comments No. of Posts Valence Data Avg Length of Comments Avg Sentiment of Comments We use state-of-the-art natural language processing techniques to obtain the sentiment of comments. 7 More precisely, we follow a seminal paper in natural language processing (Socher et al. 2012) and we use a recursive neural tensor network (RNTN) on top of the Stanford Sentiment Treebank corpus to extract a compositional vector representation of each phrase in the comment. We then use these compositional vector representations as features to classify the comments. Each comment in this case is classified into positive, neutral, or negative. We measure the daily sentiment by aggregating the sentiments across all comments in the same day. We train the RNTN classifier with 215, 154 phrases in the Stanford Sentiment Treebank dataset. 8 Table 2 summarizes relevant features in the social media data we collected (a full set of social features can be found in Table 9 in the Appendix). On average, the company has 1.11 posts per day during our study period, and each post has on average words. Each posts generates an average of 27 comments made by users responding to the posts. An average Facebook user in the US has about 350 friends, 9 indicating that these comments can affect thousands of users per day. Figure 2 presents a plot of the time series of sales, number of posts, and number of comments, where the data has been normalized for sales and comments. Visual inspection of this figure indicates there appears to be a correlation between social media information and lagged sales. Interaction between the company s Facebook page and its users grew substantially during the early stage in our data set and became stable by the end of For this reason, we focus our analysis on the period of January 2013 to July Forecasting Framework and Machine Learning Models For each model, we implement a variety of machine learning techniques and we compare the accuracy of these models with and without social media information. Section 4.1 discusses the forecasting setting and the overall framework that we use to construct, train, and test the forecasting 7 Note that simpler approaches to sentiment analysis such as bag-of-words classifiers, widely adopted in finance, accounting, and computer science (Pang and Lee 2008 and Devitt and Ahmad 2007) cannot classify our social media information to a satisfactory precision. Bag-of-words classifiers work well in longer documents by relying on a few words with strong sentiment like agony. However, unlike financial documents with at least hundreds of words and written by a relatively homogeneous set of users, comments in social media are very short and written by a heterogeneous population. In this case, the classic method does not work. In fact, past literature has shown that bag-of-words classifiers with three levels of sentiment (i.e. positive, negative and neutral) often have accuracy below 60 percent (Wang et al. 2012)
12 12 Author: The Operational Value of Social Media Information Figure 2 Daily Time series of Number of Posts, Standardized Sales and Number of Comments Jan 15-Jan 29-Jan 12-Feb 26-Feb 12-Mar 26-Mar 9-Apr 23-Apr 7-May 21-May 4-Jun 18-Jun 2-Jul 16-Jul Comments Sales Posts models. In addition, we describe the measures used to evaluate the accuracy of the sales forecasts. Section 4.2 describes the various machine learning methods that we implement Forecasting Framework Our ultimate goal is to construct daily forecasts of the retailer s sales and quantify the improvement in accuracy that can be achieved by using social media information. To do this, we develop two forecasting models: a baseline model that includes only operations features as input variables, and a richer model that includes not only operations information but also social media information. We call the former baseline forecast and the latter social media forecast. We fit the same machine learning models for the two forecasts so we have pairs of models where the only difference between them is whether or not social media information is included. Thus, the improvement of social media forecasts over the baseline forecasts captures the value of social media information. Besides these two daily forecasts, we obtain the company s internal daily forecasts. In our conversations with the company, we learn they make these forecasts by processing the operational information we have considered as well as proprietary information, both structured and unstructured. At the time they shared this data with us, however, the company did not consider social media information for their forecasts, as is the case with our baseline forecasts, and they were not using state-of-the-art machine-learning techniques to generate their forecasting models. Therefore, the company s internal forecast differs from our baseline forecast in that it (1) has access to more information and (2) does not use advanced machine learning techniques. An improvement in the daily forecast accuracy of our baseline model over company s benchmark would point to the value of utilizing advanced statistical tools in forecasting. This can be thought of as a lower bound, given
13 Author: The Operational Value of Social Media Information 13 that the company has access to more data. Figure 3 summarizes the differences in the information and tools used in the company s internal model, our baseline model, and our social media model. Next, we illustrate how we construct, train, and test the forecasting models and describe the measures we use to evaluate the sales forecasts accuracy. Figure 3 Forecasting Framework Operations Features + Social Media Features Operations Features Operations Information + Proprietary Information Machine Learning Models Company s Internal Model Social Media Forecast Baseline Forecast Company Forecast Value of Social Media Information Value of Machine Learning Models Baseline Forecast Model. In the baseline model without social media information, we assume that sales on day t are a function of the sales, promotions, and advertisements in the past week (i.e., past seven days) and the characteristics associated with that t day, S t = f baseline (S t 1,..., S t 7, A t, A t 1,..., A t 7, X t ) + ɛ t, (1) where X t are the characteristics specific to day t, such as day of the week; A t represents advertising variables, such as whether the company runs advertisement or promotions on day t; and ɛ t is the idiosyncratic demand shock. We include advertising and promotional variables because they are correlated with social media information, and these controls help us rule out potential confounding effects. Note that different machine learning models specify different functional forms, so function f( ) can take many forms, such as additive linear, nonlinear, continuous, or discrete. Social Media Forecast Model. In the models that incorporate social media information, we assume the same structure as the baseline model, with the addition of social media variables. That is, current sales are a function of past sales, promotions, advertisements, and social media information over the past seven days, and the characteristics associated with that day X t, S t = f socialmedia (S t 1,..., S t 7, A t, A t 1,..., A t 7, M t 1,..., M t 7, X t ) + ɛ t, (2)
14 14 Author: The Operational Value of Social Media Information where M t represents the social media features in day t, such as the number of comments, number of words in comments, and sentiment of average comments. Again, we use A t to control for advertisements and promotions and X t to control for characteristics specific to day t. Depending on the specification, S t can refer to total sales, sales from new customers, or sales from repeat customers. Training and Cross-Validation. To evaluate out-of-sample prediction accuracy, we divide the data into two parts: in-sample training set and out-of-sample testing set. The training set is used to train the model and the testing set is used to evaluate it. We denote the training period as {1, 2,..., T } and the out-of-sample testing period as {T + 1, T + 2,..., T + N}, where N is the number of days in the out-of-sample period. In our study, the in-sample training set has 90 days, T = 90, and the out-of-sample testing set has 45 days, N = 45. In the next paragraph, we describe what we do with our 90 days of in-sample training set. Then, in the subsection Out-of-Sample Evaluation below we discuss what we do in the 45 days of our out-of-sample testing set in order to evaluate the out-of-sample forecast accuracy. We use a cross-validation approach to choose the hyperparameters of our models (such as the number of features to include in the random forest model or the magnitude of λ in the lasso model). For each potential value of the hyperparameter, we evaluate the performance of the method using 10-fold cross-validation with three repeats (Kohavi et al and Stone 1977). The training set is randomly partitioned into 10 equal size subsets. Of the 10 subsets, a single subset is retained as the validation data for testing the model s performance and the remaining 9 subsets are used as training data. This process is repeated 10 times (the folds), with each of the 10 subsets used exactly once as the validation data. We then average the performance of the model in each of 10 subsets. We repeat the process three times to further average out errors. After doing this, we have a measure of the performance for each potential value of the hyperparameter. We choose the value of the hyperparameter that gives the best performance using grid search. Once we set the best value for the hyperparameter, we use the entire training set to estimate the parameters of the model and the features we retain. Out-of-Sample Evaluation. Let ˆf baseline [1,T ] and ˆf socialmedia [1,T ] denote the baseline and social media trained models using the data of the training period (periods 1 to T ), for which we use the information set {S 1,T, A 1,T, X 1,T, M 1,T }. When we construct forecasts for the out-of-sample testing period, we retrain (re-estimate) our model every five periods. 10 Using the trained model, we make predictions using the most recent information up to the time of the forecast. More specifically, when we forecast sales for day T + 1 and come to a point where we need to retrain the model, we use all 10 The model is retrained every five periods because higher training frequency means a higher computing burden. This rolling updating mechanism in the out-of-sample approach is widely adopted in practice as well as in past studies (see Ang et al. 2015).
15 Author: The Operational Value of Social Media Information 15 past information up to day T, plus the advertising and promotional information on day T + 1, 11 to retrain the model. Based on the updated model, we use the past seven days information and future advertisement plans as input to our model to obtain the forecast. Therefore, to predict the sale at day T + 1, we use the model that is estimated using data from day 1 to day T, i.e., and ˆf baseline [1,T ] ˆf socialmedia [1,T ]. When we predict the sale at day T + 2, we use the model estimated at day T + 1 and fit the model with updated variables by incorporating the data on day T + 1. We measure the forecast accuracy using the mean absolute percentage error (MAPE) (see Gaur et al and Kesavan et al for examples of papers that use these measurements). We use this measure to test whether the model that includes social media information has a smaller out-ofsample error. In addition, we test whether the differences in forecasting accuracy are statistically significant by performing a t-test of the differences in MAPE, MAPE = 1 N N S t+i Ŝt+i 1,t+i /S t+i. i=1 Note that while using more information would necessarily result in a smaller or equal error for an in-sample measure of error, this is not the case when we use an out-of-sample measure of error. If the additional data contains irrelevant information, including more information would result in a worse out-of-sample performance due to over-fitting. L-Day-Ahead Forecasts. In the discussion above, we assume we are constructing sales forecasts for the following day. In practice, companies need lead time to adjust their operational decisions using the forecasts and often want to forecast farther ahead. Consequently, it is informative to estimate sales forecasts beyond the next day. Let Ŝbaseline t,t+l (Ŝsocialmedia t,t+l ) denote the forecast without (with) social media information made in day t for the sales in day t + L. We refer to Ŝbaseline t,t+l and Ŝsocialmedia t,t+l as L-day-ahead predictions. We obtain the lead-time prediction following the same procedure illustrated above with a different estimation relation. That is, we directly estimate how all historical information up to t, together with advertisement and promotion plans, impacts the sale on day t + L, where Ŝbaseline t,t+l S t+l = f baseline L (S t 1,..., S t 7, A t+l,..., A t 7, X t+l ) + ɛ t, S t+l = f socialmedia L (S t 1,..., S t 7, A t+l,..., A t 7, M t 1,..., M t 7, X t+l ) + ɛ t, is generated using the estimated model baseline ˆf L,[1,T ], and Ŝsocialmedia t,t+l is generated using socialmedia the estimated model ˆf L,[1,T ]. We verify whether improvements in the accuracy of next-day forecasts that we achieve when we incorporate social media are robust to using longer lead times, ranging from one to seven days. 11 Since adverting schedules and promotional events are planned in advance, the future information is available on day T.
16 16 Author: The Operational Value of Social Media Information Linear Models Table 3 Machine Learning Models. Without Variable Selection With Variable Selection Linear regression Lasso regression Forward selection Nonlinear Models SVM with linear kernel SVM with radial kernel Gradient boosting model (GBM) Random forests 4.2. Machine Learning Models The approach we describe in Section 4.1 is generic and can be used with multiple forecasting models. In this section, we present the machine learning models considered in our study. We use machine learning models because of the high dimensionality of the problem. Our training data set has 90 days, whereas our social media model needs to estimate more than 280 parameters (40 features per day times the past seven days plus today s characteristics). The number of independent variables is much larger than the sample size, leading to a high-dimensional challenge. Simple linear models, such as moving-average and autoregressive models, cannot produce an identifiable estimate. Advanced variable selection and classification methodologies, such as machine learning models, can solve such problems. Another critical reason to apply machine learning tools is that social media information, advertising, promotions, or even past sales may not affect current sales in a linear fashion. Advanced machine learning tools, like random forests, offer a much more flexible structure for the analysis. We implement seven widely adopted statistical models. We classify them along two dimensions: whether the model is linear or nonlinear and whether or not there is a variable selection procedure to drop insignificant variables. Table 3 summarizes the machine learning models that we considered (for details see Friedman et al. 2001). Next, we briefly describe each model. Linear Regression. The simple linear regression model is the simplest model we use. It imposes a strict linear structure and does not incorporate variable selection. It uses all variables, even the nonsignificant ones, to predict the future. The simple linear regression model captures the autoregressive nature of the sales process. The forecast with social media incorporates the lagged comments and posts information in a linear fashion. We expect the performance of this model to be the worst among all the models we study. Thus, we treat it as a benchmark model. Linear Regression with Forward Selection. Forward selection regression is built on linear regressions, with an additional step to select only important variables in the prediction process. baseline When estimating models ˆf and ˆf socialmedia, we obtain the best estimator by adding the variables that most improve the model, as measured by the Bayesian information criterion (BIC).
17 Author: The Operational Value of Social Media Information 17 This maximizes the likelihood function while penalizing the number of parameters included in the model. Lasso Regression. The least absolute shrinkage and selection operator (lasso) model, proposed by Tibshirani (1996), is another type of variable selection regression. It selects variables by penalizing the absolute size of the regression coefficients so that some of the weak parameter estimates are shrunk towards zero, effectively dropping them. To be specific, for a linear regression y = β 0 + β j jx j, the objective is to minimize ( i y i β 0 + ) 2 β j jx i j + λ β j j, where λ is the hyper-parameter that determines the magnitude the estimates are penalized. The value of λ is estimated in the cross-validation step. Support Vector Machines (SVM): Linear Kernel and Nonlinear Radial Kernel. This model is used for a variety of classification and regression analyses, such as solving pattern-recognition problems. It maps data into a higher dimensional input space and constructs an optimal separating hyperplane (see Cortes and Vapnik 1995). The kernel is a similar measure of the data points. In our study, we use both a linear kernel and a nonlinear radial kernel. Support vector machines are popular because of their effectiveness in high-dimensional spaces. The downside is that they do not provide probability estimates and can be painful to train when the data scale is large. Gradient Boosting Model (GBM). Friedman (2001) developed this model. The idea of boosting models is to start with several simple models and combine them in an adaptive way that leads to a stronger predictor. The method starts with a set of classifiers h 1,..., h k like possible decision trees (called boosting with trees) or regression models (called statistical boosting based on additive logistic regression). Then it creates a classifier that combines those functions f(x) = sgn( i a ih i (x)) that can minimize the error. The advantage of a boosting algorithm is a theoretically guaranteed good performance for any reasonable weak learner; plus, it is fast to run. The disadvantage is that the best choice of weak learner is not obvious. In our study, we choose the most widely used boosting model: boosting with trees. Random Forest (RF). Before discussing random forest, we first introduce the basic component regression trees. Regression trees use a tree-like structure to map observations of an item to conclusions about the items target value. Figure 4 (a) provides a specific example. In this example, the root starts with sales on day t 1. Depending on whether it is less or more than 500, we move on to different subgroups of features. If the last days sale is on the left branch, the tree reaches one end leaf with the current days sale being 500. Otherwise, we have to further split the subgroup based on the node of sentiment on day t 1. An actual tree can be deeper: the subgroups are split until the groups are sufficiently homogeneous or pure i.e., a single class of features predominates a group. The purity is measured using a Gini index, which we will apply to inspect the importance of parameters in Section 5.2.
18 18 Author: The Operational Value of Social Media Information Figure 4 Illustration of Regression Tree and Random Forecast. (a) Regression tree (b) Random forest Tree 1 Tree 2 Tree N Sales at t-1 < Sales at t = 500 Sentiment at t-1 < Sales at t = 400 Sales at t = 600 Voting The random forest model first uses cross-validation to determine the best number of variables, m, to include in a tree. It then draws bootstrap subsamples from the training data set and grows a regression tree on each of those bootstraps. This is done by bootstrapping variables at each split randomly selecting m variables at each node and picking the best one as a split-point, which generates a diverse set of trees. In this way, we grow a vast number of trees and average those trees in order to yield a prediction. Figure 4 (b) gives an example of how it works. Each tree is based on a random bootstrap sample. Given a new observation, we run this observation through tree 1 and get a particular prediction outcome. The same observation is run through tree 2, which directs a different path, leading to a different prediction. We continue this process until tree N, where N=500 in our study. We then average those predictions together as the final forecast. Random forest s biggest advantage is that it is capable of generating highly accurate results, making it one of the most popular statistical learning methods in practice. For details regarding random forests, see Breiman (2001). 5. The Value of Social Media Information We implemented seven machine learning models, described above, to predict sales with and without social media information. Now we can compare the performance between pairs of forecasts with and without social media information to evaluate the value of social media information. Table 4 presents the out-of-sample MAPE for each of the machine learning models. Among the studied statistical learning models, random forest performs the best in terms of its forecast accuracy for both the baseline and social media models. In this section, we focus on that particular model to discuss the value of social media information and study the robustness of the results with respect to the forecast horizon, the training time, the testing periods, and the forecasting variable (i.e., new and repeat sales).
19 Author: The Operational Value of Social Media Information 19 Table 4 Comparison of Out-of-Sample Error for Statistical Learning Models RF GBM SVM SVM radial linear Lasso Forward Linear Social Media Forecast Baseline Forecast Forecasting Result of the Random Forest Model As discussed in Section 4, we conduct the analysis by fitting a random forest algorithm to both our baseline model and the social media model to generate the baseline and social media forecasts respectively for total sales. We first present our results when we implement the analysis to forecast total sales one day ahead to seven days ahead. The daily-level out-of-sample forecast accuracy and accuracy improvements for total sales are presented in both Figure 5 and Figure 6. Figure 5 portrays how the MAPE of the two forecasts we develop changes with respect to the forecast lead time. Ideally, we would like to compare our L-day-ahead forecast with the baseline and social media models to the company s corresponding L-day-ahead internal forecast. But such variety of forecasts does not exist. We do not have variation in the number of days ahead with which the company produces its internal forecast. We include the MAPE of the company s forecasts simply as a reference benchmark. 12 Figure 5 MAPE (%) Over Prediction Horizon MAPE (%) Social Media Baseline Company Prediction Horizon (Days) In Figure 5 we can observe that the forecast error increases as the forecast lead time increases for both the baseline and social media forecasts. This is what we would expect, since future sales are 12 The company generates daily forecasts by making a weekly forecast in the week before and then deconstructing the weekly forecast into daily forecasts. Thus, we can think of the company s forecasts as having been produced with an average leadtime of about L=4.
20 20 Author: The Operational Value of Social Media Information Figure 6 Relative Forecast Improvement Over Prediction Horizon 60 Relative Improvement (%) Social Media Value 20.52** * * 16.38** 23.23*** 16.87** 15.98** Stat. Value 39.77** 33.67** 30.41** 31.16** 24.48** 32.66** 28.91** Total Value 52.13*** 44.86** 39.35** 42.44** 42.02** 44.03** 40.27** Prediction Horizon (Days) Note: * p < 0.1; **p < 0.05; ***p < less likely to depend on historical observations and it becomes harder to resolve future uncertainty the farther ahead we make forecasts. When the lead time is five days, the forecast with social media information generates a MAPE of 6.94 percent, whereas the baseline forecast gives a higher MAPE of 9.04 percent, with a relative forecast accuracy improvement of percent calculated as: Baseline MAP E Social Media MAP E Baseline MAP E The relative improvements of using statistical learning tools and incorporating social media information are summarized in Figure 6. These improvements are statistically significant for MAPE and MSE metrics. This result is consistent across different forecast lead times; the models that use social media information have better out-of-sample accuracy across all different lead times. Comparing the difference between the baseline and company forecasts, we obtain an estimate of the value of applying statistical learning tools. The company may use proprietary information we do not observe to aid prediction, so our statistical value evaluated here is likely to be underestimated. The baseline forecasts are all more accurate than company s internal forecasts, with a relative MAPE improvement ranging from percent to percent, calculated as: Company s MAP E Baseline MAP E Company s MAP E If we compare the performance of the social media and company forecasts, we can evaluate the combined benefit of using more advanced methodology and social media information to predict sales. The relative MAPE improvement ranges from percent to percent. This result
21 Author: The Operational Value of Social Media Information 21 further strengthens and completes our key message: social media information, which is publicly available and easy to obtain, adds value in predicting sales. As we discuss in the next section, the value is greater if proper statistical forecasting methods are utilized to capture the correlation between past operations and social media features to current sales. Robustness with Respect to New and Repeat Sales. We conduct the same analysis for sales coming from repeat and new customers and present our results in Table 5. We observe a result similar to the one presented above; there is significant value of social media information across all the different lead times, with a MAPE improvement ranging from to percent for repeat customers and from to for new customers. Table 5 Forecast Accuracy Improvements for New and Repeat Customers. Repeat Customers New Customers Forecast Lead Time Social Relative Social Relative Baseline Baseline Media Improvement (%) Media Improvement (%) L = *** *** L = ** *** L = ** * L = ** * L = ** ** L = *** ** L = ** ** Note. * p < 0.1; **p < 0.05; ***p < Robustness across Different Time Periods. The analysis obtained so far is based on the training set from January 1 to March 31 of 2013 and the testing set from April 1 to May 16 of Since the company runs promotions more frequently in June, demand volatility is higher. To validate the robustness of our findings across different times during the selling seasons, we vary the starting time of the training and testing and repeat our analysis. We display the one-day, three-day, five-day, and seven-day-ahead forecasts in Table 6, where each row represents a different starting point. Social media information consistently leads to a forecast improvement when there are frequent promotional activities. This is because promotions drive more spikes and slumps in demand, resulting in a larger sales uncertainty and more room for potential forecast improvements. Table 6 indicates the forecast accuracy improvement of social media forecasts over the baseline forecast, representing the value of social media. On average, the improvements are positive and significant, proving the robustness of our results regardless of the volatility in sales. To summarize, we have shown that social media information brings statistically significant improvements to sales forecasting. Our result is robust across forecast horizons, sales types, and time periods.
22 22 Author: The Operational Value of Social Media Information Table 6 Sales Forecast Accuracy of Models with Different Starting Date (all dates are in 2013). Forecast Lead Time L = 1 L = 3 Social Relative Social Baseline Media Improvement (%) Media Baseline Relative Improvement (%) Training End Testing End *** * * *** *** ** *** ** *** Forecast Lead Time L = 5 L = 7 Social Media Relative Improvement (%) Social Media Relative Improvement (%) Training End Testing End Baseline Baseline *** ** * ** * *** *** *** *** Note. * p < 0.1; **p < 0.05; ***p < Feature Inspection We next explore which features are important in the random forest model. The random forest model naturally selects important features from the unimportant ones by placing the important features higher in the regression tree. For each tree, the importance of a feature is defined based on a metric called the Gini impurity index. Based on the value of a feature in a tree, the tree will split into different subsets of other features or the outcome. The Gini impurity of a feature split in the regression tree measures how often an element from the set before the split would be misclassified if it is classified randomly according to the distribution of labels in the whole set. The mathematical definition of a feature s Gini index is I = G parent i G split i with G = n j p i(1 p i ), where n is the number of items in the data, and the impurity of a node adds up to the impurity of its parent and splits. Therefore, a natural way to measure the importance of a feature in random forests, denoted as Gini importance, is the weighted average Gini index of that feature across all regression trees. 13 In Figure 7, we rank the Gini importance of basic operational features, volume (i.e., non-textual) social media features and valence (i.e., textual) social media features for the one-day-ahead random forest model (the detailed index values are in Table 10 in the Appendix). The top five relevant variables are last day s sale, number of comments six days ago, whether there is a promotion tomorrow, number of unique users commenting six days ago, and sales seven days ago. The result is intuitive; historical sales predict the trend of future sales, promotional activities capture the potential spike, and, last but not least, social interactions on the company s Facebook page are 13 This metric of Gini importance is widely used in the random forest literature in assessing feature importance (Breiman 2001).
23 Author: The Operational Value of Social Media Information 23 Figure 7 Top 20 Features in Random Forest with the Highest Gini Index 900, , ,000 Feature Type Base Non Textual Textual 600,000 Gini Index 500, , , , , an important indicator of the firm s future sales. We can observe that among the top 20 features, 12 of them are social media features, further confirming the importance of their impact on sales. Moreover, there are more valence than volume features among the top 20 features. This argues for the importance of using natural language processing techniques to extract valence features from the observed social media information. We also check the most relevant variables for both new and repeat sales. They are quite similar, with only one difference: promotion does not have a high Gini index when predicting sales from new customers, whereas it is important in forecasting sales from repeat customers. This is because the major channel the company uses for promotions is , sent to customers who subscribe to it. New customers do not receive promotional notifications from the company and therefore their purchases do not depend on promotions information. This side finding further confirms the validity of our statistical learning model the selected variables are indeed relevant. 6. Comparison across Statistical Learning Models In Section 5, we explicitly showed the forecasting outcomes for the random forest model. We now compare across all seven statistical learning models, and provide guidance on which are best for the problem we are studying Bias-Variance Trade-off In order to better compare the performance across different models, we refer to the well-established bias-variance trade-off framework in machine learning and statistics literature (Friedman et al. 2001). Let us assume the data generation process of future sales is Y = f(x) + ɛ, where X is the set of observed features and ɛ is a random variable with mean 0 and variance σɛ 2. A forecasting Feature
24 24 Author: The Operational Value of Social Media Information model based on a series of past observations q Q can be expressed as ˆf q ( ) : X R. Given any input x 0 X, the expected mean squared error for estimating f(x 0 ) is MSE(x 0 ) = E[(Y ˆf q (x 0 ) 2 X = x 0 )] = ɛ 2 σ + E[ ˆf q (x 0 ) f(x 0 )] 2 + E[( ˆf q (x 0 ) E Q [ ˆf q (x 9 )]) 2 ] = Irreducible Error + Bias 2 + Variance. The above equation tells us that in order to minimize a model s prediction error, we need to minimize the model s bias and variance. Bias represents the error introduced by approximating an unknown data generation process f( ) with the best fit ˆf q ( ) based on a model and a series of training data. For example, if the underlying data generation process is nonlinear, a model that uses linear approximation would have very large biases. In general, less flexible models have higher biases. On the other hand, variance refers to the amount by which ˆf q ( ) changes in regard to a different training set q. It represents the error introduced by estimating a model based on a realized finite training set. If a model varies greatly with the training data, it has high variance and in turn will not perform well in prediction. In general, more flexible models have higher variances. A lower variance in a model usually leads to a higher bias, and vice versa. For instance, random guessing the future sales equal to 500 is a model that has 0 variance since it does not depend on the past training data; however, it has high bias (i.e., the bias is equal to the mean squared distance of the realized sales toward 500). Similarly, if a model is degenerate in training data (i.e., ˆf q (x 0 ) = y 0 (y 0, x 0 ) q), the model has 0 bias for any set of training data. However, the model has high variance since a small perturbation in the training data can result in a large variation in the estimated model. To analyze the model performance empirically, researchers often use in-sample and out-of-sample errors as proxies for biases and variances (Friedman et al and James et al. 2013). Models, based on their errors, are classified into three categories: (1) under-fitting: in-sample and out-ofsample errors are both high, representing the model has a large bias; (2) over-fitting: in-sample error is low while out-of-sample error is large, representing the model has a large variance; and (3) good fitting: out-of-sample error is only slightly higher than in-sample error and both of them are low. Table 7 presents the in-sample and out-of-sample error comparison across different machine learning models. Random forest is the model that best fits both in-sample and out-of-sample 14 and linear regression has the most serious over-fitting problem. 14 Note that for some models, the in-sample error is larger than the out-of-sample error. This is because the loss function used to train the models is RMSE instead of MAPE.
25 Author: The Operational Value of Social Media Information 25 Table 7 Comparison of In-Sample and Out-of-Sample Error for Statistical Learning Models. Out-of-Sample MAPE (%) In-Sample MAPE (%) Model Social Media Baseline Social Media Baseline In-sample Out-of-Sample Error Out-of-Sample Error In-sample Error Error RF GBM SVM radial SVM linear Lasso Forward Linear Forecasting Result of Other Statistical Learning Models Figure 8 plots the out-of-sample MAPE performance for the baseline and social media forecasts across the seven machine learning models discussed in Section 4. For each model, the lighter line represents the base forecast and darker line represents the social media forecast. The detailed statistics on the forecast improvement are presented in Figure 9. Figure 8 MAPE (%) for Different Statistical Learning Models Prediction Horizon= MAPE (%) RF GBM SVM Radial SVM Linear Lasso Forward Linear Social Media Baseline Company Statistical Learning Models Figure 8 shows that the social media forecast outperforms the baseline forecast for all statistical models except linear regression, which further validates our main finding. That is, social media information is shown to be valuable as long as it is incorporated in a forecasting framework that is appropriate to handle this information. The random forest model beats all other statistical learning tools in forecasting sales, which is consistent with its reputation of high accuracy. This is the model we would recommend companies use in their forecasts, particularly if they consider incorporating social media information. Although its estimation time might be on the longer side, most companies
26 26 Author: The Operational Value of Social Media Information Figure 9 Relative Forecast Improvement for Different Statistical Learning Models Prediction Horizon=1 Relative Improvement (%) RF GBM SVM Radial SVM Linear Lasso Forward Linear Social Media Value 20.52** * * ** * *** Stat. Value 39.77** * * *** *** *** Total Value 52.13*** ** * ** *** *** Note: * p < 0.1; **p < 0.05; ***p < Statistical Learning Models do not need to update (retrain) the model every period. Instead, they could retrain the model every several periods, which also provides additional robustness to abnormal input data. Finally, we have two main observations about the model comparison that speak to some regularities of the best-performing models: Linear vs. Nonlinear Models. All the nonlinear models i.e., support vector machine models, gradient boosting models, and random forest outperform the linear models i.e., linear regressions, forward selection, and lasso model for both the baseline forecast without social media and the social media forecast. The linear regression model, without variable selection, performs worse when social media information is included. The main reason for the poor performance of the linear models is that, as in our case, forecasting problems with social media information are high-dimensional problems. It is well known that linear models do not approximate the underlying data generation process very well in high-dimensional problems since linear models tend to over-fit the data. Indeed, Table 7 shows that the in-sample error of the linear regression model is half of its out-of-sample error, while the in-sample and outof-sample errors of the random forest model are similar. This evidence of over-fitting indicates that social media information and some operations information, such us promotions, may influence current sales in a nonlinear fashion. Nonlinear models allow for a more flexible structure, are better able to capture the relationship between those variables, and generate more accurate out-of-sample sales predictions. Models With and Without Feature Selection. Figure 8 shows that linear regression models have a negative value for social media information, while other linear models with feature selection
27 Author: The Operational Value of Social Media Information 27 present positive values. This is not surprising since many social media features included in our models are weakly correlated, if not uncorrelated, with future sales. However, as a model designer, it is difficult to know a priori which features are more important in predicting future sales. Therefore, a model should actively select the important social media features that make the model a better approximation of the underlying data-generating process and thus less likely to over-fit. Table 7 shows that linear models with feature selection, e.g., lasso and forward selection models, have in-sample and out-of-sample error in the same range, while the linear regression model without feature selection is over-fitted. We conclude that the most effective models to use when incorporating social media information in sales forecasting are methods based on nonlinear models with feature selection. 7. Discussion and Conclusions We study the value of using social media information to improve a firms sales forecasts. We show how incorporating social media information into sales forecasting results in statistically significant improvements. These results hold in an out-of-sample forecasts test for both the full data set and the different subsamples of the data, such as new customers and repeat customers, and over non- promotional periods and promotional periods. Our results are robust to using different statistical models, including random forest, gradient boosting, support vector machine, lasso, and forward selection. We offer several concluding remarks regarding the managerial implications of these results Operational Value of Social Media Information A good forecast can be translated into a reduced safety inventory cost, smoother scheduling of staff, and more consistent delivery and production planning. For example, using the generalized demand model Martingale Model of Forecast Evolution (MMFE) and the general order-up-to inventory policy (proposed by Graves et al. 1986, Graves 1986 and Heath and Jackson 1994, and developed by Chen and Lee 2009), it is possible to show that, because the demand structure and the inventory policy are linear in the standard deviation of forecast error, a reduction in the forecast error can translate linearly into a reduction in safety stock cost. While these results are based on very specific modeling assumptions, they suggest that social media information can bring tangible benefits to a retailer in terms of reducing inventory costs. Note that it is not just the retailer who can benefit from social media. Social media can also reduce the demand uncertainty faced by the manufacturer and the different parties involved in the supply chain. Our focal company faces end customers directly, and thus the improved demand forecast directly leads to an inventory improvement at the company s central warehouse. In a decentralized supply chain, where the supplier sells products to retailers and the retailer faces end
28 28 Author: The Operational Value of Social Media Information customers, the improved downstream demand can also help the upstream supplier manage their inventory system Short-term Forecasting as a Managing Tool Some business situations will not benefit from a more accurate forecast with such a short horizon. Retailers frequently need to place their orders several months before customers can see the products on the shelf for the first time. However, there are a number of situations where a more accurate, short-term forecast will be highly beneficial. Accurate short-term forecasts can be extremely valuable for retail fast fashion companies, such as H&M, Zara, or Peacocks. These companies place heavy emphasis on quick reaction to demand changes and have developed a supply chain that is ready to deliver products to their stores twice a week. The value of a better sales forecast to these companies can be substantial. In addition, most of these types of retailers offer the option to buy online. The retailer can start collecting information, in terms of demand and social media content before the products actually arrive at the stores, making the short-horizon forecast even more meaningful. 15 Not every retailer has a supply chain with a lead time of less than two months from the manufacturing facility to the stores. However, understanding the demand trends can be valuable even when the company cannot influence its orders, since it can always work with the demand side of the business. A more accurate short-term forecast can help the company react to customers information and adjust, for example, its promotional or communications efforts to better match its inventory position with actual demand. Finally, improving accuracy in short-term forecasting can be extremely valuable in a dynamic pricing environment like today s online retailing. One of the key components of a successful pricing strategy is understanding customer demand. While it might be true that a company cannot do much about its inventory once it has learned about demand, it can always put in place a more effective pricing strategy. When thinking about dynamic pricing in online retailing, a two-week horizon is actually large, especially when we consider that there are reports presenting evidence of online retailers changing prices multiple times a day. The fact that our partner company actually had and used a daily sales forecast further emphasizes its relevance for their business context Conclusion Dealing with social media data comes with challenges. An important challenge for both practitioners and academics is that a significant amount of social media information has an unstructured, textual nature, which is not the type of data that academics and practitioners traditionally have 15 For related settings, see Moe and Fader (2002), Huang and Van Mieghem (2013), and Huang and Van Mieghem (2014).
29 Author: The Operational Value of Social Media Information 29 dealt with. For example, in our case, we used natural language processing techniques to encode the sentiment of comments users make in social media. However, once these initial challenges are overcome, there is a huge potential to use this data in novel ways. This new data can make firms more efficient while providing a better experience to their customers. For example, in a novel application of social media information, Nordstrom has started to feature in its brick-and-mortar stores the products that are receiving the most pins in Pinterest by users from nearby locations. 16 It is only a matter of time until more firms start to use this type of information to routinely improve their assortment decisions and enhance their customers experience. With the development of these novel applications, we expect the operations management community to pay increasing attention the value of social media information. We hope our project inspires others to conduct further studies using structured and unstructured data from social media information. References Allon, Gad, Dennis J Zhang Managing service systems in the presence of social networks. Available at SSRN Ang, Erjie, Sara Kwasnick, Mohsen Bayati, Erica L Plambeck, Michael Aratow Accurate emergency department wait time prediction. Manufacturing & Service Operations Management. Aral, Sinan Commentary-identifying social influence: A comment on opinion leadership and social contagion in new product diffusion. Marketing Science 30(2) Arndt, Johan Role of product-related conversations in the diffusion of a new product. Journal of marketing Research Bassamboo, Achal, Ruomeng Cui, Antonio Moreno The wisdom of crowds in operations: Forecasting using prediction markets. Available at SSRN Bollen, Johan, Huina Mao, Xiaojun Zeng Twitter mood predicts the stock market. Journal of Computational Science 2(1) 1 8. Breiman, Leo Random forests. Machine learning 45(1) Candogan, Ozan, Kostas Bimpikis, Asuman Ozdaglar Optimal pricing in networks with externalities. Operations Research 60(4) Chen, Hailiang, Prabuddha De, Yu Jeffrey Hu, Byoung-Hyoun Hwang Wisdom of crowds: The value of stock opinions transmitted through social media. Review of Financial Studies 27(5)
30 30 Author: The Operational Value of Social Media Information Chen, Li, Hau L Lee Information sharing and order variability control under a generalized demand model. Management Science 55(5) Chen, Yubo, Jinhong Xie Online consumer review: Word-of-mouth as a new element of marketing communication mix. Management Science 54(3) Chevalier, Judith A, Dina Mayzlin The effect of word of mouth on sales: Online book reviews. Journal of marketing research 43(3) Cortes, Corinna, Vladimir Vapnik Support-vector networks. Machine learning 20(3) Cui, Ruomeng, Gad Allon, Achal Bassamboo, Jan A Van Mieghem Information sharing in supply chains: An empirical and theoretical valuation Working paper, Northwestern University. Devitt, Ann, Khurshid Ahmad Sentiment polarity identification in financial news: A cohesion-based approach. ACL. Friedman, Jerome, Trevor Hastie, Robert Tibshirani The elements of statistical learning, vol. 1. Springer series in statistics Springer, Berlin. Friedman, Jerome H Greedy function approximation: a gradient boosting machine. Annals of statistics Gaur, Vishal, Avi Giloni, Sridhar Seshadri Information sharing in a supply chain under arma demand. Management Science 51(6) Gaur, Vishal, Saravanan Kesavan, Ananth Raman, Marshall L Fisher Estimating demand uncertainty using judgmental forecasts. Manufacturing & Service Operations Management 9(4) Goh, Khim-Yong, Cheng-Suang Heng, Zhijie Lin Social media brand community and consumer behavior: Quantifying the relative impact of user-and marketer-generated content. Information Systems Research 24(1) Graves, Stephen C A tactical planning model for a job shop. Operations Research 34(4) Graves, Stephen C, Harlan C Meal, Sriram Dasu, Y Qiu, S Axsäter, C Schneeweiss, E Silver Two-stage production planning in a dynamic environment. Multi-stage production planning and inventory control Hameed, B Social media usage exploding amongst fortune 500 companies. (as of 11/24/2013). Heath, David C, Peter L Jackson Modeling the evolution of demand forecasts ith application to safety stock analysis in production/distribution systems. IIE transactions 26(3) Huang, Tingliang, Jan A Van Mieghem The promise of strategic customer behavior: On the value of click tracking. Production and Operations Management 22(3)
31 Author: The Operational Value of Social Media Information 31 Huang, Tingliang, Jan A Van Mieghem Clickstream data and inventory management: Model and empirical analysis. Production and Operations Management 23(3) Ifrach, Bar, Costis Maglaras, Marco Scarsini Monopoly pricing in the presence of social learning. James, Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani An introduction to statistical learning. Springer. Jing, Bing Social learning and dynamic pricing of durable goods. Marketing Science 30(5) Johnson, K., A. B. H. Lee, D. Simchi-Levi Analytics for an online retailer: Demand forecasting and price optimization. Manufacturing and Service Operations Management forthcoming. Kesavan, Saravanan, Vishal Gaur, Ananth Raman Do inventory and gross margin data improve sales forecasts for us public retailers? Management Science 56(9) Kohavi, Ron, et al A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai, vol Kremer, Mirko, Brent Moritz, Enno Siemsen Demand forecasting behavior: System neglect and change detection. Management Science 57(10) Luca, Michael Reviews, reputation, and revenue: The case of yelp. com. Tech. rep., Harvard Business School. Moe, Wendy W, Peter S Fader Fast-track: Article using advance purchase orders to forecast new product sales. Marketing Science 21(3) NG, S Henkel ceo: exploiting online media to sell soap. Boss Talk (as of 11/24/2013). Nichols, Donald R, Jeffrey J Tsay Security price reactions to long-range executive earnings forecasts. Journal of Accounting Research 17(1) Osadchiy, Nikolay, Vishal Gaur, Sridhar Seshadri Sales forecasting with financial indicators and experts input. Production and Operations Management. Pang, Bo, Lillian Lee Opinion mining and sentiment analysis. Foundations and trends in information retrieval 2(1-2) Papanastasiou, Yiangos, Nicos Savva Dynamic pricing in the presence of social learning and strategic consumers. History. Penman, Stephen H An empirical investigation of the voluntary disclosure of corporate earnings forecasts. Journal of Accounting Research 18(1) Resnik, Philip, et al Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res.(JAIR)
32 32 Author: The Operational Value of Social Media Information Rudin, Cynthia, Gah-Yi Vahn The big data newsvendor: Practical insights from machine learning. Available at SSRN Socher, Richard, Brody Huval, Christopher D Manning, Andrew Y Ng Semantic compositionality through recursive matrix-vector spaces. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Stone, Mervyn An asymptotic equivalence of choice of model by cross-validation and akaike s criterion. Journal of the Royal Statistical Society. Series B (Methodological) Tibshirani, Robert Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) Wang, Hao, Dogan Can, Abe Kazemzadeh, François Bar, Shrikanth Narayanan A system for real-time twitter sentiment analysis of 2012 us presidential election cycle. Proceedings of the ACL 2012 System Demonstrations. Association for Computational Linguistics, Yu, Man, Laurens Debo, Roman Kapuscinski Strategic waiting for consumer-generated quality information: Dynamic pricing of new experience goods. Chicago Booth Research Paper (13-78). Appendices A. Feature Set Tables 8 and 9 demonstrate all basic and social features respectively used in our model. We have 40 features per day in total, of which 28 are social features. Feature Name Sales Ad Promotion PromotionLast WeekdayMonday WeekdayTuesday WeekdayWednesday WeekdayThursday WeekdayFriday WeekdaySaturday WeekdaySunday Holiday Table 8 All Daily Basic Features Explanation Daily number of sales A dummy variable equal to 1 if there is an advertisement A dummy variable equal to 1 if there is a promotion A dummy variable equal to 1 if there is a promotion A dummy variable equal to 1 if it s Monday A dummy variable equal to 1 if it s Tuesday A dummy variable equal to 1 if it s Wednesday A dummy variable equal to 1 if it s Thursday A dummy variable equal to 1 if it s Friday A dummy variable equal to 1 if it s Saturday A dummy variable equal to 1 if it s Sunday A dummy variable equal to 1 if it s a national holiday
33 Author: The Operational Value of Social Media Information 33 Table 9 All Daily Social Features Category Feature Name Explanation c comments count Number of comments from the company c unique comments posts Number of unique posts that the company comments on u comments count Number of comments from users u unique comments posts Number of unique posts that users comment on unique comments users Number of unique users who commented posts counts Number of unique posts Volumne photo posts Number of unique posts with photos status posts Number of unique status posts offer posts Number of unique posts with offers note posts Number of unique posts that are notes event posts Number of unique posts with events link posts Number of unique posts with links video posts Number of unique posts with videos c avg num words Average number of words of the company s comments c avg comments size Average byte size of the company s comments c std num words Std. deviation of words per comments from the company c std comments size Std. deviation of byte size of the company s comments u avg num words Average number of words per comments for users u avg comments size Average byte size of users comments Valence u std num words Std. deviation of words per comments from users u std comments size Std. deviation of byte size of users comments avg post num words Average number of words in the post avg post size Average byte size of posts std post num words Std. deviation of byte size of posts c avg sentiment Average sentiment of the company s comments c std sentiment Std. deviation of sentiments of the company s comments u avg sentiment Average sentiment of users comments u std sentiment Std. deviation of sentiments of users comments B. Important Features Table 10 shows the top 20 features in the one-day-ahead random forest model with largest Gini importance. As expected, past sales (i.e., saleslags) are very important. Moreover, when the promotion is sent (i.e., promotion) and whether there is an promotion (i.e., promotionlast) are important. Last, 13 of the top 20 features are social-media-related features. Table 10 Top 20 Features in Random Forest with the Highest Gini Index Rank Feature Name Gini Index Rank Feature Name Gini Index 1 saleslag1 889, saleslag6 163,499 2 comments countlag6 375, unique commentslag7 157,133 3 promotion 364, post wordslag2 123,300 4 unique commentslag6 354, num wordslag1 103,422 5 saleslag7 310, comments countlag7 98,226 6 post wordslag3 307, avg sentimentlag1 86,421 7 promotionlast 235, b num wordslag2 83,326 8 post wordslag5 234, saleslag5 83,021 9 saleslag2 232, b num wordslag5 82, saleslag3 172, comments countlag4 73,427
The Operational Value of Social Media Information. Social Media and Customer Interaction
The Operational Value of Social Media Information Dennis J. Zhang (Kellogg School of Management) Ruomeng Cui (Kelley School of Business) Santiago Gallino (Tuck School of Business) Antonio Moreno-Garcia
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Marketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
Neural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Model Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Recognizing Informed Option Trading
Recognizing Informed Option Trading Alex Bain, Prabal Tiwaree, Kari Okamoto 1 Abstract While equity (stock) markets are generally efficient in discounting public information into stock prices, we believe
Stock Market Forecasting Using Machine Learning Algorithms
Stock Market Forecasting Using Machine Learning Algorithms Shunrong Shen, Haomiao Jiang Department of Electrical Engineering Stanford University {conank,hjiang36}@stanford.edu Tongda Zhang Department of
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Hong Kong Stock Index Forecasting
Hong Kong Stock Index Forecasting Tong Fu Shuo Chen Chuanqi Wei [email protected] [email protected] [email protected] Abstract Prediction of the movement of stock market is a long-time attractive topic
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort [email protected] Session Number: TBR14 Insurance has always been a data business The industry has successfully
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION
MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION Matthew A. Lanham & Ralph D. Badinelli Virginia Polytechnic Institute and State University Department of Business
Decision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
ALGORITHMIC TRADING USING MACHINE LEARNING TECH-
ALGORITHMIC TRADING USING MACHINE LEARNING TECH- NIQUES: FINAL REPORT Chenxu Shao, Zheming Zheng Department of Management Science and Engineering December 12, 2013 ABSTRACT In this report, we present an
Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering
IEICE Transactions on Information and Systems, vol.e96-d, no.3, pp.742-745, 2013. 1 Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering Ildefons
ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis
ElegantJ BI White Paper The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis Integrated Business Intelligence and Reporting for Performance Management, Operational
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Sales Forecast for Pickup Truck Parts:
Sales Forecast for Pickup Truck Parts: A Case Study on Brake Rubber Mojtaba Kamranfard University of Semnan Semnan, Iran [email protected] Kourosh Kiani Amirkabir University of Technology Tehran,
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
JetBlue Airways Stock Price Analysis and Prediction
JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue
Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016
Event driven trading new studies on innovative way of trading in Forex market Michał Osmoła INIME live 23 February 2016 Forex market From Wikipedia: The foreign exchange market (Forex, FX, or currency
Bootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-19-B &
Multiple Kernel Learning on the Limit Order Book
JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor
Demand forecasting & Aggregate planning in a Supply chain. Session Speaker Prof.P.S.Satish
Demand forecasting & Aggregate planning in a Supply chain Session Speaker Prof.P.S.Satish 1 Introduction PEMP-EMM2506 Forecasting provides an estimate of future demand Factors that influence demand and
Studying Auto Insurance Data
Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.
Cross Validation. Dr. Thomas Jensen Expedia.com
Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort [email protected] Motivation Location matters! Observed value at one location is
Machine Learning Methods for Causal Effects. Susan Athey, Stanford University Guido Imbens, Stanford University
Machine Learning Methods for Causal Effects Susan Athey, Stanford University Guido Imbens, Stanford University Introduction Supervised Machine Learning v. Econometrics/Statistics Lit. on Causality Supervised
Regularized Logistic Regression for Mind Reading with Parallel Validation
Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
Leveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
Competition-Based Dynamic Pricing in Online Retailing
Competition-Based Dynamic Pricing in Online Retailing Marshall Fisher The Wharton School, University of Pennsylvania, [email protected] Santiago Gallino Tuck School of Business, Dartmouth College,
Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams
2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Employer Health Insurance Premium Prediction Elliott Lui
Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than
Cross-Validation. Synonyms Rotation estimation
Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Causal Leading Indicators Detection for Demand Forecasting
Causal Leading Indicators Detection for Demand Forecasting Yves R. Sagaert, El-Houssaine Aghezzaf, Nikolaos Kourentzes, Bram Desmet Department of Industrial Management, Ghent University 13/07/2015 EURO
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
Equity forecast: Predicting long term stock price movement using machine learning
Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK [email protected] Abstract Long
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement
Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Ray Chen, Marius Lazer Abstract In this paper, we investigate the relationship between Twitter feed content and stock market
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Question 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
THE THREE "Rs" OF PREDICTIVE ANALYTICS
THE THREE "Rs" OF PREDICTIVE As companies commit to big data and data-driven decision making, the demand for predictive analytics has never been greater. While each day seems to bring another story of
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
II. RELATED WORK. Sentiment Mining
Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract
Model Validation Techniques
Model Validation Techniques Kevin Mahoney, FCAS kmahoney@ travelers.com CAS RPM Seminar March 17, 2010 Uses of Statistical Models in P/C Insurance Examples of Applications Determine expected loss cost
Beating the NCAA Football Point Spread
Beating the NCAA Football Point Spread Brian Liu Mathematical & Computational Sciences Stanford University Patrick Lai Computer Science Department Stanford University December 10, 2010 1 Introduction Over
Machine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
Machine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution
Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Rie Johnson Tong Zhang 1 Introduction This document describes our entry nominated for the second prize of the Heritage
How can we discover stocks that will
Algorithmic Trading Strategy Based On Massive Data Mining Haoming Li, Zhijun Yang and Tianlun Li Stanford University Abstract We believe that there is useful information hiding behind the noisy and massive
Tree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems
Tree Ensembles: The Power of Post- Processing December 2012 Dan Steinberg Mikhail Golovnya Salford Systems Course Outline Salford Systems quick overview Treenet an ensemble of boosted trees GPS modern
Smart ESG Integration: Factoring in Sustainability
Smart ESG Integration: Factoring in Sustainability Abstract Smart ESG integration is an advanced ESG integration method developed by RobecoSAM s Quantitative Research team. In a first step, an improved
Data Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
Applying Data Science to Sales Pipelines for Fun and Profit
Applying Data Science to Sales Pipelines for Fun and Profit Andy Twigg, CTO, C9 @lambdatwigg Abstract Machine learning is now routinely applied to many areas of industry. At C9, we apply machine learning
Crowdfunding Support Tools: Predicting Success & Failure
Crowdfunding Support Tools: Predicting Success & Failure Michael D. Greenberg Bryan Pardo [email protected] [email protected] Karthic Hariharan [email protected] tern.edu Elizabeth
Random forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
Paper AA-08-2015. Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM
Paper AA-08-2015 Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM Delali Agbenyegah, Alliance Data Systems, Columbus, Ohio 0.0 ABSTRACT Traditional
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Lecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
Role of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
Machine Learning Methods for Demand Estimation
Machine Learning Methods for Demand Estimation By Patrick Bajari, Denis Nekipelov, Stephen P. Ryan, and Miaoyu Yang Over the past decade, there has been a high level of interest in modeling consumer behavior
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
Machine Learning in Statistical Arbitrage
Machine Learning in Statistical Arbitrage Xing Fu, Avinash Patra December 11, 2009 Abstract We apply machine learning methods to obtain an index arbitrage strategy. In particular, we employ linear regression
Why do statisticians "hate" us?
Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Data Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
A Decision Theoretic Approach to Targeted Advertising
82 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS 2000 A Decision Theoretic Approach to Targeted Advertising David Maxwell Chickering and David Heckerman Microsoft Research Redmond WA, 98052-6399 [email protected]
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Tree based ensemble models regularization by convex optimization
Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000
MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal
MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee
UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee 1. Introduction There are two main approaches for companies to promote their products / services: through mass
Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015
Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015
Change-Point Analysis: A Powerful New Tool For Detecting Changes
Change-Point Analysis: A Powerful New Tool For Detecting Changes WAYNE A. TAYLOR Baxter Healthcare Corporation, Round Lake, IL 60073 Change-point analysis is a powerful new tool for determining whether
