A Point of View Buyer versus Browser: A Predictive Model to Increase Online Purchases Abstract 1 Global online retail sales, amounted to $1.67 trillion in 2015, are expected to grow to $3.58 trillion in 2019. But online retailers are still on a quest to master one of the most important aspects of online commerce: Predicting the customer behavior. Increasing uncertainty and a dynamic business environment make it challenging to understand the customers' behavior and manage their expectations optimally. However, recent developments in analytics make it possible to accurately identify and predict the buyer behavior, based on certain critical factors, allowing the organizations to convert more customers with attractive offers and discounts. With rapid growth in machine learning techniques, analytics can be used to predict the online customer behavior with very high accuracy. Several machine learning algorithms can be used to understand the relationship between the online behavior and customer demographics, products and promotional information, and visit-related attributes. This paper discusses a methodology that uses classification modeling to compare the behaviors and separate the buyers from the browsers. It also explains how managers can use such a predictive modeling methodology to make better decisions, increase opportunities for customer engagement and generate more revenue. Understanding Visitor Conversions In the first quarter of 2015, 2.87 percent of the visits to the e-commerce Websites resulted in purchases. While this figure increased to 3.56 2 percent in the last quarter of 2015, in the first quarter of 2016, it again dipped to 3.01 percent. Online retailers have tried for long to understand the secret sauce of visitor conversions with partial or limited success. By running a predictive model that uses data on visits and purchases, e-commerce vendors can predict the likelihood of visitors on e-commerce portals turning into buyers. Online retailers can gain deep actionable insights about an online shopper's behavior by tracking: What consumers do online Where they spend most of their time Typically consumers spend most of their time researching products or seeking expert opinions and user reviews. They spend the rest of the time using price comparison sites, and searching for offers and discount coupons. These facts suggest complex consumer behavior before a purchase decision. The diversity of products available in the online marketplace adds to the complexity. For example, the Amazon Website features a vast number of products in each business line and the customer has to make a choice between many competing alternatives. Several factors such as brand, price, quality, and functional specifications are evaluated before the consumer is ready to make a purchase decision. Advanced machine learning algorithms can be applied to classify the online behavior of customers in retail. The hypothesis used in this paper differentiates between a buyer and a browser, based on several products characteristics and online behaviors. High accuracy of this classification helps target the right customers with the right products, while also minimizing losses from pursuing non-profitable customers. Pareto's principle can also be applied to online retail to surmise that '20 percent of online purchases will lead to 80 percent of profits'. The model discussed here can help identify online customers, who are actually likely to make a 'buy' decision. [1] E-Marketer, Worldwide Retail e-commerce Sales: e-marketer's Updated Estimates and Forecast Through 2019 (December 2015), accessed August 16, 2016, http://www.emarketer.com/public_media/docs/emarketer_etailwest2016_worldwide_ecommerce_report.pdf [2] Monetate, Monetate e-commerce Quarterly Report, Q1 2016 (April, 2016), accessed August 16,2016, http://info.monetate.com/rs/092-tqn- 434/images/EQ1_2016.pdf
Achieving Classification Accuracy The designed classification and prediction model extracts top customer, product and online visit features that significantly impact the purchase decision, from among all potential features. The 'predictor variables' thus used in the model include: Customer ID and demographics Normalized product and promotion attributes Visit-related attributes The aim of the predictive model, based on the problem defined, is to classify whether the target variable outcome is '1' when the consumer makes a purchase and '0' when the consumer only makes a visit to the online store without buying anything. The predictive model uses logistic regression capabilities to identify the predictor variables that are significant and random forests to improve the accuracy of prediction. In random forest, the following parameters are fine-tuned to improve the prediction accuracy: Mtry: This takes the number of significant features to be used during each split. The default value is the square root of the total number of predictor variables used for the classification. Nodesize: The number of nodes in each decision tree. When this number is small, it causes fewer trees to be grown and, hence, it takes lesser time to process the classification. Ntree: The number of decision trees to be grown in the ensemble. The Ntree is set, equal to a sufficiently large number, to ensure that every data point is considered for prediction at least, a few times. We also used ratios of certain visit-related variables as we found that these ratios have a significant impact on the online purchase decisions. The ratios of visit-related attributes impact the performance of the algorithms and can, therefore, be added in as new features in the modeling approach to improve the accuracy further. As a classifier, random forest performs feature selection implicitly, using a small sub-set of predictor variables for classification. This ensures superior performance on feature-rich data. The outcome of this implicit feature selection of random forest can be visualized by the 'Gini importance measure,' and is used for proper identification of high impact predictors. The Gini importance measure indicates how often a particular predictor is selected for a split and its impact on successful discrimination between the classes, among all the top predictors. Minimizing Prediction Errors Any predictive algorithm needs to be assessed for its error margin. An error measure, like the one below, can be used to calculate the error of prediction for each classification: E = (test data) [ ( 0.8 I_(true value=1) + 0.2 I_(true value=0) ) scaled predicted value - true value ] The evaluation error for each instance of prediction is calculated. The penalty applied for misclassifying a purchase as a visit is greater than the penalty for misclassifying a visit as a purchase; the penalties are in the ratio of 4:1. The error is then summed up for all visits and the final error value is used to evaluate the performance of the classifying algorithm. To minimize the prediction error, this assessment needs to be repeated with several advanced classification models to evaluate their accuracy. The run time of these algorithms is not a matter of concern, as all of them perform optimally within a reasonable time-frame. The model can also be coded without any significant complexity. This paper thus focuses on the classification performance of the algorithms when implemented in a data mining tool such as R or WEKA. The respective accuracy, of performances of the various algorithms tested, has been captured in Figure 1. 2
Method Classification Accuracy Prediction Error Logistic regression 81% 814.6 LMT 82.6% 676.8 IBk 84.6% 377 J48 85.2% 343 Random forest 88.2% 154.8 Figure 1: Comparison of classifiers for the buyer versus browser model In a test run, running the classifying and prediction model helped users correctly classify buyers and browsers with more than 88 percent accuracy, with prediction error score of 154.8. Mapping Prediction Variables to Business Benefits The proposed classification and prediction methodology can help managers to focus on the top features that impact the high-value customers and prioritize their advertising and marketing spends on customer profiles, identified by the classifying algorithm. Focused and direct targeting of such customers can enable businesses to increase the efficiency in the digital marketing initiatives. In essence, this model helps to increase average revenue per user, by predicting which visit is likely to result in a purchase. It helps cut costs such as online advertising and external consultancy to arrive at an optimal cost structure. Moreover, by targeting customers, who are more likely to purchase their products and services, retailers can enhance customer engagement. They can streamline customer service by ensuring always available, on-time delivery of products and services. The model also increases crossselling and up-selling opportunities for related products and services through better understanding of the buyer behavior. Moving Towards Predictable Customer Behavior Advanced machine learning algorithms can predict the customer behavior with a high degree of certainty and successfully differentiate the online behavior. A powerful combination of analytics and social media can help personalize online purchases. Research-led initiatives, based on predictive and prescriptive techniques, are already helping the organizations to attract customers, by investing in offline and digital marketing strategies that are aligned with each other. Once the organizations anticipate the purchase decisions, using the predictive model, they can target the sales growth, using owned, paid and earned media. The future holds more promise. Further research can help apply deep-learning techniques that also use audio, image, and video data to better understand the patterns in online customer behavior and, thereby, increase the prediction accuracy. 3
About the Author Sriram Sampathraman Sriram Sampathraman is a subject matter expert, with the Analytics and Insights division of Tata Consultancy Services' Business Process Services unit. He has extensive experience with analytics in academia and in industry. His research focuses on edge analytics and machine learning, especially deep-learning in solving business problems in the connected digital world. He holds a bachelor's degree in Mechanical Engineering from the IIT Roorkee and master's degrees in Computer Science and in Mathematics, both from the State University of New York at Albany, New York. 4
About TCS' Business Process Services Unit Enterprises seek to drive business growth and agility through innovation in an increasingly regulated, competitive, and global market. TCS helps clients achieve these goals by managing and executing their business operations effectively and efficiently. TCS' Business Process Services (BPS) include core industry-specific processes, analytics and insights, and enterprise services such as finance and accounting, HR, and supply chain management. TCS creates value TM through its FORE simplification and transformation methodology, backed by its deep domain expertise, TM extensive technology experience, and TRAPEZE suite of solution accelerators and governance enablers. TCS complements its experience and expertise with innovative delivery models such as using robotic automation and providing Business Processes as a Service (BPaaS). TCS' BPS unit has been positioned in the leaders' quadrant for various service lines by many leading analyst firms. With over four decades of global experience and a delivery footprint spanning six continents, TCS is one of the largest BPS providers today. Contact For more information about TCS' Business Process Services Unit, visit: www.tcs.com/bps Email: bps.connect@tcs.com About Tata Consultancy Services Ltd (TCS) Tata Consultancy Services is an IT services, consulting and business solutions organization that delivers real results to global business, ensuring a level of certainty no other firm can match. TCS offers a consulting-led, integrated portfolio of IT and IT-enabled infrastructure, engineering TM and assurance services. This is delivered through its unique Global Network Delivery Model, recognized as the benchmark of excellence in software development. A part of the Tata Group, India s largest industrial conglomerate, TCS has a global footprint and is listed on the National Stock Exchange and the Bombay Stock Exchange in India. For more information, visit us at www.tcs.com IT Services Business Solutions Consulting All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content / information contained here is correct at the time of publishing. No material from here may be copied, modified, reproduced, republished, uploaded, transmitted, posted or distributed in any form without prior written permission from TCS. Unauthorized use of the content / information appearing here may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties. Copyright 2016 Tata Consultancy Services Limited TCS Design Services I M I 11 I 16