Factors affecting online sales

Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4 Statistical modelling: Linear regression... 7 Conclusions... 8 Summary Recent anecdotal evidence suggests changes in sales patterns and in the level of investment in human resources dedicated to multichannel retailing 1. This study focuses on two aspects of multichannel retailing: level of online sales and level of investment. This research project aims to establish the levels of online sales achieved depending on retail sector and the number of specialised online marketing staff employed. Reasons behind the change in online sales levels between retail sectors and the drivers for this change are also an important part of the wider study, though this report only aims at establishing empirical associations between measured outcomes and their potential explanatory factors. Research questions 1. What levels of online sales are observed for each retail sector and how variable are they? 2. Is there a relationship between the use of front-end developer contractors and the retail sector? 3. To what extent does the number of specialised online marketing staff employed increase the levels of online sales? 1 http://www.oxfordeconomics.com/publication/open/224369 1 Page Epigeum Ltd, 2014

The dataset The data consists of a sample of 36 firms from four locations across the United Kingdom. Information collected includes location of the firm, firm ID number, number of years in business, number of specialised staff currently employed (including part-time staff, hence not all figures are whole numbers), retail sector, proportion of sales generated online (as a percentage of total sales volume) and whether the firm uses external front-end developers (contractors) to supplement the number of internal programmers. The data has been stored in list format where each row contains data from an individual firm, and is ready for analysis. Figure 1 The dataset 2 Page Epigeum Ltd, 2014

Descriptive statistics: The exploratory stage The exploratory analysis checks that the data as computerised is of sufficient quality to be used for the analysis. There are a total of 36 firms, with a different number of firms from each retail sector. Table 1 shows summary statistics for the number of online sales and years in business. There are no missing values and no obvious errors such as negative sales figures or implausible numbers of years in business. There appear to be no oddities in the dataset and so we continue with the analysis. Table 1 Summary statistics for online sales and experience Measure Count Minimum Median Maximum Mean Standard deviation Online sales 36 19.1 40.5 62.1 40.6 11.9 Years in business 36 1.5 4 20 4.49 3.15 Figure 2 shows box plots of the online sales level for each retail sector. The fashion sector achieves the highest proportion of online sales, with a median of around 60%, which is about 15 percentage points higher than the DIY/hardware firms, and about 30 percentage points higher than the electrical firms. The lowest recorded online sales figure for the fashion sector was about 56%, which is higher than the highest recorded number of online sales for the electrical sector of about 43%. Figure 2 Box plots of online sales for each retail sector Figure 3 shows a scatter plot of online sales levels against the number of specialised staff employed, together with a straight line regression. It suggests that the number of online sales increases linearly with increasing numbers of specialised staff. The scatter plot also confirms that there are no obvious errors in the dataset. 3 Page Epigeum Ltd, 2014

Figure 3 Scatter plot of online sales against specialised staff Confidence intervals The sample mean percentage of online sales for DIY/hardware firms is 45.4% and a 95% confidence interval for their true mean percentage of online sales is (41.7%, 49.1%). The sample mean percentage of online sales for the electrical firms is 30% and a 95% confidence interval for their true mean percentage of online sales is (26.4%, 33.6%). The sample mean percentage of online sales for the fashion firms is 59.6% and a 95% confidence interval for their true mean percentage of online sales is (55.5%, 63.6%). Hypothesis tests Comparing means A table with summary statistics of the online sales variable is shown below for each retail sector: 4 Page Epigeum Ltd, 2014

Retail sector n Mean Standard deviation Minimum Median Maximum DIY/hardware 17 45.4 7.13 31.8 44.6 59.3 Electrical 15 30 6.52 19.1 27.6 42.7 Fashion 4 59.6 2.55 56.8 59.7 62.1 We test the null hypothesis that the true mean percentage of online sales for DIY/hardware firms is the same as that for electrical firms, against the alternative hypothesis that the true mean percentage of online sales is different for the two retail sectors, i.e. we test: H 0 : μ DIY hardware μ Electrical = 0 against H 1 : μ DIY hardware μ Electrical 0 where µ denotes the true mean percentage of online sales for each retail sector respectively. A two-sample t-test for testing the null hypothesis stated above gives p-value <0.001. So we reject the null hypothesis in favour of the alternative. This suggests that the mean number of online sales is associated with these two retail sectors. The observed difference between the sample mean percentage of online sales by DIY/hardware firms and electrical firms is 15.44, with a standard error of the difference of 2.42. A 95% confidence interval for the true difference between the two means is (10.48, 20.39). Note that the confidence interval for the true difference between means does not include zero, suggesting that the true mean percentage of online sales for DIY/hardware firms is higher than that for electrical firms. Analysis of variance Analysis of variance was used to compare all mean online sales percentages for all three retail sectors. The aim is to determine if there is any difference between the mean percentages of online sales for each role. So the null hypothesis is that there is no difference between the true mean percentage of online sales for the three retail sectors, and the alternative hypothesis is that at least two of the true means are different, i.e. we test: H 0 : μ DIY hardware = μ Electrical = μ Fashion against H 1 : At least two true mean online sales are not the same. The p-value for testing the null hypothesis stated above is 0.0134. So we reject the null hypothesis in favour of the alternative and conclude that the mean percentage of sales generated online is related to retail sector. 5 Page Epigeum Ltd, 2014

Comparing proportions We investigate if the proportion of firms who use contractors differs between the electrical sector and nonelectrical sector. Tabulating the answer to the question Do you use external front-end developers to improve your online store's user interface? against type of sector, gives the following frequency table, also presented as percentages within each retail sector: Uses contractor Non-electrical Electrical Total No 9 12 21 Yes 12 3 15 Total 21 15 36 Uses contractor Non-electrical Electrical Total No 42.9% 80.0% 58.3% Yes 57.1% 20.0% 41.7% Total 100.0% 100.0% 100.0% The observed proportion who use contractors for non-electrical firms is 9/21 = 0.571, or 57.1%, while for electrical firms it is 3/15 = 0.2 or 20%. We assess if there is a statistical difference between the two retail sectors in the proportion of firms who use a contractor to improve their user interface. The null hypothesis we are testing is: H 0 : π Non-electrical = π Electrical against H 1 : π Non-electrical π Electrical where π denotes the true proportion of firms who employ a contractor. A chi-squared test for testing the null hypothesis stated above gives p-value = 0.026. So we reject the null hypothesis in favour of the alternative, and conclude that the true proportions are different for the two retail sectors. This suggests that the proportion of firms who employ a contractor to improve their user interface is associated with their retail sector. The mean difference between the two proportions is 0.571 0.2 = 0.371, with standard error of a difference of 0.149. A 95% confidence interval for the true difference between the two proportions is (0.078, 0.664). 6 Page Epigeum Ltd, 2014

Note that the confidence interval for the true difference does not include zero, suggesting that the true proportion is higher for the non-electrical firms than for the electrical firms. Statistical modelling: Linear regression We use linear regression to investigate the relationship between online sales (the response variable) and the number of specialised online marketing staff employed (the explanatory variable). Straight line regression model A straight line regression model was fitted to the data. The resulting table of regression coefficients is shown in Table 2. Table 2 Regression coefficients for a straight line regression model Parameter Estimate S.E. t p-value 95% CI Intercept 27.65 2.19 12.6 <0.001 23.19 32.11 Specialised staff 8.95 1.24 7.2 <0.001 6.42 11.47 The p-value for testing that the true value of the slope is zero is <0.001, so we reject the null hypothesis that the percentage of sales generated online is not related to the number of specialised staff employed. The two variables are statistically significantly related: as the number of specialised staff increases, so does the percentage of sales generated online. R 2 for the straight line regression model is 0.605. This means that just over 60% of the total variability in online sales has been explained by the straight line regression model. Quadratic regression model A quadratic regression model was fitted to the data, giving a table of regression coefficients shown in Table 3. Table 3 Regression coefficients for a quadratic regression model Parameter Estimate S.E. t p-value 95% CI Intercept 27.38 2.46 11.12 < 0.001 22.37 32.39 Specialised staff 9.98 4.22 2.36 0.024 1.39 18.56 Specialised staff sq 0.39 1.52 0.26 0.799 3.49 2.71 7 Page Epigeum Ltd, 2014

The p-value testing the null hypothesis that a straight line model is adequate (true effect of number of specialised staff squared is zero) is 0.799, so we do not reject the null hypothesis. The addition of a quadratic term does not contribute statistically significantly to the regression model. Therefore, we adopt a straight line regression model as an adequate summary model of the observed relationship between online sales and number of specialised staff employed. The selected regression model Table 2 shows parameter estimates obtained from a straight line regression model, from which we can derive the straight line regression equation shown in Figure 3 as: Online sales = 27.65 + 8.94 x Number of specialised staff Note that this equation is valid for a number of specialised staff employed between 0 and 3. Interpretation of parameter estimates Table 2 shows that the estimated increase in online sales for one more specialised staff member employed is 8.94 (percentage points). A 95% confidence interval for the true rate of change is (6.42, 11.47). Therefore, the estimated change in online sales for an additional half a member (i.e. part-time member) of specialised staff employed is 4.47 (percentage points) and a 95% confidence interval is (3.21, 5.73). The estimated intercept is 27.65: the predicted percentage of online sales for a firm with no specialised staff is 27.65%. As the observed range of specialised staff employed is 0 to 3, this prediction is meaningful. A 95% confidence interval for the true value of the intercept is (23.19, 32.11). So we are 95% confident that this interval contains the true percentage of online sales for firms that employ no specialised staff. Predictions Using the above equation, the predicted mean percentage of sales generated online by a firm with two members of specialised staff, is: 27.65 + 8.94 x 2 = 45.53 Conclusions There was evidence of an association between mean percentage of online sales and retail sector. First, a p-value of <0.001 from a two-sample t-test suggested that the true mean percentage of online sales is different between DIY/hardware firms and electrical firms. The mean percentage of online sales for DIY/hardware firms (45.4%) was higher by 15.4% than that for electrical firms (30%). The margin of error on this estimated difference is ±5%. 8 Page Epigeum Ltd, 2014

Further, a p-value of 0.0134 from an analysis of variance suggested that the true mean percentage of online sales is significantly associated with all three retail sectors. There was evidence of an association between proportion of firms who employ a contractor to improve their user interface and retail sector when comparing non-electrical (DIY/hardware and fashion) firms and electrical firms. A p-value of 0.026 from a chi-squared test suggested that the true proportion is different for each sector. The percentage of non-electrical firms who use a contractor (57.1%) was higher by 37.1% than that of electrical firms (20%). The margin of error on this estimated difference is 29.3%. There was evidence of an association between the number of specialised staff employed and online sales figures. A p-value of <0.001 suggested that as the number of specialised staff increased, so did the online sales. The rate of increase in online sales was constant, i.e. followed a straight line. A straight line regression was found to be an adequate summary model, giving the following predictive equation: Online sales = 27.65 + 8.94 x Number of specialised staff Each one more member of specialised staff results in an increase in online sales of 8.94%. The margin of error on this estimated increase is ±2.53%. This equation is valid for predictions of between 0 and 3 members of specialised staff. So the predicted percentage of online sales for firms with no specialised staff is 27.65%. A quadratic regression did not significantly improve the summary model (p-value 0.799) over and above a straight line regression. 9 Page Epigeum Ltd, 2014