Factors affecting online sales

Similar documents
Analysis of categorical data: Course quiz instructions for SPSS

Regression Analysis: A Complete Example

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Multiple Linear Regression

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Simple Linear Regression Inference

Module 5: Statistical Analysis

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Chapter 7: Simple linear regression Learning Objectives

Correlation and Simple Linear Regression

Introduction to Regression and Data Analysis

11. Analysis of Case-control Studies Logistic Regression

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

MTH 140 Statistics Videos

Elements of statistics (MATH0487-1)

Univariate Regression

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Simple linear regression


Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Part 2: Analysis of Relationship Between Two Variables

2013 MBA Jump Start Program. Statistics Module Part 3

Final Exam Practice Problem Answers

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Chapter 13 Introduction to Linear Regression and Correlation Analysis

17. SIMPLE LINEAR REGRESSION II

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

August 2012 EXAMINATIONS Solution Part I

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

The correlation coefficient

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

2. Simple Linear Regression

Example: Boats and Manatees

Premaster Statistics Tutorial 4 Full solutions

430 Statistics and Financial Mathematics for Business

Copyright PEOPLECERT Int. Ltd and IASSC

Linear Models in STATA and ANOVA

Simple Linear Regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Elementary Statistics Sample Exam #3

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

Estimation of σ 2, the variance of ɛ

Unit 26: Small Sample Inference for One Mean

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Week TSX Index

A Primer on Forecasting Business Performance

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

How Does My TI-84 Do That

SPSS Tests for Versions 9 to 13

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Session 9 Case 3: Utilizing Available Software Statistical Analysis

Fairfield Public Schools

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Regression step-by-step using Microsoft Excel

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

Statistical Models in R

Statistical Models in R

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Projects Involving Statistics (& SPSS)

An SPSS companion book. Basic Practice of Statistics

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Chapter 6: Multivariate Cointegration Analysis

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

Scatter Plots with Error Bars

SIMPLE LINEAR REGRESSION

Causal Forecasting Models

Statistics in Retail Finance. Chapter 2: Statistical models of default

Least Squares Estimation

Chapter 23. Inferences for Regression

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

Simple Predictive Analytics Curtis Seare

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Binary Diagnostic Tests Two Independent Samples

DATA INTERPRETATION AND STATISTICS

1-3 id id no. of respondents respon 1 responsible for maintenance? 1 = no, 2 = yes, 9 = blank

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Independent t- Test (Comparing Two Means)

Introduction to Quantitative Methods

Module 5: Multiple Regression Analysis

Testing for Lack of Fit

STAT 350 Practice Final Exam Solution (Spring 2015)

SUMAN DUVVURU STAT 567 PROJECT REPORT

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Transcription:

Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4 Statistical modelling: Linear regression... 7 Conclusions... 8 Summary Recent anecdotal evidence suggests changes in sales patterns and in the level of investment in human resources dedicated to multichannel retailing 1. This study focuses on two aspects of multichannel retailing: level of online sales and level of investment. This research project aims to establish the levels of online sales achieved depending on retail sector and the number of specialised online marketing staff employed. Reasons behind the change in online sales levels between retail sectors and the drivers for this change are also an important part of the wider study, though this report only aims at establishing empirical associations between measured outcomes and their potential explanatory factors. Research questions 1. What levels of online sales are observed for each retail sector and how variable are they? 2. Is there a relationship between the use of front-end developer contractors and the retail sector? 3. To what extent does the number of specialised online marketing staff employed increase the levels of online sales? 1 http://www.oxfordeconomics.com/publication/open/224369 1 Page Epigeum Ltd, 2014

The dataset The data consists of a sample of 36 firms from four locations across the United Kingdom. Information collected includes location of the firm, firm ID number, number of years in business, number of specialised staff currently employed (including part-time staff, hence not all figures are whole numbers), retail sector, proportion of sales generated online (as a percentage of total sales volume) and whether the firm uses external front-end developers (contractors) to supplement the number of internal programmers. The data has been stored in list format where each row contains data from an individual firm, and is ready for analysis. Figure 1 The dataset 2 Page Epigeum Ltd, 2014

Descriptive statistics: The exploratory stage The exploratory analysis checks that the data as computerised is of sufficient quality to be used for the analysis. There are a total of 36 firms, with a different number of firms from each retail sector. Table 1 shows summary statistics for the number of online sales and years in business. There are no missing values and no obvious errors such as negative sales figures or implausible numbers of years in business. There appear to be no oddities in the dataset and so we continue with the analysis. Table 1 Summary statistics for online sales and experience Measure Count Minimum Median Maximum Mean Standard deviation Online sales 36 19.1 40.5 62.1 40.6 11.9 Years in business 36 1.5 4 20 4.49 3.15 Figure 2 shows box plots of the online sales level for each retail sector. The fashion sector achieves the highest proportion of online sales, with a median of around 60%, which is about 15 percentage points higher than the DIY/hardware firms, and about 30 percentage points higher than the electrical firms. The lowest recorded online sales figure for the fashion sector was about 56%, which is higher than the highest recorded number of online sales for the electrical sector of about 43%. Figure 2 Box plots of online sales for each retail sector Figure 3 shows a scatter plot of online sales levels against the number of specialised staff employed, together with a straight line regression. It suggests that the number of online sales increases linearly with increasing numbers of specialised staff. The scatter plot also confirms that there are no obvious errors in the dataset. 3 Page Epigeum Ltd, 2014

Figure 3 Scatter plot of online sales against specialised staff Confidence intervals The sample mean percentage of online sales for DIY/hardware firms is 45.4% and a 95% confidence interval for their true mean percentage of online sales is (41.7%, 49.1%). The sample mean percentage of online sales for the electrical firms is 30% and a 95% confidence interval for their true mean percentage of online sales is (26.4%, 33.6%). The sample mean percentage of online sales for the fashion firms is 59.6% and a 95% confidence interval for their true mean percentage of online sales is (55.5%, 63.6%). Hypothesis tests Comparing means A table with summary statistics of the online sales variable is shown below for each retail sector: 4 Page Epigeum Ltd, 2014

Retail sector n Mean Standard deviation Minimum Median Maximum DIY/hardware 17 45.4 7.13 31.8 44.6 59.3 Electrical 15 30 6.52 19.1 27.6 42.7 Fashion 4 59.6 2.55 56.8 59.7 62.1 We test the null hypothesis that the true mean percentage of online sales for DIY/hardware firms is the same as that for electrical firms, against the alternative hypothesis that the true mean percentage of online sales is different for the two retail sectors, i.e. we test: H 0 : μ DIY hardware μ Electrical = 0 against H 1 : μ DIY hardware μ Electrical 0 where µ denotes the true mean percentage of online sales for each retail sector respectively. A two-sample t-test for testing the null hypothesis stated above gives p-value <0.001. So we reject the null hypothesis in favour of the alternative. This suggests that the mean number of online sales is associated with these two retail sectors. The observed difference between the sample mean percentage of online sales by DIY/hardware firms and electrical firms is 15.44, with a standard error of the difference of 2.42. A 95% confidence interval for the true difference between the two means is (10.48, 20.39). Note that the confidence interval for the true difference between means does not include zero, suggesting that the true mean percentage of online sales for DIY/hardware firms is higher than that for electrical firms. Analysis of variance Analysis of variance was used to compare all mean online sales percentages for all three retail sectors. The aim is to determine if there is any difference between the mean percentages of online sales for each role. So the null hypothesis is that there is no difference between the true mean percentage of online sales for the three retail sectors, and the alternative hypothesis is that at least two of the true means are different, i.e. we test: H 0 : μ DIY hardware = μ Electrical = μ Fashion against H 1 : At least two true mean online sales are not the same. The p-value for testing the null hypothesis stated above is 0.0134. So we reject the null hypothesis in favour of the alternative and conclude that the mean percentage of sales generated online is related to retail sector. 5 Page Epigeum Ltd, 2014

Comparing proportions We investigate if the proportion of firms who use contractors differs between the electrical sector and nonelectrical sector. Tabulating the answer to the question Do you use external front-end developers to improve your online store's user interface? against type of sector, gives the following frequency table, also presented as percentages within each retail sector: Uses contractor Non-electrical Electrical Total No 9 12 21 Yes 12 3 15 Total 21 15 36 Uses contractor Non-electrical Electrical Total No 42.9% 80.0% 58.3% Yes 57.1% 20.0% 41.7% Total 100.0% 100.0% 100.0% The observed proportion who use contractors for non-electrical firms is 9/21 = 0.571, or 57.1%, while for electrical firms it is 3/15 = 0.2 or 20%. We assess if there is a statistical difference between the two retail sectors in the proportion of firms who use a contractor to improve their user interface. The null hypothesis we are testing is: H 0 : π Non-electrical = π Electrical against H 1 : π Non-electrical π Electrical where π denotes the true proportion of firms who employ a contractor. A chi-squared test for testing the null hypothesis stated above gives p-value = 0.026. So we reject the null hypothesis in favour of the alternative, and conclude that the true proportions are different for the two retail sectors. This suggests that the proportion of firms who employ a contractor to improve their user interface is associated with their retail sector. The mean difference between the two proportions is 0.571 0.2 = 0.371, with standard error of a difference of 0.149. A 95% confidence interval for the true difference between the two proportions is (0.078, 0.664). 6 Page Epigeum Ltd, 2014

Note that the confidence interval for the true difference does not include zero, suggesting that the true proportion is higher for the non-electrical firms than for the electrical firms. Statistical modelling: Linear regression We use linear regression to investigate the relationship between online sales (the response variable) and the number of specialised online marketing staff employed (the explanatory variable). Straight line regression model A straight line regression model was fitted to the data. The resulting table of regression coefficients is shown in Table 2. Table 2 Regression coefficients for a straight line regression model Parameter Estimate S.E. t p-value 95% CI Intercept 27.65 2.19 12.6 <0.001 23.19 32.11 Specialised staff 8.95 1.24 7.2 <0.001 6.42 11.47 The p-value for testing that the true value of the slope is zero is <0.001, so we reject the null hypothesis that the percentage of sales generated online is not related to the number of specialised staff employed. The two variables are statistically significantly related: as the number of specialised staff increases, so does the percentage of sales generated online. R 2 for the straight line regression model is 0.605. This means that just over 60% of the total variability in online sales has been explained by the straight line regression model. Quadratic regression model A quadratic regression model was fitted to the data, giving a table of regression coefficients shown in Table 3. Table 3 Regression coefficients for a quadratic regression model Parameter Estimate S.E. t p-value 95% CI Intercept 27.38 2.46 11.12 < 0.001 22.37 32.39 Specialised staff 9.98 4.22 2.36 0.024 1.39 18.56 Specialised staff sq 0.39 1.52 0.26 0.799 3.49 2.71 7 Page Epigeum Ltd, 2014

The p-value testing the null hypothesis that a straight line model is adequate (true effect of number of specialised staff squared is zero) is 0.799, so we do not reject the null hypothesis. The addition of a quadratic term does not contribute statistically significantly to the regression model. Therefore, we adopt a straight line regression model as an adequate summary model of the observed relationship between online sales and number of specialised staff employed. The selected regression model Table 2 shows parameter estimates obtained from a straight line regression model, from which we can derive the straight line regression equation shown in Figure 3 as: Online sales = 27.65 + 8.94 x Number of specialised staff Note that this equation is valid for a number of specialised staff employed between 0 and 3. Interpretation of parameter estimates Table 2 shows that the estimated increase in online sales for one more specialised staff member employed is 8.94 (percentage points). A 95% confidence interval for the true rate of change is (6.42, 11.47). Therefore, the estimated change in online sales for an additional half a member (i.e. part-time member) of specialised staff employed is 4.47 (percentage points) and a 95% confidence interval is (3.21, 5.73). The estimated intercept is 27.65: the predicted percentage of online sales for a firm with no specialised staff is 27.65%. As the observed range of specialised staff employed is 0 to 3, this prediction is meaningful. A 95% confidence interval for the true value of the intercept is (23.19, 32.11). So we are 95% confident that this interval contains the true percentage of online sales for firms that employ no specialised staff. Predictions Using the above equation, the predicted mean percentage of sales generated online by a firm with two members of specialised staff, is: 27.65 + 8.94 x 2 = 45.53 Conclusions There was evidence of an association between mean percentage of online sales and retail sector. First, a p-value of <0.001 from a two-sample t-test suggested that the true mean percentage of online sales is different between DIY/hardware firms and electrical firms. The mean percentage of online sales for DIY/hardware firms (45.4%) was higher by 15.4% than that for electrical firms (30%). The margin of error on this estimated difference is ±5%. 8 Page Epigeum Ltd, 2014

Further, a p-value of 0.0134 from an analysis of variance suggested that the true mean percentage of online sales is significantly associated with all three retail sectors. There was evidence of an association between proportion of firms who employ a contractor to improve their user interface and retail sector when comparing non-electrical (DIY/hardware and fashion) firms and electrical firms. A p-value of 0.026 from a chi-squared test suggested that the true proportion is different for each sector. The percentage of non-electrical firms who use a contractor (57.1%) was higher by 37.1% than that of electrical firms (20%). The margin of error on this estimated difference is 29.3%. There was evidence of an association between the number of specialised staff employed and online sales figures. A p-value of <0.001 suggested that as the number of specialised staff increased, so did the online sales. The rate of increase in online sales was constant, i.e. followed a straight line. A straight line regression was found to be an adequate summary model, giving the following predictive equation: Online sales = 27.65 + 8.94 x Number of specialised staff Each one more member of specialised staff results in an increase in online sales of 8.94%. The margin of error on this estimated increase is ±2.53%. This equation is valid for predictions of between 0 and 3 members of specialised staff. So the predicted percentage of online sales for firms with no specialised staff is 27.65%. A quadratic regression did not significantly improve the summary model (p-value 0.799) over and above a straight line regression. 9 Page Epigeum Ltd, 2014