Teaching Business Statistics through Problem Solving David M. Levine, Baruch College, CUNY with David F. Stephan, Two Bridges Instructional Technology CONTACT: davidlevine@davidlevinestatistics.com
Typical student perception of the introductory business statistics course It s a math course I ll never use anything from this course in my other courses and after I graduate This is a required course that somehow, some way, I will have to get through and complete
Combatting misperception leads to these course goals Show relevance of statistics by providing examples drawn from the functional areas of business that students study Emphasize interpretation of statistical results over mathematical computation Give students plenty of practice in learning how to apply statistics to business Illustrate for students how to use statistical software to assist business decision making Link course content to current trends in business
Show relevance of statistics by providing examples drawn from the functional areas Just as computers are used in courses beyond the computer course, statistics is used in courses beyond the statistics course Each statistics topic needs to be presented in an applied context related to at least one functional area of business Functional areas of business include accounting, finance, information systems, management, and marketing When teaching a topic, the focus should be on its application in business Emphasize interpretation of results
Emphasize interpretation of statistical results over mathematical computation Introductory business statistics courses should recognize the growing need to interpret statistical results that computerized processes create. This makes the interpretation of results more important than knowing how to execute the tedious hand calculations required to produce them Interpretation includes the evaluation of the assumptions and a discussion of what should be done if the assumptions are violated
Give students plenty of practice in learning how to apply statistics to business Both classroom examples and homework exercises should involve actual or realistic data as much as possible Students should work with both small and large data sets Students should be encouraged to look beyond the statistical analysis of data to the interpretation of results in a managerial context Clear and reusable instructions should be provided for using statistical software
Illustrate for students how to use statistical software to assist business decision making Introductory business statistics courses should recognize that computers in business typically contain programs with statistical functions Integrating statistical software into all aspects of an introductory statistics course allows the course to focus on interpretation of results instead of computations
Clear and reusable instructions should be provided for using statistical software Instructions should explain clearly how to use a program such as Microsoft Excel with the study of statistics Instructions should provide sufficient step-bystep detail, including program elements such as dialog boxes, to enable students to use the instructions for other problems and examples Using templates, project files, and/or macros adds in reusability and lessens the burden of learning the software
Special issues What to do during the first day of class Dealing with students negative affect Take note of current trends that require knowledge of statistics
First Day of Class First impressions are critically important in everything you do in life First day is the most important class of the semester You need to set the tone to create a new impression that the course will be important to their business education
Deming s Eighth Point Drive Out Fear
Statistics is not Sadistics Make the point that this course is not a math course State that you will be learning analytical skills for making business decisions Explain that the focus will be on how statistics can be used in the functional areas of business
Reading, Writing, and Arithmetic Statistics I keep saying that the sexy job in the next ten years will be statistician. Hal Varian, Chief Economist, Google, as quoted in The New York Times, August 6, 2009
Current trend example: Analytics Analytics can help answer these questions What happened in the past and how and why it happened? What is happening now and what is the best action to take? What will happen and how can you obtain good predictions of what will happen? Analytics should be part of the competitive strategy of any organization. Davenport and Harris (references 1 & 2)
How to proceed with rest of course Provide a roadmap that helps guide students to use statistics for problem solving in business State example problems that are stories about making decisions in a functional area of business Fictional or real businesses? Illustrate that statistics provide a problem-solving approach for business decision making
Part Two: Implementing course goals through the DCOVA problem-solving framework
DCOVA: five-steps that serve as a blueprint for all statistical problemsolving Define the data that you want to study in order to solve a problem or meet an objective Collect the data from appropriate sources Organize the data by developing tables Visualize the data by developing charts Analyze the data to reach conclusions and present those results
Define Step Present every problem from the perspective of what is the business objective for collecting data (Compare to Here is some data, let s analyze it. ) Use operational definitions to identify the variables that need to be analyzed. Determine the type (categorical or numerical) for each variable.
Collect Step Determine the source of the data Primary source Secondary source Survey Designed experiment Prepare data Data cleaning Recoding
Organize Step Determine the format for data entry Choose software to be used for data analysis (potentially could involve several different types of software) Organize can be done in conjunction with the Visualize and Analysis steps
Visualize Step Construct charts and special displays Explore the charts to discover patterns and relationships Evaluate the charts to determine the validity of the methods used in the Analyze step
Analyze Step Determine which method(s) should be used to analyze the data Using a roadmap to help make this determination can be helpful Summarize the results Present the results in a report
Example: Teaching Simple Linear Regression Introduce topic with a story-based business problem Execute the DCOVA framework Reflect and state solution to business problem and propose further action
The Story: Knowing Customers at Sunflowers Apparel Having survived recent economic slowdowns that have diminished their competitors, Sunflowers Apparel, a chain of upscale fashion stores for women, is in the midst of a companywide review that includes researching the factors that make their stores successful. Until recently, Sunflowers managers had no data analyses to support store location decisions, relying instead on subjective factors, such as the availability of an inexpensive lease or the perception that a particular location seemed ideal for one of their stores. As the new director of planning, you have already consulted with marketing data firms that specialize in using business analytics to identify and classify groups of consumers. Based on such preliminary analyses, you have already tentatively discovered that the profile of Sunflower shoppers may not only be the upper middle class long suspected of being the chain s clientele, but may also include younger, aspirational families with young children, and, most surprising, urban hipsters that set trends and are mostly single. You seek to develop a systematic approach that will lead to making better decisions during the site-selection process. As a starting point, you have asked one marketing data firm to collect and organize data for the number of people in the identified categories that live within a fixed radius of each Sunflower store. You believe that the greater numbers of profiled customers contribute to store sales, and you want to explore the possible use of this relationship in the decision-making process. How can you use statistics so that you can forecast the annual sales of a proposed store based on the number of profiled customers that reside within a fixed radius of a Sunflowers store?
Key Points from the Sunflowers Apparel Story Until recently, Sunflowers managers relied on subjective factors to support store location decisions You have already tentatively discovered that the profile of Sunflower shoppers may not only be the upper middle class shoppers long suspected of being the chain s clientele You believe that the greater numbers of profiled customers living near a store contribute to store sales, and you want to explore this relationship How can you use statistics so that you can forecast the annual sales of a proposed store based on the number of profiled customers that reside within a fixed radius of a Sunflowers store?
Define and Collect steps Operational definitions needed for Profiled customers (in millions) Annual store sales (in $milllions) Collect data from a sample of 14 stores (Sampling issues already discussed in course)
Organize Step (worksheet entry) Store Profiled Customers Annual Sales 1 3.7 5.7 2 3.6 5.9 3 2.8 6.7 4 5.6 9.5 5 3.3 5.4 6 2.2 3.5 7 3.3 6.2 8 3.1 4.7 9 3.2 6.1 10 3.5 4.9 11 5.2 10.7 12 4.6 7.6 13 5.8 11.8 14 3.0 4.1
Visualize Step
Analyze Step (worksheet results)
Analyze Step Interpret the regression coefficients Use the regression model for prediction Interpret the standard error of the estimate Interpret the coefficient of determination Explain the regression sum of squares, error sum of squares, and total sum of squares
Analysis Step (residual analysis) Explain the assumptions of regression Show residual plots when each assumption has been violated Show residual plots when each assumption has not been violated Show the residual plot for these data Note the integration of visualize and analyze
Analyze Step (residual plot)
Analyze Step (inferences) t test for the slope Confidence interval for a mean value Prediction interval for an individual value
Reflection and solution statement To make more objective decisions, you used the DCOVA approach to identify and classify groups of consumers and develop a regression model to analyze the relationship between the number of profiled customers that live in a fixed radius from a Sunflowers store and the annual sales of the store. The model indicated that about 84.8% of the variation in sales was explained by the number of profiled customers that live in a fixed radius from a Sunflowers store. Furthermore, for each increase of one million profiled customers, mean annual sales were estimated to increase by $2.0742 million. You can now use your model to help make better decisions when selecting new sites for stores as well as to forecast sales for existing stores.
Additional thoughts about the Introductory Business Statistics Course
Additional thoughts Course structure issue Course variations Typical content Introduction Tables & Charts Descriptive Statistics Probability Discrete Probability Distributions Normal Distribution Sampling Distributions Confidence Intervals Hypothesis Testing p-values Regression Quality Management Use of templates
Course structure issue One semester vs. two semester Undergraduate versus graduate MBA
Course variations One semester undergraduate course can only cover a certain amount of topics. Two semester undergraduate course can cover more tests including some ANOVA and a good deal of multiple regression Introductory MBA course can cover more regression than undergraduate one semester course Specialized MBA courses can focus on multiple regression and time series
Typical content Overview/orientation Tables and Charts/Descriptive Statistics Probability and Probability Distributions Confidence Intervals and Hypothesis Testing Regression
Introduction Explain that by using software such as Excel or Minitab the focus is on analyzing the results not on doing the computations Ask the class to tell you whether certain variables are categorical or numerical Collect data from students that requires them to measure something such as the time it takes them to get ready in the morning
Tables & Charts Use the student generated data for the classroom example Focus on the differences between alternative graphs and the circumstances in which each is better Mention misuse of graphs
Descriptive Statistics Take a small sample of student generated data and use it for the classroom example Teach the mean, median, and mode without showing equations first When you get to variation, build up to the variance and standard deviation slowly by explaining that you need a measure of variation that will be 0 when there is no variation, small when there is some variation, and large when there is a great deal of variation
Probability Don t use Venn diagrams they are confusing to students; use contingency tables instead Minimize coverage of probability especially in a one semester course. This is a statistics course not a math course
Discrete Probability Distributions Do you really need to explicitly cover the binomial, Poisson, and/or hypergeometric distributions especially in a one semester course? Can you teach confidence intervals and hypothesis testing without covering these? Yes!
Normal Distribution Don t show the equation for the normal distribution. It will only intimidate some students and make students think that somehow they need to know it Work through a classroom example in which you show all the possible variations of finding areas under the curve Expect that the most difficult example is trying to find the unknown X given an area Use a picture of the normal table to show that you are doing the inverse of what you did previously
Sampling Distributions Probably the most difficult concept for students to learn Try using a small population and then select all the samples from that population so that they can see that the distribution of the sample mean is different from the distribution of the population Then, present the central limit theorem and show what happens when the sample size is increased with different populations
Confidence Intervals The most important points to get across are that you can never be certain that your confidence interval is correct and that if you took a different sample you would get a different confidence interval Review the difference between categorical and numerical variables and point out that there are different equations for different types of variables. This will set the stage for using road maps in hypothesis testing
Hypothesis Testing Focus on the fact that the alternative hypothesis H1 never has a equal sign -- it is always <, >, or. Give a practical example to show the difference between Type I and Type II errors such as should you market a product or should you take a drug Beware of trying to cover too many different hypothesis tests -- students won t see the forest from the trees Use a roadmap that presents a series of questions leading to the correct test procedure
p-values Students have a more difficult time with this concept than we expect Use a hypothesis test that involves the normal distribution (such as a Z test for a mean or a proportion) to demonstrate the p-value Use the mantra If the p-value is low, H 0 must go to help students remember that a low p-value is significant not a high p-value
Regression Begin with a business problem of trying to predict the value of a variable of interest. Then ask what other variables might be useful in helping to predict the value of the variable of interest Do this before going through any computations Review the meaning of the Y intercept and the slope Don t do the proof of the Least squares method Focus on interpreting the results of software not on doing computations Make sure to mention the assumptions and what happens if the assumptions are violated Discuss residual analysis if time permits
Quality Management Integrate control charts with management philosophy Do the Red Bead experiment if time permits as this transmits the notion that most of the variation is due to the system not the individual
Use of templates (stored in a library or generated by an add-in) In this example, the complexity is hidden, yet fully accessible later, to the student focused on the interpretation of results to solve a problem.
Even simple linear regression can be a template!
Time does not permit discussion of other topics! Thanks for your interest and attention! David Levine, with David Stephan
References 1. Davenport, T. H. and J. G. Harris. Competing on Analytics: The New Science of Winning. Boston, MA: Harvard Business School Press, 2006. 2. Davenport, T. H., J. G. Harris, and R. Morrison. Competing on Analytics: Smarter Decisions Better Results. Boston, MA: Harvard Business Press, 2010. 3. Thomas Davenport and D. J. Patil. Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, October 2012: 70 76. 4. Levine, D. M. and D. F. Stephan. Teaching Introductory Business Statistics Using the DCOVA Framework, Decision Sciences Journal of Innovative Education, Vol. 9, September 2011: 393 397. 5. Levine, D. M., D. F. Stephan, and K.A. Szabat. Statistics for Managers Using Microsoft Excel, 7th Ed. Upper Saddle River, N. J.: Pearson Education, 2013.