TEACHING STATISTICS THROUGH DATA ANALYSIS



Similar documents
Chapter 1 Introduction. 1.1 Introduction

Getting Correct Results from PROC REG

RUSRR048 COURSE CATALOG DETAIL REPORT Page 1 of 6 11/11/ :33:48. QMS 102 Course ID

Why is Internal Audit so Hard?

USING EXCEL IN AN INTRODUCTORY STATISTICS COURSE: A COMPARISON OF INSTRUCTOR AND STUDENT PERSPECTIVES

PGR Computing Programming Skills

Master of Science in Healthcare Informatics and Analytics Program Overview

Master of Science (MS) in Biostatistics Program Director and Academic Advisor:

Teaching Multivariate Analysis to Business-Major Students

Introduction. Chapter 1

Pre-Calculus Semester 1 Course Syllabus

3. Current Auditing Computerized Tools

BOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites

COMPUTING AND VISUALIZING LOG-LINEAR ANALYSIS INTERACTIVELY

3 Guidance for Successful Evaluations

Teaching Business Statistics through Problem Solving

Credibility: Theory Meets Regulatory Practice Matt Hassett, Arizona State University Brian Januzik, Advance PCS

How to Write a Successful PhD Dissertation Proposal

THINK DEVISE CALCULATE ENVISION INVENT INSPIRE THINK DEVISE CALCULATE ENVISION INV

Introduction to time series analysis

7 Conclusions and suggestions for further research

AP Physics 1 and 2 Lab Investigations

Dr Christine Brown University of Melbourne

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Implied Volatility for FixedResets

Archetypal Analysis in Marketing Research: A New Way of Understanding Consumer Heterogeneity

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

LOGNORMAL MODEL FOR STOCK PRICES

4. Simple regression. QBUS6840 Predictive Analytics.

Syllabus MAC2312 Calculus and Analytic Geometry II

Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods

DISCRIMINANT FUNCTION ANALYSIS (DA)

How To Create A Visual Analytics Tool

INTEGRATING FINANCE AND ACCOUNTING CONCEPTS: AN EXAMPLE USING TROUBLED DEBT RESTRUCTURING WITH MODIFICATION OF TERMS ABSTRACT

Toothpick Squares: An Introduction to Formulas

Integrated Resource Plan

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

MONEY MANAGEMENT. Guy Bower delves into a topic every trader should endeavour to master - money management.

Designing and Evaluating a Web-Based Collaboration Application: A Case Study

MBA Jump Start Program

MATH 2 Course Syllabus Spring Semester 2007 Instructor: Brian Rodas

SAN DIEGO COMMUNITY COLLEGE DISTRICT CITY COLLEGE ASSOCIATE DEGREE COURSE OUTLINE

Asymmetry and the Cost of Capital

BINOMIAL OPTIONS PRICING MODEL. Mark Ioffe. Abstract

Reciprocal Cost Allocations for Many Support Departments Using Spreadsheet Matrix Functions

Teaching Biostatistics to Postgraduate Students in Public Health

AMS 5 CHANCE VARIABILITY

THE ROLE OF THE COMPUTER IN THE TEACHING OF STATISTICS. Tarmo Pukkila and Simo Puntanen University of Tampere, Finland

American Journal Of Business Education September/October 2012 Volume 5, Number 5

Lecture Notes Human Resource Management MBA- IInd Semester. Lecture 7 Training

Al-Jo anee Company: support department cost allocations with matrices to improve decision making

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

Standard Deviation Estimator

Mathematics Placement Test (MPT) Alignment with Washington State College Readiness Mathematics Standards (CRMS)

FTS Real Time System Project: Portfolio Diversification Note: this project requires use of Excel s Solver

An Activity-Based Costing Assessment Task: Using an Excel Spreadsheet

White Paper 6 Steps to Enhance Performance of Critical Systems

MAC 2233, STA 2023, and junior standing

COURSE REQUIREMENTS AND EXPECTATIONS FOR ALL STUDENTS ENROLLED IN COLLEGE ALGEBRA ROWAN UNIVERSITY CAMDEN CAMPUS SPRING 2011

COURSE SYLLABUS Pre-Calculus A/B Last Modified: April 2015

Brillig Systems Making Projects Successful

Mathematics within the Psychology Curriculum

The Bass Model: Marketing Engineering Technical Note 1

3 Hands-on Techniques for Managing Operational Change in HRIS Projects

DRAFT. Further mathematics. GCE AS and A level subject content

MICRO-COMPUTER BASED REAL ESTATE DECISION MAKING AND INFORMATION MANAGEMENT - AN INTEGRATED APPROACH

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Experiment #1, Analyze Data using Excel, Calculator and Graphs.

Probability Models of Credit Risk

Managing Customer Retention

A Primer for Investment Trustees (a summary)

Fairfield Public Schools

Spreadsheets have become the principal software application for teaching decision models in most business

SEEM3490 Information Systems Management Lecture 01 Introduction to ISM

Demonstrating Understanding Rubrics and Scoring Guides

Investment Decision Analysis

LOGISTIC REGRESSION ANALYSIS

Math 3E - Linear Algebra (3 units)

January 26, 2009 The Faculty Center for Teaching and Learning

ELECTRONIC CIRCUIT SIMULA TION IN ENGINEERING EDUCATION: THEORETICAL CONSIDERATIONS

Handling attrition and non-response in longitudinal data

AS-D2 THE ROLE OF SIMULATION IN CALL CENTER MANAGEMENT. Dr. Roger Klungle Manager, Business Operations Analysis

Where s the Beef? : Statistical Demand Estimation Using Supermarket Scanner Data

POL 204b: Research and Methodology

QMB 3302 Business Analytics CRN Spring 2015 T R -- 11:00am - 12:15pm -- Lutgert Hall 2209

How To Calculate Bond Price And Yield To Maturity

How To Manage A Call Center

CURRICULUM VITA. Michael J. Tammaro. Department of Physics University of Rhode Island Kingston, RI (401)

Categorical Data Analysis

The Trip Scheduling Problem

MATH. ALGEBRA I HONORS 9 th Grade ALGEBRA I HONORS

Lesson 1. Key Financial Concepts INTRODUCTION

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

SAS R IML (Introduction at the Master s Level)

Marketing Research Core Body Knowledge (MRCBOK ) Learning Objectives

Bond Price Arithmetic

Transcription:

TEACHING STATISTICS THROUGH DATA ANALYSIS Thomas Piazza Survey Research Center and Department of Sociology University of California Berkeley, California The question of how best to teach statistics is sometimes phrased in terms of learning by theory versus learning by doing. Unfortunately each approach, by itself, has serious drawbacks. We all know that statistical theory and proofs tend to be forgotten soon unless put to some use. On the other hand, to plunge a student into data analysis does not guarantee that he or she will ~inderstand the rationale for, and the limits of, the procedures that can now be so easily carried out by using statistical software packages. The real problem, as 1 see it, is how to guide the use of data analysis as a learning tool so that it leads students to an understanding of the statistical principles on which the various analytic techniques are based. In what follows I will summarize the pros and cons of three approaches to the use of data analysis in teaching statistics. Then I will try to draw a few conclusions about the matter. Note that the students we will be concerned with here are those who are at a beginning level and who are interested primarily in application. At a more advanced level, especially for those wishing to become professional statisticians, the balance can shift more in the direction of theoretical exposition. 1. Large Batch-Oriented Statistical Packages The most widely diffused data analysis tools are the large statistical packages such as SAS, BMDP, SPSS-X, and other sets of programs that offer a wide variety of data management and analysis procedures. These packages can be used to carry out analyses in conjunction with a statistics course A data file can be set up for students to use, and they can carry out assignments or work on projects using an available statistical package. A partial step in this direction is the textbook by Mosteller, Fienberg, and Rourke (1988), which supplements the discussion of multiple regression by displaying and interpreting computer output; a few problems are even provided specifically for students with access to a computer with a multiple regression program. The use of large packages for instruction has both advantages and disadvantages. The principal advantage of using statistical packages is their comprehensive nature. Although the multiplicity of available procedures can cause confusion at first, students profit by being exposed to the variety of possible methods that can be applied to the analysis of data. They also learn a great deal in the process of figuring out the meaning of the computer output. Furthermore, since students will most likely be using statistical packages in their work later on, the use of these programs can make the courses seem more realistic and practical.

It is also worth noting that these packages are readily available in most universities. If adequate computer time can be obtained for instruction, a set of exercises can be developed and quickly incorporated into a course. A disadvantage of using statistical packages for instruction is the false sense of precision that students can acquire. The computer programs rarely complain when basic assumptions are being violated or when the form of the data is inappropriate for the procedure used. The computer printout can look good but still be nonsense. Another problem is that students can get bogged down with the mechanics of producing the runs. The correct specification of control language and options can be so distracting that students implicitly come to think that the successful production of computer printout is the same as the solution to a statistical problem. It is important for students to remain aware that the purpose of the course is to teach statistical principles and sound analytic technique - not just to develop facility with a particular statistical package. A related problem is that turn-around time can still be slow at many institutions. Ideally the computer should give results immediately for the problems we submit. A delay of a few hours may be tolerable; however, if the results are not available for days, it is easy for students to lose track of what the whole point of the analysis was. Once again, it is important that students not get bogged down with mechanical details. On balance, consequently, the use of large statistical packages for teaching statistics can be helpful, but they are certainly not the ideal solution. It is worth considering other approaches as well. 2. l nteractive Analysis Programs Computer programs for minicomputers and microcomputers have begun to appear which are designed for interactive data analysis. These programs are somewhat more promising for teaching statistics than are the large batch-oriented packages. Examples are the S package from Bell Laboratories, and ISP from Berkeley; the microcomputer versions of SPSS and SAS can also be considered semi-interactive. The distinguishing characteristic of interactive data analysis programs is that the results of one stage of analysis can immediately be used to specify the next stage. At the simplest level this means that the results can be examined on the screen of a computer terminal and used as the basis for an additional analysis specification. At a more sophisticated level this means that intermediate results are available for examination and can guide further analytic steps without requiring the user to issue a whole new set of commands. The main advantage of interactive programs is this immediate feedback. This approach encourages exploration of the data, and it facilitates the development of good problem-solving strategies. Some of the disadvantages

of batch-oriented programs can thus be overcome.. The text by McNeil (1 977) illustrates some of these possibilities. On the other hand, interactive statistical programs still have some. of the same problems as batch programs for purposes of teaching statistics. There are still the problems of having to spend a lot of time learning to use the programs and of getting distracted by the mechanics of producing results. Furthermore, interactive programs produce results in the same uncritical manner as any other statistical computer program, and those results need not make much sense. All in all, however, an interactive statistical package is likely to be more appropriate for teaching statistics than a batch-oriented package. If such a set of programs is available for student use, it is well worth investigating. 3. Simulations Based on Spreadsheets There is one other computer application that I wish to mention in this context, and that is the use of spreadsheet programs that are widely available for microcomputers. These programs set up a large matrix in which some cells can be defined as a function of other cells. When the latter cells are modified, then the cells defined as functions also change. These functions can be as simple as the sum of two cells or as complex as a chi-square statistic based on a contingency table. The instructional value of these spreadsheet programs is due to their ability to make clear and transparent how a set of observations generates a particular statistic. A student can see immediately how a change in observed values leads to a change in the statistic (Kelman et al., 1983). If set up appropriately, a spreadsheet application can allow the student to simulate easily the effects of different distributions or patterns and to examine the behavior of a statistic under many conditions. Let me give a relatively simple example of a spreadsheet that I have used in my own courses. In a course on log-linear analytic methods I wanted the students to develop a good intuitive sense of the behavior of the oddsratio in a 2x2 contingency table. So I set up a spreadsheet with a contingency table and the computed odds-ratio (plus Yule's Q and the logarithm of the odds-ratio). The students could change the cell counts of the contingency table and see immediately how those changes affected the oddsratio (and the other associated statistics). In a very short period of time a student could explore quite thoroughly the behavior of this statistic under various conditions. A more interesting and complex example from the same course was a spreadsheet set up to generate the model of independence and the associated chi-square statistic in a bivariate contingency table. A starting table was also included so that some of the cells could be defined as structural zeros. The student could manipulate both the cell frequencies and the starting table in order to see how well the models of independence and

quasi-independence fit the cell frequencies. 1 found this spreadsheet application to be particularly helpful to the students. These spreadsheet applications can focus the students' attention on certain important principles or on the behavior of a specific statistic. Once set up, these spreadsheets are very easy for students to use, and they can provide real insights into the subject matter of the course. The limitations of these spreadsheets, of course, are very clear. Not every calculation can be set up coriveniently in such a format, and the functions available (logs, square roots, trigonometric functions, and so forth) vary from one program to the other. Another problem is that the data set for calculations must be relatively small, if students are to modify it easily and intelligibly. All in all, however, I believe that this approach to using computers holds great promise for teaching statistical principles. At the present time it is a little cumbersome for the professor to have to set up spreadsheets for a course, but I suspect that in due time some appropriately designed software will facilitate the task or provide some equivalent methodology. 4. Conclusion \ The basic question we need to address is how people learn - more specifically, how they learn statistical concepts and techniques. Most of us have probably come to the conclusion that it is not enough to teach statistical theorems and derivations, with a few exercises thrown in. We need more emphasis on the analysis of data, on putting statistics to work. Nevertheless, it is not always easy to develop an approach that integrates theory, data analysis, and the requisite computer skills. As discussed above, statistical packages and programs can be helpful, but they are not a substitute for understanding statistical theory. As we also noted, students can spend so much time getting the computer to do something that they lose sight of the statistical principles that are supposedly being exerci sed. One use of computers that I consider especially promising is the use of spreadsheet programs, which are available for practically any microcomputer. Once set up appropriately, these programs allow students to change a set of data and to see immediately the impact of those changes on a statistic. This method can help students to understand the meaning of a certain statistic or statistical principle and to acquire an intuitive grasp of how robust a statistic is under different conditions. This approach, although relatively undeveloped, is a good example of how computers and data analysis can further a theoretical understanding of statistics. References Kelman, P. et al. (1983). Computers in teachinq mathematics. Reading, Mass. : Addison-Wesley.

McNeil, D. R. (1977). York: John Wiley. Interactive data analysis: A practical primer. New Mosteller, F., Fienberg, S.E., & Rourke, R.E. K. (1983). Beqinninq statistics with data analysis. Reading, Mass. : Addison-Wesley.