Probability Distributions



Similar documents
Summary of important mathematical operations and formulas (from first tutorial):

Creating Charts in Microsoft Excel A supplement to Chapter 5 of Quantitative Approaches in Business Studies

Appendix 2.1 Tabular and Graphical Methods Using Excel

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

Excel Guide for Finite Mathematics and Applied Calculus

Using Excel 2003 with Basic Business Statistics

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

Drawing a histogram using Excel

Using Excel for descriptive statistics

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Formulas, Functions and Charts

MBA 611 STATISTICS AND QUANTITATIVE METHODS

If there is not a Data Analysis option under the DATA menu, you will need to install the Data Analysis ToolPak as an add-in for Microsoft Excel.

This activity will show you how to draw graphs of algebraic functions in Excel.

Using Microsoft Excel to Plot and Analyze Kinetic Data

Microsoft Excel 2010 Part 3: Advanced Excel

What is the difference between simple and compound interest and does it really matter?

To launch the Microsoft Excel program, locate the Microsoft Excel icon, and double click.

Advanced Microsoft Excel 2010

6.4 Normal Distribution

Data Analysis Tools. Tools for Summarizing Data

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

Excel Basics By Tom Peters & Laura Spielman

Excel 2003 A Beginners Guide

Intro to Excel spreadsheets

Using SPSS, Chapter 2: Descriptive Statistics

ITS Training Class Charts and PivotTables Using Excel 2007

Table of Contents TASK 1: DATA ANALYSIS TOOLPAK... 2 TASK 2: HISTOGRAMS... 5 TASK 3: ENTER MIDPOINT FORMULAS... 11

Graphing Parabolas With Microsoft Excel

Updates to Graphing with Excel

4. Continuous Random Variables, the Pareto and Normal Distributions

INTRODUCTION TO EXCEL

EXCEL SPREADSHEET MANUAL

Lab 1: The metric system measurement of length and weight

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

Excel 2007 A Beginners Guide

Preface of Excel Guide

Q&As: Microsoft Excel 2013: Chapter 2

Charting LibQUAL+(TM) Data. Jeff Stark Training & Development Services Texas A&M University Libraries Texas A&M University

Microsoft Excel Tips & Tricks

Computer Skills Microsoft Excel Creating Pie & Column Charts

Create Charts in Excel

Excel Tutorial. Bio 150B Excel Tutorial 1

Tutorial 2: Using Excel in Data Analysis

MICROSOFT EXCEL STEP BY STEP GUIDE

Sample Table. Columns. Column 1 Column 2 Column 3 Row 1 Cell 1 Cell 2 Cell 3 Row 2 Cell 4 Cell 5 Cell 6 Row 3 Cell 7 Cell 8 Cell 9.

How to make a line graph using Excel 2007

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA

Introduction to Microsoft Excel 2007/2010

The Center for Teaching, Learning, & Technology

Data exploration with Microsoft Excel: analysing more than one variable

Describing, Exploring, and Comparing Data

Excel -- Creating Charts

Creating a Gradebook in Excel

Excel 2003 Tutorial I

Years after US Student to Teacher Ratio

Excel Pivot Tables. Blue Pecan Computer Training Ltd - Onsite Training Provider :: :: info@bluepecan.co.

Microsoft Excel 2010 Tutorial

Chapter 2: Frequency Distributions and Graphs

6 3 The Standard Normal Distribution

Using Excel for Handling, Graphing, and Analyzing Scientific Data:

Characteristics of Binomial Distributions

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER

Step 3: Go to Column C. Use the function AVERAGE to calculate the mean values of n = 5. Column C is the column of the means.

2. Creating Bar Graphs with Excel 2007

Formulas & Functions in Microsoft Excel

Data exploration with Microsoft Excel: univariate analysis

ECDL. European Computer Driving Licence. Spreadsheet Software BCS ITQ Level 2. Syllabus Version 5.0

Basic Formulas in Excel. Why use cell names in formulas instead of actual numbers?

Microsoft Excel Basics

CREATING EXCEL PIVOT TABLES AND PIVOT CHARTS FOR LIBRARY QUESTIONNAIRE RESULTS

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Chapter 4 Displaying and Describing Categorical Data

To create a histogram, you must organize the data in two columns on the worksheet. These columns must contain the following data:

Getting Started with Excel Table of Contents

Spreadsheets and Laboratory Data Analysis: Excel 2003 Version (Excel 2007 is only slightly different)

TIPS FOR DOING STATISTICS IN EXCEL

Produced by Flinders University Centre for Educational ICT. PivotTables Excel 2010

MARS STUDENT IMAGING PROJECT

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Getting started in Excel

MICROSOFT EXCEL Formulas, Functions, & Macros. Documented by Vincent J. Yanusauskas Computer Training Coordinator

Scientific Graphing in Excel 2010

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Introduction to MS EXCEL 2007 Data entry & formatting Using formulas & functions Presenting data with charts Database features in Excel

Step Sheet: Creating a Data Table and Charts

Excel Using Pivot Tables

Computer Training Centre University College Cork. Excel 2013 Pivot Tables

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Probability Distributions

Using Microsoft Excel to Manage and Analyze Data: Some Tips

PERFORMING REGRESSION ANALYSIS USING MICROSOFT EXCEL

Key Concept. Density Curve

Introduction to Data Tables. Data Table Exercises

Formulas & Functions in Microsoft Excel

Presentations and PowerPoint

Creating Basic Excel Formulas

Basic Pivot Tables. To begin your pivot table, choose Data, Pivot Table and Pivot Chart Report. 1 of 18

Transcription:

CHAPTER 5 Probability Distributions CHAPTER OUTLINE 5.1 Probability Distribution of a Discrete Random Variable 5.2 Mean and Standard Deviation of a Probability Distribution 5.3 The Binomial Distribution 5.1 PROBABILITY DISTRIBUTION OF A DISCRETE RANDOM VARIABLE As we explained in Chapter 4 (Section 4.1), Excel can be used to convert a frequency distribution table into a probability distribution table. Then, a graphical presentation of this distribution can easily be generated, as illustrated below. Example 5-1 The table below gives the frequency distribution of the number of vehicles owned by all 2000 families living in a small town. Number of Vehicles Owned Frequency 0 30 1 470 2 850 3 490 4 160 Enter this into Excel as a probability distribution table (having Excel calculate relative frequencies in place of frequencies) and generate a graphical presentation of this distribution. 87

88 Chapter 5 Probability Distributions Solution: Type the column headings Number of Vehicles Owned, x and Probability, P(x) into cells A1 and B1. (Widen the columns as necessary.) Enter the values 0-4 into cells A2-A6. Click in cell B2, and type =30/SUM(30,470,850,490,160) and then copy this to cells B3-B6. Double-click on cell B3 so that you can edit it. Replace the 30 after the = sign with 470, so that it contains the formula =470/SUM(30,470,850,490,160). Hit ENTER. Edit cells B4-B6 similarly, so that they also contain the rest of the relative frequencies. Highlight the table and click on the chart wizard icon:. In Step 1, select the Column chart type and click on Next. In Step 2, you need to click on the Series tab and remove the Number of Vehicles Owned series from the list, leaving only the Probability series. Then click in the Category (X) axis labels text box and highlight cells A2-A6. Click on Next. In Step 3, edit the chart title to read Probability Distribution of the Number of Vehicles Owned by Families, enter Number of Vehicles for the x-axis label, and Probability for the y-axis label. Click on the Legend tab and remove the legend, and then click on Finish. Double-click on the bars, click on the Options tab, and lower the Gap width to 50 (or less). Resize the chart (by clicking on it and then dragging a corner) or move it (by clicking and dragging the whole thing) as necessary. Figure 5.1 Using Excel to generate a probability distribution table and graph. Cumulative Frequency Distributions In order to change a frequency distribution to a cumulative frequency distribution, you need to change all of the lower class limits to the same first lower class limit, and then calculate the cumulative frequencies.

Excel Manual 89 Figure 5.2 The Format Data Series dialog box enables you to format various aspects of the bar chart, including the width of the bars comprising the graph. Example 5-2 Construct a cumulative frequency distribution for the 50 states average travel times given in Example 2-4. Solution: First, construct the frequency table as above, but instead of editing the classes to show varying lower class limits with the upper class limits, use 15.0 as the lower class limit for all of the classes. (These will be cumulative classes.) In the column next to the frequencies, type in the heading "Cumulative Frequency." For the first one, just copy the first class' frequency. For the second one, type "=" and then arrow left, and type "+" and then arrow up to the previous cumulative frequency. Then copy this cell down to the cells below it (click on it and drag the square in the bottom right-hand corner). Finally, hide the column with the frequencies in it. (You can't delete it since doing so would also delete the actual references.) Right-click on the column letter and select Hide.

90 Chapter 5 Probability Distributions Figure 5.3 A cumulative frequency distribution. The cumulative frequencies are calculated by using cell references to frequencies in a hidden column. In order to generate an Ogive, which is really a cumulative frequency polygon, you need to enter upper class boundaries into a column. The upper class boundary of a class is always halfway between the upper class limit of that class and the next class' lower class limit. So, you can have Excel average these values for you, if you wish. Then copy the cumulative frequencies into the column next to them. (Go to Edit>Paste Special and select Values in order to avoid changing cell references.) Finally, insert a row above the first upper class boundary, enter the number that is exactly one class width less than the first upper class boundary (in the cell just above it), and enter a 0 in the cell just above the first cumulative frequency. Figure 5.4 Calculate upper class boundaries and position cumulative frequencies next to them in order to generate an Ogive.

Excel Manual 91 Highlight the columns and go to the Chart Wizard. Select the Line chart option. In step 2, you will probably need to fix the data references. Click on the Series tab. Under Series, you need only to have the Cumulative Frequency series; remove the other one. Click in the text box next to Category (X) axis labels and highlight the column of upper class boundaries. Then, you should see a good preview of the ogive. Remove the legend and edit titles in step 3, and click on Finish. Example 5-3 Construct a cumulative frequency distribution for the data in Example 2-5. Solution: Following the same procedure as above, only this time the cumulative classes are all scores up to and including a given score. Precisely, we the following is what the input should be for Example 2-5 (in comparison to Figure 5.4): Figure 5.5 The input used to construct a cumulative frequency distribution for Example 2-5. Again, in order to generate the cumulative frequencies in this case, begin by entering 2 into cell D2, then =B3+D2 into cell D3, and then =B4+D3 into cell D4. The remaining cumulative frequencies can now be easily generated by highlighting cells D3 D12, followed by Edit>Fill Down. The result is pictured above. Now, to form the ogive, use the first column as the upper class boundaries. Now, follow the same instructions as in Example 5-2 to obtain an ogive that resembles the one below:

92 Chapter 5 Probability Distributions Cumulative Frequency Distribution of Quiz Scores 40 35 30 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 Scores Figure 5.6 Ogive for the data in Example 2-5. Here, the labels for the scores have been shifted up by one. If this is not desirable, you can always change the scale/labels along the x-axis. 5.2 MEAN AND STANDARD DEVIATION OF A PROBABILITY DISTRIBUTION We will use Excel to calculate the mean and standard deviation of a discrete random variable similarly to how we had it calculate the mean and standard deviation of grouped data in Sections 2.5 and 2.6. The method we will use involves generating columns of values and using the following mathematical formulas found in your textbook: 2 2 2 µ = xp( x) σ = x P( x) µ (Unfortunately, Excel does not have one-step built-in functions for calculating these measures.) Example 5-4 The following table lists the probability distribution of the number of breakdowns per week for a machine based on past data. Breakdowns per week x Probability P(x) 0.15 1.20 2.35 3.30

Excel Manual 93 Use Excel to find the mean number of breakdowns per week for this machine. Also, use it to calculate the standard deviation of the number of breakdowns. Solution: Enter the probability distribution into a blank Excel worksheet. Include column headings of x and P(x), using cells A1-B5. In cell C1, type xp(x) for the column heading for the products that will be used for calculating the mean. In cell C2, type = and click on cell A2, then type * and click on cell B2, and hit ENTER. Then copy this down to cells C3-C5. In cell C7, insert the SUM function (or type =sum( without the quotes), highlight cells C2-C5 and hit ENTER. You should see the mean value of 1.8 appear. Figure 5.7 Using Excel to find the mean of a probability distribution. Now, type a column label of xxp(x) or x 2 P(x) in cell D1 for the products that will be used for calculating the standard deviation. Click on cell D2, type = and click on cell A2, then type ^2* and click on cell B2. Then hit ENTER. Copy this down to cells D3-D5. Then copy cell C6 over to cell D6 to show the sum of that column, 4.3. Now, click on an empty cell nearby, say cell E6. Here we will finish calculating the standard deviation. Type =SQRT( and click on cell D6, then type - and click on cell C6, then type ^2) and hit ENTER. The standard deviation of 1.029563 appear. Figure 5.8 Using Excel to find the standard deviation of a probability distribution.

94 Chapter 5 Probability Distributions 5.3 THE BINOMIAL DISTRIBUTION Recall that the binomial random variable X is used to generate a probabilistic model of a situation in which an experiment involving only two outcomes (success and failure) is performed n independent, identical times. The binomial probability distribution and cumulative distribution, along with its parameters are as follows: Let p = probability of a success and q= 1 p. Then, 1) n a n a P(a successes) = PX ( = a) = pq, for any a = 0, 1, 2,, n. a 2) Cumulative probability distribution is PX ( a) = n x n x pq x a x 3) Mean µ = np Standard Deviation σ = npq Excel has the binomial probability distribution built-in as the function BINOMDIST. This function requires four inputs: 1) the number of successes that you are interested in, x = Number_s 2) the total number of trials, n = Trials 3) the probability of success, p, as a decimal = Probability_s 4) 1 (TRUE) if you want the cumulative probability of at most ( ) x successes, or 0 (FALSE) if you want the probability of exactly (=) x successes = Cumulative. We remark that Excel s built-in probability distributions, such as the Binomial, will allow you to be slightly more accurate than using values from tables that have been rounded off. (Your answers may differ slightly from those in your textbook as a result.) Example 5-5 At the Express House Delivery Service, providing high-quality service to customers is the top priority of the management. The company guarantees a refund of all charges if a package it is delivering does not arrive at its destination by the specified time. It is known from past data that despite all efforts, 2% of the packages mailed through this company do not arrive at their destinations within the specified time. Suppose a corporation mails 10 packages through Express House Delivery Service on a certain day. Use Excel to find a) the probability that exactly 1 of these 10 packages will not arrive at its destination within the specified time, and b) the probability that at most 1 of these 10 packages will not arrive at its destination within the specified time.

Excel Manual 95 Solution: Click on a blank cell in an Excel worksheet. For part a), the first input is 1, for 1 of the 10 packages, the second input is 10, the total number of packages, the third input is.02, the probability that a package does not arrive by the specified time, and the fourth is 0 (for FALSE), since we want the probability for exactly 1. Click on OK and you should see the probability of 0.16675 appear. See Figure 5.9 below. Figure 5.9 Go to Insert>Function in order to insert Excel s BINOMDIST function, and fill in the four inputs in order to calculate a binomial probability. For part b), the inputs are all the same except the last one, which needs to be a 1 (for TRUE), since we want the probability of at most (less than or equal to) 1. If you want to, you can just double-click on the cell and edit this input instead of reinserting the function and starting over. When you hit ENTER, you should see the probability of 0.983822 appear. Figure 5.10 Excel s BINOMDIST function can be typed in and/or edited instead of inserted. Excel s BINOMDIST function and chart wizard are also wonderful tools for viewing an entire binomial distribution. Example 5-6 According to an Allstate Survey, 56% of Baby Boomers have car loans and are making payments on these loans (USA TODAY, October 28, 2002). Assume that this result holds true for the current population of all Baby Boomers. Let x denote the

96 Chapter 5 Probability Distributions number in a random sample of three Baby Boomers who are making payments on their car loans. Have Excel generate the probability distribution of x and graph it. Solution: Enter headings of x and P(x) into cells A1 and B1. Enter the possible values of x: 0, 1, 2, and 3, into cells A2-A5. Click on cell B2 and insert the BINOMDIST function. For the first input, Number_s, move the window out of the way, if necessary, and click on cell A2. For the second input, Trials, type a 3. For the third input, Probability_s, enter.56, and for the fourth input, Cumulative, type 0 (for FALSE). Hit ENTER and you should see the value of 0.085184 appear. Now copy this cell down to cells B3-B5, and you ll have the probabilities for the other success values as well! For the graph, click on a blank cell and then click on the chart wizard icon:. In Step 1, select the Column chart type and click on Next. In Step 2, highlight cells B2- B5 (the probabilities only) for the Data range. Then click on the Series tab, click in the Category (X) axis labels text box, and highlight cells A2-A5 (the values of x). Click on Next. In Step 3, edit the titles and click on the Legend tab and remove the legend, and then click on Finish. Double-click on the bars, click on the Options tab, and lower the Gap width to 50 (or less). Finally, resize the chart (by clicking on it and then dragging a corner) or move it (by clicking and dragging the whole thing) as necessary. Figure 5.11 Using Excel s BINOMDIST function (with the FALSE option) and chart wizard to represent a binomial distribution. Variations of the BINOMDIST function are sometimes required to answer specific probability questions. Remember that at most is the same as equal or less and at least is the same as equal or more. Example 5-7 According to a 2001 study of college students by Harvard University s School of Public Health, 19.3% of those included in the study abstained from drinking

Excel Manual 97 (USA TODAY, April 3, 2002). Suppose that of all current college students in the United States, 20% abstain from drinking. A random sample of six college students is selected. Use Excel to answer the following. a) Find the probability that exactly three college students in this sample abstain from drinking. b) Find the probability that at most two college students in this sample abstain from drinking. c) Find the probability that at least three college students in this sample abstain from drinking. d) Find the probability that one to three college students in this sample abstain from drinking. Solution: Using the BINOMDIST function for the various inequalities involved, with 0 (FALSE) for = and 1 (TRUE) for, the inputs and answers are as follows: a) P(x = 3) is calculated by: =BINOMDIST(3, 6,.20, 0) and the result is: 0.08192. b) P(x 2) is calculated by: =BINOMDIST(2, 6,.20, 1) and the result is: 0.90112. c) P(x 3) = 1 P(x < 3) = 1 P(x 2), so use: =1-BINOMDIST(2, 6,.20, 1) and the result is: 0.09888. d) P(1 x 3) = P(x 3) P(x <1) = P(x 3) P(x 0), so use: = BINOMDIST(3, 6,.20, 1)-BINOMDIST(0, 6,.20, 1), which gives 0.720896. P-charts for Trend: Much like a time-series plot (discussed in Section 2.7) was formed to investigate the change over time in certain numerical measurements for purposes of statistical control, the same type of chart can be formed in the same way to graph a series of sample proportions. Indeed, assuming that p 0 is the long-term expected proportion, the upper and lower bounds for the control chart are given by: 0(1 0) 0 3 p p ± p n The fact that 3 standard deviations are used in contrast to 2 for the sample mean in Section 2.6 of your textbook is simply a matter of preference; any number can be used here depending on the context of the study. Note: Actually, these bounds correspond to the endpoints of the confidence interval about the sample proportion, used for inferential purposes. This will be discussed in a subsequent chapter in more detail. Example 5-8 Suppose that the number of applications requiring full review out of the 20 in the sample were recorded for 19 weeks, resulting in the following measurements:

98 Chapter 5 Probability Distributions 11 7 8 4 9 10 4 8 8 7 10 6 9 10 7 7 6 9 10 Compute the sample proportions corresponding to each measure and plot them in a p- chart. Solution: Enter into column A the label Sample number (in cell A1) and then 1 19 in cells A2 A20, respectively. Then, enter into column B the label Number (in cell B1) and then the above data in cells B2 B20. Then, in column C, type the label Proportion (in cell C1) and then, in cell C2, type =B2/20. Then, highlight cells C2 C20 and go to Edit>Fill down to fill in the remaining proportions. Now, highlight columns A and C, go to the Chart Wizard, and create a scatter diagram just as described earlier. The output should look as follows: Proportion of Applications Needing Full Review: 19 Week Study Proportion 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 Sample Number Figure 5.12 p-chart for Example 5-8. Now, to calculate the upper and lower bounds (corresponding to 3 standard deviations from the value of p 0, which in this case is taken to be SUM(B2:B20)/380 = 0.4), we simply allow Excel to perform the arithmetic for us. Indeed, we have: Lower Bound: 0.4 3*SQRT(.4 * (1 -.4) / 20 ) = 0.07 Upper Bound: 0.4 + 3*SQRT(.4 * (1 -.4) / 20 ) = 0.73 Notice that (1) none of the sample proportions escape these two bounds, and (2) the sample proportions do indeed seem to fluctuate just around 0.4. These two observations suggest the situation is indeed stable.

Excel Manual 99 Exercises 5.1 One of the most profitable items at A1 s Auto Security Shop is the remote starting system. Let x be the number of such systems installed on a given day at this shop. The following table lists the frequency distribution of x for the past 80 days. x 1 2 3 4 5 f 8 20 24 16 12 Use Excel to convert this frequency distribution table to a probability distribution table for the number of remote starting systems installed on a given day. Then have Excel generate a graph of the probability distribution. 5.2 Do the following for the data sets in Exercises 2.2 and 2.3: a) Prepare a cumulative frequency distribution. b) Have Excel generate an ogive for the cumulative frequency distribution. 5.3 The following table lists the probability distribution of the number of exercise machines sold per day at Elmo s Sporting Goods store. Machines sold per day 4 5 6 7 8 9 10 Probability.08.11.14.19.20.16.12 Use Excel to calculate the mean and standard deviation for this probability distribution. 5.4 In a poll of 12- to 18-year-old females conducted by Harris Interactive for the Gillette Company, 40% of the young females said that they expected the United States to have a female president within 10 years (USA TODAY, October 1, 2002). Assume that this result is true for the current population of all 12- to 18-year-old females. Suppose a random sample of 16 females from this age group is selected. Using Excel s BINOMDIST function, find the probability that the number of young females in this sample who expect a female president within 10 years is a) at least 9 b) at most 5 c) 6 to 9. 5.5 An instructor has been tracking the performance of high school seniors on his final exam in psychology for 25 years. Specifically, he is interested in the proportion of students receiving a C or better on this exam. These proportions are tabulated below. Form a p-chart to discuss the consistency in these percentages over the past 25 years: 80 81 88 65 77 72 74 70 59 74 71 73 79 50 63 68 71 65 64 49 66 52 60 59 62