Learning Objectives. Sample: A sample is a subset of measurements selected from the population of interest. 1 P age

Similar documents
Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Exploratory Data Analysis

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

How To Write A Data Analysis

Chapter 5. Random variables

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

Module 4: Data Exploration

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

4. Continuous Random Variables, the Pareto and Normal Distributions

Descriptive Statistics

Lecture 1: Review and Exploratory Data Analysis (EDA)

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Means, standard deviations and. and standard errors

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Exploratory data analysis (Chapter 2) Fall 2011

Characteristics of Binomial Distributions

STAT355 - Probability & Statistics

Northumberland Knowledge

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

PROBABILITY AND SAMPLING DISTRIBUTIONS

List of Examples. Examples 319

Chapter 4. Probability and Probability Distributions

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Variables. Exploratory Data Analysis

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Week 1. Exploratory Data Analysis

THE BINOMIAL DISTRIBUTION & PROBABILITY

You flip a fair coin four times, what is the probability that you obtain three heads.

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Lecture 5 : The Poisson Distribution

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

MAS131: Introduction to Probability and Statistics Semester 1: Introduction to Probability Lecturer: Dr D J Wilkinson

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

The normal approximation to the binomial

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Data Exploration Data Visualization

Fairfield Public Schools

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

6.4 Normal Distribution

1.1 Introduction, and Review of Probability Theory Random Variable, Range, Types of Random Variables CDF, PDF, Quantiles...

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

AP STATISTICS REVIEW (YMS Chapters 1-8)

Describing, Exploring, and Comparing Data

Exercise 1.12 (Pg )

The normal approximation to the binomial

0 x = 0.30 x = 1.10 x = 3.05 x = 4.15 x = x = 12. f(x) =

Mind on Statistics. Chapter 2

3.4. The Binomial Probability Distribution. Copyright Cengage Learning. All rights reserved.

WHERE DOES THE 10% CONDITION COME FROM?

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Sampling and Descriptive Statistics

Descriptive Statistics

Section 6.1 Discrete Random variables Probability Distribution

Random Variables. Chapter 2. Random Variables 1

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Description. Textbook. Grading. Objective

2 Binomial, Poisson, Normal Distribution

Some special discrete probability distributions

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Probability Distributions

3: Summary Statistics

Sta 309 (Statistics And Probability for Engineers)

CAMI Education linked to CAPS: Mathematics

Review of Random Variables

Random variables, probability distributions, binomial random variable

Dongfeng Li. Autumn 2010

AP * Statistics Review. Descriptive Statistics

An Introduction to Basic Statistics and Probability

TEACHER NOTES MATH NSPIRED

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

ST 371 (IV): Discrete Random Variables

BINOMIAL DISTRIBUTION

Intro to Statistics 8 Curriculum

Algebra II EOC Practice Test

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

How To Understand And Solve A Linear Programming Problem

MATH 140 Lab 4: Probability and the Standard Normal Distribution

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

Summarizing and Displaying Categorical Data

Important Probability Distributions OPRE 6301

Transcription:

Learning Objectives Definition: Statistics is a science, which deals with the collection of data, analysis of data, and making inferences about the population using the information contained in the sample. Population: A finite or infinite collection of measurements or individuals that comprises the totality of all possible measurements within the context of a particular statistical study. Sample: A sample is a subset of measurements selected from the population of interest. 1 P age

An example of Population and Sample A nationwide survey was conducted to determine which issues were of greatest concern among Americans. Each responded in the survey was randomly selected according to a sampling plan reflecting the proportion of individuals in categories defined by several demographic variables such as age, sex, income and geographic region. Participants were asked to specify the national problem that caused them the most concern. Some typical responses were poverty, drug abuse, unemployment, and the federal budget deficit. (a) What is the response that will be measured in this survey? (b) Define the population of interest to the experimenter. (c) Describe the sampling procedure used by the experimenter. (d) What demographic groupings might the experimenter consider as subpopulation within the main population to be studied concerning their response to the survey? 3.1 Describing Variation Some variation in the process is unavoidable. Because, two units of product by the same manufacturing process are not identical. Statistics is a science of analyzing data and drawing inferences by taking variation in the data into account. 3.1.1 The Stem-and-Leaf Plot (stem plot) Suppose we have a set of data denoted by x 1, x 2,., xn and each number of x i consists of at least two digits. To construct stem plot, we divide each number x i into two parts: A stem consisting of one or more of leading digits and a leaf, consisting of the remaining digits. Example 3.1, page 64: A sample of the cycle time in days to process and pay employee health insurance claims in a large company are given in Table 3.1. The data and stem plot are presented below: 2 P age

Figure 3.2 also called a run chart. 3.1.2 The Histogram Bar charts that depict data on a single measured characteristic are called histograms. The bars are formed by dividing up the horizontal scale into a collection of classes and then counting the class frequencies with which the measurements fall into these classes. A histogram represents a visual display of the data and very useful to describe the shape of the data distribution. The shape of the histogram could be symmetric or skewed (left skewed or right skewed). 3 P age

Example 3.2, page 67: The thickness of a metal layer on 100 silicon wafers resulting from a chemical vapor deposition (CVD) process in a semiconductor planet and presented in Table 3.2. Construct a histogram for this data. Construction of a Histogram Group values of the variable into bins (or classes, groups), then count the number of observations that fall into each bin Plot frequency (or relative frequency) versus the values of the variable Shape of the layer thickness data? Reasonably symmetric or bell shaped 4 P age

3.1.3 Numerical Summary of Data Statistic: Any number or summary measure, calculated form a set of sample data is called a statistic. Statistic is a function of sample observations. Sample Average: Suppose x 1, x 2,., xn are the observations in a sample. The most important measure of central tendency in the sample is the sample average (or sample mean). x x+x+ +. x xi n n 1 2 n = = (3.1) Sample Variance (or dispersion): The variability in the sample data is measured by the sample variance and defined as n 2 ( xi x) 2 i= 1 s = n 1 A short-cut method for sample variance is s 2 = n i= 1 x 2 i nx n 1 2 (3.2) The square root of the sample variance is called sample standard deviation (SD) and denoted by s, 5 P age

s = s = n 2 i= 1 ( x x) i 2 (3.3) n 1 The main advantage of the sample standard deviation is that it can be expressed in the original units of measurement. That means both mean and SD has the same unit of measurements. The sample variance and standard deviation of metal thickness data are 180.2928 and 13.43 respectively. 3.1.4 The Box Plot Stem plots and histograms are excellent graphic displays for focusing attention on key aspects of the shape of a distribution of data. However, they are not good tools for making comparison among data sets. To construct a box-plot, we need the following 5 numbers summary. Five numbers summary: Minimum, First Quartile, Median, Third Quartile and Maximum. Minimum: Minimum is the smallest value in the data set. Maximum: Maximum is the largest value in the data set. Median: Median is the middle most value of a data set. That is, the median of a set of measurements is the value of x such that at most half of the measurements are less than x and at most half of the measurements are greater than x. First Quartile (Lower quartile): First quartile is the middle value among the data points below the median and is denoted by Q 1. Third Quartile (Upper quartile): Third quartile is the middle value among the data points above the median and is denoted by Q 3 Interquartile Range (IQR) = Q 3 - Q 1 Example 3.4, page 71: The data in Table 3.4 are diameters (in mm) of holes in a group of 12 wing leading edge ribs for a commercial transport airplane. Construct and interpret the box plot of these data. 6 P age

From the above box plot we find, minimum=120.1, Q 1=120.35, Median ( Q 2)=120.6, Q 3 =120.9 and maximum=121.3. We expect that data will be right skewed. Comparative Box plots Figure 3.8 shows the comparative box plots for a manufacturing quality index on products at three manufacturing plants. We can see higher variability in plant 2 and both plant 2 & 3 need to raise their quality index performance. 7 P age

Comments on Mean, Median, SD and IQR: The mean provides a better description of the center of a data set if the distribution of the data is symmetric while the median provides a better description of the center of a skewed (right or left) data. Standard deviation (SD) provides a better description of the variability of a symmetric data while IQR provides a better description of the variability of a skewed data set. 3.1.5 Probability Distributions 8 P age

9 P age Discrete probability distribution and Continuous probability In Discrete probability distribution: 1) ( ) ( ) ( = = a X P a X P a X P 1) ( ) ( ) ( = a X P b X P b X a P In Continuous probability distribution: 0 ) ( = = a X P ) ( ) ( ) ( a X P b X P b X a P =

The population mean and population standard deviation 10 P age

The mean is not necessarily the 50th percentile of the distribution (that s the median). The mean is not necessarily the most likely value of the random variable (that s the mode). However, for a mound shaped (symmetric) distribution, mean, median and mode are the same. 3.2 Important Discrete Distributions 3.2.1 The Hypergeometric Distribution Suppose there are N items in a lot and D of these items are defectives. A random sample of n items is selected from these N items without replacement. If x denotes the number of defective items in the sample of size n, then x will follow a hypergeometric distribution and defined as follows 11 P age

Example page 76-77 3.2.2 The Binomial Distribution Consider a process that consists of a sequence of n independent trials. When the outcome of each trial is either success or failure, the trials are called Bernoulli trials. If the probability of success on any trial say p, is constant, then the number of success x in n Bernoulli trials has the binomial distribution with parameters n and p and defined as follows. Extra Example 1: Suppose ten items will be tested from a lot. Each item can pass the test with probability 0.90 and fail with probability 0.10. Calculate the probability that (a) exactly 3 items will fail, (b) less than 3 items will fail, (c) between 2 and 4 items (inclusive) will fail. 12 P age

3.2.3 The Poisson Distribution The Poisson distribution is widely used in statistical quality control and improvement, frequently as the underlying probability model for count data. Extra Example 2: For a certain manufacturing industry, the number of accidents averages 2 per week. (a) Find the probability that at least 2 accidents will occur in a given week. (b) Find the probability that no accident will occur in 2 weeks. (c) What is the expected number of accidents in a given 28 days? 3.2.4 The Pascal Distribution (Negative Binomial Distribution) The Pascal distribution, like the binomial distribution, has its basis in Bernoulli trials. Consider a sequence of independent trials, each with probability of success p, and let x denote the trial on which the rth success occurs. The x is a Pascal random variable with the following probability distribution. 13 P age

When r = 1 the Pascal distribution is known as the geometric distribution The geometric distribution has many useful applications in SQC Extra Example 3: Suppose 10% of the engines manufactured on a certain assembly line are defective. If engines are randomly selected one at a time and tested, find the probability that the third non-defective engine is found on the fifth trial. Find the mean and variance of the number of trial on which the third non-defective engine is found. 14 P age

3.3 Some Important Continuous Distributions 3.3.2 The Normal Distribution The normal distribution is the most useful distribution in both theory and application of statistics. If x is a normal random variable, then the probability distribution of x is defined as follows. 15 P age

Standard Normal Distribution Example 3.7, page 83 Example 3.8, page 84 Example 3.9, page 85 Linear Combinations of Normal Distribution 16 P age

That means y is distributed as normal with mean 2 in short, y ~ N( µ, σ ). y y 2 µ and variance σ y y. OR Central Limit Theorem (CLT) Practical interpretation the sum (or average) of independent random variables is approximately normally distributed regardless of the distribution of each individual random variable in the sum 3.3.3 The Exponential Distribution 17 P age

Exercise 3.29, page 101. The cumulative distribution function (cdf) of exponential is F( a) = P( x a) = 1 e λa This CDF is very useful to solve some problems for exponential distribution. 3.3.4 The Gamma Distribution 18 P age

Result: If x, 1 x, 2,xrare exponential with parameter λ and independent, then y=x 1+ x 2+ + x r is distributed as gamma with parameters λ and r. Example 3.11, page 91. 3.4 Probability Plots Determining if a sample of data might reasonably be assumed to come from a specific distribution Probability plots are available for various distributions Easy to construct with computer software (MINITAB) Subjective interpretation 3.4.1 Normal Probability Plots 19 P age

3.4.2 Other Probability Plots (page 95) 3.5 Some Useful Approximations 3.5.1 The Binomial Approximation to the Hypergeometric n Consider hypergeometric distribution in equation (3.8). If 0.10 N, then the D Binomial distribution with parameters p = and n is a good N 20 P age

approximation to the hypergeometric distribution. The approximation is n better for small, which also called the sampling fraction. N See example on page 96 3.5.2 The Poisson Approximation to the Binomial When n is large and p is small (p < 0.1), the Poisson probability distribution provides a good approximation to binomial probabilities with λ=np. Extra Example 4: When the circuit boards used in the manufacture of compact disc players are tested, the percentage of defectives is found to be 5%. Let X denote the number of defectives board in a random sample of size 100. Then X has a binomial distribution. What is the probability that none of the 100 boards is defective? 3.5.3 The Normal Approximation to the Binomial distribution If x is distributed as Binomial with parameter n and p, then the binomial probability distribution can be approximated by using a normal curve with µ=np and σ = npq, where n = number of trials and p = probability of success. The binomial probability Pa ( x b) can be approximated by the normal probability, P[( a 0.5) x ( b+ 0.5)] as long as n is large and the interval np ± 2 npq falls between 0 and n. The half unit adjustment is called correction for continuity. That means Pa ( x b) P[( a 0.5) x ( b+ 0.5)] Extra Example 5: Suppose that 25% of the fire alarms in a large city are false alarms. Let x denotes the number of false alarms in a random sample of 100 alarms. Find the approximate probability that (a) there will be at least 30 false alarms. (b) there will be no more than 35 false alarms. 21 P age