An optical illusion. A statistical illusion. What is Statistics? What is Statistics? An Engineer, A Physicist And A Statistician.



Similar documents
Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Output Analysis (2, Chapters 10 &11 Law)

Measures of Spread and Boxplots Discrete Math, Section 9.4

Confidence Intervals for One Mean

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

I. Chi-squared Distributions

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Hypothesis testing. Null and alternative hypotheses

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

PSYCHOLOGICAL STATISTICS

5: Introduction to Estimation

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

1. C. The formula for the confidence interval for a population mean is: x t, which was

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Normal Distribution.

Lesson 17 Pearson s Correlation Coefficient

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

One-sample test of proportions

Statistical inference: example 1. Inferential Statistics

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Descriptive Statistics

Chapter 7: Confidence Interval and Sample Size

Chapter XIV: Fundamentals of Probability and Statistics *

1 Correlation and Regression Analysis

Overview of some probability distributions.

Math C067 Sampling Distributions

Maximum Likelihood Estimators.

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Determining the sample size

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Chapter 7 Methods of Finding Estimators

Properties of MLE: consistency, asymptotic normality. Fisher information.

NATIONAL SENIOR CERTIFICATE GRADE 12

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Quadrat Sampling in Population Ecology

Confidence Intervals

1 Computing the Standard Deviation of Sample Means

Sampling Distribution And Central Limit Theorem

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

CHAPTER 3 THE TIME VALUE OF MONEY

3. If x and y are real numbers, what is the simplified radical form

Now here is the important step

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

3 Basic Definitions of Probability Theory

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

STATISTICAL METHODS FOR BUSINESS

OMG! Excessive Texting Tied to Risky Teen Behaviors

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Hypergeometric Distributions

Chapter 14 Nonparametric Statistics

Exploratory Data Analysis

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Practice Problems for Test 3

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

Soving Recurrence Relations

Tradigms of Astundithi and Toyota


Building Blocks Problem Related to Harmonic Series

Modified Line Search Method for Global Optimization

CS103X: Discrete Structures Homework 4 Solutions

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Information about Bankruptcy

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

LECTURE 13: Cross-validation

Asymptotic Growth of Functions

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

Page 1. Real Options for Engineering Systems. What are we up to? Today s agenda. J1: Real Options for Engineering Systems. Richard de Neufville

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

CHAPTER 3 DIGITAL CODING OF SIGNALS

TO: Users of the ACTEX Review Seminar on DVD for SOA Exam MLC

INVESTMENT PERFORMANCE COUNCIL (IPC)

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

Bond Valuation I. What is a bond? Cash Flows of A Typical Bond. Bond Valuation. Coupon Rate and Current Yield. Cash Flows of A Typical Bond

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Listing terms of a finite sequence List all of the terms of each finite sequence. a) a n n 2 for 1 n 5 1 b) a n for 1 n 4 n 2

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Confidence intervals and hypothesis tests

A PROBABILISTIC VIEW ON THE ECONOMICS OF GAMBLING

7. Concepts in Probability, Statistics and Stochastic Modelling

1 The Gaussian channel

FOUNDATIONS OF MATHEMATICS AND PRE-CALCULUS GRADE 10

Transcription:

A optical illusio Yalçı Akçay CASE 7 56 yakcay@ku.edu.tr A statistical illusio A Egieer, A Physicist Ad A Statisticia Real estate aget sellig a house to a sob customer: typical mothly icome i the eighborhood = $5,7 Local politicia arguig for larger govermet support: typical mothly icome i the eighborhood = $3, Three people aswered a ad for a ope job - a egieer, a physicist ad a statisticia. Whe the egieer wet i, he was asked: Q: "What is two plus two?" A: "Four." Whe the physicist wet i, he was asked the same questio: Q: "What is two plus two?" A: "Four." The statisticia wet i et. Whe the questio was posed to him, he looked aroud secretively, shut the door ad drew the blids closed. His respose: "What do you wat it to be?" What is Statistics? There is a widespread mistrust of statistics i the world today. Everyoe kows about lyig with statistics, while good statistical aalysis is early impossible to fid i daily life. What is Statistics? There are three kids of lies lies, damed lies, ad statistics Statistical thikig will oe day be as ecessary for efficiet citizeship as the ability to read ad write Bejami Disraeli Prime Miister Eglad 5 H. G. Wells Eglish Author It ai t so much the thigs we do t kow that get us i trouble. It s the thigs that we kow that ait t so. Artemus Ward U.S. Author 6

What is Statistics? Statistics is the art of ever havig to say you're wrog. Statistics meas ever havig to say you're certai. 97.3% of all statistics are made up. 4.3% of all statistics are worthless. What is Statistics? We muddle through life makig choices based o icomplete iformatio 7 8 What is Statistics? What is Statistics? Most of us live comfortably with some level of ucertaity What makes statistics uique is its ability to quatify ucertaity, to make it precise. This allows statisticias to make categorical statemets, with complete assurace about their level of ucertaiy 9 So what is statistics? What Net? Statistics is a brach of mathematics dealig with the aalysis ad iterpretatio of masses of data.

Basic Defiitios A populatio is the group of all items of iterest (everythig you wish to study) to a statistics practitioer A sample is a subset of the populatio, ofte radomly chose ad preferably represetative of the populatio as a whole. A variable is a attribute, or measuremet, o members of a populatio. A observatio is a list of all variable values for a sigle member of a populatio (data) Basic Defiitios A variable is umerical if meaigful arithmetic ca be performed o it. Otherwise, the variable is categorical. A umerical variable is discrete if its possible values ca be couted. A cotiuous variable is the result of a essetially cotiuous measuremet. A categorical variable is ordial if there is a atural orderig of its possible values. If there is o atural orderig, it is omial. Cross-sectioal data are data o a populatio at a distict poit i time. Time series data are collected across time. 3 4 Eamples Number of LCD TVs shipped from various warehouses: 8 3 7 4 87 6 Marital status: sigle, married, married, married, divorced, widowed, sigle Quality ratigs of products: poor, fair, good, fair, very good, ecellet, good, poor, good Iterest rates at Turkish baks:.7% 4.8% 3.% 6.3%... Describig Data Eample: Aual salary figures ad related data for 5 employees of Beta Techologies Ic. o geder o age o umber of years of relevat work eperiece prior to employemet at Beta o umber of years of professioal eperiece at Beta Techologies o umber of years of post-secodary educatio o aual salary 5 6 Describig Data Frequecy Tables ad Histograms A frequecy table is a table cotaiig each category or value that a variable might have ad the umber of times that each oe occurs i the data. A histogram is a bar chart of these frequecies. 7 8 3

Frequecy Tables ad Histograms Prior Ep. Freq. 3 3 3 4 3 5 7 6 3 7 8 9 4 5 3 3 4 5 6 7 8 9 Frequecy Tables ad Histograms Prior Ep. Freq. -4 5-8 4 9-4 3-6 7-9 Aalyzig the Histogram I statistics, the features of iterest for a set of umerical data ca be classified as: ceter describes where, umerically, the data are cetered or cocetrated shape describes how the data are spread out aroud the ceter with respect to the symmetry or skewess of the data variability describes how the data are spread out aroud the ceter with respect to the smoothess ad magitude of the variatio Aalyzig the Histogram A histogram is said to be symmetric if, whe we draw a vertical lie dow the ceter of the histogram (the peak), the two sides are idetical i shape ad size. Aalyzig the Histogram A skewed histogram is oe with a log tail etedig to either right or left. Right skewed if tail eteds to the right (positive skewess) Left skewed if tail eteds to the left (egative skewess) Aalyzig the Histogram Mode is the observatio with the greatest frequecy. A uimodal histogram is oe with a sigle peak. A bimodal histogram is oe with two peaks, ot ecessarily equal i height. 3 4 4

Aalyzig the Histogram Whe data are ot very variable, the frequecy of observatios decreases steadily as you move away from the ceter. Describig the Relatioship Betwee Two Variables Suppose we are iterested i the relatioship betwee the umber of years of professioal eperiece at Beta Techologies ad the aual salary of a employee. The graphical techique used to describe the relatioship betwee two variables is the scatter plot. 5 6 Aalyzig Scatter Plots No relatioship betwee the two variables Aalyzig Scatter Plots Liear relatioship betwee the two variables.9.8.7.6.5.4.3....4.6.8.9.8 7.7.6.5.4.3....4.6.8 positive.9.8 7.7.6.5.4.3....4.6.8 egative 7 8 Aalyzig Scatter Plots No-liear relatioship betwee the two variables.3.5.5..5 Time Series Plots A scatter plot with the time series variable o the vertical ais ad time itself o the horizotal ais. Useful for forecastig future values of a time series. observe tred ad seasoal patters.5..5..4.6.8.5..4.6.8 Warig: Just because there is a relatioship betwee two variables, it does ot mea that there is a cause-ad-effect relatioship betwee the two variable time series plot 45 4 35 3 5 5 5 3 4 5 6 7 8 9 3 4 5 6 time variable times series plot 9 8 7 6 5 4 3 97 975 98 985 99 995 year 9 3 5

Art ad Sciece of Graphical Presetatios Eamples Graphical ecellece: ) is well-desiged presetatios of iterestig data a matter of substace, of statistics ad of desig ) gives the viewer the greatest umber of ideas i the shortest time with the least ik i the smallest space 3) is early always multivariate 4) requires tellig the truth about the data If growth is predicted i fish ad seafood products, why are all the lies poitig dowward? 3 3 Eamples Eamples 33 34 Eamples Eamples 35 36 6

Eamples 37 "Should we scare the oppositio by aoucig our mea height or lull them by aoucig our media height?" Sample Statistic or Populatio Parameter A parameter is a descriptive measuremet about a populatio. A statistic is a desciptive measuremet about a sample. We use statistics from a sample to make ifereces about the populatio parameters. Parameters are usually represeted by Greek letters ad statistics by Roma letter. Numerical Measures Cetral Tedecy Spread mea, media ad mode rage, variace ad stadard deviatio Associatio covariace ad correlatio Numerical measures allow the statistics practitioer to be more precise i describig characteristics of data. 39 4 Measures of Cetral Tedecy The mea is the arithmetic average of all values of a variable. sum of all the values Mea = umber of observatios (the mea of 3, 7, 4, 9, 7 is (3+7+4+9+7)/5 = 6) Measures of Cetral Tedecy Ecel formula: AVERAGE(data) Algebraic formula: sample mea X populatio mea μ 4 X = i= X i μ= N i= N X i 4 7

Measures of Cetral Tedecy The media is the middle observatio whe the data are listed from smallest to largest. If there are a eve umber of observatios the the media is the average of the two middle values 4, 9, 7,, 6, 5, 5, 9 sort 4, 5, 5, 6, 7, 9, 9, 6.5 Ecel formula: MEDIAN(data) Measures of Cetral Tedecy Mode is the most frequetly occurig value. If the variables are cotiuous, mode is irrelevat. Eample: Iterarrival times of customers to a bak 6 Ecel formula: 4 MODE(data) frequecy 8 6 4 <=.5.5-5 5-7.5 7.5- -.5.5-5 5-7.5 7.5- -.5.5-5 5-7.5 >7.5 iterarrival time (miutes) 43 44 Mea, Media, Mode: Which is Best? The mea is geerally the first selectio. However, there are several circumstaces whe the media is better. Mea, Media, Mode: Which is Best? Whe there is a relatively small umber of outliers, the media usually produces a better measure of ceter 3 7 5 8 4 8 9 3 7 5 8 4 8 9 outlier Average = 9.8 Media = 8.5 Average = 5.58 Media = 8.5 45 46 Mea, Media, Mode: Which is Best? Comparig the Mea ad the Media For most sets of data, the mea ad the media will be very close to each other i value. 47 mea media 48 8

Comparig the Mea ad the Media Whe data are more spread out i oe directio (data are skewed), the mea is pulled toward these values. Measures of Cetral Tedecy What happes if you add a costat to your data 3, 7, 4, 9, 7 Average = 6 (3+5), (7+5), (4+5), (9+5), (7+5) Average =? media mea mea media multiply your data with a costat 3, 7, 4, 9, 7 Average = 6 (3), (7), (4), (9), (7) Average =? 49 5 Measures of Cetral Tedecy k th percetile = k% of data are at or below this value 35 3 5 5 e.g. th percetile of a data set is the umber below which there eists % of the data 5 k% Ecel formula: PERCENTILE(data,k) 5th percetile = st quartile 75th percetile = 3rd quartile Measures of Spread We eed to kow how spread out or dispersed the data values are relative to typical values. Uderstadig the variatio i a set of data is of critical importace to statistics. Miimum i smallest value i the data set Ecel formula: MIN(data) Maimum largest value i the data set Ecel formula: MAX(data) Rage differece betwee the maimum ad the miimum 6.5 7 7.5 8 8.5 9 9.5.5.5.5 3 >3 5 5 Measures of Spread Cosider the data set: 3, 7, 4, 9, 7 Let s try to determie the average deviatio of the data from the mea. Measures of Spread Variace is the average of the squared deviatios of the data values from the mea. 3, 7, 4, 9, 7 Average = 6 (3-6),(7-6), (4-6),(9-6),(7-6) Average = 4.8 Sample variace s Populatio variace (3-6), (7-6), (4-6), (9-6), (7-6) Average =? 53 54 9

Measures of Spread = N i= (X μ) i N Ecel formula: VARP(data) for VAR(data) for s s = i= (X X) i Variace icreases as there is more variability aroud the mea. Large deviatios from the mea cotribute heavily (puished) to the variace because they are squared. Measures of Spread What happes if you add a costat to your data 3, 7, 4, 9, 7 Variace = 4.8 (3+5), (7+5), (4+5), (9+5), (7+5) Variace =? multiply your data with a costat 3, 7, 4, 9, 7 Variace = 4.8 (3), (7), (4), (9), (7) Variace =? 55 56 Measures of Spread Stadard deviatio is the square root of variace. Stadard deviatio has the same uit as the data. populatio stadard deviatio = sample stadard deviatio s= s Ecel formula: STDEVP(data) for populatio STDEV(data) for sample Measures of Spread Stadard deviatio is ot as ituitive or appealig as the rage (immediate picture although sometimes false of how far the data spread out aroud the ceter) Empirical Rule: (if the histogram is bell-shaped) ) about 68% of all observatios are withi oe stadard deviatio of the mea ) about 95% of all observatios are withi two stadard deviatios of the mea 3) about 99.7% of all observatios are withi three stadard deviatios of the mea Moder portfolio theory returs of a diversified asset portfolio Operatios maagemet process variatios HR maagemet employee performace bell-shaped distributios 57 58 Measures of Spread Measures of Spread If a sample of employees were grouped accordig to their heights, we might see a arragemet like this. Drawig a smooth lie aroud the group of employees produces a bell-shaped curve, which shows that most people have heights gathered aroud 67.5 iches. 59 6

y Measures of Spread Measures of Spread Chebysheff s Rule The proportio of observatios i ay sample that lie withi k stadard deviatios of the mea is at least for k > k Eample: k= 75% k=3 88.9% 6 6 Measures of Spread Cosider the followig eample i which mothly reveue ad profit data have bee collected for the last 5 years. stadard deviatio of mothly reveues = $, stadard deviatio of mothly profits = $5, mea of mothly reveues = $4, mea of mothly profits = $, Do you thik the compay recorded a loss i ay moth? What is the likelihood? Measures of Associatio Scatter plot describes the relatioship betwee two variables graphically..9.8.7.6.5.4.3....4.6.8 Covariace ad correlatio summarize the stregth of the liear relatioship betwee the two variables umerically The two variabes, say X ad Y, must have the same umber of observatios (paired variables). 63 64 Measures of Associatio Measures of Associatio Covariace (Xi X)(Yi Y) i= Cov(X,Y) = Correlatio Cov(X,Y) Corr(X,Y) = ss X Y Ecel: COVAR(data,data) data) Ecel: CORREL(data,data) The advatage that the correlatio has over covariace is that the correlatio is always betwee - ad +. Corr(X,Y) = - egative liear relatioship Corr(X,Y) = + positive liear relatioship Corr(X,Y) = o liear relatioship All other values of correlatio are judged i relatio to these three values. 65 66

Measures of Associatio Measures of Associatio Covar(X,Y) =.88 Corr(X,Y) =.9 Covar(X,Y) = - 943.63 Corr(X,Y) = -.8 Covar(X,Y) = 85.43 Corr(X,Y) =. Covar(X,Y) = -.58 Corr(X,Y) = -.38 67 68 Measures of Associatio Remember: Correlatio does ot imply causatio! 69 radom var riables Probability Essetials Cosider the followig variables: Time betwee arrivals of customers to a bak Number of defective items i a productio batch Retur from a stock Customer demad for a ew product Chages i iterest rates Probability Essetials Radom variable associates a umerical value with each possible outcome of a radom pheomeo (eperimet). Probability bilit is a umber betwee ad dth that t measures the likelihood that some evet will occur. The probability distributio of a radom variable determies the probability that the radom variable will take o each of its possible values. 7 7

Probability Essetials There are two types of radom variables ) discrete coutable umber of possible values ) cotiuous a cotiuum of possible values Probability Essetials The maager of a computer store has kept track of the umber of computers sold per day. O the basis of this iformatio, the maager produced the followig list of the umber of daily sales. Time betwee arrivals of customers to a bak Number of defective items i a productio batch Retur from a stock Customer demad for a ew product Chages i iterest rates umber of computers sold - d Probability P(D=d)...3 3.3 4. (.) (.) (.3) 3 4 (.3) (.) 73 74 Probability Essetials Sample space is a list of all possible outcomes of a eperimet A evet is a collectio or set of oe or more outcomes of a sample space. (.) (.) (.3) 3 4 (.3) (.) Probability Essetials If A is a evet, the the complemet of A, deoted by A( or A c ), is the evet that A does ot occur. Evet A: At most sold (.) (.) (.3) 3 4 (.3) (.) Evet A c : or more sold P(A) =. +. =.3 P(A c ) =.3 +.3 +. =.7 P(A c ) = P(A) rule of complemets 75 76 Probability Essetials Evets are mutually eclusive if at most oe them ca occur (if oe occurs, oe of the others ca occur). Evet A: At most sold (.) (.) (.3) 3 4 (.3) (.) Evet B: 4 sold Evet C: or 3 sold Oe of A, B or C must occur ehaustive Probability Essetials Additio rule: If evets, A, A,..., A are mutually eclusive the P(at least oe of A, A,..., A ) = P(A ) + P(A )+...+P(A ) P(D ) = P(D=) + P(D=) + P(D=) (.) (.) A A A 3 (.3) 3 4 P(<D 3) = P(D=) + P(D=3) (.3) (.) A A 77 78 3

Probability Essetials Probability Essetials Additio rule for evets that are ot mutually eclusive: If two evets A ad B are ot mutually eclusive that meas there is a probability that both could occur. P(A ad B) > 3 (.) (.) (.3) 3 4 (.3) (.) Evet A: At least sold Evet B: or sold P(A or B) = P(A) + P(B) P(A ad B) 79 8 Probability Essetials Probability Essetials You are visitig a fried whom you already kow has two kids. You kock o the door ad a little girl opes the door. What is the likelihood that your fried s secod kid is a boy? We frequetly eed to kow how two evets are related. We would like to kow the likelihood of oe evet give the occurece of aother related evet. Marketig a product: Evet A: a perso chose at radom buys the product Evet B: a perso i the sample space has see a advirtisemet for the product If a perso has see the advirtisemet, what is the probability that s/he will buy the product? 8 8 Probability Essetials New iformatio chages the probability of a evet. The coditioal probability of a evet A give a evet B is: P(A ad B) P(A B) = P(B) P(A ad B) = P(A B).P(B) = P(B A).P(A) multiplicatio rule Probability Essetials A marketig research team is iterested i measurig the effectiveess of a advertisemet for a ew product. They take a radom sample of 5 people ad ask them whether they have bought the ew product ad whether they saw a advertisemet before the purchase. Here are the results: Saw advertisemet Did ot see advertisemet Total Purchased the product 75 45 Did ot purchase the product 8 8 Total 75 5 5 83 84 4

Probability Essetials The marketig research firm is iterested i fidig out whether seeig the advertismet (B) affects the probability that a perso will buy the product (A). Saw Did ot see Total advertisemet advertisemet Purchased the 75 45 product Did ot purchase 8 8 the product Total 75 5 5 P(A) = /5 = 44% P(A B) = 75/75 = 63.6% P(A) =? P(B) =? P(A B) =? Probability Essetials If A ad B are two idepedet evets the P(A B) = P(A) ad P(B A) = P(B) It follows that the joit probability b of two idepedet evets is simply the product of the probabilities of the two evets. P(A ad B) = P(A B).P(B) = P(A).P(B) 85 86 Statisticia Wo't Fly I A Plae A famous statisticia would ever travel by airplae, because he had studied air travel ad estimated the probability of there beig a bomb o ay give flight was i a millio, ad he was ot prepared to accept these odds. Oe day a colleague met him at a coferece far from home. "How did you get here, by trai?" "No, I flew" "What about the possibility of a bomb?" Well, I bega thikig that if the odds of oe bomb are :millio, the the odds of TWO bombs are (/,,) (/,,). This is a very, very small probability, which I ca accept. So, ow I brig my ow bomb alog!" Probability Essetials Bayes Law P(A ad B) P(A B) = = P(B A)P(A) c c P(B) P(B A)P(A) + P(B A ).P(A ) 88 Probability Essetials Researchers have developed statistical models based o fiacial ratios that predict whether a compay will go bakrupt over the et moths. I test of oe such model, the model correctly predicted the bakruptcy of 85% of the firms that did i fact fail, ad it correctly predicted obakruptcy for 74% of the firms that did ot fail. Suppose that we epect 8% of the firms i a particular city to fail over the et year ad that the model predicts bakruptcy for a firm that you ow. What is the probability that your firm will fail withi the et moths? Probability Essetials F: firm fails F c : firm does ot fail B: model predicts bakruptcy B c : model predicts obakruptcy P(B F) =.85 P(B c F c )... =.74 P(F) =.8... P(F B) =?...... P(F ad B) P(F B) = = P(B F).P(F) c c P(B) P(B F).P(F) + P(B F ).P(F ).85.8 =.85.8 + (.74) (.8) =. 89 9 5

Bayes Rule Distributio of a Sigle Radom Variable The probability that a perso has a certai disease is.3. Medical diagostic test are available to determie whether the perso actually has the disease. If the disease is actually preset, the probability that the medical diagostic test will give a positive result (idicatig that the disease is preset) is.9. If the disease is ot actually preset, the probability of a positive test result (idicatig that the disease is preset) is.. Suppose that the medical diagostic test has give a positive result (idicatig that the disease is preset). What is the probability that the disease is actually preset? Number of Color TVs Number of Households (thousads),8 3,379 37,96 3 9,387 4 7,7474 5,84 Total,5 p()..39.374 3.9 4.7676 5.8 Total. P(X ) = P(X=) + P(X=) + P(X=) =.75 Cumulative probability is the probability that the radom variable is less tha or equal to some particular value. 9 9 Distributio of a Sigle Radom Variable Epected value (mea)of a probability distributio is the weighted sum of the possible values, weighted by their probabilities. μ = E(X) = ip( i) all i The epected value gives the average of observed values of a radom variable over a large umber of observatios. Distributio of a Sigle Radom Variable p().p()..39.39.374.748 3.9.573 4.76.34 5.8.4 Total..84 93 94 St. Petersburg Lottery Cosider the followig game of chace: o o You pay a fied fee to eter ad the a fair coi is tossed repeatedly util a tail appears, edig the game. You wi dollar if a tail appears o the first toss, dollars if a head appears o the first toss ad a tail o the secod, 4 dollars if a head appears o the first two tosses ad a tail o the third, 8 dollars if a head appears o the first three tosses ad a tail o the fourth, etc. o I short, you wi k dollars if the coi is tossed k times util the first tail appears. o What would be a fair price to pay for eterig the game? Distributio of a Sigle Radom Variable Variace of a probability distributio is a weighted sum of squared deviatos of the possible values from the mea where the weights are the probabilities. bili i = Var(X) = ( E(X)) p( ) i all i = Stdev(X) = Var(X) i 95 96 6

Distributio of a Sigle Radom Variable Discrete vs. Cotiuous Radom Variables There are two types of radom variables: p() (-.84) p().(-.84). 4.343.5.39.75.3748.374.77.66 3.9.839.6 4.76 3.67.79 5.8 8.53.38 Total..68 discrete ca take o a coutable umber of values e.g. X = umber of heads observed i a eperimet that flips a coit times cotiuous values are ucoutable e.g. X = time to write a statistics eam A probability distributio is a table, formula or graph that describes the values of a radom variable ad the probability associated with these values. 97 98 Discrete vs. Cotiuous Radom Variables Cotiuous Distributios We caot list all possible values of a cotiuous radom variable. Sice there is a ifiite umber of values, the probability of each idividual value is virtually. Istead of assigig probabilities to each idividual value, we spread the total probability of over the cotiiuum (imagie a histogram with a large umber of small itervals). A probability desity fuctio, usually deoted by f(), specifies the probability distributio of a cotiuous variable. If the rage of is betwee a ad b the, o the total area uder the curve betwee a ad b is o f() for all betwee a ad b o the higher f() is, the more likely is. 99 Cotiuous Distributios Cotiuous Distributios f() probability desity fuctio (pdf) f() probability desity fuctio (pdf) a b a b Area = P(a b) = 7

Cotiuous Distributios Cotiuous Distributios f() f() probability desity fuctio (pdf)? a b if a b f() = b a otherwise a c d b uiform distributio E() =? Area = P(c d) 3 4 Normal Distributio Normal Distributio Normal distributio is the most importat distributio i statistics. f() f() = e π ( μ) It has two parameters: μ mea stadard deviatio μ chagig the parameters 5 6 Normal Distributio Normal Distributio icrease μ decrease μ chagig the value of μ chagig the value of μ 7 8 8

Normal Distributio Normal Distributio icrease decrease chagig the value of chagig the value of 9 Normal Distributio Normal Distributio X ~ N(μ,) P(a X b) =? X ~ N(μ,) P(a X b) = Area a b a b Normal Distributio Normal Distributio X ~ N(μ,) X ~ N(μ,) P(X b) = Area P(X a) = Area a b a b Ecel fuctio: NORMDIST(b, μ,,) Ecel fuctio: NORMDIST(a, μ,,) 3 4 9

Normal Distributio Normal Distributio X ~ N(μ,) P(a X b) = Area Area X withi oe.687 X withi oe.9545 X withi oe 3.9973 a b μ-3 μ- μ- μ μ+ μ+ μ+3 Ecel fuctio: NORMDIST(b, μ,,) NORMDIST(a, μ,,) 5 6 Normal Distributio Let X ~ N(μ,) A liear fuctio of X is still ormally distributed. Y = ax + b The Y~ N(aμ+b,a) Eample A factory produces cm-log plastic pipes which have a target diameter of 5 mm. The product is desigated acceptable quality if this thickess is betwee 45mmad55mm 4.5 5.5 mm. The productio process curret output has N(5,.). What is the defective rate? What if they ca reduce to. mm? 7 8 Eample Normal Distributio Suppose you must establish regulatios cocerig the maimum umber of people who ca occupy a lift. You kow that the weight of a perso chose at radom follows a ormal distributio with a mea of 7kg ad a stadard deviatio of 5kg. If the likelihood that the total weight eceeds 5kg is required to be less tha %, what is the maimum capacity for the lift (i terms of persos )? = NORMINV(cumulative probability,μ,) I order to geerate a radom umber comig from N(μ,), we use NORMINV(cumulative probability,μ,). 9

Eample 3 Eample 4 The lifetime of a certai maufacturer s washig machie is ormally distributed with mea 4 years. Oly 5% of all these washig machies last at least 5 years. What is the likelihood that this maufacturer s washig machies break dow withi the first year? Cosider two retailers. Demad of each retailer is ormally distributed with mea uits ad stadard deviatio uits. Each retailer sets its ivetory such that there is a % risk of a stockout. Suppose these two retailers cosider keepig a joit ivetory ad set the ivetory such that there is a % risk of a stockout. How much will they save (if ay)? Samplig ad Estimatio Parameters describe populatios. Parameters are almost always ukow. We take a radom sample of a populatio to obtai the ecessary data. We calculate oe or more statistics from the data. A iferece is a statemet about a parameter of a populatio. Iferetial statistics cocer geeralizig from a sample to a populatio. A critical part of iferetial statistics ivolves determiig how far sample statistics are likely to vary from each other ad from the populatio parameter. 4 Key Terms i Samplig Poit estimate is a sigle umeric value, a best guess of a populatio parameter, based o the data i a sample. The estimatio error is the differece betwee the poit estimate ad the true value of the populatio parameter beig estimated. A iterval estimate (or cofidece iterval) is a iterval aroud the poit estimate, where we strogly believe the true value of the populatio parameter lies. Key Terms i Samplig The samplig distributio of ay poit estimate is the distributio of the poit estimates we would see from all possible samples (of a give sample size) from the populatio. The stadard error of a estimate is the stadard deviatio of the samplig distributio of the estimate. 5 6

Samplig Distributios 3 Select two of the balls radomly (with replacemet) ad compute the average of their umbers. Outcome Ball Ball Mea.5 3 3 4.5 5 6 3.5 7 3 8 3.5 9 3 3 3 Samplig Distributios Mea Frequecy Relative Frequecy..5. 3.333.5. 3. Outcome Ball Ball Mea.5 3 3 4.5 5 6 3.5 7 3 8 3 5.5 9 3 3 3 Cetral Limit Theorem Cetral Limit Theorem μ.96 μ μ +.96.96.96.96 +.96 Does this iterval cotai μ? populatio sample mea 9 Cetral Limit Theorem Cetral Limit Theorem μ.96 μ μ +.96 μ.96 μ μ +.96.96.96.96.96.96 +.96 Does this iterval cotai μ?.96 +.96 Does this iterval cotai μ?

Cetral Limit Theorem Cetral Limit Theorem μ.96 μ μ +.96 μ.96 μ μ +.96.96.96.96.96 Does this iterval cotai μ?.96 +.96 Does this iterval cotai μ?.96 +.96 μ.96 μ μ +.96 Cofidece Iterval for the Mea.96 +.96 The above iterval cotais the populatio p mea with 95% probability 95% cofidece iterval for the populatio mea Samplig Distributios The samplig distributio of the populatio mea whe the populatio stadard deviatio is ukow: replace with s the samplig distributio is ot ormal aymore! the samplig distributio is t-distributio with degrees of freedom of. 35 36 Samplig Distributios Samplig Distributios stadard ormal t with 3 df t with 5 df 37 38 3

Samplig Distributios Cofidece Iterval for the Mea t with 3 df Cofidece iterval for the populatio mea is: X ± t - multiple SE(X) 9% CI t-multiple =.699 for = 3 95% CI t-multiple =.45 for = 3 99% CI t-multiple =.756 for = 3 For (-α)% CI, t-multiple i Ecel: TINV(α, -) 39 4 Eample The operatios maager of a breakfast cereal compay would like to kow the average weight of a bo of cereal comig out of a particular productio lie. Costruct a 95% CI for the mea with observatios i the sample. Costruct a 9% CI for the mea with observatios i the sample. Costruct a 95% CI for the mea with observatios i the sample. Costruct a 9% CI for the mea with observatios i the sample. 4