Probability for Computer Scientists

Similar documents
ECE302 Spring 2006 HW1 Solutions January 16,

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University

Section 6.2 Definition of Probability

Chapter 4 Lecture Notes

Statistics 100A Homework 2 Solutions

Probability. Sample space: all the possible outcomes of a probability experiment, i.e., the population of outcomes

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation

Random variables, probability distributions, binomial random variable

6.3 Conditional Probability and Independence

E3: PROBABILITY AND STATISTICS lecture notes

Basic Probability. Probability: The part of Mathematics devoted to quantify uncertainty

Probability & Probability Distributions

Chapter 4 & 5 practice set. The actual exam is not multiple choice nor does it contain like questions.

STAT 35A HW2 Solutions

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Definition and Calculus of Probability

The study of probability has increased in popularity over the years because of its wide range of practical applications.

Fundamentals of Probability

Introduction to Probability

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Lecture 6: Discrete & Continuous Probability and Random Variables

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

CONTINGENCY (CROSS- TABULATION) TABLES

WHERE DOES THE 10% CONDITION COME FROM?

Session 8 Probability

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

Statistics 100A Homework 8 Solutions

Sums of Independent Random Variables

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION

Statistics 100A Homework 4 Solutions

Ch. 13.3: More about Probability

EXAM. Exam #3. Math 1430, Spring April 21, 2001 ANSWERS

Joint Exam 1/P Sample Exam 1

Problem sets for BUEC 333 Part 1: Probability and Statistics

Lecture 7: Continuous Random Variables

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

ECE302 Spring 2006 HW4 Solutions February 6,

Lesson Plans for (9 th Grade Main Lesson) Possibility & Probability (including Permutations and Combinations)

Find the indicated probability. 1) If a single fair die is rolled, find the probability of a 4 given that the number rolled is odd.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Math 141. Lecture 2: More Probability! Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

ST 371 (IV): Discrete Random Variables

Chapter 5 A Survey of Probability Concepts

Master s Theory Exam Spring 2006

Math/Stats 342: Solutions to Homework

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

36 Odds, Expected Value, and Conditional Probability

Formula for Theoretical Probability

Section 6-5 Sample Spaces and Probability

MA 1125 Lecture 14 - Expected Values. Friday, February 28, Objectives: Introduce expected values.

Worksheet A2 : Fundamental Counting Principle, Factorials, Permutations Intro

A probability experiment is a chance process that leads to well-defined outcomes. 3) What is the difference between an outcome and an event?

Chapter 5. Discrete Probability Distributions

AP Statistics 7!3! 6!

How To Solve The Social Studies Test

Lesson 1. Basics of Probability. Principles of Mathematics 12: Explained! 314

Lab 11. Simulations. The Concept

ECON1003: Analysis of Economic Data Fall 2003 Answers to Quiz #2 11:40a.m. 12:25p.m. (45 minutes) Tuesday, October 28, 2003

Worksheet for Teaching Module Probability (Lesson 1)

Bayesian Tutorial (Sheet Updated 20 March)

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Probability Generating Functions

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Random Variables. Chapter 2. Random Variables 1

Combinatorial Proofs

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Statistics 100A Homework 3 Solutions

AP Stats - Probability Review

Basic Probability Concepts

calculating probabilities

Curriculum Design for Mathematic Lesson Probability

Introductory Probability. MATH 107: Finite Mathematics University of Louisville. March 5, 2014

ECE302 Spring 2006 HW3 Solutions February 2,

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

A Tutorial on Probability Theory

Remarks on the Concept of Probability

A Few Basics of Probability

7.S.8 Interpret data to provide the basis for predictions and to establish

( ) is proportional to ( 10 + x)!2. Calculate the

( ) = 1 x. ! 2x = 2. The region where that joint density is positive is indicated with dotted lines in the graph below. y = x

Section 6.1 Discrete Random variables Probability Distribution

If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C?

Chapter 5 Section 2 day f.notebook. November 17, Honors Statistics

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

How To Understand And Solve A Linear Programming Problem

M2S1 Lecture Notes. G. A. Young ayoung

Statistics in Geophysics: Introduction and Probability Theory

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Basic Probability Theory II

Math 55: Discrete Mathematics

Responsible Gambling Education Unit: Mathematics A & B

6 EXTENDING ALGEBRA. 6.0 Introduction. 6.1 The cubic equation. Objectives

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE

Acing Math (One Deck At A Time!): A Collection of Math Games. Table of Contents

Review for Test 2. Chapters 4, 5 and 6

An Introduction to Basic Statistics and Probability

Friday, January 29, :15 a.m. to 12:15 p.m., only

Transcription:

Probability for Computer Scientists This material is provided for the educational use of students in CSE2400 at FIT. No further use or reproduction is permitted. Copyright G.A.Marin, 2008, All rights reserved. Applied Statistics: Probability 1-1

Motivation: Here are some things I ve done with prob/stat in my career: Helped Texaco analyze seismic data to find new oil reserves. Helped Texaco analyze the bidding process for offshore oil leases. Helped US Navy search for submarines and develop ASW tactics. Helped US Navy evaluate capabilities of future forces. Helped IBM analyze performance of computer networks in health, airline reservation, banking, and other industries. Evaluated new networking technologies such as IBM s Token Ring, ATM, Frame Relay Analyzed meteor burst communications for US government (and weather and earthquakes...) Applied Statistics: Probability 1-2

Motivation: I never could have imagined the opportunities I would have to use this material when I took a course like this one. You can t either. Mostly I cannot teach about all of these problem areas because you must master the basics (this course) first. It is the key. This takes steady work. Have in mind about 6-8 hours per week (beginning this week!). 20 hours after 3 weeks is nowhere as good. If you begin now and learn as you go, you ll likely succeed. If you don t do anything, say, until the night before a test or the day before an assignment is due, you will not be likely to succeed. Now is the time to commit. Applied Statistics: Probability 1-3

Suggestions on how to study Do NOT bring a computer to class. It distracts you. Watch, listen and take notes the old fashioned way (pencil and paper). For example, write: slide 7 geom series?? Writing notes/questions actually helps you pay attention. As soon after class as possible review the slides covered in class and your class notes. Do not go to the next slide until you understand everything on the current slide. Can t understand something? Ask next class period or come to my office hours. In the words of Jerry McGuire: Help me to help you. Answer all points you raised in your notes. Look things up on the web. Read the material in the text that corresponds to the slides covered. The syllabus gives you a good indication of where to look. Work a few homework problems each night (when homework is assigned). Can t work one or two of these problems? Come to my office hours. Help me to help you. It is much, much worse to come two days before 15 problems are due and say I can t do any of these. I will help you with some of the work, but it s like trying to learn a musical instrument just by watching me play it.) Applied Statistics: Probability 1-4

Introduction to Mathcad Mathcad is computational software by Mathsoft Engineering and Education, Inc. It includes a computational engine, a word processor, and graphing tools. You enter equations almost as you would write them on paper. They evaluate instantly. You MUST use Mathcad for all homework in this class. Send your mathcad worksheet to gmarin@fit.edu BEFORE class time on the due date. I could not get a Mathcad terminal is NO EXCUSE. Name the worksheet HomeworknFirstnameLastname.xmcd The letter n represents the number of the assignment. The type.xmcd is assigned automatically. If you do not follow the instructions, you will receive a zero for the homework assignment. Mathcad is available (at no charge) in Olin EC 272. Tutorial and help are included with the software; resources are also available at mathcad.com. Mathcad OVERVIEW is next. Note: Homework 1 is on the web site. Applied Statistics: Probability 1-5

Review: Finite Sums of Constants n n n-1 1 = n 1 = n+ 1 1 = n i= 1 i= 0 i= 0 n n n-1 c= cn c= c( n+ 1) c= cn, where c is any constant. i= 1 i= 0 i= 0 Example: 100 i= 1 9= 900. Applied Statistics: Probability 1-6

Review: Gauss Sum When Carl Freidrich Gauss, 1777 1855, was in the 3 rd grade he was asked to compute the sum 1+2+3+ +100. He quickly obtained the following formula: Young Gauss reasoned that if he wrote two of the desired sums as follows: S=1 + 2 + 3 + + 99 + 100 S=100+ 99 + 98 + + 1 + 1 then clearly 100 2S = 101 = 100(101) and S = i= 1 n n( n+ 1 ) In general, k =. 2 k = 1 100(101) 2 = 5, 050. Applied Statistics: Probability 1-7

Definition of Infinite Series Sum Begin with the series S = a and define the partial sums S = a, for n= 1,2,... i n i i= 1 i= 1 The series S is said to converge to s if lim S = s. i=0 n i Example: The geometric series p has the partial sums S 1-p 1 p n+ 1 2 n By division we know that = 1 + p+ p +... + p = Sn. n n n i = p, for n= 1,2,... n+ 1 1-p By definition the sum of the infinite geometric series is lim Sn = lim. n n 1 p 1 If p < 1, then the above limit is. 1 p n i= 1 Applied Statistics: Probability 1-8

Review: Gauss Related Series We obtain an infinite "series" when we wish to "add" infinitely many values. From the sum of constants we might consider the sum 1. From calculus we know that we cannot obtain a finite sum from this series and that it "diverges" to. However, if we multiply each term k k 1 by z we have the "geometric" series z = provided z < 1. 1 z Take the derivative of boths sides to obtain the Gauss series k= 1 kz 1 k 1 = z < 2 ( 1 z), for 1. k = 0 k=0 Applied Statistics: Probability 1-9

Applied Statistics: Probability 1-10 Try these 10 0 10 2 0 5 1 ( ) 2 1 ( ) 2 1 () 2 1 ( ) 2 i i i i i i i i a b c d = = = =

Power Series Definition: A power series is a function of a variable (say z) and has the form i= 0 ( ) i Pz ( ) = c z a, where the care constants and ais also a constant. In this i form the series is said to be "centered" at the value a. i Convergence: One of the following is true: Either (1) the power series converges only for z = a, Or (2) the power series converges absolutely for all z, Or (3) there is a positive number R such that the series converges absolutely for z a < R and diverges for z a > R. R is called the radius of convergence. Applied Statistics: Probability 1-11

Ratio Test (Good tool for finding radius of convergence.) Color red means not on test! k + 1 Consider the series ak and suppose that lim = ρ > 0 (or infinite). k k=0 ak If ρ<1, then the series converges absolutely. If ρ>1 or ρ =, the series diverges. 1 x Example: Determine where the series converges. k = 1 k 3 k a 1 x a k + 1 3 k x k x x k + 1 The ratio = =. The limit lim = ; k a 13 k k k+ k+ 13 3 1 x k 3 k x x thus, ρ =. It follows that the series converges for < 1 x < 3, and 3 3 the series diverges for x > 3. Applied Statistics: Probability 1-12

Review: Other Useful Series Exponential function: The sum of squares: Binomial sum: Geometric sum: n k = 0 z k = 1 z 1 z n+ 1 Applied Statistics: Probability 1-13

Review: Binomial Coefficients Pascal s triangle: n=0 n=1 n=2 n=3 n=4 1 11 121 1331 14641 3 2 3 3 3 3 + = + + + = + + + 0 1 2 3 ( ) 3 3 2 2 3 3 2 2 3 x y x xy xy y x 3xy 3xy y where k n n! n n( n 1) ( n k+ 1) = = = k k! ( n k)! k! k( k 1) 1 Applied Statistics: Probability 1-14

Floors and Ceilings As usual we define x = the greatest integer less than or equal to x where x is any real number. Similarly we define x = the smallest integer greater than or equal to x. Thus, 2.3 = 2 while 2.3 = 3. Also, -7.6 = 8 while -7.6 = 7. Only when x is an integer do we have equality of these functions: x = x= x. In fact, when x is not an integer, then x x = 1. Applied Statistics: Probability 1-15

Logarithms log 10 Conversion formula: log ba =. log10 b Any base may be used in place of 10. a Applied Statistics: Probability 1-16

Log Function (Base 10) Applied Statistics: Probability 1-17

Log(50) Base x Applied Statistics: Probability 1-18

Logarithmic Identities Applied Statistics: Probability 1-19

Notation, notation, notation Learning math in many ways is like learning a new language. You may understand a concept but be unable to write it down. You may not be able to read notation that represents a simple concept. The only way to tackle this language barrier is through repetition until the notation begins to look friendly. You must try writing results yourself (begin with blank paper) until you can recall the notation that expresses the concepts. Copy theorems from class or from the text until you can rewrite them without looking at the material and you understand what you are writing. Probability and statistics introduce their own notation to be mastered. If you begin tonight, and spend time after every class, you can learn this new language. If you wait until the week before the test, it will be much like trying to learn Spanish, or French, or English in just a few nights (1 night??). It really can t be done. Applied Statistics: Probability 1-20

Math sentences Math problems, theorem, solutions... must be written in sentences that make sense when you read them (EVEN WHEN EQUATIONS ARE USED). You will notice that I am careful to do this to the greatest extent possible on these slides (even knowning that I will explain them in class). My observation is that most students have no idea how to do this. I often see solutions like the following on homework or test papers: Evaluate = = = 4 0 2 x 3 x 3 64. 3 2 xdx. The student writes something like The answer is right BUT every step is nonsense. The = sign means is equal to. In the first step, we don t know to what the equality refers. The equality is simply wrong in the next two steps. Instead write: 4 3 2 x 4 64 64 xdx= = 0 =. 3 3 3 0 0 Applied Statistics: Probability 1-21

Permutations and Combinations Suppose that we have objects,,...,. n O1 O2 O n A permutation of order k is an "ordered" selection of k of these for 1 k n. A combination of order k is an "unordered" selection of k of these. ( ) n Common notation: P n, k or P = n( n 1) ( n k+ 1) = n k n n n C( n, k) = Ck = =. k k! Example: Given the 5 letters a,b,c,d,e how many ways can we list 3 of the 5 when order is important? Answer: P =5 =5*4*3=60. 5 3 3 Note that each choice of 3 letters (such as a,c,e) results in 6 different results: ace, aec, cae, cea, eac, eca... Example: Given the 5 letters above how many ways can we choose 3 of the 5 when order is NOT important? 3 5 5 5*4*3 k Answer: = = = 10. In this case we have the 60 that result when we care about order 3 3! 3*2*1 divided by 6 (the number of orderings of 3 fixed letters). k Applied Statistics: Probability 1-22

Definition k n = n ( n 1) ( n k+ 1) for any positive integer n and for integers k such that 1 k n. This symbol is pronouced " n to the k falling." 3 Examples: 6 = 6 5 4 = 120. 6 3 is not defined (for our 5 5 5! = purposes). k n k n n Again: Pk = n and =. k k! Applied Statistics: Probability 1-23

Permutations of Multiple Types The number of permutations of n= n + n + + n objects of which n are of 1 2 2 1 2 r 1 one type, n are of a second type,..., and n are of an rth type is n!. n! n! n! r r Example: Suppose we have 2 red buttons, 3 white buttons, and 4 blue buttons. How many different orderings (permutations) are there? 9! Answer: 1260. 2!3!4! = Applied Statistics: Probability 1-24

Try These There are 12 marbles in an urn. 8 are white and 4 are red. The white marbles are numbered w1,w2,...,w8 and the red ones are numbered r1,r2,r3,r4. For (a) - (d): Without looking into the urn you draw out 5 marbles. 5 (a) How many unique choices can you get if order matters? 12 = 95,040 12 (b) How many unique choices can you get if order does not matter? = 792 5 (c) How many ways can you choose 3 white marbles and 2 red marbles if order matters? You will fill 5 "slots" by drawing. First determine which two slots (positions) will be occupied by 2 red marbles: 3 2 multiply by orderings of 3 white and 2 red: 10 8 4 40,320. 5 = 10. Next 2 (d) How many ways can you choose 3 white marbles and 2 red marbles if 8 4 order does not matter? = 336 3 2 (e) How many marbles must you draw to be sure of getting two red ones? 10 i i = Applied Statistics: Probability 1-25

Complex Combinations How many ways are there to create a full house (3-of-akind plus a pair) using a standard deck of 52 playing cards? 13 4 12 4 = 13i4i12i6 = 3,744. 1 3 1 2 (choose denomination)x(choose 3 of 4 of given denomination)x(choose one of the remaining denominations)x(choose 2 of 4 of this second denomination). This follows from the multiplication principle (Theorem 2.3.1 in text). Applied Statistics: Probability 1-26

Try these n n Suppose =. What is n? 11 7 18 18 Suppose =. What is r? r r 2 Applied Statistics: Probability 1-27

Examples* Consider a machining operation in which a piece of sheet metal needs two identical diameter holes drilled and two identical size notches cut. We denote a drilling operation as d and a notching operation as n. In determining a schedule for a machine shop, we might be interested in the number of different possible sequences of the four operations. The number of possible sequences for two drilling operations and two notching operations is 4! 2!2! = 6 The six sequences are easily summarized: ddnn, dndn, dnnd, nddn, ndnd, nndd. *Applied Statistics and Probability for Engineers, Douglas C. Montgomery, George C. Runger, John Wiley & Sons, Inc. 2006 Applied Statistics: Probability 1-28

Example* A printed circuit board has eight different locations in which a component can be placed. If five identical components are to be placed on the board, how many different designs are possible? Each design is a subset of the eight locations that are to contain the components. The number of possible designs is, therefore, 3 8 8 8 8*7*6 = = = = 5 3 3! 3*2*1 56. *Applied Statistics and Probability for Engineers, Douglas C. Montgomery, George C. Runger, John Wiley & Sons, Inc. 2006 Applied Statistics: Probability 1-29

Sample Space Definition: The totality of the possible outcomes of a random experiment is called the Sample Space, Ω. Finite Countable { } Outcome from one roll of one die Ω= 1, 2,3, 4,5,6. The number of attempts until a message is transmitted successfully when the probability of success on any one attempt is p + Ω= { 1, 2,3, 4,5,6,... } =. Continuous (We begin with the discrete cases.) The time (in seconds) until a lightbulb burns out { t t } Ω= : 0, where is the set of all real numbers. Applied Statistics: Probability 1-30

Events Definition: An event is a collection of points from the sample space. Example: the result of one throw of die is odd. We use sets to describe events. From the die example let the set of "even" outcomes be E = 2, 4,6. { } { } Let the set of "odd" outcomes be O = 1,3,5. If Ω is finite or countable, then a simple event is an event that contains only one point from the sample space. For the die example the simple events are S1 = { 1 }, S2 = { 2 },..., S6 = { 6 }. Suppose we toss a coin until first Head appears. What are the simple events? Unless stated otherwise, ALL SUBSETS of a sample space are included as possible events. (Generally we will not be interested in most of these, and many events will have probability zero.) Applied Statistics: Probability 1-31

Describe the sample space and events Each of 3 machine parts is classified as either above or below spec. At least one part is below spec. An order for an automobile can specify either an automatic or standard transmission, premium or standard stereo, V6 or V8 engine, leather or cloth interior, and colors: red, blue, black, green, white. Orders have premium stereo, leather interior, and a V8 engine. Applied Statistics: Probability 1-32

Describe: sample space and events The number of hours of normal use of a lightbulb. Lightbulbs that last between 1500 and 1800 hours. The individual weights of automobiles crossing a bridge measured in tons to nearest hundredth of a ton. Autos crossing that weigh more than 3,000 pounds. A message is transmitted repeatedly until transmission is successful. Those messages transmitted 3 or fewer times. Applied Statistics: Probability 1-33

Operations on Events Because the sample space is a set, Ω, and any event is a subset A Ω, we form new events from existing events by using the usual set theory operations. A B Both A and B occur. A B At least one of A or B occurs. A A does not occur. S A= S A S occurs and A does not occur. the empty set (a set that contains no elements). A B= A and B are "mutually exclusive." A B Every element of A is an element of B, or, if A occurs, B occurs. Review Venn diagrams (in text). Applied Statistics: Probability 1-34

Example Four bits are transmitted over a digital communications channel. Each bit is either distorted or received without distortion. Let denote the event that the ith bit is distorted, i = 1,2,3,4. (a) Describe the sample space. A i (b) What is the event A? 1 (c) What is the event A A? 1 2 (c) What is the event A A? 1 2 (d) What is the event A? 1 Applied Statistics: Probability 1-35

Venn Diagrams Identify the following events: A B ( a) A ( b) A B () c ( A B) C ( ) ( d) B C C (e) A B C ( ) Applied Statistics: Probability 1-36

Mutually Exclusive & Collectively Exhaustive A, A,... A collection of events 1 2 is said to be mutually exclusive if A { φ if i j i Aj = A= A if i= j. A collection of events is collectively exhaustive if A =Ω. i i A collection of events forms a partition of if they are mutually exclusive and collectively exhaustive. A collection of mutually exclusive events forms a partition of an event if i j Ω E A = E. i i Applied Statistics: Probability 1-37

Partition of Ω A1 2 A A n 1 A n The sets are "events." No two of them intersect (mutually exclusive) and A i their union covers the entire sample space. Applied Statistics: Probability 1-38

Probability measure We use a probability measure to represent the relative likelihood that a random event will occur. The probability of an event A is denoted P( A). Axioms: A1. For every event A, P( A) 0. A2. P ( Ω ) = 1. A3. If A and B are mutually exclusive, then P(A B)=P(A)+P(B). A4. If the events A, A,... are mutually exclusive, then 1 2 P An = P( An). n = 1 n = 1 Applied Statistics: Probability 1-39

Theorem: Given a sample space, Ω, a "well-defined" collection of events, F, and a probability measure, P, defined on these events then the following hold: ( ) = [ ] = F [ ] [ ] [ ] [ ] [ ] [ ] F (a) P 0. (b) P A 1 P A, A. (c) P A B = P A + P B P A B, A, B F. (d) A B P A P B, A, B. You must know these and be able to use them to solve problems. Don t worry about proving them. Applied Statistics: Probability 1-40

Applying the Theorem We roll 1 die and obtain one of the numbers 1 through 6 with equal probability. (a) What is the probability that we obtain a 7? The event we want is ; thus, the probability is 0. (b) What is the probability that we do NOT get a 1? 5 The event we want is {} 1 or Ω {} 1, and P {} 1 = 1 P {} 1 =. 6 (c) What is the probability that we get a 1 or a 3? 1 1 1 P {} 1 {} 3 = P {} 1 + P {} 3 = + =. 6 6 3 (d) If E = 1, 4,5, 6, and E G, what might the event G be? { } { } { } { } G = 1, 2, 4, 5, 6, G = 1, 3, 4, 5, 6, G = 1, 2, 3, 4, 5, 6, or G = E. [ ] P[ G] Note that in all of these cases P E. Applied Statistics: Probability 1-41

Assigning Discrete Probabilities When there are exactly n possible outcomes of an experiment, x, x,..., x then i= 1 ( ) ( ) ( ) 1 2 the assigned probabilities, p x, i = 1,2,... n, must satisfy the following: (1) 0 p x 1, i = 1, 2,..., n. n (2) p x = 1. i i i n If all of the outcomes have equal probability, then each p x i ( ) 1 probability of any particular outcome on the roll of a fair die is. 6 Suppose, however, we have a biased die and the probability of a 4 1 = ; thus, the n is 3 times more likely than the probability of any other outcome. This implies that p( 1) = p( 2) = p( 3) = p( 5) = p( 6 ) = a (for example) and p( 4) = 3 a. 1 1 3 It follows that 8a = 1 a=. Thus, p() i =, i 4, and p(4) =. 8 8 8 Applied Statistics: Probability 1-42

Exercises: Suppose Bigg Fakir claims that by clairvoyance he can tell the numbers of four cards numbered 1 to 4 that are laid face down on a table. (He must choose all 4 cards before turning any over.) If he has no special powers and guesses at random, calculate the following: (a) the probability that Bigg gets at least one right (b) the probability he gets two right (c) the probability Bigg gets them all right. Note: assume that Bigg must guess the value of each card before looking at any of the cards. Hint: What is the sample space? How are the probabilities assigned? Applied Statistics: Probability 1-43

Bigg Fakir Solution The sample space: 1234 2134 3124 4123 1243 2143 3142 4132 1324 2314 3214 4213 1342 2341 3241 4231 1423 2413 3412 4312 1432 2431 3421 4321 Let R = Number of cards Bigg gets right. [ ] (a) We seek P R 1. Suppose that Bigg chooses 1234 (it does not matter). There is at least one match in the entire first column. There are three matches in each of the remaining columns. 15 5 Thus, there are 15 with at least one match and P[ R ] = =. 24 8 Try the other problems. Applied Statistics: Probability 1-44

Complex Combinations How many ways are there to create a full house (3-of-akind plus a pair) using a standard deck of 52 playing cards? 13 4 12 4 = 13i4i12i6 = 3,744. 1 3 1 2 (choose denomination)x(choose 3 of 4 of given denomination)x(choose one of the remaining denominations)x(choose 2 of 4 of this second denomination). This follows from the multiplication principle (Theorem 2.3.1 in text). Applied Statistics: Probability 1-45

What is the probability of a full house? In discrete problems we interpret probability as a ratio: number of successful outcomes total number of outcomes. In this case the number of successful outcomes is 5 52 52 52i51i50i49i48 = successes. successes + failures the number of ways to get a full house (3,744). The total number of outcomes is: = = = 2,598,960. 5 5! 54321 i i i i 3,744 Thus, the probability of getting a full house is 2,59 8,960 0.00144 This is an example of a hypergeometric distribution; we'll study this soon. A full house happens about once in every 694 hands! This is why people invented wild cards. = Applied Statistics: Probability 1-46

Conditional Probability The conditional probability of given that B has occurred is PA ( B) PAB ( ) =, provided PB ( ) 0. PB ( ) Q1: What is the probability of obtaining a total of 8 when rolling two dice? Q2: Suppose you roll two dice that you cannot see. Someone tells you that the sum is greater than 6. What is the probability that the sum is 8? A Applied Statistics: Probability 1-47

Dice Problem Let A be the event of getting 8 on the roll of two dice. Let B be the event that the sum of the two dice is greater than 6. The first question is Find P( A). Here is the sample space: ( )( )( )( )( 1, 6) ( )( )( )( )( ) ( )( )( )( )( ) ( )( )( )( )( ) ( )( )( )( )( ) ( )( )( )( )( ) (1,1) 1,2 1,3 1, 4 1,5 (2,1) 2,2 2,3 2, 4 2,5 2, 6 (3,1) 3,2 3,3 3, 4 3,5 3,6 (4,1) 4,2 4,3 4, 4 4,5 4, 6 (5,1) 5,2 5,3 5, 4 5,5 5, 6 (6,1) 6,2 6,3 6, 4 6,5 6, 6 Sum=8. 5 PA= ( ). 36 because we are assuming that each outcome pair has the same probability, 1/36. Applied Statistics: Probability 1-48

Dice Problem (conditional) Here we roll the dice and learn that the sum is greater than 6. Let represent the event that the sum is greater than 6. With this knowledge the sample space becomes the following: B ( 1, 6) ( 2,5)( 2, 6) ( 3, 4)( 3,5)( 3,6 ) ( 4,3)( 4, 4)( 4,5)( 4, 6) ( 5,2)( 5,3)( 5,4)( 5,5)( 5,6) (6,1)( 6,2)( 6,3)( 6, 4)( 6,5)( 6, 6) 5 It follows that PAB ( ) =. 21 Alternatively, by definition of conditional probability, we have PA ( B) PA ( ) PAB ( ) = = because A B= A. PB ( ) PB ( ) 5 Note well! PA ( ) 36 5 Furthermore, = =. PB ( ) 21 21 So the definition makes sense! 36 Applied Statistics: Probability 1-49

Try this. A university has 600 freshmen, 500 sophomores, and 400 juniors. 80 of the freshmen, 60 of the sophomores, and 50 of the juniors are Computer Science majors. For this problem assume there are NO seniors. What is the probability that a student, selected at random, is a freshman or a CS major (or both)? If a student is a CS major, what is the probability he/she is a sophomore? Applied Statistics: Probability 1-50

Use these steps to solve previous slide 1. What is the sample space? 2. What are the events (subsets) of interest? 3. What are the probabilities of the events of interest? 4. What is the answer to the problem? Applied Statistics: Probability 1-51

Curtains Suppose you are shown three curtains and told that a treasure chest is behind one of the curtains. It is equally likely to be behind curtain 1, curtain 2, or curtain 3; thus, the initial probabilties are 1/3, 1/3, 1/3 for the treasure being behind each curtain. Now suppose that Monte Hall, who knows where the treasure is, shows you that the treasure is not behind curtain number 2. The probabilities become ½, 0, ½ right? If you were asked to choose a curtain at this point you would pick either curtain and hope for the best. 1 Initially we had P(1) = P(2) = P(3) =, where P( i) = probability that the prize 3 is behind curtain i. P(1 2) P(1) After Monte shows us curtain 2, we have P(1 2) = = P(2) P(2) 1 3 1 = = = P(3 2). 2 2 3 The formula works again! Note: using the conditional probabilities you do not have to enumerate the conditional sample space. Applied Statistics: Probability 1-52

Curtains Revisited We change the game as follows. Again the treasure is hidden behind one of three curtains. At the beginning of the game you pick one of the curtains - say 2. Then Monte shows you that the treasure is NOT behind curtain 3. Now he offers you a chance to switch your choice to curtain 1 or to stay with your original choice of 2. Which should you do? Does the choice make a difference? Applied Statistics: Probability 1-53

Here s the deal After you make your choice (but have not seen what's behind your curtain), the probabilities remain,,. Right? This means that the probability that 1 1 1 3 3 3 the treasure is behind your curtain is and 2 3 1 3 the probability that it is behind one of the other curtains is. Monte will NEVER show you the treasure; thus, even after he shows you one of the curtains, the probability that the 2 treasure is behind the curtain you did not choose is 3. It is twice as likely to be behind the other curtain so... you should always switch! Moral of the story: be REALLY careful about underlying assumptions about the sample space and how it changes to create the conditional sample space. You are generally assuming the changes are random. Maybe they are not. Applied Statistics: Probability 1-54

Alternate Form We have seen that the conditional probability of event A given that event B has occurred is: PA ( B) PAB ( ) =, provided PB ( ) 0. PB ( ) Clearly this implies that PA ( B) = PABPB ( ) ( ). This is referred to as the "multiplication rule," and holds even when PB ( ) = 0. Notice that we could also write PA ( B) = PB ( APA ) ( ). Both these equations always hold for any two events. But there is a special case where the conditional probabilities above are not needed. Note: memorize these conditional probability equations TODAY. They are extremely important. Applied Statistics: Probability 1-55

Independent Events Two events A and B are independent iff the probability PA ( B) = PAPB ()(). Example (dice) Q1: If one die is rolled twice, is the probability of getting a 3 on the first roll independent of the probability of getting a 3 on the second roll? Q2: If one die is rolled twice, is the probability that their sum is greater than 5 independent of the probability that the first roll produces a 1? Applied Statistics: Probability 1-56

Dice sample spaces (Q1) The sample space associated with one roll of a die: Ω = 1, 2,3, 4,5,6. Unless otherwise stated we assume the die is fair so that the probability of 1 any one of the simple events is. The sample space associated with two 6 rolls of one die (or with one roll of a pair of dice): ( )( )( )( )( ) ( )( )( )( )( ) ( )( )( )( 5)( 3,6 ) ( )( )( )( )( ) ( )( )( )( )( ) ( )( )( )( )( ) (1,1) 1,2 1,3 1,4 1,5 1,6 (2,1) 2,2 2,3 2, 4 2,5 2, 6 (3,1) 3,2 3,3 3, 4 3, (4,1) 4,2 4,3 4, 4 4,5 4, 6 (5,1) 5,2 5,3 5,4 5,5 5,6 (6,1) 6,2 6,3 6, 4 6,5 6, 6 1 1 Clearly P (3,3) =. P(3 on first roll) = P(3 on second roll) =. 36 6 1 1 1 Because =, the two events are independent. 6 6 36 1 Applied Statistics: Probability 1-57

Dice: Q2 1 The probability of getting a 1 on the first die is. Let G5 be the event that 6 the sum of the two dice is greater than 5 and F1 be the event that the first roll produces a 1. The sample space is: F1 G5 ( ) ( )( )( )( ) ( )( )( )( )( ) ( )( )( )( )( ) ( )( )( )( )( ) ( )( )( )( )( ) ( )( )( )( )( ) (1,1) 1,2 1,3 1, 4 1,5 1,6 (2,1) 2,2 2,3 2, 4 2,5 2, 6 (3,1) 3,2 3,3 3,4 3,5 3,6 (4,1) 4,2 4,3 4, 4 4,5 4, 6 (5,1) 5,2 5,3 5, 4 5,5 5, 6 (6,1) 6,2 6,3 6, 4 6,5 6, 6 1 1 13 13 P[ F1 G5] = P[ F1] P[ G5 ] = =. 18 6 18 108 Thus, these two events are NOT independent. 2 1 P[ F1 G5 ] = =. 36 18 26 13 PG [ 5 ] = =. 36 18 1 P[ F1 ] =. 6 Applied Statistics: Probability 1-58

Practice Quiz 1 Explain your work as you have been taught in class. 1. A university has 600 freshmen, 500 sophomores, and 400 juniors. 80 of the freshmen, 60 of the sophomores, and 50 of the juniors are Computer Science majors. For this problem assume there are NO seniors. If a student is a CS major, what is the probability that he/she is a Junior? 2. Evaluate i= 3 i 1. 4 3. What is the probability of drawing 2 pairs in a draw of 5 cards from a standard deck of 52 cards? (A pair is two cards of the same denomination such as two aces, two sixes, or two kings.) Applied Statistics: Probability 1-59

Multiplication and Total Probability Rules* Multiplication Rule *This slide from Applied Statistics and Probability for Engineers,3rd Ed,by Douglas C. Montgomery and George C. Runger, John Wiley & Sons, Inc. 2006 Applied Statistics: Probability 1-60

Multiplication and Total Probability Rules* *This slide from Applied Statistics and Probability for Engineers,3 rd Ed,by Douglas C. Montgomery and George C. Runger, John Wiley & Songs, Inc. 2006 Applied Statistics: Probability 1-61

Multiplication and Total Probability Rules* Total Probability Rule Partitioning an event into two mutually exclusive subsets. Partitioning an event into several mutually exclusive subsets. *This slide from Applied Statistics and Probability for Engineers,3rd Ed,by Douglas C. Montgomery and George C. Runger, John Wiley & Sons, Inc. 2006 Applied Statistics: Probability 1-62

Problem 2-97a A batch of 25 injection-molded parts contains 5 that have suffered excessive shrinkage. If two parts are selected at random, and without replacement, what is the probability that the second part selected is one with excessive shrinkage? S={pairs (f,s) of first-selected, secondselected taken from 25 total with 5 defects} SD={second selected (no replace) is a defect} FD={first selected is a defect} FN={first selected is not a defect}. Applied Statistics: Probability 1-63

Problem Solution We seek P[SD]=P[SD FD]+P[SD FN] This becomes PSD [ FNPFN ] [ ] + PSD [ FDPFD ] [ ] 5 4 4 5 1 1 1 = * + * = + = = 0.2. 24 5 24 25 6 30 5 Applied Statistics: Probability 1-64

Multiplication and Total Probability Rules* Total Probability Rule (multiple events) *This slide from Applied Statistics and Probability for Engineers,3rd Ed,by Douglas C. Montgomery and George C. Runger, John Wiley & Sons, Inc. 2006 Applied Statistics: Probability 1-65

Total Probability Example A semiconductor manufacturer has the following data regarding the effect of contaminants on the probability that chips fail. Probability of Failure Level of Contamination 0.1 High 0.01 Medium 0.001 Low In a particular production run 20% of the chips have high-level, 30% have medium-level, and 50% have low-level contamination. What is the probability that one of the resulting chips fails? Applied Statistics: Probability 1-66

Bayes Rule Suppose the events B1, B2,..., Bn form a partition of the event A, then it follows that PAB ( j) PB ( j) PB ( j A) =, PAB ( ) PB ( ) i i i provided PA ( ) 0. Applied Statistics: Probability 1-67

Exercise: Moon Systems, a manufacturer of scientific workstations, Produces its Model 13 System at sites A,B, and C, with 20%, 35%, and 45% produced at A,B, and C, respectively. The probability that a Model 13 System will be found Defective upon receipt by a customer is 0.01 if shipped From site A, 0.06 if from B, and 0.03 if from C. (a) What is the probability that a Model 13 System selected at random at a customer location will be defective? (b) If a customer finds a Model 13 to be defective, what is the probability that it was manufactured at site B? Applied Statistics: Probability 1-68

Solution (a) Let D be the event that a Model 13 is found to be defective at a customer site. [ ] = [ ] [ ] + [ ] [ ] + [ ] [ ] We want P D P D A P A P D B P B P D C P C, where A is the event that the Model 13 was shipped from plant A and the events B and C are defined analogously. This is a very important example of conditioning an event D on three events that partition D. An equation like this can always be written when a number of events partition an event in which we're interested. [ ] = [ ] [ ] { } (The general form would be P D P D A P A where A form a partition of the event D.) Substituting the given numbers we have: i= 1 [ ] ( )( ) ( )( ) ( )( ) P D n i i i i = 0.01 0.2 + 0.06 0.35 + 0.03 0.45 = 0.037, which answers (a). Applied Statistics: Probability 1-69

Solution (b) (b) If a customer finds a Model 13 to be defective, what is the probability that it was manufactured at site B? Now we seek PB ( D) = by Bayes Law. Substituting we get PB ( D) PD ( BPB ) ( ) P( D APA ) ( ) + PD ( BPB ) ( ) + PDCPC ( ) ( ) (0.06) (0.35) = (0.01) (0.2) + (0.06) (0.35) + (0.03) (0.45) = 0.575 Applied Statistics: Probability 1-70

Bernoulli Trials Consider an experiment that has two possible outcomes, success and failure. Let the probability of success be p and the probability of failure be q where p+q=1. Now consider the compound experiment consisting of a sequence of n independent repetitions of this experiment. Such a sequence is known as a sequence of Bernoulli Trials. The probability of obtaining exactly k successes in a sequence of n Bernoulli trials is the binomial probability ( ) k n k n k p( k) = p q. Note that the sum of the probabilities pk ( ) = 1. Thus they are said to form a probability distribution. k Applied Statistics: Probability 1-71

Probability Distribution { s1 s2 } { } { } When we take a discrete or countable sample space Ω=,,... and assign probabilities to each of the possible simple events: P( s ) = p, P( s ) = p,..., 1 1 2 2 we have created a probability distribution. (Think that you have "distributed" all of the probability over all possible events.) As an example, if I toss a coin 1 1 one time then PH ( ) = and PT ( ) = represents a probability distribution. 2 2 The single coin toss distribution also is an example of a Bernoulli trial because it has only two possible outcomes (generally called "success" or "failure). Applied Statistics: Probability 1-72

Binomial Probability Distribution The binomial probabilities are defined by pk ( ) = ( k ) n k n k, where p is the probability of success and q is the probability of failure in n Bernoulli pq trials. Suppose we toss a coin 10 times and we want the [ ] [ ] total number of heads. Then p= P H, q = P T, n= 10. Using the above formula we obtain the probabilities: Binomial n=10 p=0.5 3.00E-01 2.50E-01 2.00E-01 1.50E-01 probabilities 1.00E-01 5.00E-02 0.00E+00 0 1 2 3 4 5 6 7 8 9 10 Applied Statistics: Probability 1-73

Regarding Parameters Notice that the binomial distribution is completely defined by the formula ( k ) n k n k for its probabilities, pk ( ) = pq, and by it "parameters" pand n. The binomial probability equation never changes so we regard a binomial distribution as being defined by its parameters. This is typical of all probability distributions (using their own parameters, of course). One of the problems we often face in statistics is estimating the parameters after collecting data that we know (or believe) comes from a particular probability distribution (such as the p and n for the binomial). Alternatively, we may choose to estimate "statistics" such as mean and variance that are functions of these parameters. We'll get to this, after we consider random variables and the the continuous sample space. Applied Statistics: Probability 1-74

Example (from Trivedi*) Consider a binary communication channel transmitting coded words of n bits each. Assume that the probability of successful transmission of a single bit is p and that the probability of an error is q= 1 p. Assume also that the code is capable of correcting up to e errors, where e 0. If we assume that the transmission of successive bits is independent, then the probability of successful word transmission is: w [ or fewer errors in trials] P = P e n e n qp i n i =. i= 0 i Notice that a "success for the Binomial distribution" means getting an error, which has probability q. *Probability and Statistics with Reliability, Queuing and Computer Science Applications, 2 nd Ed, Kishor S. Trivedi, J. Wiley & Sons, NY 2002 Applied Statistics: Probability 1-75

Example A communications network is being shared by 100 workstations. Time is divided into intervals that are 100 ms long. One and only one workstation may transmit during one of these time intervals. When a workstation is ready to transmit, it will wait until the beginning of the next 100ms time interval before attempting to transmit. If more than one workstation is ready at that moment, a collision occurs; and each of the k ready workstations waits a random amount of time before trying again. If k = 1, then transmission is successful. Suppose the probability of a workstation being ready to transmit is p. Show how probability of collision varies as p varies between 0 and 0.1. Applied Statistics: Probability 1-76

Practice Quiz 2 A partial deck of playing cards (fewer than 52 cards) contains some spades, hearts, diamonds, and clubs (NOT 13 of each suit ). If a card is drawn at random, then the probability that it is a spade is 0.2. We write this as P[Spade]=0.2. Similarly, P[Heart]=0.3, P[Diamond]=0.25, P[Club]=0.25. Each of the 4 suits has some number of face cards (King, Queen, Jack). If the drawn card is a spade, the probability is 0.25 that it is a face card. If it is a heart, the probability is 0.25 that it is a face card. If it is a diamond, the probability is 0.2 that it is a face card. If it is a club, the probability is 0.1, that it is a face card. 1. What is the probability that the randomly drawn card is a face card? 2. What is the probability that the card is a Heart and a face card? 3. If the card is a face card, what is the probability that it is a spade? Applied Statistics: Probability 1-77

Discrete Random Variables G. A. Marin Applied Statistics: Probability 1-78

Review of function Defn: A function is a set of ordered pairs such that no two pairs have the same first element (unless they also have the same second element). {( ) ( ) ( )} Example: g = 1,2, 3, 5, 5,12 defines a function, g, whose "domain" consists of the real numbers 1,3,5 and whose "range" consists of the numbers 2, 5,12. All functions are said to "map" values in their domain to values in their range. Example: f( x) 2 = x + 5. Here a function is defined using a formula. {( 2 ) } This actually implies the the function is f = x, x + 5 : x is a real number. Notice the following: (a) The function has a "name." Here that name is f. (b) The implied domain of the function includes all real numbers, x, that can be plugged into the formula. In this case that includes all real no's. (c) Every number x in the domain (all reals) is "mapped" to the number 2 2 x +5. Thus f(1) = 6, f( 5) = 30, f( π) = π +5. 2 (d) Sometimes we write this as 1 6, -5 30, +5. { x x } (e) The range of f is : 5. π π Applied Statistics: Probability 1-79

Random Variable Definition: A random variable on a sample space Ω is a function that assigns a real number x to each sample point s Ω. The inverse image of X() s is the set of all points in Ω that the random variable X maps to the value x. It is denoted A = { s Ω X( s) = x}. Ω discrete X discrete Ω continuous X continuous x X Applied Statistics: Probability 1-80

Ω= { s, s, s,...} = (, ) 1 2 3 Random Variable We define Ax as the set of all points in Ω that "map" into the value x. Sometimes we write A = X x and state that A is the "inverse image" x ( ) X We write X( s) = x, where s Ω, and x. of the value x under the random variable X. For discrete random variables, x [ ] we then define the probability of the value x to equal P A x. [ ] [ ] p ( x) = P A. X OR we may be given a discrete (continuous later) random variable, a description of the values it can produce and the probability of each value. For example, For k = 1,2,..., n, P X = k = p k. In this case we need not know what the underlying experiment really is. x Applied Statistics: Probability 1-81

The role of a random variable Experiment 1: Roll 1 fair die and determine the outcome. Experiment 2: Spin an arrow that lands with equal probability on one of the numbers 1 through 6. Experiment 3: You have 6 cards numbered 1 through 6. Shuffle them and draw one at random. Replace the card and reshuffle to repeat. Notice that we d represent the sample space of each of these as {1,2,3,4,5,6} usually without drawing dice or arrows or cards, but the sample spaces really include dice, arrows, cards. For each probability distribution 1 let pi = for i = 1,2,...6. 6 The importance of the random variable is that it lets us deal with such an experimental setup without thinking dice, arrows, or cards. We say: Let X be a random variable such that takes on the discrete values 1,2,3,4,5,6. Its probability mass function is given as: the probability that X=i is 1/6. We write this as 1 px ( i ) = for i=1,2,...,6. 6 Applied Statistics: Probability 1-82

Probability Mass Function If X is a discrete random variable, then its probability mass function (pmf) is given by: px ( x) = P( X = x) = P( Ax) = P( s). The pmf satisfies the following properties: (p1) 0 px ( x) 1 for all values, x, such that P[ X = x] is defined. (p2) If X is a discrete random var iable then s A px ( xi) = 1, where the set { x1, x2,... } includes all real i numbers, x, such that px ( x) 0. x Note: you cannot define a pmf without first defining a random variable. You can, however, define a probability distribution directly on a sample space with no random variable defined. Applied Statistics: Probability 1-83

MathCad definition Note that MathCad refers to the probability mass function as the probability density function (pmf versus pdf). This is the reason that the included pmf s begin with the letter d, like dbinom. We will reserve pdf for use with continuous distributions. However, you must interpret pdf as pmf when using MathCad. The text uses probability mass function, but represents a pmf with a function, f, instead of our notation, p. (See page 70.) Applied Statistics: Probability 1-84

Discrete RV Example 1 Let the sample space Ω represent all possible outcomes of a roll of one die; thus, Ω= { } 1, 2,3,4,5,6. We define the random variable X on this sample space as 1 if i = 1, 2 follows: Xi ( ) =. Because the probability of rolling a 1 or 2 0 if i = 3, 4, 5, 6 1 if i = 1 1 3 is, we define X's probability mass function as px () i =. X has 3 2 if i = 0 3 1 a Bernoulli distribution. Alternatively, we could just write that px () 1 = and 3 2 pmf of X px ( 0 ) =, or we could define the pmf using a table: 3 Value Prob 0 2/3 1 1/3 Applied Statistics: Probability 1-85

Discrete RV Example 2 A die is tossed until the occurrence of the first 6. Let the random variable X = k if the first 6 occurs on the kth roll for integer k > 0. What is the probability mass function (pmf) for X? In order for the first 6 to occur on the 5th toss, for example, we must have the event AAAA6 occur where A means any result other than 6. Clearly, these represent a sequence of 5 Bernoulli trials where success = 6 and failure = 1 through 5. Each trial is independent; thus, the probability of this particular result is of the first 6 on the kth roll px k 1 4 5 1 625 = = 0.08. Similarly, the probability 6 6 7776 is k 1 5 1. This defines the pmf, 6 6 5 1 ( k) =. This is a particular instance of the geometric distribution. 6 6 Applied Statistics: Probability 1-86

Useful die illustrations (1) Roll a die once and the probability of getting any one number (choose one of six) is 1/6. 1 The uniform distribution for X : px ( k) =, k = 1, 2,..., 6. 6 (2) Roll a die n times and count the number of times, k, that you get, say, a 2. This is k n k k n k n 1 5 n 1 5. The binomial distribution for X : px ( k), k 0,1,... n. k = 6 6 k = 6 6 (3) Roll a die once, twice,... until you get, say, a 2 for the first time. Suppose that the first time you get the 2 is on the kth roll. The probability of this is ( ) distribution for X : p k X k 1 5 1 =, k = 1, 2,... 6 6 k 1 5 1. The geometric 6 6 Applied Statistics: Probability 1-87

Probability of Sets & Intervals For a discrete RV, X, and any set of real numbers, A, we can write: PX ( A) = p ( x). x A i X i If A= ( a, b), we write: P( X A) = P( a< X< b). If A= ( a, b], we write: P( X A) = P( a< X b), etc. For any real number xthe probability that the random variable X takes a value in the interval (, x] is especially important and is denoted as: FX( x) = P( < X x) = P( X x) = px( t), where the last equality holds t x only for discrete RVs X. The function F is called the cummulative distribution function (or just the distribution function ) of X. Applied Statistics: Probability 1-88

Simple cdf example Let X be a random variable with pmf given by: 1 2 5 px () 1 =, px ( 2 ) =, and px ( 3 ) =. 8 8 8 Then the cdf, F F X, is given by: 0 for x < 1 1 for 1 x < 2 8 ( x) = 3 for 2 x < 3 8 1 for x 3. X NOTICE that F simply adds up the probability mass function's range values as it gets to them (starting from - and moving towards + ). F starts at 0 (for a discrete random variable like this one) and adds up the probabilities until it ends at 1. The function F is defined for ALL REAL NUMBERS. (Its domain is all reals.) The range of F is always between 0 and 1. The "meaning" of F is that " F ( x) is the probability that X x." X random variable the graph of F is a step function. X For a discrete Applied Statistics: Probability 1-89

Cumulative Distribution Function Properties Important: Pa ( < X b) = F( b) F( a). (F1) 0 F ( x) 1. X (F2) Fx ( ) is an increasing function of x. (F3) lim Fx ( ) = 0 and lim Fx ( ) = 1. x - x + i 1 X (F4) For discrete X that has positive probability only at the values x, x... X 1 2 Fhas a positive jump at x equal to p ( x) and takes a constant value in the interval [ x i X i, x). Thus, it graphs a step function. i Cumulative distribution functions of discrete RVs grow only by jum ps, and cumulative distribution functions of continuous RVs have no jumps. A RV is said to be of mixed type if it has continuous intervals plus jumps. Applied Statistics: Probability 1-90

Bernoulli Distribution The RV, X, is Bernoulli (or has a Bernoulli distribution) if its pmf is given by p = p (0) = qand 0 p = p (1) = pwhere p+ q= 1. 1 X X The corresponding CDF is given by: Fx () = { 0 for x< 0 q for 0 x < 1 1 for x 1. Example: Roll a die once. Let X=1 if the result is 1 or 2. Let X=0 otherwise. This is a Bernoulli trial with p=1/3 and q=2/3. Applied Statistics: Probability 1-91

Bernoulli pmf Bernoulli Distribution p=0.5 0.6 0.5 0.4 0.3 0.2 Bernoulli Distribution p=0.5 0.1 0 0 1 Applied Statistics: Probability 1-92

Bernoulli cdf p=0.5 0 for x < 0 Write as: F( x) = 0.5 for 0 x< 1 1 otherwise. Notice that the cdf is defined for all real numbers, x. Applied Statistics: Probability 1-93

Discrete Uniform Distribution Let X be a random variable that can take any of n values x, x,..., x 1 with equal probability. The RV X is said to have a Discrete Uniform n Distribution, and has pmf given by: p ( x ) = X i { 1 for i= 1,2,..., n n 0 otherwise. If we let X take on the integer values 1,2,..., n, then its distribution function is given by 0 for x < 1 x 1 x FX ( x) = = for 1 x n i= 1 n n 1 for x> n. 1 2 n Applied Statistics: Probability 1-94

Discrete Uniform pmf n=10 Discrete Uniform pmf 0.12 0.1 0.08 0.06 0.04 Discrete Unifrom n=10 0.02 0 1 2 3 4 5 6 7 8 9 10 Applied Statistics: Probability 1-95

Discrete Uniform cdf n=10 0 for x < 1 x x FX( x) = px( xi) = for 1 x 10 i= 1 10 1 otherwise. Applied Statistics: Probability 1-96

Binomial Distribution Let Y denote the number of successes in n Bernoulli trials. n The pmf of Y is given by: p = P( Y = k) = P ( k) = k n Y n n { n k n k ( k ) p ( 1 p) for 0 k n, k an integer, 0 otherwise. The random variable Y n is said to have a binomial distribution if 0 for t < 0 t [ ] ( ) ( n) i n i P Yn t = FY t = (1 ) for 0 n i p p t n i= 0 1 for t > 0. Example: Toss a coin 10 times and count the total number of heads. This is binomial with p=0.5. Applied Statistics: Probability 1-97

Probability Mass Function Binomial n=10 p=0.5 3.00E-01 2.50E-01 2.00E-01 1.50E-01 1.00E-01 probabilities 5.00E-02 0.00E+00 0 1 2 3 4 5 6 7 8 9 10 Applied Statistics: Probability 1-98

Binomial cdf n=10 p=0.5 0 for t < 0 PY t F t p p t 1 otherwise. t ( 10 ) i 10 i [ n ] = Y ( ) = (1 ) for 0 10 n i i= 0 Applied Statistics: Probability 1-99

Geometric Distribution Consider any arbitrary sequence of Bernoulli trials and let Z be the number of trials up to and including the first success. Z is said to have a geometric distribution with pmf given by p i = q p i 1 Z () for i = 1,2,... and probabilities p+ q = 1 This is i-1 well-defined because pq = = 1. i= 1 p 1 q The distribution function of Z is given by F Z 0 for t < 1 t () t = p p = p t i= 1 i 1 t (1 ) 1 (1 ) for 1. Example: See the previous example concerning rolling 1 die until a 6 occurs. Applied Statistics: Probability 1-100

Geometric pmf p=0.5 Geometric pmf Example 0.6 0.5 0.4 0.3 geom p=0.5 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 Applied Statistics: Probability 1-101

Geometric cdf p=0.5 B F Z () t 0 for t < 1 = p p = p t > t i= 1 i 1 t (1 ) 1 (1 ) for 1. Applied Statistics: Probability 1-102

Regarding Problem 2-95 A Useful Result: ( ij ) = ( ij i ) ( i ) P a A P a A P A A A 2 A 1 a A n a 21 11 a a n1 a 22 12 a n2 a 1(m1) a 2(m2) a n(mn) Proof: ( ij ) ( ) ( ) ( ) ( ) ( ) ( ij ) P a A P a P( aij A) = = because aij A P A P( A) i ( ij i ) ( i ) ( ) ( ) P aij P Ai P aij Ai P Ai A = = P A P A P A P A ( ) ( ) i same reasons as above = P a A P A A by definition of conditional probability. Applied Statistics: Probability 1-103

Poisson Distribution A random variable, X t, has a Poisson Distribution with parameter λ>0 if its pmf is given by: k λt ( λt) ( e ) PX ( t = k) = for k= 0,1,... and t 0. (A distinct RV for each t.) k! NOTE: The Poisson is typically used to model the number of jobs arriving during time t in a time-share system, the arrival of calls at a switchboard, the arrival of messages at a terminal, etc. The parameter λ is then interpreted as an arrival rate "per unit time." That is, if t is in seconds, then λ must be the average arrivals per second. The cumulative distribution function is: 0 for x < 0 F k X ( x) = x λt ( ) ( t λt e ) for x 0. k = 0 k! Notice that in mathematical notation λ does not typically appear on the left-hand side even though the function is unspecified without it. Applied Statistics: Probability 1-104

Packet Arrival Example ---- Packet Arrivals ---- X 1 X 2 X 3 X 4 X N Each of the random variables X 1 X 2 XN k λ ( λ) ( e ) has a Poisson distribution; thus, PX ( i = k) = for i= 1, 2,..., n. k! If Y represents the total number of arrivals during any time t, then t k λt ( λt) ( e ) PY ( t = k) =, per the previous slide. k! Applied Statistics: Probability 1-105

Poisson pmf (x,3,1) Applied Statistics: Probability 1-106

Poisson cdf λ = 3 and t = 1. Applied Statistics: Probability 1-107

Poisson Example Connections arrive at a switch at a rate of 11 per ms. The arrival distribution is Poisson. (a) What is the probability that exactly 11 calls arrive in one ms? (b) What is the probability that exactly 100 calls arrive in 10 ms? (c) What is the probability that the number of calls arriving in 2 ms is greater that 7 and less than or equal to 10? Let X be the random variable giving the number of arrivals during t ms. We know t [ k] k 11t ( 11t) ( e ) [ ] k λt ( λt) ( e ) that Xt has a Poisson distribution, which implies that P Xt = k =. The k! arrival rate is 11 ; thus, P X ms [ X ] 2 t = = k 10 11 2 ( 11 2) ( e ) k! [ ] with t in ms. [ ] 11 11 ( 11) ( e ) (a) Probability of exactly 11 arrivals in one ms is P X1 = 11 = = 0.119. 11! 100 11 10 ( 11 10) ( e ) (b) Probability of 100 calls in 10 ms is P X10 = 100 = = 0.025. 100! (c) P 7 < 10 = k = 8 k! 8 2 8 22 1 22 22 22 794 = 0.003. 22 + + = 22 = e 8! 9! 10! e 10! Applied Statistics: Probability 1-108

Summary for Discrete Random Variable, X To define a pmf when X takes integer values write the following: n k n k "The pmf is px ( k) = an expression often involving k, such as ( p) ( 1 p)." k This means the probability X = k. Be sure to specify all possible values of k, such as "for integers k = 1,2,..., n." Be sure to use the values of other parameters ( pn,, λ...) that are correct for this particular problem. To define a cdf write the following: "The cdf is: F X 0 for k < whatever min x ( the expression for px k ) for 1 x n " k = 1 1 for k > n ( x) = ( ) This means the probability X x. Applied Statistics: Probability 1-109

Practice Quiz 3 A college student phones his girlfriend once each night for three nights. The 1 probability that he reaches her is anytime he calls. Suppose that the random 3 variable X equals the number of nights (out of three) that he is able to reach her. 1. What is the pmf for X? What is the name of X ' s distribution? 2. What is the cdf for X? 3. Now suppose that this student will phone once each night until the first night that he is able to reach his girlfriend. Let Y be a random variable that equals the number of nights that it takes him to reach her for the first time. (For example, Y = 2 if she doesn't answer the first night but does answer the second night.) What is the pmf for Y? What is the name of Y ' s distribution? Applied Statistics: Probability 1-110

Suppose values of X are not integers. You may have to list each possible value and its probability. For example, suppose that 1 1 1 1 X = with probability, X = 1 with probability, X = π with probability. 2 6 3 2 You can define the pmf in a table: x p ( x) X 1 2 1 π 1 2 1 6 1 3 This mean the probability X = x. Applied Statistics: Probability 1-111

The cdf for this example F X This means the probability X. 1 0 for x < 2 1 1 for x < 1 ( x) = 6 2 1 for 1 x < π 2 x 1 for x π. Applied Statistics: Probability 1-112

Defining a pmf in Mathcad ORIGIN:= 1 Let X be a random variable with a discrete uniform distribution on the integers 1,2,...,10. Its pmf is given by p X (i)=0.1 for i=1,2,...,10. We cannot use the subscript X in Mathcad so we can simplify to p(i) or px(i), if needed for clarity. To enter this pmf in Mathcad we write the following. i:= 1, 2.. 10 p := 0.1 i We have created a probability mass vector p = 1 2 3 4 5 6 7 8 9 10 1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 Applied Statistics: Probability 1-113

Define pmf continued 10 i i= 1 ( i k) Alternatively, we can define px(i) = 0.1 = and avoid storing the vector of tenths. This could avoid storing a large vector of probabilities. Applied Statistics: Probability 1-114

To define the corresponding cdf The cdf can now be defined as: FX() x := 0 ( x< 1) floor() x + px() i ( 1 x 10) + 1 ( x> 10) i = 1 1 0.8 FX( x) 0.6 0.4 0.2 0 0 5 10 x Applied Statistics: Probability 1-115

MathCad Defined cdf Many cdf's are included in Mathcad. For example, the binomial distribution (illustrated with n=20 and p=0.4.) Applied Statistics: Probability 1-116

Mean and Variance of a Discrete Random Variable Definition 2 2 Working formula: Var( X ) = E( X ) E ( X ). Applied Statistics: Probability 1-117

Mean and Variance of a Discrete Random Variable Figure 3-5 A probability distribution can be viewed as a loading with the mean equal to the balance point. Parts (a) and (b) illustrate equal means, but Part (a) illustrates a larger variance. Applied Statistics: Probability 1-118

Mean and Variance of a Discrete Random Variable Figure 3-6 The probability distribution illustrated in Parts (a) and (b) differ even though they have equal means and equal variances. Applied Statistics: Probability 1-119

Example 3-11 Applied Statistics: Probability 1-120

Properties of E(X) and Var(X) If X and Y are discrete random variables and a and b are real numbers, then * EaX ( ) = aex ( ) * 2 * Var( ax ) a Var( X ) ( ) E( X + Y ) = E( X ) + E( Y ) and E ax + by = ae( X ) + be( Y ) = * Var( X + Y ) = Var( X ) + Var( Y ), only when X and Y are independent. 2 2 ( + ) = + ( ) and Var ax by a Var( X ) b Var Y, only when X and Y are independent. Continuing the example from previous slide: E(5 X) = 5(12.5) = 62.5 Var(5 X ) = 25(1.85) = 46.25 Applied Statistics: Probability 1-121

Expected Value vs Average Suppose we take 10 playing cards numbered Ace, 2,3,4,5,6,7,8,9,10 and arrange them randomly, face-down, on a table. If we choose one at random (and then replace and reshuffle), it is equally likely that we get any value between 1 and 10. If the random variable V gives the value obtained, then V has a discrete uniform distribution on the integers 1 through 10. If we were asked, "What is the average face value of 1+2+3+4+5+6+7+8+9+10 these 10 cards?", we would compute = 5.5. If we're 10 asked "What is the expected value of V?", we find 1 + 2 + + 10 = 5.5. 1 1 1 ( ) ( ) ( ) 10 10 10 In fact, for any random variable with discrete uniform distribution (like our "die") the expected value is the same as the average of all possible values. This is NOT TRUE for other distributions. If we actually perform the experiment by drawing 10 times, we are not likely to get each value exactly once. For example, we might draw 1,5,2,6,8,9,7,2,3,6. The average of these outcomes is 4.9 - NOT 5.5. The more times we repeat the draw, the closer we are likely to get to the expected average of 5.5. So you might think of the expected value of a random variable as the value expected from averaging many outcomes. Applied Statistics: Probability 1-122

Binomial Mean and Variance The mean of a binomial distribution with parameter p is n n n k n k n k n k EX ( ) = k p(1 p) = k p(1 p). Let m= k 1 to get k= 0 k k= 1 k n 1 m+ 1 n n 1 m+ 1 n 1 m n m+ 1 ( 1) (1 ) ( 1) (1 = m+ p p = m+ p p) m= 0 m + 1 m= 0 ( m + 1! ) n 1 m ( n 1) = np p p = np m! m= 0 m n 1 m (1 ). Similar work will show that Var( X) = np(1 p), but there are much easier ways to show this. n 1 m Applied Statistics: Probability 1-123

Exercise The interactive computer system at Gnu Glue has 20 communication lines to the central computer system. The lines operate independently and the probability that any particular line is in use is 0.6. What is the probability that 10 or more lines are in use? What is the expected number of lines in use? What is the standard deviation of lines in use? Applied Statistics: Probability 1-124

Binomial Revisited Recall that the Binomial RV X = X + X +... + X where each X has a 1 2 Bernoulli distribution and is mutually independent with the others. Because E( X ) = p, it follows trivially that E( X) = np. Because of i independence we can write that ( n 2 ) = ( i) also. ( i) = ( i ) 2 ( ) = 2 = (1 ). i= 1 Var X Var X Var X E X E X p p p p It follows simply that Var( X ) = np(1 p). n i Applied Statistics: Probability 1-125

Poisson Mean and Variance The mean of a Poisson distribution with parameter λ > 0 is k λ k 1 λ e λ λ λ λ k = λe = λe e = λ. k! ( k 1)! k= 0 k= 1 Similarly, ( ) k 1 λ e λ = = k! ( k 1)! k λ 2 2 λ EX k λe k k= 0 k= 1 k 1 k 1 λ λ λ = λe [ ( k 1) + ] ( k 1 )! ( k 1)! k = 1 k=1 k 2 k 1 2 λ λ λ λ 2 = λ e + λe = λ + λ. ( k 2)! ( k 1)! k = 2 k=1 X E X E X λ λ λ λ 2 2 2 2 It follows that Var( ) = ( ) ( ) = + =. NOTE : The mean and variance of the Poisson random variable, Xt is λt. Applied Statistics: Probability 1-126

Exercise Suppose it has been determined that the number of inquiries that arrive per second at the central computer system can be described by a Poisson random variable with an average rate of 10 messages per second. What is the probability that no inquiries arrive in a 1-second period? What is the probability that 15 or fewer inquiries arrive in a 1-second period? What are the mean and variance of the number of arrivals in 1 second? Applied Statistics: Probability 1-127

Geometric Mean and Variance k 1 k 1 ( ) = (1 ) = (1 ). EX kp p p k p k= 1 k= 1 1 1 k k 1 Write s( p) = (1 p) =. Then s ( p) = k(1 p) =. 2 k= 0 p k= 1 p 1 1 Thus, EX ( ) = p. 2 = p p (1 p) Homework: Use similar technique to show that Var( X ) =. 2 p Applied Statistics: Probability 1-128

Discrete Uniform Distribution Let X be a random variable with a discrete uniform distribution on the integers k= 1 k= 1 k= 1 k= 1 ( 1) n n k 1 n n+ n+ 1 1, 2,..., n. Then E( X) = = k = =. n n 2n 2 ( )( ) ( ) ( )( ) n 2 n 2 k 1 2 n n+ 1 2n+ 1 n+ 1 2n+ 1 Similarly, E( X ) = = k = =. n n n 6 6 Therefore, Var( X ) ( n+ 1)( 2n+ 1) ( n+ 1) 2( n+ 1)( 2n+ 1) 3( n+ 1) 2 2 = = 6 4 12 12 2 ( n+ 1)( 4n+ 2 3n 3) ( n+ 1)( n 1) n 1 = = =. 12 12 12 11 Example: Let X be uniformly distributed on 1,2,...10. Then E( X) = and 2 99 33 Var( X ) = =. 12 4 Applied Statistics: Probability 1-129

Hypergeometric Distribution Suppose that a set of n objects includes k objects of type 1 (successes?) and n k objects of type 0 (failures perhaps?). A sample of size m is selected from the n objects "without replacement," where m n (and k n). Let X be the random variable that denotes the number of type 1 objects in the sample. Then X is said to be a hypergeometric random variable and its pdf is given by: k n k i m i for i = max 0, m+ k n to min k, m px () i = n m 0 otherwise Values of i (examples): (1) n= 20, k = 5 (type 1), m= 3 (sample size) i = 0,1,...,3 (2) Same as (1) but m= 7 i = 0,1,...5 (3) Same as (1) but m= 17 i = 2,3...,5. { } { }. Applied Statistics: Probability 1-130

Text problem 3-101 A company employs 800 men under the age of 55. Suppose that 30% carry a marker on the male chromosome that indicates an increased risk for high blood pressure. (a) If 10 men in the company are tested for the marker in this chromosome, what is the probability that exactly 1 man has the marker? Answer: Notice that this is certainly sampling without replacement. (We don't put the first man back into the pool before we draw the second one.) Let X be the number of men that have the marker in a sample of size 10. X is hypergeo- 240 560 1 9 metric. Thus, px (1) = = 0.12. 800 10 Applied Statistics: Probability 1-131

Text problem 3-101 Continued (b) If 10 men are tested for the marker, what is the probability that more than 1 has the marker? Answer: Out of 10 the number with the marker can be 0,1,2,...,10 (because 240 560 i 10 i i = 0,1,...,10 a total of 240 have the marker). Thus, px ( i) = 800 10 0 otherwise. 10 1 The answer is either p ( i) or 1 p ( i) = 0.852. X i= 2 i= 0 X Applied Statistics: Probability 1-132

Mean and Variance of Hypergeometric If X is a hypergeometric random variable with parameters n (total objects), k (number of type 1 objects), and m (sample size), then μ = E( X) = mp and 2 n m k σ = var( X) = mp(1 p), where p = (the proportion of type 1 n 1 n objects in the total). 240 Example: In the previous problem EX ( ) = 10 = 3 and 800 240 240 800 10 Var( X ) = 10 1 = 2.076. 800 800 799 Applied Statistics: Probability 1-133

Continuous Random Variables and Moments of Random Variables G. A. Marin For educational purposes only. No further distribution authorized. Applied Statistics: Probability 1-134

Continuous Cumulative Distribution Function The CDF F of a random variable X is defined to be the function X F ( x) = P( X x), < x<. X The subscript is dropped if there is no abiguity. A continuous random variable is characterized by a distribution function that is a continuous function of x for all x. If the distribution function has a derivative at all except, possibly, a finite number of points, then the random variable is said to be absolutely continuous. Example: This means the probability X x. 0, x < 0 FX ( x) = x,0 x< 1 1, x 1. Applied Statistics: Probability 1-135

Properties of CDF * 0 F( x) 1, < x< * F( x) is an increasing function of x. * F( x) = 0 and F( x) = 1. lim lim x - x + Applied Statistics: Probability 1-136

Probability Density Function df( x) For a continuous (differentiable) random variable, X, f( x) = is called the dx probability density function (pdf) of X. Thus, discrete random variables have a probability mass function and continuous random variables have a probability density function. The cumulative distribution function is used in both cases (or in "mixed" cases). We obtain the distribution function from the density function through integration: x P( X x) = F( x) = f( t) dt, < x<. - - The pdf satisfies the following properties: (1) f( x) 0 for all x. (2) f( x) dx= 1. This means NOTHING except through integration. Note: in most of our problems The pdf will be defined piecewise. Applied Statistics: Probability 1-137

Example The probability density function f is given as: k 2 for x > 2 x f( x) = 0 otherwise. What is the value of k? What is the corresponding cdf? Applied Statistics: Probability 1-138

Probabilities on Intervals and cdf Suppose X is a continuous RV with pdf given by f and cdf given by F. This [ ] [ ] [ ] [ ] [ ] implies that for any real number xp, X x = Fx ( ) = f( tdt ). It also implies that Pa X b= Fb ( ) Fa ( ) P a<x b = F( b) Fa ( ) Pa X< b= Fb ( ) Fa ( ) Pa< X< b= Fb ( ) Fa ( ). All 4 cases hold because, for a continuous random variable X, [ ] [ ] P X = a = P X = b = 0. That is, the probability of any particular value is zero in this case. [ ] [ ] Note that it is traditional to write F( x) = P X x is continuous, it is also true that F( x) = P X < x. x. If the distribution Applied Statistics: Probability 1-139

Exponential Distribution A random variable has an exponential distribution if for some λ>0 its distribution function is given by: λx 1 e, if 0 x< F( x) = 0 otherwise. It follows that its pdf is given by: f( x) = λx λe, if x 0 0 otherwise. Note that in most problems the parameter λ represents a "rate," such as a rate of arrivals or a rate of failures. Examples of use: Interarrival times at a communication switch Service times at a server Time to failure or repair of a component. Applied Statistics: Probability 1-140

Exponential pdf λ = 2 f( x) x λe λ, if x 0 = 0 otherwise. Note the values of pdf mean nothing except through integration. Applied Statistics: Probability 1-141

Exponential cdf λ = 2 λx 1 e, if 0 x< F( x) = 0 otherwise. Applied Statistics: Probability 1-142

Class Problem Suppose that we stand at a mile marker on I-4 and watch cars pass. We notice that on the average 10 cars pass by us per minute and we're given that the time lapse between two consecutive cars has an exponential distribution. If we begin timing at the moment that one car passes by, what is the probability that we will have to wait more than 20 secs for the next car to pass? Answer. Let W be waiting 1 1 time in minutes. We seek P W > = 1 F( ), 3 3 λt where Ft ( ) = 1 e, is the exponential cdf. The average "rate" is λ=10; 10 1 3 thus, the answer is P W > = e = 0.036 3 Note 20 sec = 1/3 min. Applied Statistics: Probability 1-143

Simple Exercises Use F in the previous problem to write: Probability W<6 Probability W>6 Probability W<0 Probability W<-1 Probability 2<W<5 Probability W=1. Applied Statistics: Probability 1-144

Memoryless Property If X has an exponential distribution, x> 0, and t > 0, then we know that x t+ x λy λy PX ( x) = λe dyand PX ( t+ x) = λe dy. 0 0 λ y λe dy P ( X t+ x) ( X > t) t Thus, PX ( t+ x X> t) = =. PX ( > t) λ y λe dy λt λx e (1 e ) λx = = 1 e = P( X x). λt e This is why we don't replace lightbulbs until they fail. (Would you like it if waiting time at your doctor's office was exponentially distributed?) t+ x t Applied Statistics: Probability 1-145

Exponential/Poisson Relationship Show that the time between adjacent arrivals of a Poisson Process has an exponential distribution. Hint: If N denotes the number of arrivals during time t t and N has a Poisson distribution, then the probability t that waiting time to the next event is greater than t is P[ W > t], where W is waiting time, and P[ W > t] = P[ N t = 0]. t 0 arrivals during time t. 4 arrivals during time t. Applied Statistics: Probability 1-146

Continuing k λ λ e We've seen that PW ( > t) = P( Nt = 0 ), and we know that P( Nt = k) =. k! 0 λ λ e λ λ Thus, PW ( > t) = = e, which means PW ( t) = 1 e. Whas an 0! exponential distribution. Applied Statistics: Probability 1-147

Properties of Gamma Function On the next slide we introduce the gamma distributions, which is a family of distributions that includes the exponential distribution. That definition incorporates something called the gamma function, which is defined as α 1 x ( ), 0. Γ α = x e dx α > 0 1 ( ) 2 ( ) ( ) ( ) Using integration by parts one can show Γ α = α 1 Γ α 1 for α > 1. Because Γ (1) = 1, it follows that Γ ( n) = ( n-1) Γ ( n-1) =... = ( n-1)! when n is a positive integer. Note also that ( α ) Γ = π. Also, it is well known that α-1 λx Γ x e dx=, for α > 0 and λ > 0. We shall refer to the last equation 0 α λ as the "gamma integration formula." Applied Statistics: Probability 1-148

Example 2 4 Evaluate x xe dx. 0 ( ) ( α ) 1 Answer: Recall that α λx x e dx=, for λ>0 and α > 0. 0 α λ In this case α-1 = 2 α = 3 while λ=4. It follows that 0 xe 2 4x Γ 3 2! 1 dx= = =. 3 4 64 32 Γ Applied Statistics: Probability 1-149

Gamma Distribution A random variable with pdf given by α α 1 λt λ t e f() t =, α > 0, t > 0, λ > 0 Γ( α) Is said to have a Gamma distribution with parameters λ and α and we write X GAM( λ, α). The parameter α is called the shape parameter and the parameter λ is called the scale parameter. For α=1 the gamma becomes identical to the exponential distribution. NOTE: If a sequence of random variables X1, X2,..., X k are mutually independent and identically distributed as GAM( λα, ), then their sum has a GAM( λ, kα) distribution. Applied Statistics: Probability 1-150

Gamma Density: (,, ) gxλ α scale, shape Applied Statistics: Probability 1-151

Practice Quiz 4 The pdf of a random variable, X, is given by: 0 for x < 0 3 x fx ( x) = for 0 x 4 64 0 for x > 4. Answer each of the following and EXPLAIN (show your work). ( a) What is the cdf of X? (Find it explicitly.) ( > ) ( b) What is P X 2? () c What is P( X > 2 X > 1)? BONUS: Use the gamma integration formula to evaluate the following integral: 0 xe 3 2x dx Applied Statistics: Probability 1-152

Mean and Variance of a Continuous Random Variable Definition Applied Statistics: Probability 1-153

Expected Value (continuous example) 0 for x < 0 3 x Let fx ( x) = for 0 x 4 By definition 64 0 for x > 4. 4 3 5 x x 4 1024 16 E( X) = xfx ( x) dx= xi dx= = =. 64 5 64 0 5 64 5 0 4 ( ) 4096 32 E X = x f x dx= x dx= = = 64 6 64 0 6 64 3 4 3 6 2 2 2 x x X ( ) i 0 ( ) ( 2 ) 2 32 16 Var X = E X E( X) = = 0.427 3 5 2 Applied Statistics: Probability 1-154

Existence of E(X) Continuing a previous example: Let X be a random variable with pdf given by: f( x) 2 2 for x > 2 x = 0 otherwise. 2 2 Notice that EX ( ) = x dx= dx= 2ln x =. x x 2 2 2 2 Thus, well-defined random variables may not have finite means. Similarly, a random variable may have a finite mean and not have a finite 2nd moment or a finite kth moment (to be defined). Applied Statistics: Probability 1-155

Mean and Variance of a Continuous Random Variable Expected Value of a Function of a Continuous Random Variable Applied Statistics: Probability 1-156

Other expected values Let X be a random variable with pdf given by f. Let μ = E( X) = xf( x) dx. ( μ is called the mean of X.) Then 2 2 (a) E X = x f( x) dx. ( ) 2 2 ( b) E( X + 5X 2) = ( x 5x 2) f( x) dx. ( ) (c) E(sin X) = sin x f( x) dx. 2 (d) σ 2 2 ( ) ( ) ( ) ( ). = Var X = E X μ = x μ f x dx Variance of X Applied Statistics: Probability 1-157

Try these with/without Mathcad 0 for x < 0 π Let X be a random variable with probability density f ( x) = cos( x) for 0 x. 2 π 0 for x > 2 (a) Find the cdf of X. (b) Find E( X). (c) Find E(sin X). Applied Statistics: Probability 1-158

Exponential Mean and Variance Let X be an exponential random variable with parameter λ. 1 1 Then EX ( ) = λ and Var( X) = 2. λ α 1 λx Γ( α) Proof. Recall the gamma integration formula: x e dx=. α λ 0 Γ(2) 1 Then EX ( ) = xf ( xdx ) = xe dx= = X 0 λx λ λ. λ 2 λ 2 2 2 2 λx Γ(3) 2 Similarly, Var( X) = E( X ) μ and E( X ) = x λe dx= λ =. 3 2 0 λ λ 2 1 1 Thus, Var( X ) = =. 2 2 2 λ λ λ Applied Statistics: Probability 1-159

Exponential Example The lifetime of a particular brand of lightbulb is exponentially distributed. The "mean time to failure" (MTTF) is 2000 hours. If the lightbulb is installed at time t = 0, what is the probability that it fails at time t = 3000 hours? What is the probability that it fails in fewer than 3000 hours? Applied Statistics: Probability 1-160

Gamma Mean and Variance Recall that the pdf of a Gamma random variable is given by: Think scale** shape α α 1 λt λ t e f() t =, α > 0, t > 0 Γ( α) α The expected value (mean) of a Gamma random variable is. λ α The variance is. 2 λ Note that when α =1, the distribution is exponential. Applied Statistics: Probability 1-161

Class Exercise A gamma distribution has a mean of 1.5 and a variance of 0.75. Sketch the pdf. α α We have = 1.5 and = 0.75. First, solve for α and λ. From the first 2 λ λ 1.5λ equation we get α=1.5 λ. Substitute this in second equation to get = 0.75. 2 λ This gives λ=2 which implies α=3. This is all we need to obtain pdf values. 0.596 Gam( x) 1 0.5 Scale=2 and Shape=3 Important: Make sure that you can find the scale and shape parameters from a given mean and variance. 0 0 0 1 2 3 0 x 3 Applied Statistics: Probability 1-162

Continuous Uniform Distribution A continuous random variable X is said to have a uniform distribution over the interval (a,b) if its density is given by: 1, a< x< b f( x) = b a 0, otherwise. The distribution function is: 0, x< a x a F( x) =, a x < b, b a 1, x b. Applied Statistics: Probability 1-163

Continuous Uniform Density on (3,5) Applied Statistics: Probability 1-164

Continuous Uniform cdf on (3,5) Applied Statistics: Probability 1-165

Continuous Uniform Mean and Variance Let X be a random variable having the continuous uniform distribution over the interval ( ab, ). It's density is, thus, f( x) 1 b a for a< x< b = 0 otherwise. b 1 1 2 2 1 b+ a ( ) = b a = 2 ( )( ) =. b a 2 a EX x dx b a 3 3 2 1 b a Similarly, EX ( ) = 3. It follows that b a 3 3 2 2 b a 1 b+ a ( b a) Var( X ) = 3 =. b a 2 12 Recall that if X has a "discrete" uniform distribution on 1,2,..., n, then 2 n+ 1 n 1 EX ( ) = and VarX ( ) =. 2 12 Applied Statistics: Probability 1-166

Example: Suppose that the time that it takes to drive from the Orlando airport to FIT is uniformly distributed between one hour and one and a quarter hours. (a) What is the mean driving time? (b) What is the standard deviation of the driving time? (c) What is the probability that it will take less than one hour and 5 minutes to make the trip? (d) 80% of the time the trip will take less than minutes? Applied Statistics: Probability 1-167

Means and Variances (so far) Distribution E(X) Var(X) Bernoulli Binomial Geometric Discrete Uniform Poisson Exponential Gamma Continuous Uniform p p ( 1 p) np np ( 1 p) 1 ( 1 p) p n + 1 2 2 n 1 12 α (or λ ) α (or λ) 1 1 λ α λ b a λ 2 α λ 2 + ( ) 2 a b 2 12 p 2 Applied Statistics: Probability 1-168

Practice Quiz 5 1. Suppose that the driving time between Orlando and Melbourne is uniformly distributed between one hour and one hour plus 15 minutes. (a) What is the variance of the driving time.? (b) Eighty percent of all drivers make the trip in fewer than x minutes. What is x? 2. The random variable X has an exponential distribution. What is the pdf of X? What is the mean of X? What is the variance of X? (Just write down what we known the mean and variance are. Do not derive them from the definition.) 3. The random variable X has a Binomial distribution and the total number of trials is 30. (a) If the probability of success on a single trial is 0.2, write the pmf for X. (b) If EX ( ) = 2.1, then what is the probability of success on a single trial? Applied Statistics: Probability 1-169

Normal Distribution Definition Applied Statistics: Probability 1-170

Normal Distribution Figure 4-10 Normal probability density functions for selected values of the parameters μ and σ 2. Applied Statistics: Probability 1-171

Normal Distribution Some useful results concerning the normal distribution Applied Statistics: Probability 1-172

Normal Distribution Definition : Standard Normal Applied Statistics: Probability 1-173

Normal Distribution Example 4-11 I I Figure 4-13 Standard normal probability density function. Applied Statistics: Probability 1-174

Standard Normal Table 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 Applied Statistics: Probability 1-175 3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

Normal Distribution Standardizing Applied Statistics: Probability 1-176

Normal Distribution To Calculate Probability Applied Statistics: Probability 1-177

Normal Distribution Example 4-13 Applied Statistics: Probability 1-178

Normal Distribution Example 4-14 Note: In MathCad we simply compute: Mean and standard deviation Applied Statistics: Probability 1-179

Normal Distribution Example 4-14 (continued) Applied Statistics: Probability 1-180

In Mathcad [ ] We want to find the value of x such that P X x = 0.98. We use the inverse normal function in MathCad. x: = qnorm( p, μσ, ). x:= qnorm( 0.98, 10, 2) x = 14.107 Students should be able to solve problems like this and the one on slide 182 either by using MathCad or referring to a standard normal table. Applied Statistics: Probability 1-181

Normal Distribution Example 4-14 (continued) Figure 4-16 Determining the value of x to meet a specified probability. Applied Statistics: Probability 1-182

4-10 Weibull Distribution Definition Applied Statistics: Probability 1-183

4-10 Weibull Distribution Figure 4-26 Weibull probability density functions for selected values of δ and β. Applied Statistics: Probability 1-184

4-10 Weibull Distribution Applied Statistics: Probability 1-185

Recall Gamma Function The gamma function (studied previously) is defined as: α 1 x ( α) x e dx, α 0. Γ = > 0 1 ( ) ( ) ( ) ( ) Using integration by parts one can show Γ α = α 1 Γ α 1 for α > 1. Because Γ (1) = 1, it follows that Γ (n) = (n-1) Γ (n-1) =... = (n-1)! ( α ) α-1 λx Note also that Γ 2 = π. Also, it is well known that x e dx=. 0 α λ We shall refer to the last equation as the "gamma integration formula." Γ Applied Statistics: Probability 1-186

4-10 Weibull Distribution Example 4-25 Applied Statistics: Probability 1-187

4-11 Lognormal Distribution X is log-normal means ln(x) is normal. Applied Statistics: Probability 1-188

4-11 Lognormal Distribution Figure 4-27 Lognormal probability density functions with θ = 0 for selected values of ω 2. Applied Statistics: Probability 1-189

4-11 Lognormal Distribution Example 4-26 = 1 - Applied Statistics: Probability 1-190

4-11 Lognormal Distribution Example 4-26 (continued) Applied Statistics: Probability 1-191

4-11 Lognormal Distribution Example 4-26 (continued) Applied Statistics: Probability 1-192

Joint Probability Distributions When two or more random variables naturally occur together or take values that seem to be related, it is common to consider their joint probability distributions. Suppose, for example, that X and Y are two random variables whose outcomes we wish to consider jointly. In a manner similar to single random variables we consider two cases. (1) X and Y are discrete. Here we use a joint pmf and joint cdf (tbd). (2) X and Y are continuous. Here we use a joint pdf and joint cdf (tbd). Examples: (1) The location of a ship is given by its latitude and longitude. Suppose you are searching for a ship from its last known position. It is natural to consider the joint distribution of ( XY, ), the ship's probable position. (2) Similarly an aircraft's position might be predicted using a joint distribution of ( XYZ,, ), where ( XY, ) is as above and Zis altitude. Applied Statistics: Probability 1-193

Discrete Random Variables For discrete random variables X, X,..., X their joint probability mass function is given by: p ( x) = p ( x, x,..., x ) X X, X,... X 1 2 n 1 2 n 1 1 2 2 1 2 = PX ( = x, X = x,..., X = x) n n n If only two random variables are involved, we usually denote them as X and Y, instead of X, and X, and we write: XY, 1 2 p ( x, y) = P( X = x, Y = y). Note that the author writes the latter as f ( x, y) = P( X = x, Y = y). XY, Applied Statistics: Probability 1-194

Simple example The joint distribution of XY, is given as follows: ( ) 1 p ( ) XY, 1,1 =, 3 1 1 p ( ) ( ) XY, 2,1 = and pxy, 2, 2 = 6 6 1 1 1 p ( ) ( ) ( ) XY, 3,1 =, pxy, 3, 2 =, and pxy, 3,3 =. 9 9 9 Do the values of X and Y seem to "influence" each other? Applied Statistics: Probability 1-195

Marginal pmf [ X = x ] ( ) 1 2 ( ) If X1, X2,..., Xn are discrete random variables with joint pmf px1, X2,..., X x1, x2,..., x, n n then px ( x ) ( ) 1, 2,..., 1, 2,...,, where the sum is over the points in the i i = px X X x x x n n i range of X, X,..., X where X probability mass function for X. i n i ( ) Example: Using the previous slide the marginal pmf for X is given by: 1 1 1 px () 1 = p ( 2 ) ( 3 ). 1 X = p 1 X = 1 3 2 6 i = x. The function p x is called the marginal i X i The notation above is difficult; however, notice that, in the example, to get 1 ( x x ) you just sum over all the, pair values that have 1 as the value for X. 1 2 1 i 1 p X () Applied Statistics: Probability 1-196 1 1 = 3

Problem: Joint pmf for X1 and X2 X 2 = 1 X 2 = 2 X 2 = 3 X 1 = 1 X 1 = 2 X 1 = 3 1/12 1/6 1/12 1/6 1/4 1/12 1/12 1/12 0 Find: [ ] [ ] [ ] [ ] (a) P X X is even (b) P X is odd (c) P X 1.5 (d) P X is odd X is odd 1 2 1 1 2 1 Applied Statistics: Probability 1-197

Joint PMF and Independence For discrete random variables X, X,..., X their joint probability mass function is given by: p ( x) = p ( x, x,..., x ) X X, X,... X 1 2 n 1 2 n 1 1 2 2 1 2 = PX ( = x, X = x,..., X = x) n n n The discrete random variables X, X,..., X are said to be mutually X X 1 X 2 X n 1 2 1 2 independent if their joint pmf can be written as: p ( x) = p ( x ) p ( x ) iii p ( x ). n In the previous example are X and X independent? 1 2 n Applied Statistics: Probability 1-198

Joint cdf for Discrete RVs If X, X,..., X are discrete random variables, then their joint cumulative 1 2 distribution function is given by: 1 2 ( ) = ( ) F x, x,... x P X x, X x,..., X x. X, X,..., X 1 2 n 1 1 2 2 n n n n If there are only two random variables XY, XY, ( ) = ( ) F x, y P X x, Y y. ( ) involved we usually write: Returning to the "Simple Example" we find, for example, F 2 2,3 =. 3 Applied Statistics: Probability 1-199

Skip Section 5-1.3 We will not cover the material on conditional probability distributions that is in this section of the text. Applied Statistics: Probability 1-200

Double Integral: z z=f(x,y) R b f ( xyda, ) n( x) = amx ( ) f ( xydydx, ) Iterated Integral y f(x,y) Y=n(x) Y=m(x) x,y x=a R x=b x Applied Statistics: Probability 1-201

Example: Evaluate the double integral 2 xyda where R is the region bounded by the 2 curve y = x and the lines y = 0 and x= 2. R x 2 5 4 2 0 0 Region R 0 1 2 3 2 2 2 x 2 x 2 2 2 x xydydx x ydydx x y = = dx 0 0 0 0 0 0 Answer: 2 2 2 6 4 x 2 xx dx 6 0 3 0 = = = 32. Computes the volume under the surface z=2xy and above the region R. 0 x x = 2 3 Applied Statistics: Probability 1-202

3 Evaluate the double integral 4 x ydxdy where R is the region bounded by 2 y = x y = x and 2. R 10 10 8 x 2 2x 6 4 2 0 0 0 1 2 3 0 x 3 Applied Statistics: Probability 1-203

Extra Practice Evaluate the double integrals: 2 2 1. 3 for bounded by, 2, 1. R 3 4 3 2. 10 for bounded by, 0, 1. R 2 2 2 3. for bounded by, 2. R x y dxdy R y = x y = x x = x y dxdy R y = x y = x = ydxdy R y= x x= y ( ) ( ) ( ) 4. xydxdy for R the triangle with vertices 0,0, 1,1, 4,1. R 2 2 2 5. 12 for bounded by 1, 1, 0, 1. R 2 1 1 6. 3( y + 4 ) dxdy for R bounded by x= 1, x= 4, y =, y =. x x R x ydxdy R x = y x = + y y = y = Applied Statistics: Probability 1-204

Problem 3 5 4 y1( x) 3 y2( x) 2 (-1.353,1.831) The two curves are y1 = x 2 y2 = 2 x 1 (1,1) 0 2 1 0 1 2 3 x Iterated integral: 1 2 x 1.353 x 2 2 ydydx Applied Statistics: Probability 1-205

Joint Distribution of Continuous RVs The cumulative joint distribution of continuous random variables X and Y is defined by F ( x, y) = P( X x, Y y), < x<, < y<. XY, Properties: * 0 F( x, y) 1 * F( x, y) is monotone increasing in BOTH variables * P( a< X band c< Y d) = Fbd (, ) Fad (, ) Fbc (, ) + Fac (, ) lim Note: F ( x, y) = F ( x) is called the marginal cumulative distribution of X. y lim x XY, F ( x, y) = F ( y) is called the marginal cumulative distribution of Y. XY, X Y ( x) = P[ X x] = P[ X x Y < ] F ( x ) Think of F,. X Also: The marginal cdf is defined in the same manner for discrete random variables. X, Y Applied Statistics: Probability 1-206

Joint Probability Density If X and Y are both continuous random variables, there is often a function f such that F( x, y) = f ( u, v) dvdu. The function f is known as the joint probability density function. Note that x Pa ( < x bc, < y d) = f( x, y) dydx. ( ) ( ) y The marginal pdf of X is f ( x) = f( x, y) dy. Similarly, the marginal pdf of Y is f ( y) = f( x, y) dx. Y bd a c For the marginal cdfs we have: x F ( x) = f u du = f( u, v) dvdu X Y y F ( y) = f v dv= X Y x y X f ( u, v) dudv. Applied Statistics: Probability 1-207

Mathcad Example Applied Statistics: Probability 1-208

Example: Joint Density of X,Y 1 for (, ) 6 xy A Let f( x, y) = where A is the triangle shown. 0 otherwise What is the probability that X 2 and Y 2? y = 3 x 4 ( 1) A y = 3 x = 1 x = 5 Also, answer is ratio of area of small triangle to area of A. 3 [ 2, 2] 2 2 3( 1) 1 x 4 1 6 0 8 1 1 2 1 16 2 3 4 1 0 ( x 1) P X Y = dydx = y dx= ( x 1) dx 2 2 1 x = 8 x = 8 3 1 = = 6 48 16 This only works because f is constant above area A. 1. 1 6 Applied Statistics: Probability 1-209

( ) ( ) 3( x 1) 4 3 y 4 0 1 6 Marginal Density of X (previous example) f x = f x, y dy, for 1 x 5, (The value of x is held constant, and X X, Y ( x 1) 1 = dy = = 1. 6 0 8 ( x ) and the value of y is "integrated out.") 5 2 1 1 x Notice that f X ( x) dx= 8( x 1) dx= 8 x 2 1 as required for a pdf. 25 ( ) ( ) 1 1 = 8 2 5 2 1 = 1 5 1 Applied Statistics: Probability 1-210

Example: The random variables X and Y have the joint density function f( x, y) ( ) ( ) xy x y x y = 0 otherwise. Find F y, f x, and F(1,2). Y X 1 2 2 exp 2 ( + ) for > 0 and > 0 ( ) Solution : We begin by finding f x = f ( x, y) dy. ( ) 2 0 X, Y 2 2 2 x y x 1 2 2 2 2 2 0 2 0 f x = xy exp ( x + y ) dy = xe ye dy = xe. X y 2 By symmetry, fy ( y) = ye, also. This is Y's density function, and we can now find the cdf, F ( y) = te Y X y 2 2 2 t t y 0 y 2 2 2 dt = e = e 0 1. Applied Statistics: Probability 1-211

Finding F(1,2) Note that in the region of integration, x>0 & y>0, values of y do not depend on values of x. 1 2 1 2 2 (1, 2) XY, (1, 2) = exp 0 0 2 ( + ) F F xy x y dydx 2 2 2 2 2 x y x y 1 2 1 2 2 2 2 = xe ye dydx xe e dx 0 = 0 0 0 ( ) 1 2 ( 2 = 1 e xe dx 1 e ) = e 0 2 2 x x 2 2 1 = ( 2 1 ) 2 e 1 e 0.340. 1 0 Applied Statistics: Probability 1-212

xy x y x y Scatterplot of f( x, y) = 0 otherwise. 1 2 2 exp 2 ( + ) for > 0 and > 0 Applied Statistics: Probability 1-213

Lessons to Learn F(1,2)=P(X<1,Y<2) Integrate a pdf to get a cdf: x FX( x) = fx( t) dt. Limit out the other variables of a joint cdf to get a marginal cdf. lim XY y F, ( xy, ) = FX( x) Integrate out the other variables of the joint pdf to get a marginal pdf. ( ), (, ). f X x = f X Y x y dy Applied Statistics: Probability 1-214

Bivariate Normal pdf μx:= 0 σx:= 1 μy := 0 σy := 1 ρ = 0.6 ρ := 0 1 c := 2 π σx σy 1 ρ 2 Applied Statistics: Probability 1-215

Independent RVs Two random variables, X and Y, are independent if F ( x, y) = F ( x) F ( y) for < x< and < y<. XY, X Y If the corresponding density functions exist, this is equivalent to f ( x, y) = f ( x) f ( y) for < x< and < y<. X X Y EXAMPLE: { 2 2 1, x + y π 1 0, otherwise Let X and Y have joint pdf f( x, y) =. Determine the marginal pdf's of X and Y. Are X and Y independent? Applied Statistics: Probability 1-216

Solution (Independent RV s) { 1 2 2, x + y 1 π 0, otherwise We're given: f( x, y) = ; thus, the marginal density for X is 2 2 1 1 1-x 2 2 π π - 1-x π 2 1-x 2 X ( ) = (, ) = = = 1-, 1 1. - 1-x f x f x y dy dy y x x Because of the symmetry the marginal density of Y is 2 2 fy ( y) = 1 y, 1 y 1. π To check for independence notice that 1 2 2 4 fxy, (0,0) = and fx(0) fy(0) = =. π π π π Thus, X and Y are NOT independent. Applied Statistics: Probability 1-217

Sums of Independent Normals Let X X,..., X be independent random variables with distributions 1, 2 n given by X N( μ, σ ), then X N( μ, σ) i i i i i= 1 n n n 2 2 where μ= μi and σ = σi. Note that the Avg of the n random i= 1 i= 1 μ σ variables will have mean and standard deviation =. n n Applied Statistics: Probability 1-218

Example: Sum of Normals Suppose that XYW,, are three normally distributed random variables. With means 1,2,3 (respectively). The variance of each is 3. Let S = X + Y + W. Find PS ( 7). We know that S has a normal distribution with mean 6 and variance 7 6 1 3 + 3+ 3 = 9. Thus, P( S 7 ) = P Z = P Z, where Z has a 3 3 1 standard normal distribution. From a standard normal table: P Z 0.6295. 3 Applied Statistics: Probability 1-219

Sums of Random Variables with Binomial or Poisson Distributions THEOREM: Let X, X,..., X be mutually independent. i= 1 1 2 1 2 (a) If X has a binomial distribution with parameters n and p, i r i and n= n + n +... + n. r then X has a binomial distribution with parameters p r (b) If X has a Poisson distribution with parameter α, then r i X has a Possion distribution with parameter α = α. i i= 1 i= 1 i i r i Applied Statistics: Probability 1-220

Keeping Sums in the Family A sum of independent normals is normal. A sum of independent Poissons is Poisson. A sum of independent exponentials is Erlang. A sum of independent gammas is gamma. A large sum of independent, identically distributed RVs is approximately normal. Applied Statistics: Probability 1-221

Erlang Distribution When r sequential phases have identical exponential distributions, then the resulting density is known as an r-stage Erlang and is given by: r r 1 λt λ t e f( t) =, for t > 0, λ > 0, r = 1, 2,... ( r 1)! The distribution function is given by: r 1 k ( λt) λt Ft ( ) = 1 e t 0, λ > 0, r= 1, 2,... k = 0 k! Note that the exponential is a special case of the Erlang distribution with r = 1. One may also think of this as the distribution function for the sum of r mutually independent random variables each with the exponential distribution with parameter λ. Applied Statistics: Probability 1-222

Sums of r exponentials Suppose that the waiting time between arrivals of customers at the post office is exponentially distribution with parameter λ (the arrival rate per unit time). Let W be the waiting time until the rth arrival occurs. Then r [ ] = [ or more arrivals occur during time ] PW t Pr t r k = 0 ( λt) r 1 λt e = 1 Pr [ 1 or FEWER arrivals occur during time t] = 1 ; k! thus, W has an r stage Erlang distribution. r k The last result is true because the arrivals are Poisson. Applied Statistics: Probability 1-223

Example Suppose that packets arrive at a switch with a Poisson distribution that has a rate of 100 packets per second. Starting at any particular point in time let T be the elapsed time until the arrival of the 5th packet. What is the probability that 5ms Solution. T 10ms? Because arrivals are Poisson, the waiting time between arrivals is exponential with parameter 100 1000 pkts sec ms sec =0.1. The distribution of T will be Erlang 4 k 0.1t and given by: Ft ( ) = 1 e. Thus, answer is F(10) F(5). k = 0 (0.1 t) k! Applied Statistics: Probability 1-224

Functions of Multiple Random Variables Let X, X,..., X be random variables defined on the same 1 2 probability space and let Y= φ( X, X,..., X ). Then EY ( ) = n 1 2... φ( x1, x2,..., xn) p( x1, x2,..., xn) x1 x2 xn E[ φ( X1, X2,..., Xn) ] = or... φ( x1, x2,..., xn ) f ( x1, x2,..., xn) dxdx 1 2... dx - where the top line corresponds to the discrete case and the bottom line corresponds to the continuous case. n n Applied Statistics: Probability 1-225

Discrete Example The joint pmf for X and Y is given in the table. Y=1 Y=2 Y=3 X=1 1/12 1/6 1/12 X=2 1/6 1/4 1/12 X=3 1/12 1/12 0 ( 2 + Y) Find E X. Applied Statistics: Probability 1-226

Continuous Example 1 for (, ) 6 xy A Let f( x, y) = where A is the triangle shown 0 otherwise y = 3 y = 3 x 4 ( 1) A x = 1 x = 5 ( φ ) Let φ XY, = X + Y. What is E XY,? ( ) 2 ( ) 5 ( φ ( )) ( ) 3 4 ( x 1) Solution: E X, Y = x + y dydx= 92. 1 0 2 1 6 Applied Statistics: Probability 1-227

Try this one The random variables X and Y have a pdf given by the following: xy for ( xy, ) the region f( x, y) = 144 0 otherwise. R + = 2 The region R is bounded by the curve y 4 x, the x axis, and the line Find each of the following: (a) ( ) E Y x = 4. 2 2 (b) EX ( Y) ( ) ( ) (c) f x, f y X (d) F ( x), F ( y). X + Y Y Applied Statistics: Probability 1-228

Mathcad Solution Applied Statistics: Probability 1-229

Applied Statistics: Probability 1-230

Applied Statistics: Probability 1-231

Linearity of Expectation Suppose that X and Y are random variables and that a and b are any two real numbers, then EaX ( + by) = aex ( ) + bey ( ). In previous example find E(3X + 2 Y). Use this property to derive the working 2 2 2 Formula for variance: σ = EX ( ) E( X). 2 Also notice that Var( ax ) = a Var( X ) and that Var ax by a Var X b Var Y X Y 2 2 ( + ) = ( ) + ( ) if and are independent. Applied Statistics: Probability 1-232

Examples 2 Given ( ) = 5 and ( ) = 30, find ( ). EX VarX EX [ ] 2 2 2 Ans: VarX ( ) = EX ( ) EX ( ) EX ( ) = 30 + 25 = 55. Let Y = 2X + 3. What is σ Y? Ans: Var( Y ) = Var(2X + 3) = Var(2 X ) + Var(3) = 4 Var( x) + 0 = 4 55 = 220. Thus, σ = 220. Y Because a random variable is always independent of a constant. Because the variance of a constant is always 0. Applied Statistics: Probability 1-233

Independence and Expectation If X, X... X are mutually independent random variables, then 1 2 n n n n n E( X ) = E( X ) and E[ φ ( X )] = E[ φ ( X )] i i i i i i i= 1 i= 1 i= 1 i= 1 where the φ are real-valued functions for i = 1, 2,..., n. i Using this property it is easy to show that if X and Y are independent, then Var( X + Y ) = Var( X ) + Var( Y ). Example: If X and Y are independent random variables with E( X) = 5 and EY ( ) = 3, then E( XY) = E( X) EY ( ) = 15. Equality holds because X and Y are independent! Applied Statistics: Probability 1-234

Covariance of X and Y We define Cov( X, Y ) = E[( X E( X ))( Y E( Y )]. ( μ ) 2 Recall the definition that Var( X) = E X X, where μx = E( X). We could also write Var( X) = E[( X E( X))( X E( X))] = Cov( X, X). In other words, covariance is a generalization of variance to multiple random variables. For covariance we also have the working formula Cov( X, Y) = E( XY) E( X ) E( Y). Note that when X and Y are independent, Cov( X, Y) = 0; however, knowing that Cov( X, Y ) = 0 is not enough to conclude that X and Y are independent. When Cov( X, Y ) = 0, we say that X and Y are uncorrelated. When X and Y are NOT independent we have: Var( X + Y ) = Var( X ) + Var( Y ) + 2CovXY (, ). Applied Statistics: Probability 1-235

What Cov(X,Y) Implies Example. Let X and Y have the joint pmf given by the following table: Probabilities: Y = 1 Y = 2 [ ] [ ] X = 1 0.4 0.1. It is easy to see that P X = 1 = 0.5 = P X = 2. X = 2 0.1 0.4 The distribution of Y is identical, and E( X) = E( Y) = 1.5. NOTICE that, when X = 1, it is most likely that Y = 1. When X = 2, it is most likely that Y = 2. Thus, X and Y have a tendency to "vary" together (in the same direction). By the working formula for covariance, we have Cov( X, Y ) = E( XY ) E( X ) E( Y ) ( ) 2 = 1 0.4 + 2 0.1+ 2 0.1+ 4 0.4 1.5 = 2.4 2.25 = 0.15. The positive result indicates that X large suggests Y large, as expected. If we change the original table: Probabilities: Y = 1 Y = 2 X = 1 0.1 0.4. Then X tends to be small when Y is large, etc. X = 2 0.4 0.1 In this second case, the covariance is 0.15. The absolute value of 0.15 doesn't mean much unless we're comparing with other covariance values or unless we "normalize" using the variance of X and Y. Applied Statistics: Probability 1-236

Correlation Coefficient The correlation (coefficient) of X and Y, ρ( X, Y), is defined as: [ ] ( XY) ( X) Var ( Y) ( XY) cov, cov, ρ( XY, ) = =, Var σ σ provided both variances are nonzero. We know that ρ( XY, ) 1 and that ρ( XY, ) = 1 iff P Y = ax + b = 1 for some a and b. X Y ( Y) In the previous example, Var( X) =.25 and Var =.25; thus, 0.15 ρ( XY, ) = = 0.6 (on a scale that runs between -1 and +1). 0.25 Applied Statistics: Probability 1-237

Student Exercise The random variables X and Y are jointly distributed as follows: 1 1 1 PXY, (1,1) = PXY, (1,2) = PXY, (2,2) = 6 6 6 1 1 1 PXY, (2,3) = PXY, (3,3) = PXY, (3,4) =. 6 6 6 Find the covariance and correlation of X and Y. Applied Statistics: Probability 1-238

Discrete Covariance Example Using MathCad Let values of the RV X and Y be given as 1 2 xv := yv 3 4 := 2 4 5 Let the joint pmf of X and Y be given by: The expected values of X and Y. p := 0.04 0.06 0.16 0.05 0.1 0.09 0.12 0.07 0.16 0.07 0.06 0.02 EX 4 3 i = 1 j = 1 ( ) xv p i ij, := EX 2.32 = EY 4 3 i = 1 j = 1 ( ) := EY = 3.69 yv p j ij, EXY 4 3 i = 1 j = 1 ( ) := CovXY = 0.401 xv yv p i j ij, Applied Statistics: Probability 1-239

Correlation Coefficient (continued) ρ := CovXY VarX VarY VarX:= EXSq EX 2 VarY:= EYSq EY 2 VarX= 1.098 VarY = 1.454 ρ = 0.317 Applied Statistics: Probability 1-240