Applied Statistics for Engineers and Scientists: Basic Data Analysis

Transcription

1 Applied Statistics for Engineers and Scientists: Basic Data Analysis Man V. M. Nguyen Faculty of Computer Science & Engineering HCMUT November 18, 2008

2 Abstract This lecture presents selective topics of Statistics and Probability I from basic concepts to practical applications for undergraduates in * Statistics and Applied Mathematics, * Computer Science major, and * Biological Sciences. Aimed for a joint program with Portland Univ. at HCMUS, HCM City Vietnam.

3 Introduction to Statistics I The aims the course. Randomness and uncertainty are phenomena that engineering students are facing in both their daily life and in professional environments. The course aims are to provide for students in - Business Administration and Econometrics, - Computing and Biological Sciences fundamental methodologies together with major formalizations and techniques of Probability and Statistics. The foundation could help you understand and resolve efficiently theoretical and practical problems possesing randomness by nature.

4 What is Statistics? Statistics is usually defined as a branch of Applied Mathematics, which is in turn a modern discipline of modern mathematics. In practice, modern mathematics is one of the principal tools of statistics. Therefore, to understand statistics, it is a must to have some knowledge of modern mathematics. Brief description of the course. We introduce basic statistical concepts and terminology that are fundamental to the use of statistics in experimental work.

5 Why is Statistics? Look at the following real life situations. 1. A recent newspaper article concluded that smoking marijuana/cigarette at least three times a week resulted in lower grades in college. How do you think the researchers came to this conclusion? Do you believe it? Is there a more reasonable conclusion?

6 Why is Statistics? 2. It is obvious to most people that, on average, men are taller than women, and yet there are some women who are taller than some men. Therefore, if you wanted to prove that men were taller, you would need to measure many people of each sex. Here is a theory: On average, men have lower resting pulse rates than women do. How could you go about trying to prove or disprove that? Would it be sufficient to measure the pulse rates of one member of each sex? Two members of each sex? What information about mens and womens pulse rates would help you decide how many people to measure?

7 Why is Statistics? 3. Suppose you were to learn that the large state university in a particular state graduated more students who eventually went on to become millionaires than any of the small liberal arts colleges in the state. Would that be a fair comparison? How should the numbers be presented in order to make it a fair comparison?

8 Brief description of the course We will learn: the role of statistics in engineering and scientific experimentation, two major subdivision of Statistics: * Descriptive statistics and * Inferential statistics In Inferential statistics, understand the distinction between samples and populations, see how to obtain/provide decision based on key statistics extracted from samples relating sample statistics to populations parameters, and characterizing deterministic and empirical models.

9 Course structure In Part 0: Warming-up Review, we will survey some basic ideas of modern mathematics such as - the concept of functions, - equations, - operation of summations, etc. before we venture into the statistical discussion. Part I mentions methods of Descriptive Statistics. Part II discusses Probability Concepts and Distributions, and in Part III, we touch one of the most important topics of Statistics, Statistical Estimation.

10 A definition of Statistics Statistics is the science of problem-solving in the presence of variability. Two main branches Descriptive statistics is concerned with summarizing and describing numerically a body of data. More importantly, Inferential statistics is the process of reaching generalizations about the whole (called the populations) by examining a portion or many portions (called samples). Why Statistics in Science and Technology? Scientific investigations are important not only in the academic laboratories of research universities but also in the engineering laboratories of industrial manufacturers.

11 Statistics in engineering and scientific experimentation Statistical methods are applied in an enormous diversity of problems in fields as: Agriculture (which varieties grow best?) Genetics, Biology (selecting new varieties, species) Economics (how are the living standards changing?) Market Research (comparison of advertising campaigns) Education (what is the best way to teach small children reading?) Environmental Studies (do strong electric or magnetic fields induce higher cancer rates?) Quality engineering

12 Statistics in Quality engineering A key motivation. Quality and productivity are characteristic goals of industrial and service processes, which are expected to result in goods and services that are highly sought by consumers and that yield profits for the firms that supply them. Urgent demands from Industry and Services. * No longer satisfactory just to monitor on-line industrial processes and to ensure that products are within desired specification limits. * Competition demands that a better product be produced within the limits of economic realities. * Better products: initiated in academic & industrial research laboratories, made feasible in pilot & new-product research studies All of these activities require experimentation, the data collection and the analysis of data rightly.

13 Explicit material of this course After learning and conducting exercises of the course, in BA contexts, or broader in Econometrics, in Software Industry, or in Biology-related sciences as Pharmaceutics, Bio-medicines... you should be able to: 1. Know introductory concepts and methods of descriptive and inferential statistics 2. Understand and practically employ methods, include: a) grouping of data, b) measures of central tendency and dispersion, c) probability concepts and distributions, d) sampling and statistical estimation. e) Statistical hypothesis testing, and f) basic Linear Regression Models. The last two topics will be discussed in the next lecture note, Statistics-I-Slides-part-3and4.pdf.

14 Part O: Warming up- a mathematical review Set Theory. Concept of Set. A set is a collection of things/objects s (called) elements. S = {s : a property P(s) is fulfilled, that s satisfies} E.g., P(s) = national soccer teams s taking part matches in Germany in July If S is a set and x is a member or element of S we write x S. Otherwise we write x S. The set with elements x 1,, x n is denoted {x 1,, x n }. The empty set with no elements is denoted {} or. A set with one element is called a singleton. e.g., {a} is a singleton.

15 Various Number Sets- the naturals and integers Notation. The natural numbers {0, 1, } by N, the set of integers is denoted by Z, the rational numbers by Q, and the real numbers by R. Elucidation. a/ The number 0, 1, 2, 3, and so on are called natural numbers N. If we add or multiply any two natural numbers, the result is always a natural number. However, if we subtract or divide two natural numbers, the results are not always a natural number. b/ To overcome the limitation of subtraction, we extend the natural number system to the system of integers. We do by including, together with all the natural numbers, all of their negatives and the number zero (0). Thus, we can represent the system of integers Z in the form:... -3, -2, -1, 0, 1, 2, 3,...

16 The rationals Q c/... we still can not always divide any two integers. For example 8/(-2) = -4 is an integer, but 8/3 is not an integer. To overcome this problem, we extend the system of integers to the system of rational numbers. We define a number as rational if it can be expressed as a ratio of two integers. Thus, all four basic arithmetic operations (addition, subtraction, multiplication and division) are all possible in the rational number system Q.

17 The irrationals and the reals R There also exits some numbers in everyday use that are not rational number; that is, they can not be expressed as a ratio of two integers. For example 2, 3,, etc. are not rational numbers; such numbers are called irrational numbers. d/ The term real number is used to describe a number that is either rational or irrational. To give a complete definition of real numbers R would involve the introduction of a number of new ideas, and we shall not do this task now. However, it is a good idea to think about a real number in terms of decimals.

18 Set equality and subsets Two sets A, B are equal, denoted A = B if they have the same elements. Sets can be described by properties that the elements satisfy. If P is a property, then the expression {x P} denotes the set of all x that satisfy P. e.g., the set of odd natural numbers can be represented by the following equal sets. {x x = 2k + 1 for some k N} = {1, 3, 5, }. Subsets. The set A is a subset of B, denoted A B, means every element of A is an element of B, i.e. [ for all s, if s A then s B ] Thepower set of a set S, denoted power(s) or P(S) is the set of all subsets of S.

19 Set operations and the algebra of sets Few basic operations on sets are: S R := {x : x in both S and R} (intersection) S R := {x : x S or x R} (union) S \ R := {x : x S but not in R} (difference) Quiz. Determine the set P(X ) P(Y ) if you know X = {a, b, 1} and Y = {u, a, b}, where P(S) is the set consisting all subsets of S.

20 FUNCTIONS GENERAL FUNCTIONS The idea of function is one of the most fundamental concepts in modern mathematics. A function expresses the hypothesis of one quantity depending on (or being determined by) another quantity. For example: (i) bone mass is dependent on age of subject; (ii) height is dependent on races etc. If a function f assigns a value y in the range to a certain x in the domain, then we write: y = f (x) where, in Modern Statistics, x is called independent variable and y is dependent variable (although this terminology is sometimes controversial.) In formal mathematics, we refer to the possible values of x as domain, and the possible values of y as range.

21 FUNCTIONS LINEAR FUNCTION A linear function is usually of the form: y = a + bx [1] where a is called the intercept (when y = 0) and b is called the slope which represents the rate of change in y with respect to the change in x by one unit. In a two dimensional space Ox, Oy, for any two given points (x 1, y 1 ) and (x 2, y 2 ), the slope can be determined by the relation: b = change iny change in x = y 2 y 1 x 2 x 1

22 LINEAR FUNCTIONS However, for a series of n > 2 points of x and y, we could extend this formula into a series of n simultaneous equations and estimate a and b by the Method of Least Squares which is readily available in several statistical softwares. MANY LINEAR FUNCTIONS. If there are two lines, say, y = a 1 + b 1 x and y = a 2 + b 2 x, then we can make a number of observations: (a) The two lines are parallel if and only if its slopes are equal, i.e. b 1 = b 2 (b) On the other hand, if b 1 b 2, then the lines are not parallel. In Statistics, we call this phenomenon interaction.

23 LINEAR FUNCTIONS with many independent variables y = a + bx [1] Equation [1] could be expanded further to include more than one x variable. For instance, bone mineral density (BMD) is strongly dependent on age, denote AGE and weight, denote WEIGHT, we may write this statement as: BMD = a + b AGE + c WEIGHT where a, b and c are estimated constants. Thus, for every value of AGE and WEIGHT, a BMD could be estimated. We will examine this function in the context of regression analysis later in this series.

24 FUNCTIONS QUADRATIC FUNCTIONS We often come across situations where the functional relationship between the dependent variable (y) and the independent variable (x) is not linear, but a curved one. One of the popular functions is the quadratic function, which is of the form: y = f (x) = ax 2 + bx + c [2] where a, b, c are constants.

25 COMMON MATHEMATICAL SYMBOLS Of course, a learning of mathematics can not be complete without being able to communicate in its language. Here are some of the commonly-used symbols in mathematics which you are required to be conversant with: SYMBOL MEANING Note Belong to relation Not belong to relation imply; it follows that logic operator Implied by logic operator Equivalent to; if and only if logic operator R Real numbers set notation For every quantifier There exists quantifier

26 Part I: Descriptive statistics Numerical measures of location (e.g. Central tendency) Measures of Dispersion (Variability) Measures of Association Between Two Variables

27 Part I: Descriptive statistics Practical motivation I. Cash flow management (CFM) is a key critical activity of a firm named S in HCMC. S uses independent representatives to sell the products to department stores, gift shops on the whole city. Main component of CFM is the analysis and control of accounts receivable. How: to measure the average age (time duration) and value of outstanding invoices, and to make meaningful decisions based on those statistics (i.e. numerical measures).

28 Concrete Data and Demand a) Recent summary of accounts receivable shows the following descriptive statistics: Mean: 40 days Median: 35 days Mode: 31 days Furthermore, b) Critical and concrete demands for S s success are: the average age for outstanding invoices should not exceed 45 days, and the dollar value of invoices more than 60 days old should not exceed 5% of the total value of all accounts receivable. Question: how should you connect/employ/explain the statistical summary a) rightly to answer the management s concern that whether b) is satisfied?

29 Numerical measures of location: Central tendency We employ three basic measures to describe central tendency of data: 1/ Mean 2/ Median 3/ Mode

30 Central tendency- Mean Mean or the average value. The sample mean x describes the central tendency of a sample of size n: x = n i=1 x i n * If the number of elements (observations/items) of the entire population is N, the population mean is µ = N i=1 x i. N Other way to measure Central tendency? Yes!

31 Central tendency- Median The median is the value in the middle when the data x 1,, x n of size n are sorted in ascending order (smallest to largest). - If n is odd, then the median is the middle value. - If n is even, the median is the average of the two middle values. Example 0. For instance, find the mean and median of two data sets, representing monthly salaries of IT engineers in the US: x = [2710, 2755, 2850, 2880, 2880, 2890, 2920, 2940, 2950, 3050, 3130, 3325], and x = [2710, 2755, 2850, 2880, 2880, 2890, 2920, 2940, 2950, 3050, 3130, 10000]. Then the sample mean of data x is x = P n i=1 x i n = What else do you think?

32 Central tendency- Median Since n = 12 is even, the middle two values are 2890 and 2920; the median of data x, denoted Med(x) is the average of these values: Median = Med(x) = = Remark: Whenever a data set contain extreme values, the median is often the preferred measure of central location than the mean. Sample data x consists of extreme values such as $USD10000, then the new sample mean is n x i=1 = x i = 3496 >> 2940 = the old mean of data x n But the median is unchanged, reflecting better central tendency: Med(x) = Med(x ) = = 2905.

33 Central tendency- Mode Frequency distributions. In any sample data x of size n, the number of observations n A of a particular value A is its absolute frequency distribution. A relative frequency distribution of A is n A n. * A histogram is a bar graph of a frequency distribution. * The mode of sample data x is the value that occurs with greatest frequency Example 1. A computing student An received the following grades in subjects of his first semester 2007: y = [6, 7, 6, 8, 5, 7, 6, 9, 10, 6]

34 Mode Example Grades Absolute frequency Relative frequency Size of data = n = 10 relative frequency = n A /n Table: Frequency distributions of An s grades Hence, the mode of our grade data x is Mode = 6, its absolute frequency is 4, its relative frequency is 0.4.

35 Mode Bimodal and Multimodal If the data consists of exactly two modes, we say the data is bimodal; if more than two modes, the data are multimodal. In practice, only singlemodal or bimodal mode are interested, since they indicate important measure of central tendency (location) for qualitative data. Soft Drink Absolute frequency Coke classic 19 Diet Coke 8 Twister 5 Pepsi 13 Sprite 5 soft drinks A absolute frequency = n A = 19, 8,... size of data = n = A=Sprite, na = 50 Table: Frequency distributions of purchased soft drink

36 Numerical measures of location: Spreading tendency We employ basic measures to describe spreading tendency of data: a/ Percentiles and b/ Quartiles, a specific case of percentiles

37 Percentiles A percentile provides information about how the data are spread over the interval from the smallest value to the largest value. Given a sample data x of observations, formally we have Definition The pth percentile is a value m x such that at least p percent of the observations are less than or equal m, and at least 100 p percent of the observations are greater than or equal this value.

38 Percentiles Example 2. Universities frequently report admission test scores in terms of percentiles. Suppose an applicant K obtain a raw score m = 54 (on the scale 100) of an admission test. Would we know his chance to pass the exam in comparison with his friends? YES, if we know how many percent the value m corresponds to on the set of all applicant scores! If the value m = 54 corresponds to, say 70th percentile (of the whole students scores), we know that approximately 70% of students scored lower than applicant K and approximately 30% of students scored higher than his score.

39 Percentiles- Mathematical formula Our concern now is: Given p%, find the value m by locating its position (index) in the observed sample data x of size n. Calculating the pth percentile. In 3 steps 1. Arrange the data x in ascending order to obtain the sorted sample data y 2. Compute an index i 3. Locate m from i: i = ( p ) n 100 If i is not an integer, round up to the ceiling i =: j (the smallest integer that bigger than i). Then m = y[j]. If i is an integer, then m = (y[i] + y[i + 1])/2.

40 Percentiles- Examples Example 3. Let us determine the 85th percentile for the salary data given in Example 0. x = [2710, 2755, 2850, 2880, 2880, 2890, 2920, 2940, 2950, 3050, 3130, 3325] - Arrange x to get y = x (since the data is sorted already). - Compute an index ( p ) ( ) 85 i = n = 12 = 10.2 N Round it up to the ceiling i =: 11 (the smallest integer that bigger than 10.2). Then m = y[11] = * The 50th percentile for the same data is similarly computed as: i = ( p ) n = 100 ( ) 12 = 6 N So m = (y[6] + y[7])/2 = ( )/2 = 2905.

41 Quartiles= 25% Often desirable to divide data into four parts. Each part contains approximately one-fourth, or 25% of the observations. The division points are called the quartiles, and are defined as: Q 1 = Q 2 = Q 3 = first quartile, or 25th percentile second quartile, or 50th percentile (also the median) third quartile, or 75th percentile Table: Three Quartiles E.g., our salary data sample, given in Example 0, is divided into four parts (Q 1 and Q 3 should be computed as we did for the 85th and 50th percentile!) [2710, 2755, 2850, 2880, 2880, 2890, 2920, 2940, 2950, 3050, 3130, 3325] Q 1 = 2865 Q 2 = 2905 Q 3 = 3000

42 Measures of Dispersion (also called Variability) Practical motivation II. You are purchasing agent of Maximart in HCMC, You regularly place orders with two distinct suppliers in good and luxurious ceramic, say Minh Long ceramic, denoted M and another foreign brand, denoted F. After several months of operation, you find that the mean number of days required to fill orders is µ =10.3 days for both suppliers. Your concern is: Do the two suppliers M and F demonstrate the same degree of reliability in terms of making deliveries on schedule? Which supplier would you prefer?

43 Measures of Dispersion Working Days Supplier M Supplier F Table: Frequency distributions of the days needed to deliver products

44 Measures of Dispersion In total, Minh Long ceramic provides sum of delivery days = = = 103 days and the foreign brand F got ( ) = = = 103.days. Obviously µ M = µ F =10.3 days. But Note that: * the 7 or 8 deliveries shown for the Minh Long ceramic M are viewed as favorably, meanwhile * the slow 13- to 15- deliveries for the foreign brand F could be disastrous in terms of keeping your business run smoothly (workforce busy, big selling during peak-season)

45 Measures of Dispersion We understand dispersion = how far the extreme data values is from the mean! Although Minh Long ceramic and the foreign brand F the same mean µ M = µ F =10.3, but F has dispersion > = Minh Long dispersion. In that sense, the foreign brand F has large dispersion, so less reliable (than Minh Long firm) in terms of making deliveries on schedule! How should we quantify this concept?

46 Measures of Dispersion Basic measures for Dispersion- Variability are: Range, isn t it? Interquartile Range? Variance and Standard deviation: the right measure!

47 Measures of Dispersion- Range Range= The largest minus the smallest. That is Range of the data x = [x 1,, x n ] is Max(x) Min(x). For the salary data given in Example 0 x = [2710, 2755, 2850, 2880, 2880, 2890, 2920, 2940, 2950, 3050, 3130, 3325] the range of the data is = 615 For the extreme one x = [2710, 2755, 2850, 2880, 2880, 2890, 2920, 2940, 2950, 3050, 3130, 10000] the range of the data now is = 7290!. Hence the range is not so descriptive indicator of dispersion!

48 Measures of Dispersion- Interquartile Range The Interquartile Range (IQR) is the range of the middle 50% of the data: Interquartile Range = Q 3 Q 1. This indicator overcomes the dependency on extreme values. E.g., from [2710, 2755, 2850, 2880, 2880, 2890, 2920, 2940, 2950, 3050, 3130, 3325] Q 1 = 2865 Q 2 = 2905 Q 3 = 3000 the Interquartile Range of the data is Q 3 Q 1 = = 135.

49 Measures of Variability Variance and Standard deviation. The Variance of a data is a measure of variability that utilizes all the data. *The sample variance of a data x of size n Var(x) = σ 2 x = n i=1 (x i x) 2 n 1 = n i=1 x i 2 nx 2, n 1 * the sample standard deviation σ x = Var(x) *The population variance of a population of size N, with µ = x: σ 2 = N i=1 (x i µ) 2. N

50 Variance From Practical motivation II Working Days Supplier M Supplier F Table: Frequency distributions of the days needed to deliver

51 Variance to judge Reliability Supplier M, provides the data of length n = 10 x M = [9, 10, 10, 10, 10, 10, 11, 11, 11, 11] and the sample variance Var(M) n i=1 x i 2 nµ 2 = ( ) n = = 0.45 Supplier F, similarly provides the data with the same length x F = [7, 7, 8, 10, 10, 10, 11, 12, 13, 15] and the sample variance Var(F ) ( ) So Supplier F is less reliable than Supplier M. 9 = = 6.67

52 Standard deviation Coefficient of Variation *The sample standard deviation is σ x : σ M = Var(M) = 0.45 = 0.67; σ F = Var(F ) = 6.67 = 2.58 * Coefficient of Variation V measures relative dispersion, i.e. compares how large the standard deviation is relative to the mean: ( ) σ V = µ 100 % for populations and V = ( ) σx 100 % for samples x µ x

53 Coefficient of Variation In Practical motivation II, the coefficient of variation of the Supplier M is ( ) σm V M = 100 % = % = 6.5% µ M 10.3 the coefficient of variation of the Supplier F is ( ) σf V F = 100 % = % = 25% µ F 10.3 Hence Supplier F is less reliable than Supplier M with a ratio of almost 4 times!

54 Measures of Association Between Two Variables We now consider the relationship between variables. Two most important concepts as descriptive measures for this task are: Covariance measures the co-movement of two separate distributions and Correlation. Let us start by looking at a Practical motivation III. [Sale trend] A manager of a sound equipment store in Hanoi wants to determine the relationship between the number x of weekend television commercials shown, and the sales y at his store during the following weeks. Sample data of size n = 10 has been recorded in 10 weeks, shown in Table 1.

55 Association Between Two Variables Week Number of commercials (x) Sales Volume y ( $100s) Table: Sample data for the sound equipment store

56 Association Between Two Variables Covariance- the 1st descriptive measure of association between 2 variables X, Y. For a sample data of size n with the observations (x, y) = {(x 1, y 1 ),, (x n, y n )} the sample covariance is defined as i (x i x)(y i y) s xy = n 1 In our example we have x = 30/10 = 3 and y = 510/10 = 51, and the sample covariance s xy = 99/9 = 11. Obviously for the entire population, the population covariance is i (x i µ x )(y i µ y ) σ xy = N A positive covariance indicates that X and Y move together in relation to their means. A negative covariance indicates that they move in opposite directions.

57 Association Between Two Variables Remark that (x i x)(y i y) > 0 the point (x i, y i ) quadrants I &III (x i x)(y i y) < 0 the point (x i, y i ) quadrants II &IV As a result, 1. s xy > 0 indicates a positive linear association (relationship) between x and y 2. s xy 0: x and y are not linearly associated 3. s xy < 0 then x and y are negatively linearly associated In our example, s xy = 99/9 = 11 indicating a strong positive linear relationship between the number x of television commercials shown and the sales y at the multimedia equipment store. But the value of the covariance depends on the measurement units for x and y. Is there other precise measure of this relationship?

58 Association Between 2 Variables- Correlation Correlation coefficient- the second descriptive measure r xy = s xy s x s y Application of Correlation coefficients.

59 Part II: Probability Concepts and Distributions What is probability? Experiments. An experiment E is a specific trial/activity (of scientists, human being) whose outcomes possess randomness. Simple examples are: Coin throwing- throw a coin, random outcomes are head (H) or tail (T) Temperature measurement- observe continuously temperatures at noon in HCMC in 10 days of Summer 2007, random outcomes are recorded by the list [34, 29, 28, 32, 31, 32, 30, 31, 30, 33] (in Celsius degree).

60 Probability distributions Three basic concepts 1. Sample space S- set of all possible outcomes. Ex. 1: Coin throwing S = {H, T } 2. Events- is subset A of sample space S: A S. Usually we include all events into a set, called the event set Q := {A : A S and is an event}. When an experiment E is performed and an outcome a is observed we say that event A has occurred if a A. 3. Probability distribution (probability function)- a map P from Q to the interval [0, 1]: P : Q [0, 1], A Q P(A) = Prob(A) = probability or chance that the event A occurs.

61 Axioms of Probability Theory (A. Kolmogorov, 1933). A1. Probabilities are nonnegative, 0 P(A) 1, where P(A) := Prob(A) A2. The sample space S has probability 1, that is P(S) = 1 A3. Probabilities of disjoint events A, B, A B = : P(A B) = P(A or B) = P(A) + P(B), in which * A, B S are events, * the sample space S is formed from a random experiment E.

62 Axioms of Probability Theory More general, we have P(A 1 A 2 A m ) = P(A 1 ) + P(A 2 ) + + P(A m ) for m mutually disjoint events, i.e. A i A j = when 1 i j m. The so-called countably additive of probabilities is a generalization: P( A i ) = i=1 P(A i ). i=1

63 Assign probabilities to events Possible ways to assign probabilities to events: a) Frequency interpretation: probability is based on history (data obtained or observed). For any event A S, its probability is the relative frequency P(A) = Prob(A) = s A P(s). Example 2: If temperatures in Temperature measurement experiment above are the list [34, 29, 28, 32, 31, 32, 30, 31, 30, 33] (in Celsius degree), and define event A = temperatures higher than 30 o. The sample space S is the above list, and if we suppose the chance to see any temperature in S is the same, then P(A) = s A P(s) = 6 10.

64 Assign probabilities to events b) Classical interpretation: compute probability in question from other known probabilities using basic formulas, based on the assumption that all outcomes have equal probability. Apply when the sample space S holds S = n <, then for any event A S, its probability is the fraction found by counting methods: P(A) = Prob(A) = A S. Example 3: In Coin throwing, S = {H, T }, P(H) = P({H}) = 1 2 = P(T ). c) Subjective interpretation: use a model, can hypothesize about phenomenon possessing randomness Example 4: P(survival after a serious surgery) is estimated by the doctor

65 Probability of a single event Computing Rule. For finite sample spaces, we assume S = {s 1, s 2,, s n }, define p i = P(s i ) then p i 0, and n p i = 1. i=1 Fact If all outcomes have equal probabilities, then P(A) = Prob(A) = n A n, where n A = A. Example On a single toss of a die, we get only one of six possible outcomes 1,2,3,4,5 or 6; then the sample space S = {1, 2, 3, 4, 5, 6}, and p i = P(i) = 1/6, for all i = 1..6

66 Multiple events Rule of addition What are mutually exclusive and not mutually exclusive events? Two events A and B are mutually exclusive if A B =, i.e. the occurrence of A precludes the occurrence of B. Then * For mutually exclusive events: P(A and B) = P(A B) = 0 P(A B) = P(A or B) = P(A) + P(B). * How about nonmutually exclusive events, i.e. P(A and B) = P(A B) 0?

67 Multiple events Rule of addition * For nonmutually exclusive events, P(A and B) = P(A B) 0: P(A B) = P(A or B) = P(A) + P(B) P(A and B). Example If the die is fair, when tossing of a die, p i = P(i) = 1/6. The probability of event Z = getting 2 or 3 or 4 is P(Z) = P(2 or 3or 4) = s Z P(s) = 3/6 Given that event B happened, what is the probability that event A also happened?

68 Conditional probability Given that event B happened, what is the probability that event A also happened? Brainstorming thought: narrow down the sample space to the space where B has occurred. (aimed to the comparison between A B and B). The formula: Conditional probability of Event A given Event B P(A B) = P(AB) P(B) P(A B) =. (0.1) P(B) As a result, the joint probability of two events A and B is P(A and B) = P(A B) = P(B) P(A B).

69 Bayes Theorem Note also that P(B) P(A B) = P(A) P(B A) [since LHS = P(AB) = P(BA) = RHS ] Theorem We always have the following, for any pair of events A, B: P(A B) = P(A) P(B A) P(B) (0.2)

70 What are dependent events? Events A and B are dependent if the occurrence of one is connected in some way to the occurrence of the other. Then the joint probability of A and B is P(AB) = P(A) P(B A) = P(B A) P(A) or also P(AB) = P(BA) = P(A B) P(B) (0.3) ( since P(BA) = P(B) P(A B) = P(A B) P(B)) How about the independent case?

71 What are independent events? Events A and B are independent if the occurrence of A is not connected in any way to the occurrence of B. Then P(A B) = P(A) and P(B A) = P(B) (0.4) Rule of multiplication. The joint probability of two independent events A and B is so due to Eq. 0.4 P(AB) = P(A B) P(B) by Equation 0.3 P(AB) = P(A) P(B).

72 FOR THOSE WHO LIKE FORMULAS Denote events or outcomes with capital letters A, B, C, and so on. If A is one outcome, all other possible outcomes are part of A complement = A c. P(A) is the probability that the event or outcome A occurs. Rule 0: For any event A, 0 P(A) 1. Rule 1: P(A) + P(A c ) = 1 or P(A c ) = 1 P(A) Rule 2: If events A and B are mutually exclusive, then P(AorB) = P(A) + P(B) Rule 3: If events A and B are independent, then P(AB) = P(A) P(B) Rule 4: If the ways in which an event B can occur are a subset of those for event A, then P(B) P(A).

73 Part II: Probability distributions A little introduction to Random Variable. Definition A random variable X is a function from a set - sample space S to the reals R. For any b R, the preimage is an event, we understand A := X 1 (b) = {w : X (w) = b} S Prob{X = b} := Prob(A) = w A Prob(w). For finite set - sample space S then obviously Prob{X = b} := Prob(A) = A S.

74 What is a Probability Distribution? The probability distribution of a random variable describes how probabilities are distributed over the (range) values of the random variable. For a discrete random variable X, its probability distribution is the probability function f (x) = Prob({X = x}, provides the probability that the r. v. X receives a particular value x Range(X ). We must have f (x) 0 and x Range(X ) f (x) = 1

75 Discrete Probability Distribution Example (Die tossing) On a single toss of a die, we get only one of six possible outcomes 1,2,3,4,5 or 6; then the sample space S = {1, 2, 3, 4, 5, 6}. Define the random variable X : S R + to be the identity function Id, that is X (i) = Id(i) = i, i S. The probability distribution associated with X is the probability function f (i) = Prob{X = i} = Prob{X 1 (i)} = X 1 (i) S = 1/6, for i = 1..6 Often for discrete distributions, we write p i = f (i) = P(i).

76 Practical motivation Why study Probability Distributions? Citibank in HCMC makes available financial services, including checking and saving accounts, loans, mortgages, insurance and investment services. These complicated activities have been done through a Citibanking system consisting of many modules, like - ATMs, or more advanced, - the Card Banking Centers (CBCs). What would be the services available at CBSs? and How?

77 Motivation - Card Banking Centers What would be the services available at CBSs? and How? Each CBC operates as a waiting line system with randomly arriving customers seeking service at one of the ATMs. CBC capacity studies are used - to analyze customer waiting line and - to determine whether additional ATMs are needed. Data collected by Citibank showed that the random customer arrivals followed a probability distribution known as the Poisson distribution. Using the Poisson distribution, Citibank can compute probabilities for the number of customers arriving at a CBC during any time period and decisions concerning the number of ATMs needed.

78 Part IIA: Useful Discrete probability distributions Discrete probability distributions, such as the one used by Citibank are the topic of this section. Discrete random variable X is the one that has a finite range set. The discrete probability distribution f (x) must fulfill: f (x) 0, and x Range(X ) f (x) = 1. Besides the Poisson distribution, other important discrete distributions are Bernoulli and Binomial.

79 Useful Discrete probability distributions 1/ Bernoulli Distribution B(p). This distribution describes a random variable that can take only two possible values, i.e. X = {0, 1}. The distribution is described by a probability function p(1) = P(X = 1) = p, p(0) = P(X = 0) = 1 p for some p [0, 1]. It is easy to check that E(X ) = p, Var(X ) = p(1 p). Notice that we used the following concepts of E(X ) and Var(X ).

80 Expectation and Variance- the Discrete Case Expectation. The expectation operator defines the expected value (or average behavior) of a random variable X as E(X ) = x Range(X ) P(X = x) x, where P(X = x) = P(X 1 (x)); and X 1 (x) = {w : X (w) = x} S. Since, the r.v. X : S R is an assignment of values to the points in sample space S, you could also think E(X ) = w S P(w) X (w) equivalently. Variance of a random variable X is Var(X ) = E[(X E(X )) 2 ].

81 Useful Discrete probability distributions 2/ Binomial distribution B(n, p). This distribution describes a random variable X that is a number of successes in n independent Bernoulli trials with probability of success p. In other words, X is a sum of n independent Bernoulli r.v. Therefore, X takes values in X = {0, 1,..., n} and the distribution is given by a probability function ( ) n p(k) = P(X = k) = p k (1 p) n k. k It is easy to check that E(X ) = np, Var(X ) = np(1 p).

82 Binomial process- a well-known example Let H and T be two outcomes of an experiment as Coin throwing, with sample space S Coin = {H, T }, and in general, the occurrence likelihoods P(H) = P({H}) = p; P(T ) = 1 p. Assume that we perform n trials, called Bernoulli Trials, of the experiment and each trial is independent of the others. For example, the event H on the first trial is independent from the event H on the second trial. So both events have probability p. The sample space S now can be represented by S = {x 1 x 2 x n x i S Coin }. Since the trials are independent, we assign probabilities to the points in S by P(x 1 x 2 x n ) = P(x 1 ) P(x 2 ) P(x n ).

83 Binomial process- example Question: What is the probability of exactly k successes in n trials of a binomial experiment where P(success) = p and P(failure) = 1 p? Let X be the sum of n independent Bernoulli r.v., then X takes values in {0, 1,..., n}. The answer, therefore is X = k {0, 1,..., n}, that means exactly k successes in n trials. By combinatorial reasoning, the binomial distribution X = Bin(n, k) is given by a probability function ( ) n p(k) = P(X = k) = p k (1 p) n k. k

84 Binomial process- example Example/Quiz. Two fair dice are tossed. If the total is 7, we win $100; if the total is 2 or 12, we lose $100; otherwise we lose $10. What is the expected value of the game? Reminder: if V : S R is an assignment of values to the points in sample space S, then E(V ) = w S P(w) V (w). [End of Week 4]

85 Part IIB: Continuous probability distributions A random variable X is a continuous random variable iff its range Range(X ) is a continuous set (as R or its subsets) Continuous distributions. A continuous probability distribution refers to the range Range(X ) of all possible values that a continuous random value X can assume, together with the associated probabilities P(X t). The probability distribution of a continuous random variable X is called probability density function (pdf), or simply a probability function, denoted f X (t). Key continuous probability distributions include: - the normal distribution and - the exponential distribution.

86 Continuous probability distribution- determination The distribution function or cumulative distribution function (cdf) of X is the function defined by F X (t) = P(X t), < t < Let X be a r.v. with cdf F X (t). We say that X is a continuous random variable only iff its range Range(X ) contains an interval (either finite or infinite) of real numbers. The cdf F X (t) must have derivative df X (t) dt =: f (t) This function is defined almost every where and is piecewise continuous

87 Probability density and Cummulative function Thus, if X is a continuous r.v., then P(X = t) = 0. The probability function f (t) = df X (t) dt - is called the probability density function of X, and - is given by a smooth curve C such that * the total area (probability) under the curve is 1, * but the probability of a specific value P(X = t) = 0 is 0.

88 Continuous probability distributions- Properties However, the probability that a continuous random variable X assumes any value within a given interval say, [a, b] is measured by the area under the curve C within that interval. In other words, the probability of the event a X b is: Prob(a X b) = Prob(a < X < b) = b a f (x)dx The mean µ of a continuous probability distribution with pdf f (x) is given by µ = E(X ) = x f (x)dx, and the variance Var X = σ 2 = x Range(X ) x Range(X ) [x µ] 2 f (x)dx.

89 Important continuous probability distributions Two key ones: the normal distribution and the exponential distribution. Normal distribution found to be useful in numerous areas like Medical science, Petroleum engineering, Enviromental, Biological and Ecological sciences... Exponential distribution found to be useful in numerous other areas like mass manufacturing, mechanical and electronic engineering...

90 (a) Normal distribution- the first continuous one If X is a normal random variable, the normal distribution is f (x) = 1 1 σ 2π e 2 [ x µ We write x N(µ, σ 2 ), where f (x) = height of the normal curve e = constant 2.71 π = constant 3.14, µ is the mean, and σ 2 is the variance of the normal distribution. σ ] 2, < x <, µ R, σ 2 > 0. (0.5)

91 (b) Exponential distribution- the second continuous one The exponential distribution is f (x) = f (x; λ) = λe λ x, x 0 (0.6) where λ > 0 is a constant. The mean and the variance are µ = 1 λ ; σ2 = µ 2 = 1 λ 2. The exponential cummulative distribution function (cdf) is F (a) = Prob(x a) = a 0 f (t)dt = a 0 λe λ t dt = 1 e λa, a 0. Reminder: the relation between pdf f and cdf F is f (t) = df X (t) fdt = F (t)! dt

92 Practical uses of Exponential distribution The exponential distribution is widely used in the field of Reliability Engineering, such as a model of the time to failure (TTF) of a component or system. In that case, the parameter λ is called the failure rate of the system, and the distribution s mean µ = 1 λ is called the mean time to failure (MTTF). Example An electronic component of in an airborne radar system has a useful life X described by an exponential distribution with failure rate 10 4 /h, that is λ = Compute MTTF for this component. The mean time to failure for this component is its expected life which is the mean µ = 1 λ = 104 = 10000h.

93 Part IIB: Normal distribution- Properties The normal curve (of the probability function f (x)) is - bell-shaped, - symmetrical about the mean, and - when we move further away from the mean in both directions, the normal curve approaches the horizontal axis. Quantitatively description of the three properties is given by three most useful cases: The area of A 1 = {x : x µ σ} takes 68.26% of the whole area (probability 1) The area of A 2 = {x : x µ 2σ} takes 95.44% of the whole area The area of A 3 = {x : x µ 3σ} takes 99.74% of the whole area

94 Normal distribution- Computation Observation: The cdf of a normal random variable X N(µ, σ 2 ), given by F (a) = Prob(x a) = a f (x)dx, where f (x) = 1 1 σ 2π e 2 [ x µ can not be evaluated symbolically! Can only compute probabilities if we use the z-transformation z = x µ σ, then f (z) N(0, 1)! Practically we can employ Listed Tables to extract probabilities concerned. σ ] 2

95 Normal distribution- Computation 1 Key facts: (see Table 6.1 at 233, Buz Stat. 1) x = µ z = 0; x = µ + σ z = 1; x = µ + kσ z = k; x = µ σ z = 1.96; x = µ σ z = 1.645; x = µ σ z = 2.576;

96 Normal distribution- Computation 2 For instance, a normal random variable X N(µ, σ 2 ) with µ = 10, σ = 2. Due to then z = x µ σ, P(10 X 14) = P(0 z 2) Table 6.1 at 233, Buz Stat. 1 provides that: z = 2 resulting in probability So P(10 X 14) =

97 Part III: Statistical Estimation Interval Estimation Population Mean- σ known case Basic of Sampling Distribution of the sample mean x Interval Estimation means an interval estimate of a (population) parameter p where two statistics L, R say, round possible values p up to a probability L p R

98 What is Interval Estimation? An interval estimate of a (population) parameter p: the interval between two statistics L, R say, that includes the true value of the parameter with some probability. L p R E.g., formally, an interval estimator of the mean parameter p := µ (an important parameter, most used in Statistics) consists of three components: two statistics L, R and the confidence coefficient or level 1 α! Three components and the concerned parameter must satisfy: P{L µ R} = 1 α = β (0.7)

99 Part IIIA: Interval Estimation Example P{L µ R} = 1 α = β (0.8) The interval [L, R] = L µ R is called a 100(1 α)% = 100β% confidence interval for the unknown µ E.g., if µ is the mean of most productive age of human being, α = α = β = 0.9, L = 35, R = 45 then the interval [35, 45] = 35 µ 45 is called a 100(1 α)% = 100β% = 90% confidence interval for the population mean µ.

100 Why do we study Interval Estimation? In Statistics, a point estimate of a population parameter is a sample statistic used to estimate that population parameter. But a point estimator cannot be expected to provide the exact value of the population parameter. Instead an interval estimate is often computed by adding and subtracting a margin of error, to the point estimate: An interval estimate = Point estimate ± Margin of error For example, an interval estimate of the mean µ [L, R] = ˆµ ± Margin of error f (α) means P{L µ R} = 1 α

101 Why do we study Interval Estimation? [L, R] = ˆµ ± Margin of error f (α) means P{L µ R} = 1 α = confidence level Mathematically, an interval estimate refers to a range of values together with the probability, called confidence level, that the interval includes the unknown population parameter L ˆµ µ R L = ˆµ f (α), R = ˆµ + f (α) and the probability that µ [L, R] is 1 α. Here f (α) is the radius measuring how large the bounding area of µ is!

102 Part IIIA: Population Mean- σ known case We first consider Estimation of population mean in the case of population variance σ 2 known. Specific practical application in business. Consider the monthly customer service survey conducted by Sicasys.com, a start-up biological applications oriented firm at HCMC. Key facility: the firm provides a website for accepting customer orders and providing follow-up services over the Internet.

103 Aim of statistical inference Sicasys.com. The firm s quality assurance team uses a customer service survey to measure satisfaction of customers with its website and online customer service. Statistically How? The team sends a questionnaire each month to a random sample of customers who placed an order or requested service during previous months. Key Aim of statistical inference. To draw conclusions or make decisions about a population based on a random sample selected from the population.

104 What are components/questions of the questionnaire? We rating satisfaction of customers by formulating/asking questions: 1. how ease of placing orders? 2. how timely delivery? 3. how accurate order filling? And 4. how efficient technical advices? Summarizing data, how? Compute an overall satisfaction score x from 0 to 100. In the most recent month, a sample data of n = 100 customers are surveyed, a sample mean x = 82 of customer satisfaction is then computed. * Will assume that random samples are used in the analysis

105 What are Random Samples? Definition Random samples A sample x 1, x 2,..., x n of size n is random iff the observations {x i } are independently and identically distributed (i.i.d.). This concept is applicable for both finite or infinite populations, and where sampling is performed with replacement. Sampling without replacement In sampling without replacement from a finite population of N items, we say that a sample of n items {x i } is a random sample iff each of the ( N n) possible samples has an equal probability of being chosen.

106 A bit of Sampling for Statistical Inference Sampling from a finite population. A simple random sample of size n (from a finite population of size N) is a sample selected such that each possible sample of size n has the same probability of being selected. Sampling without replacement, from a finite population of N items, is the sampling procedure used most often; and when refer to simple random sampling, we assume that the sampling is without replacement. In our instance, N = the number of all customers of Sicasys.com, and n = 100 (the number of customer to whom we sent the questionnaire in the last month).

107 The core equality of Interval Estimation Remind the key proposed equality to estimate some interest parameter: An Interval Estimate = Point estimate ± Margin of error, An Interval Estimate of µ = x ± some error, Here, the sample mean x provides a point estimate of the population mean µ (of satisfaction scores) for the population of all Sicasys.com customers. From the survey for many months (all sampling months), we consistently and approximately found an estimate 20 for the standard deviation, i.e. σ = 20.

108 Case of σ known- Point estimator usage Key observation- assumption: More over, the historical data (from the survey for all sampling months) show that the population of satisfaction scores is normally distributed, with a standard deviation σ = 20. Hence σ is known. By the proposed equality Interval Estimate of µ = x ± some error margin, how could we determine - the Margin of Error, and as a result - the Interval Estimate of the population mean µ by its Point estimate x? The Case of σ unknown is far more complicated, then will be discussed in Part IV: Hypothesis Testing.

109 Main Proposition 1 Proposition Given the population standard deviation σ or its estimate σ x (by Eqn. 0.9), and provided that the population is normal or that a random sample has size at least 30 (see Fact 11), we can find the 95% confidence level for the unknown population mean as P(x 1.96σ x < µ < x σ x ) = 0.95 or P( µ x < 1.96σ x ) = 0.95 In our instance of Sicasys.com, (why we know σ x = 2?) P( µ 82 < 3.92) = 0.95.

110 Main Proposition 1- the general case With the Margin of Error e = z α/2 σ n, the general form of an Interval Estimate of a population mean µ with known standard deviation σ, with the confidence coefficient/level 1 α is or [x z α/2 σ x, x + z α/2 σ x ] µ P( µ x < z α/2 σ x ) = 1 α where z α/2 is the z value providing an area α/2 in the upper tail of the standard normal probability distribution.

111 Proof of Main Proposition 1 1/ Basic of Sampling Distribution of the sample mean x a) the sample mean x is a random variable? b) find the sample standard deviation σ x 2/ Central Limit Theorem- Formal form 3/ Computing the Margin of Error from σ and n