Statistics with Matlab for Engineers

Transcription

1 Statistics with Matlab for Engineers Paul Razafimandimby 1 1 Montanuniversität Leoben October 6, 2015 Contents 1 Introduction 1 2 Probability concepts: a short review Basic definitions Independence concept Parameters of a random variable Frequently used distribution Two important Limit Theorems in probability Inferential Statistics: Parameter Estimation Sampling concepts and distributions Parameter Estimation Introduction Roughly speaking, Statistics is the science of gaining knowledge from numerical and categorical data. It deals with the collection, analysis, interpretation and drawing conclusion from collected data. A population is basically the collection or set of all individuals under consideration in a statistical study. A sample is a part of the part or subset of the population from which information is collected. One can distinguish two branches of Statistics. 1. Descriptive Statistics is the methodology of organizing and summarizing information. This branch of statistics deals with the construction of the distribution of the sample/population (calculation of frequency), the visualization of data (graphs, charts, histograms), and the calculation of various descriptive measures (averages, standard deviation, percentiles). 1

2 2. Inferential Statistics is a science of drawing and measuring the reliability of conclusions about population based on information collected from a sample of population. Inferential statistics deals with point estimation, interval estimation and hypothesis testing which rely very much on probability theory. Descriptive and inferential statistics are interrelated in that before inferring conclusion from the statistical investigation it is necessary to organize and summarize the information collected from a sample. Moreover, the knowledge from the descriptive statistics usually suggests the appropriate method or approach to be used for the inferential statistics. In a statistical study, either it is a descriptive or inferential, the property of a population is usually described by numerical parameters. In many cases these parameters are unknown and a statistical study are very often oriented to the investigation/estimation of these parameters. For this purpose, one usually uses statistical samples to make inference about these unknown parameters. Numerical values calculated from and characterizing a statistical sample is called a statistic and they are used to make inference about the unknown parameters of the whole population. Statistics finds its applications in numerous applied sciences, among others, economics, political science, medicine. Of course, Statistics play an important role in many branches of Engineering sciences. For instance, assuming that a factory producing use the same equipment, the raw materials and the methods of production, then using statistics we can infer about the qualities of the light bulbs produced in the future. 2 Probability concepts: a short review 2.1 Basic definitions The estimation of the population parameters leads the statistician to investigate the statistic of (random) experiment whose outcome cannot be predicted with certainty and would very likely to change if the experiment is repeated. The set of all possible outcomes of a random experiment is called the sample space. Flipping a coin and rolling a die are examples of random experiment and the sample spaces are respectively Ω 1 = {T, H} and Ω 2 = {1, 2, 3, 4, 5, 6}. An example of a random experiment in Engineering is the determination of the probability of piston failure in each leg of steam-driven compressors. The sample space is Ω 3 = {0, 1} where 0 indicates a piston non-failure and 1 a failure. A random variable X is a numerical function defined on the sample space, very often we call it as the outcomes from random experiments. When rolling a die, a random variable X may represent the number of dots on the upper face. In the case of the observation of piston failure, X may represent the 2

3 number of failures of piston in a compressor. So far we have only enumerated examples of discrete random variables, i.e., random variables that take on values from a finite or countable infinite set if numbers, but there are also random variable that can take on values from an interval of the real numbers. We call the latter continuous random variables. An example of a continuous random variable is the tensile strength (in kg/m 3 ) of cement by a cement factory. An event is a subset of outcomes in the sample space, e.g. the tensile strength of cement is in the range of [40, 50]. An event itself can be the union of events, for example, the number of dots on the upper face of a die is odd. The probability measures the likelihood of an event to occur in a random experiment. It also measures the likelihood of a random variable X to takes on an observed value x or to be in the range of observed values x < y, i.e, X = x or x X y. Three important axioms of probability is given below. AXIOM 1: The probability of any event E is always between 0 and 1: P(E) [0, 1]. AXIOM 2: The probability of the sample space Ω is 1: P(Ω) = 1. AXIOM 3: For mutually disjoint event E 1,..., E k we have P(E 1 E k ) = k P(E i ). A null probability indicates that an event is impossible and an event with probability one is a sure event: obtaining 7 dots when rolling a die is an impossible event and obtaining dots in {1, 2, 3, 4, 5, 6} is a sure event. The probability distribution of a random variable X is a function describing the probabilities associated to each possible value of X. To determine the PD of a random variable we can use either the equal likelihood or the relative frequency model. When the n outcomes of a random experiment has the same likelihood/equally likely to appear then we assign to each outcome the probability value 1/n. This is the case of rolling a fair die: let X be the number of dots observed on the upper face of a fair die. The probability distribution in this experiment is f (x) = P(X = x) = 1/6, for any x {1, 2, 3, 4, 5, 6}. When the outcomes do not have the same chance to occur, then we conduct the experiment n times and denote by f the frequency of a particular event 3

4 E during our experience. In this case, we can assign to the event E the probability f /n. Another way of determine the probability of an event E is to use a probability density function (pdf) or a probability mass function (pmf) when we respectively deal with a continuous and discrete random variable. We have already seen an example of pmf. In the case of continuous random variable X with probability function f (x), the probability that the falls within the interval [x 1, x 2 ] is P(x 1 X x 2 ) = x2 x 1 f (x)dx. Note that: 1. a pmf or a pdf takes on non-negative values. 2. If the sample space of a discrete random variable X with pmf f (x) is {x 1,..., x n }, then n f (x i) = Let X be a continuous random variable with pdf f (x). If X does not takes on values within the interval [x 1, x 2 ], then P(x 1 X x 2 ) = x2 x 1 f (x)dx = Let X be a continuous random variable with pdf f (x). Then, f (x)dx = 1. In many realistic situation it is more practical to use the cumulative distribution function F(x) which is defined by F(x) = P(X x) = x f (r)dr, where f (x) is the probability density function of the continuous random variable X. 2.2 Independence concept Two events E 1 and E 2 are independent if the occurrence of E 1 does not affect the occurrence of E 2, and vice verse. In mathematical term, they are independent if their joint probability is equal to the product of their probabilities, i.e., P(E 1 E 2 ) = P(E 1 ) P(E 2 ). This definition can be generalized to any number of events. 4

5 In a similar way, we define the independence of two random variables Y and Y. Let E 1 and E 2 be any sets in the range of the random variables X and Y, respectively. Then, X and Y are independent iff P[(X E 1 ) (Y E 2 )] = P(X E 1 )P(Y E 2 ). In other words, two random variables X and Y are independent if their joint probability density (mass) function is the product of the pdf/pfm: f (x, y) = f (x)g(y), where f (x) and g(y) are the pdf/pmf of X and Y, respectively. 2.3 Parameters of a random variable We will now give various formula and interpretation of various parameters associated to a random variable X. Mean and variance of a random variable The mean of a discrete random variable X is µ = E(X) = x i f (x i ). It provides a central tendency of the distribution: we would expect that the average of many observed values of a random variable will be close to the mean. Assuming that µ <, then its variance is σ 2 = V(X) = E[(X µ) 2 ] = (x i µ) 2 f (x i ). One important parameter associated to a random variable is its standard deviation denoted by σ and defined by σ = V(X). The variance or standard deviation measures the dispersion of a distribution. An observed value of a random variable having small standard deviation is more likely to be closer the mean µ. For a continuous random variable X with pdf f (x), we have the following formula for the mean and the variance µ =E(X) = σ 2 =V(X) = x f (x)dx, (x µ) 2 f (x)dx. 5

6 r-th moment and r-th central moment of a random variable With the definition of the mean above, we define the r-th moment and the r-th central moment of a random as follows Skewness µ r =E(X r ), µ r =E[(X µ) r ]. The coefficient of skewness γ 1, which is associated to the third central moment µ 3, is used to measure the asymmetry or skewness and is given by γ 1 = µ 3 µ A negative (resp. positive) coefficient of skewness means that the distribution is skewed to the left (resp. to teh right). For a symmetric distribution, we have γ 3 = 0. (N.B., γ 3 = 0 does not in general imply that a distribution is symmetric.) Kurtosis The Kurtosis measure the peakedness/flatness of a distribution near its center. It also measures the departure of the distribution from normality. Its formula is given by γ 2 = µ 4 µ 2. 2 if γ 2 > 3, then the distribution have more values in the vicinity of the mean (more peaked than the normal distribution). A kurtosis less than 3 indicates that the distribution is flatter than the normal. 2.4 Frequently used distribution Binomial distribution Assume that the sample space of an experiment contains only two elements, say {0, 1}. In this case, we can define a probability mass function as follows f (0) =P(X = 0) = 1 p, f (1) =P(X = 1) = p, where p is the probability of an outcome X = 1. A random variable whose pmf is defined as above is called a Bernoulli random variable. 6

7 When repeating this experiment for n independent trials, we obtain a Binomial random variable X where X denotes the number of 1 in these n trials. The pmf for X is given by f (x; n, p) = P(X = x) = ( n x ) p x (1 p) n x ; x = 0, 1, 2,..., n, where ( n x ) = n! x!(n x)! and x! the factorial of a non-negative integer x. Straightforward calculation showed that E(X) =np, V(X) =np(1 p). Example: A manufacturer of light bulbs finds that on average 5% are defective. To monitor the manufacturing process, they take a random sample size of 100. If the sample contains more than five defective light bulbs, then the production must be stopped. What is the probability that the process is stopped? Poisson distribution A discrete random variable X is a Poisson random variable with parameter λ > 0 iff its pmf is We have f (x; λ) = P(X = x) = e λ λx E(X) = λ, V(X) = λ. ; x = 0, 1,.... x! Example: What is the probability that a page have at least 2 typos if the typographical errors per page follows the Poisson distribution with parameter λ = 0.25? Uniform distribution One of the most important distributions is the uniform distribution for continuous random variable. A continuous random variable X with values on a interval (a, b) follows the uniform distribution iff its pdf is given by We have f (x; a, b) = 1 b a ; a < x < b. E(X) = a + b 2, V(X) = (b a)

8 Normal distribution A continuous random variable X follows a normal or Gaussian distribution and we denote X N(µ, σ 2 ) iff its pdf is defined by where f (x; µ, σ 2 ) = 1 σ (x µ)2 exp{ 2π 2σ 2 }, x (, ), µ (, ), σ 2 > 0. A normal distribution is determined by its parameters µ and σ 2 and We have the following properties: (N1) lim x f (x; µ, σ 2 ) = 0, E(X) =µ, V(X) =σ 2. (N2) The pdf f (x; µ, σ 2 ) attains its maximum value at x = µ. (N3) The pdf f (x; µ, σ 2 ) is symmetric about the mean µ. A random variable X such that X N(0, 1) is called a standard normal random variable. Normal distribution is frequently used in statistics and engineering. Exponential distribution A continuous random variable follows an exponential distribution with parameter λ iff its pdf is defined by We have f (x; λ) = λe λx ; X 0, λ > 0. E(X) = 1 λ, V(X) = 1 λ 2. Exponential random variables are used to describe (i) the time between arrivals of telephone calls: in this case, λ is a rate with units of arrivals per time period; (ii) the time until a machine part fails and λ is failure rate. Example: The time between arrivals of telephone calls at a switchboard follows an exponential distribution with a mean 12 seconds. What is the probability that the time between arrivals is 10 seconds or less? 8

9 The Gamma and Chi-Square distributions In this part we will review a generalization of the exponential distribution. A random variable X follows the Gamma distribution with parameters λ > 0 and t > 0 iff its pdf is given by f (x; λ, t) = λe λx (λx) t 1 ; x 0, Γ(t) where the Gamma function Γ is defined by We have Γ(x) = 0 t x 1 e t dt, x > 0. E(X) = t λ, V(X) = t λ 2. When t is a positive integer the Gamma distribution can be used to model the amount of time one has to wiat until t events have occurred, if the inter-arrival times are exponentially distributed. The Gamma distribution is called a Chi-Square distribution with ν degrees of freedom when λ = 0.5 and t = ν/2 where ν is a positive integer. The pdf of a Chi-Square random variable is defined by f (x; ν) = 1 Γ (ν/2) ( 1 2 ) ν 2 x ν 2 1 e 1 2 x ; x 0. The Gamma distribution is a generalization of the exponential distribution in the sense that the former reduces to the latter when t = 1. Student s t distribution This kind of distribution is also frequently used in inferential statistics, especially when the sample size is small (usually less than 30). The pdf of a t distribution with degree of freedom nu is defined by We have f (x; ν) = 1 Γ( ν+1 πν Γ( ν 2 ) 2 ) ) (ν+1)/2 (1 + x2. ν E(X) =0, ν 2, V(X) = ν ν 2, ν 3. 9

10 The pdf of a t distribution is symmetric and bell-shaped and its is centered at 0, however, in contrast to the normal distribution it has a havier tails and a larger spread. We should notice that one can define a t distribution with ν degrees of freedom from a standard normal random variable Z and a chi-square random variable U by setting X = Z. νu 2.5 Two important Limit Theorems in probability In statistics we usually want to estimate the unknown mean of a given population. For this purpose we randomly choose a sample from the population and calculate its mean µ and its variance σ 2. We repeat this experiments n times by assuming that the trials are mutually independent and identically distributed. In this case we have formed n independent and identically distributed random variables X 1,... X 2. Two important theorems in probability give the behaviour of the mean S n = n X i n when n is becoming bigger and bigger. Theorem 2.1 (Law of Large Number). As n gets bigger and bigger, S n will get closer and closer to the theoretical mean µ. The next theorem gives the behaviour of the distribution of S n as n gets bigger. Theorem 2.2 (Central Limit Theorem). As n gets bigger and bigger, S n will be approximately normally distributed with mean µ and variance σ 2 /n. These theorem has many versions, but we simplify their statements so that they are accessible to non-mathematician. 3 Inferential Statistics: Parameter Estimation 3.1 Sampling concepts and distributions In what follows a random sample of size n is a sequence of independent and identically distributed (iid) X 1,..., X n, i.e., the X i -s are independent and they have a common probability density/mass f (x). As we have defined in the previous section, a population parameter (mean, variance, quantiles,... ) is in many instances unknown and the goal of inferential statistics is to use a random sample to estimate or make a statement about a unknown population parameter. A statistic is a function observed (known) random variables which is used as a point estimate for a population parameter, 10

11 to obtain a confidence interval for a parameter, as a test statistic in hypothesis testing. Before we move to the main subject of this section let us define several statistics frequently encountered in many applications. Sample Mean and Sample Variance The sample mean of a random sample of size n is given by X = 1 n The sample variance is defined by S 2 = 1 n 1 n X i. n (X i X) 2. The sample standard deviation is the square root of the sample variance. Sample Moments The r-th sample moment is defined by M r = 1 n The r-th central moment is given by M r = 1 n n n Xi r. (X i X) r. The following are the sample coefficient skewness and kurtosis γ 1 = M 3 M 3 2 2, γ 2 = M 4 M2 2. Note that as X 1,..., X n are random variables the above quantities are also random variables. The above statistics can be used to estimate the population parameters, for instance, X, S 2, γ 1 and γ 2 can be used to estimate the the mean, variance, the skewness coefficient and kurtosis of the population. But, since we are working with sample which are much smaller than the actual population, it is very likely that there are some errors in our estimate. To study the 11

12 efficiency of our estimate and to manage the uncertainty of our estimate, we must know the distribution of the statistic we use (and only then we can perform statistical hypothesis test and calculate confidence intervals.) For instance, if we know that our sample is normal, then, by a classical theorem in probability, its mean X is also normal. Anyway, whatever the distribution of the random sample is, when the sample is big enough (of size bigger than 30) the CLT theorem tells us that the mean will be approximately normally distributed. 3.2 Parameter Estimation One can use two types of methods to estimate a population parameter. A point estimation deals with the calculation from the sample a single value (point estimate) that, with high probability, will be close to the unknown population parameter. An interval estimation is a procedure which return a range of values (or interval) around the point estimate that, with a certain degree of confidence, will contain the population parameter. Point estimation A point estimator T for a parameter θ is function from all possible values of the sample data X i, i = 1,..., n. One of the main interest in point estimation is the measure of the performance of the estimation. To assess the estimators we have four criteria: bias, mean square error, efficiency, and standard error. 1. Bias: The bias measures the average error we have made in estimating θ by T, i.e., bias(t) = E(T θ). The estimator T is said to be unbiased if bias(t) = 0, i.e, E(T) = θ. 2. Mean Squared Error: The MSE of the estimator T is defined by MSE(T) = E[(T θ) 2 ]. It is a straightforward task to check that MSE(T) = V(T) + [bias(t)] Relative Efficiency: this criteria is used to compare estimators. Assume we have two estimators T 1 and T 2 for the same parameter θ. Then, the relative efficiency of T 1 to T 2 is defined by eff(t 1, T 2 ) = MSE(T 1) MSE(T 2 ). When eff(t 1, T 2 ) > 1, i.e., MSE(T 2 ) > MSE(T 1 ), then T 1 is more efficient than T 2. 12

13 4. Standard Error is the square root of the variance of the estimator T. We have said that the sample mean X is an estimator of the population mean µ. But how precise is this estimation? By CLT, we have V( X) = 1 n σ2, thus the standard error ot X is SE( X) = σ n. If the standard deviation σ is unknown, then we can derive an estimate of the standard error ŜE( X) by using an estimator of σ. Method of Moments 13