Analisi Statistica per le Imprese Dip. Economia Politica e Statistica 4. La prospettiva dell'ecienza interna 1 / 27
Obiettivo Ricerca dell'ecienza interna: si tratta determinare il corretto utilizzo delle risorse, dato un volume produzione atteso. 2 / 27
Secondo il BS Gli interventi da eseguire dovranno legare ecienza interna, aumento della produzione/ricavi, investimenti e ricerca della sodsfazione del cliente. Quin secondo la visione rappresentata dal sistema descritto, l'ecienza interna è il primo step una stima simultanea dell'intero sistema. 3 / 27
Formalmente Consideriamo la prima delle relazioni del sistema dove la variabile X può rappresentare grandezze come: spese per il personale spese per le materie prime spese per gli investimenti (capitale) ovvero ciò che riguarda i fattori produttivi utili per ottenere un livello produzione pressato. Veamo quin quali meto statistici possono essere applicati per il raggiungimento dei nostri obiettivi 4 / 27
Deterministic relationship A deterministic relationship f between a dependent variable Y and an independent variables X is expressed as: Y = f (X) (1) In the most simple case, when the relationship is linear and k = 1, it is commonly expressed as follows: Y = β 0 + β 1 X (2) This relationship can be represented graphically on a Cartesian Coornate System as a line with intercept β 0 and slope β 1. 5 / 27
Stochastic relationship However deterministic relationships are very rare due to randomness. The true relationship between a dependent variable Y and an independent variables X is approximated by the stochastic regression model as follows: Y = f (X) + ε (3) The term ε is supposed to incorporate the size of the approximation error. The introduction of such element identies the stochastic relationship. If f is a linear function, the regression model is linear and is expressed as: Y = β 0 + β 1 X + ε (4) The terms (β 0,β 1 ) are the regression parameters or model coecients. 6 / 27
analysis is a method to study functional relationships between variables. The relationship is expressed as an equation or model that links the dependent (tted or prected) variable to one or more independent (prector) variable Example Si supponga aver classicato i clienti una banca secondo due caratteri: redto meo sponibile e frequenza mensile ricorso al servizio Bancomat. La relazione che si vuole vericare è se la frequenza mensile ricorso al servizio Bancomat Y (variabile pendente), vari al variare del redto X. Observing a sample of size n we will have n observations for each variable. 7 / 27
Simple Linear The relationship between the dependent variable and the independent variable is expressed by the following linear model: Y = β 0 + β 1 X + ε (5) The parameters or coecients of regression are (β 0,β 1 ) and ε is the random component of the model. For each observation the model can be written as: Y i = β 0 + β 1 X i + ε i i = 1,...,n (6) 8 / 27
Scatter plot y y (x 2,y 2 ) (x 1,y 1 ) (x 3,y 3 ) (x 4,y 4 ) (x 5,y 5 ) (x 6,y 6 ) x x 9 / 27
Scatter plot y Ŷ = β 0 + β 1 X Ŷ = a + bx Ŷ = c + dx x 10 / 27
Optimal Parameters One must then nd the equation that best approximates the observations with coornates (X, Y). This equation is specied by the following linear model: Ŷ = β 0 + β 1 X (7) Hence one must nd the optimal parameters ( β 0, β 1 ). 11 / 27
(OLS) The method of (OLS) minimizes the following auxiliary function: n i=1 ( ) 2 n ( Y i Ŷ i = Y i β 0 + β 2 1 X) (8) i=1 The function is minimized for the following parameter values: ( β 1 = n i=1 Xi X )( Y i Y ) ( Xi X ) = n i=1 x iy i 2 n (9) i=1 x2 i n i=1 β 0 = Y β 1 X (10) 12 / 27
(OLS) x i = ( X i X ) (11) y i = ( Y i Y ) (12) X = 1 n Y = 1 n n i=1 n i=1 X i (13) Y i (14) 13 / 27
(OLS) A stochastic model is much more realistic then a deterministic model when the observations are not perfectly aligned. The introduction of the ε introduces culties but enhances the results that are more useful and deeper in meaning. First consideration: Justication How can we justify the introduction of the stochastic component? 1 There may be errors in the model 2 The number of covariates is limited 3 Randomness from empirical sampling 4 Measurement errors 14 / 27
(OLS) Second consideration: Random Variable The introduction of the error term allows the identication of Y as a random variable (r.v.). This implies that every value expressed as a function of Y is also a r.v. 15 / 27
(OLS) Third consideration: Assumptions 1 The functional relationship must be linear 2 The independent variable has a deterministic nature 3 The error term has a null expected value E [ε i ] = 0 4 The error term is homoskedastic Var [ε i ] = σ 2 5 The error terms are not autocorrelated Cov[ε i ε j ] i j = 0 16 / 27
Creble assumptions 1 seems restrictive, but many non linear relationships can be expressed as linear relationships of the parameters through appropriate transformations. 2 is possibly the most unrealistic in socio-economic analysis, but guarantees important by E [X i ε i ] = X i E [ε i ] = 0 3 guarantees that the most likely error value is zero, as in this case the mean value is also the modal value. 4 can be unlikely in cross section observations, and can give serious culties. We will focus on this later. 5 can be unlikely in time series observations, and can give serious culties. 17 / 27
We will now examine the of the OLS s of the unknown parameters of the regression function. The estimates are obtained for both (Y, X) from n observations extracted with probabilistic sampling from a target population. If we extracted a new sample from the same population we would most likely obtain erent results, hence the parameters estimates would be erent. For this reason we can say that the estimates are tied to the r.v. When one writes β one means: 1 The slope of the regression line, estimated from a set of observations 2 the that has a specic probability stribution 18 / 27
Properties Given assumptions 1-5, OLS s are the Best Linear Unbiased Estimators (BLUE). This means that they are the most ecient (minimum variance) not systematically storted s within the class of linear s. Here we just prove that it is linear and unbiased. 19 / 27
Linear ( β 1 = n i=1 Xi X )( Y i Y ) n ( i=1 Xi X ) = n i=1 x iy i 2 n i=1 x2 i β 1 = x iy i x 2 i Similarly it can be proven for β 0. = w i y i w i = x i x 2 i (15) (16) w i = 0 w i x i = 1 (17) 20 / 27
Unbiased β 1 = w i y i = w i ( Yi Y ) = w i Y i Y w i }{{} 0 (18) β 1 = w i [β 0 + β 1 X i + ε i ] + 0 = w i β 0 + β 1 w i X i + w i ε i }{{}}{{} 0 1 (19) β 1 = 0 + β 1 + w i ε i (20) [ ] E β1 = E [ ] β 1 + w i ε i = β1 + w i E [ε i ] = β }{{} 1 (21) 0 Similarly it can be proven for β 0. 21 / 27
Let us add to the hypothesis 1-5 the following: The error term has Normal stribution so:ε i N(0,σ 2 ) so: y i N(βX i,σ 2 ) 22 / 27
OLS stribution As β is a weighted average of y and the y are normally stributed, β is also normally stributed. ( β 1 N β 1, σ ε 2 ( β 0 N β 0,σε 2 x 2 i X 2 i ) N x 2 i OLS s coincide with Maximum Likelihood (ML) s. The Central Limit supports that even if the y where not to be normally stributed, under a very nonrestrictive contions, the asymptotic stribution would still hold. ) (22) (23) 23 / 27
Residual Variance Estimator This, called the residual variance, can be obtained through Maximum Likelihood computations (here omitted). s 2 = σ 2 = 2 ε (Y i n 2 = i β 0 β ) 2 1 X i n 2 (24) ε i = Y i Ŷ i (Residual) (25) The residual variance is an unbiased and consistent of the variance of the error term. The denominator is n 2 because that is the number of degrees of freedom which are calculated as the number of observations minus the required parameters in the computation. In this case to calculate this one needs both (β 0,β 1 ). 24 / 27
Further observations ) Var ( β1 is a rect function of Var (ε i ) strongly variable errors lower precision and reliability ) Var ( β1 is an inrect function of the variability of X i covariates concentrated in a close interval worsen the quality of β 1 25 / 27
OLS Estimators standard error Once we have estimated the variance of the stochastic term in the regression model, we may substitute it in the OLS 's variance to obtain the standard errors (s.e.) of s. s 2 β1 = s2 x 2 i s 2 β0 = s 2 ( X 2 i n x 2 i ) (26) (27) 26 / 27
Pindyck R. and Rubinfeld D., 1998 Econometrics Models and Economic Forecasts (4-th Ed.) Bracalente, Cossignani, Mulas, 2009 Statistica Aziendale sec.4.1 27 / 27