The Analysis of Interdeendent Series by Correlation Method PhD Professor Angelica BĂCESCU - CĂRBUNARU PhD Lecturer Monica CONDRUZ - BĂCESCU Bucharest University of Economic Studies Abstract By the correlation method we can measure the degree of interdeendence between two or more variables. Qualitative analysis, on the basis of knowing the tye of variables, could elain the eistence of a common cause that influence both of them. As such, we roose to consider briefly the roblem of multile correlation between a deendent variable and two or more indeendent variables. Keywords: correlation analysis, functional relationshi ( mathematics law), the coefficient (intensity) of correlation, correlation, multifactorial (multile) correlation, total multile correlation, multile artial correlation. *** The essence of correlation method In statistical research we often encounter distributions where, to each unit of considered oulation, corresonds simultaneously two or more features of the same kind or of different nature. Eamles can be found in most areas : eole s height and weight, the amount of rainfall and harvests, technical equiment and labor roductivity, etc. Such distributions, called twodimensional, suggest the eistence of relationshis between those features. The correlation analysis measures the degree of interdeendence between two or more variables. It can not rove a causal relationshi, a relationshi of cause and effect between variables. Interdeendence can however be functional. By functional relationshi we understand the relationshi that can be eressed by a formula or by what mathematicians call a mathematics law, such as linear relationshi formula. The fact that the two variables tend to be related, meaning that one of them increased levels tend to be accomanied by an increase of the second and vice versa, it doesn t turn out that the first has a direct influence on the second or vice versa. But that association does not Revista Română de Statistică - Suliment nr. / 0 9
rove the oosite either. Qualitative analysis based on a thorough knowledge of the nature of the variables is necessarily required for correct interretation of the correlation or intensity coefficient. The correlation of the two variables can be elained by the eistence of common causes that affect both. Revenue growth can cause both increased variable of oulation cash availability and variable growth of oulation endowment with refrigerators. But we cannot say that the first variable is the cause of the second, but that both are caused by increasing incomes of oulation. Determination of relationshi between two variables raises the question: how close, how intense are these relationshis and, consequently, how much can vary estimates or redictions made on the basis of regression analysis. As the average can not be roerly interreted without a measure of the disersion or variability of the data from which it resulted (and the most common measure is the average square deviation or standard deviation), so estimates or redictions resulting from the regression analysis require finding a measure of their variability. We will consider as measures of estimations variability based on regression analysis, the standard error of estimation and correlation coefficient. To this end, we will first refer to a hyothetical eamle consisting of five airs of associated values. y y y y-y (y-y) 3 3 9 6,, 3,0 3,9,8-0, -0,,0-0,9 0, 0,0 0,0,00 0,8 0,0 TOTALS,0 0,0,90 The values of a and b are obtained with the hel of the following formulas: y y b N N 0 Romanian Statistical Review - Sulement nr. / 0
and a y y N which gives by relacement: b 0,9 a 0,3 The best adjusted equation of the right will be: y=a+b The regression equation is thus: Y = 0,3 + 0, 9 We note that the algebraic sum of the differences (y-y) is equal to zero. If we remember that the first algebraic roerty of arithmetic average is that the sum of deviations around the average is equal to zero, we conclude that the regression line is a line of averages. We mention that the regression line must ass through a oint with coordinates (, y) and this is the case of our equation for the regression line will ass through that resective oint. The multile multifactorial correlation We will only briefly eamine the roblem of multile correlation, ie, the roblem of correlation between a deendent variable and two or more indeendent variables. For eamle, we want to know not only the correlation coefficient between labor roductivity and the number of workers, but also between the number of workers and their energy endowment. Or we may want to know the correlation coefficient not only between the yield er hectare and alication of nitrogenous fertilizers, but also of work outut and alication of nitrogenous fertilizers and certain hoshate fertilizers. The introduction of a further indeendent or elanatory variables in a roblem of regression results in less standard error in the estimation of r, Revista Română de Statistică - Suliment nr. / 0
namely, the value of correlation coefficient will increase. It may be conceived that by the introduction of additional elanatory variables, the correlation coefficient to be so high, that almost all the variation to be elained. But the difficulties that will thus aear in the calculation of the correlation coefficient will be greater than the benefit that we will get from them. The multile correlation may be artial or total. While the overall multile correlation measures the influence of combined indeendent variables, multile artial correlation measures the influence of each indeendent variables variations when the other one or the other ones are considered constant. Partial correlation coefficient shows the relative imortance of each indeendent variable. M.Ezechiel, an American statistician who founded methods of calculating multile and artial correlation, gave the following eamle to illustrate these methods. He referred to the relationshi between the increase in the benefits of a farm and its size, the number of cattle and workers. Considering only the etent and number of cattle, he calculated a correlation coefficient of 0.90. Introducing a third indeendent variable, the number of workers, the correlation coefficient increased to 0.9. Transforming these correlation coefficients in coefficients of determination, the first two indeendent variables elain 8.8 % of the increase in benefits, and all three indeendent variables 83.7%. The introduction of the third indeendent variable incresed the elanation of farm benefits variation with the difference between 83.7% and 8.8 %, therefore with %. If the elanation of the significance of these increases is determined by comaring them to the unelained variation before the introduction of the third indeendent variable (the number of workers ), we find that.0 / 8.3 namely 0.93 ercent of the variation that remained unelained when we considered only the etent and number of cattle, could be associated with the number of workers. If you take the square root of 0.093 we find the artial correlation coefficient 0.33. Multifactorial links can be eressed with the hel of multile regression equation: Y = f (,... )+ where:,... reresent the indeendent or factorial characteristics. = o residue variable with zero average and constant variance. As noted above, the factorial variables included in the model should eress key factors influencing the henomenon investigated. The most widely used model of multifactorial regression is the linear model eressed as follows: Romanian Statistical Review - Sulement nr. / 0
Y = a + a + a +... + a 0 where: a 0 = a coefficient eressing the influence of not included factors in the model considered with constant action, a i (i=, ) are multile regression coefficients and show the share of each characteristic factor i influencing the characteristic factor y. Calculation of the arameters a 0, a a is made starting from the well known method of the smallest squares eressed as: ( y a0 a a... a ) = minimum By derivation is obtained a normal equation system with factorial variables and + arameters, as follows: a0n + a + a +... + a = y a0 + a + a +... + a = y a0 + a + a +... + a = y... a + + + + = a a... a y Regression coefficients a i can have either a ositive sign either a negative sign and show the tye of connection (direct or inverse) between the factorial variable i and the resultant variable y. Conclusions Checking the eistence or non-eistence and the intensity of these relations is the object of interdeendent series analysis. It involves the simultaneous analysis of two variables and uses two tyes of statistical methods: regression and correlation. If one of the two variables is considered as elanatory variable or indeendent, and the other called resultant or deendent, resent changes in the case of a variation of the first, we will use the regression method to analyze the relationshis between them. Revista Română de Statistică - Suliment nr. / 0 3
Bibliograhy. Andrei, T., (003), Statistică şi econometrie, Editura Economică, Bucureşti. Angelache, C., (008), Tratat de statistică teoretică şi economică, Editura Economică, Bucureşti 3. Băcescu-Cărbunaru, A. (009), Statistică Bazele statisticii, Ed. Universitară, Bucureşti. Mark, L. Berenson, David, M. Levine, Timothy, C. Krehbiel, (0), Basic Business statistics: concets and alications, twelfth edition, Pearson. Prodan, L., (0), Corelaţia dintre rodusul intern brut/locuitor şi rata de ocuare a oulaţiei model econometric de analiză, Revista Română de Statistică nr. 6. Tiţan, E., Ghiţă, S., Cărbunaru-Băcescu, A., (000), Bazele statisticii, Editura Meteora Press, Bucureşti 7. Vătui, M., Voineagu, V., Lilea, E., Goschin, Z., Isaic-Maniu, I., Danciu, A., Todose, D., (006), Statistică- Teorie şi alicaţii, Editura A.S.E., Bucureşti Romanian Statistical Review - Sulement nr. / 0