Chapter 65 Linear regression 65.1 Introduction to linear regression ( n, n ) D n Q Regression analysis, usually termed regression, is used to draw the line of best fit through co-ordinates on a graph. The techniques used enable a mathematical equation of the straight line form y = mx + c to be deduced for a given set of co-ordinate values, the line being such that the sum of the deviations of the co-ordinate values from the line is a minimum, i.e. it is the line of best fit. When a regression analysis is made, it is possible to obtain two lines of best fit, depending on which variable is selected as the dependent variable which variable is the independent variable. For example, in a resistive electrical circuit, the current flowing is directly proportional to the voltage applied to the circuit. There are two ways of obtaining experimental values relating the current voltage. Either, certain voltages are applied to the circuit the current values are measured, in which case the voltage is the independent variable the current is the dependent variable; or, the voltage can be adjusted until a desired value of current is flowing the value of voltage is measured, in which case the current is the independent value the voltage is the dependent value. 65. The least-squares regression lines For a given set of co-ordinate values, ( 1, 1 ), (, ),...,( n, n ) let the values be the independent variables the -values be the dependent values. Also let D 1,..., D n be the vertical distances between the line shown as PQ in Fig. 65.1 the points representing the co-ordinate values. The least-squares regression line, i.e. the line of best fit, is the line which makes the value of D1 + D + +D n a minimum value. ( 1, 1 ) D 1 P Figure 65.1 D (, ) The equation of the least-squares regression line is usually written as = a 0 + a 1, where a 0 is the -axis intercept value a 1 is the gradient of the line (analogous to c m in the equation y = mx + c). The values of a 0 a 1 to make the sum of the deviations squared a minimum can be obtained from the two equations: = a0 N + a 1 (1) H 3 H 4 () = a0 + a1 () where are the co-ordinate values, N is the number of co-ordinates a 0 a 1 are called the regression coefficients of on. Equations (1) () are called the normal equations of the regression line of on. The regression line of on is used to estimate values of for given values of. If the -values (vertical-axis) are selected as the independent variables, the horizontal distances between the
Linear regression 7 line shown as PQ in Fig. 65.1 the co-ordinate values (H 3, H 4, etc.) are taken as the deviations. The equation of the regression line is of the form: = b 0 + b 1 the normal equations become: = b0 N + b 1 (3) () = b0 + b1 (4) where are the co-ordinate values, b 0 b 1 are the regression coefficients of on N is the number of co-ordinates. These normal equations are of the regression line of on, which is slightly different to the regression line of on. The regression line of on is used to estimate values of for given values of. The regression line of on is used to determine any value of corresponding to a given value of. If the value of lies within the range of -values of the extreme co-ordinates, the process of finding the corresponding value of is called linear interpolation. If it lies outside of the range of -values of the extreme co-ordinates then the process is called linear extrapolation the assumption must be made that the line of best fit extends outside of the range of the co-ordinate values given. By using the regression line of on, values of corresponding to given values of may be found by either interpolation or extrapolation. 65.3 Worked problems on linear regression Problem 1. In an experiment to determine the relationship between frequency the inductive reactance of an electrical circuit, the following results were obtained: Frequency (Hz) 50 100 150 Inductive 30 65 90 reactance (ohms) Frequency (Hz) 00 50 300 350 Inductive 130 150 190 00 reactance (ohms) Determine the equation of the regression line of inductive reactance on frequency, assuming a linear relationship. Since the regression line of inductive reactance on frequency is required, the frequency is the independent variable,, the inductive reactance is the dependent variable,. The equation of the regression line of on is: = a 0 + a 1 the regression coefficients a 0 a 1 are obtained by using the normal equations = a0 N + a 1 = a0 + a1 (from equations (1) ()) A tabular approach is used to determine the summed quantities. Frequency, Inductive reactance, 50 30 500 100 65 10 000 150 90 500 00 130 40 000 50 150 6 500 300 190 90 000 350 00 1 500 = 1400 = 855 = 350 000 1500 900 6500 45 13 500 8100 6 000 16 900 37 500 500 57 000 36 100 70 000 40 000 = 1 000 = 18 75
8 Engineering Mathematics The number of co-ordinate values given, N is 7. Substituting in the normal equations gives: Solving these equations in a similar way to that in problem 1 gives: 855 = 7a 0 + 1400a 1 (1) 1 000 = 1400a 0 + 350 000a 1 () b 0 = 6.15 b 1 = 1.69, correct to 3 significant figures. 1400 (1) gives: 7 () gives: (4) (3) gives: 1 197 000 = 9800a 0 + 1 960 000a 1 (3) 1 484 000 = 9800a 0 + 450 000a 1 (4) 87 000 = 0 + 490 000a 1 (5) 87 000 from which, a 1 = 490 000 = 0.586 Substituting a 1 = 0.586 in equation (1) gives: Thus the equation of the regression line of frequency on inductive reactance is: = 6.15 + 1.69 Problem 3. Use the regression equations calculated in Problems 1 to find (a) the value of inductive reactance when the frequency is 175 Hz, (b) the value of frequency when the inductive reactance is 50 ohms, assuming the line of best fit extends outside of the given co-ordinate values. Draw a graph showing the two regression lines i.e. a 0 = 855 = 7a 0 + 1400(0.586) 855 80.4 7 = 4.94 Thus the equation of the regression line of inductive reactance on frequency is: = 4.94 + 0.586 Problem. For the data given in Problem 1, determine the equation of the regression line of frequency on inductive reactance, assuming a linear relationship (a) (b) From Problem 1, the regression equation of inductive reactance on frequency is: = 4.94 + 0.586. When the frequency,, is 175 Hz, = 4.94 + 0.586(175) = 107.5, correct to 4 significant figures, i.e. the inductive reactance is 107.5 ohms when the frequency is 175 Hz. From Problem, the regression equation of frequency on inductive reactance is: = 6.15 + 1.69. When the inductive reactance,, is 50 ohms, = 6.15 + 1.69(50) = 416.4 Hz, correct to 4 significant figures, i.e. the frequency is 416.4 Hz when the inductive reactance is 50 ohms. In this case, the inductive reactance is the independent variable the frequency is the dependent variable. From equations 3 4, the equation of the regression line of on is: = b 0 + b 1 the normal equations are = b0 N + b 1 = b0 + b1 From the table shown in Problem 1, the simultaneous equations are: 1400 = 7b 0 + 855b 1 1 000 = 855b 0 + 18 75b 1 The graph depicting the two regression lines is shown in Fig. 65.. To obtain the regression line of inductive reactance on frequency the regression line equation = 4.94 + 0.586 is used, (frequency) values of 100 300 have been selected in order to find the corresponding values. These values gave the co-ordinates as (100, 63.5) (300, 180.7), shown as points A B in Fig. 65.. Two co-ordinates for the regression line of frequency on inductive reactance are calculated using the equation = 6.15 + 1.69, the values of inductive reactance of 50 150 being used to obtain the co-ordinate values. These values gave co-ordinates (78.4, 50) (47.4, 150), shown as points C D in Fig. 65.. It can be seen from Fig. 65. that to the scale drawn, the two regression lines coincide. Although it is not necessary to do so, the co-ordinate values are also shown to indicate that the regression lines do appear to be the
Linear regression 9 Inductive reactance in ohms 300 50 00 150 100 50 Figure 65. A C D 0 100 00 300 400 500 B Frequency in hertz lines of best fit. A graph showing co-ordinate values is called a scatter diagram in statistics. Problem 4. The experimental values relating centripetal force radius, for a mass travelling at constant velocity in a circle, are as shown: Force (N) 5 10 15 0 5 30 35 40 Radius (cm) 55 30 16 1 11 9 7 5 Determine the equations of (a) the regression line of force on radius (b) the regression line of force on radius. Hence, calculate the force at a radius of 40 cm the radius corresponding to a force of 3 N Let the radius be the independent variable, the force be the dependent variable. (This decision is usually based on a cause corresponding to an effect corresponding to.) Using a tabular approach to determine the values of the summations gives: Radius, Force, 55 5 305 30 10 900 16 15 56 1 0 144 11 5 11 9 30 81 7 35 49 5 40 5 = 145 = 180 = 4601 75 5 300 100 40 5 40 400 75 65 70 900 45 15 00 1600 = 045 = 5100 (a) The equation of the regression line of force on radius is of the form = a 0 + a 1 the constants a 0 a 1 are determined from the normal equations: = a0 N + a 1 = a0 + a1 (from equations (1) ()) Thus 180 = 8a 0 + 145a 1 045 = 145a 0 + 4601a 1 Solving these simultaneous equations gives a 0 = 33.7 a 1 = 0.617, correct to 3 significant figures. Thus the equation of the regression line of force on radius is: = 33.7 0.617
10 Engineering Mathematics (b) The equation of the regression line of radius on force is of the form = b 0 + b 1 the constants b 0 b 1 are determined from the normal equations: = b0 N + b 1 = b0 + b1 (from equations (3) (4)) The values of the summations have been obtained in part (a) giving: In Problems 3 4, determine the equations of the regression lines of on for the data stated, correct to 3 significant figures. 3. The data given in Problem 1. [ = 3.0 + 0.014] 4. The data given in Problem. [ = 0.056 + 4.56] 5. The relationship between the voltage applied to an electrical circuit the current flowing is as shown: 145 = 8b 0 + 180b 1 045 = 180b 0 + 5100b 1 Solving these simultaneous equations gives b 0 = 44. b 1 = 1.16, correct to 3 significant figures. Thus the equation of the regression line of radius on force is: = 44. 1.16 The force,, at a radius of 40 cm, is obtained from the regression line of force on radius, i.e. = 33.7 0.617(40) = 9.0, i.e. the force at a radius of 40 cm is 9.0 N The radius,, when the force is 3 Newton s is obtained from the regression line of radius on force, i.e. = 44. 1.16(3) = 7.08, i.e. the radius when the force is 3 N is 7.08 cm Now try the following exercise Exercise Further problems on linear regression In Problems 1, determine the equation of the regression line of on, correct to 3 significant figures. 1. 14 18 3 30 50 900 100 1600 100 3800 [ = 56 + 80.6]. 6 3 9 15 14 1 13 1.3 0.7.0 3.7 0.5.9 4.5.7 [ = 0.0477 + 0.16] Current (ma) 4 6 8 10 1 14 Applied voltage (V) 5 11 15 19 4 8 33 Assuming a linear relationship, determine the equation of the regression line of applied voltage,, on current,, correct to 4 significant figures. [ = 1.14 +.68] 6. For the data given in Problem 5, determine the equation of the regression line of current on applied voltage, correct to 3 significant figures. [ = 0.483 + 0.440] 7. Draw the scatter diagram for the data given in Problem 5 show the regression lines of applied voltage on current current on applied voltage. Hence determine the values of (a) the applied voltage needed to give a current of 3 ma (b) the current flowing when the applied voltage is 40 volts, assuming the regression lines are still true outside of the range of values given. [(a) 7.9V (b) 17.1 ma] 8. In an experiment to determine the relationship between force momentum, a force,, is applied to a mass, by placing the mass on an inclined plane, the time,, for the velocity to change from u m/s to v m/s is measured. The results obtained are as follows: Force (N) 11.4 18.7 11.7 Time (s) 0.56 0.35 0.55
Linear regression 11 Force (N) 1.3 14.7 18.8 19.6 Time (s) 0.5 0.43 0.34 0.31 Determine the equation of the regression line of time on force, assuming a linear relationship between the quantities, correct to 3 significant figures. [ = 0.881 0.090] 9. Find the equation for the regression line of force on time for the data given in Problem 8, correct to 3 decimal places. [ = 30.194 34.039] 10. Draw a scatter diagram for the data given in Problem 8 show the regression lines of time on force force on time. Hence find (a) the time corresponding to a force of 16 N, (b) the force at a time of 0.5 s, assuming the relationship is linear outside of the range of values given. [(a) 0.417 s (b) 1.7 N]