Effect on R of fitting means and calculation of lack of fit (LOF) Page dm'log;clear;output;clear'; OPTIONS PS=5 LS=0 NOCENTER NODTE NONUMBER FORMCHR="----+---+=-/\<>*"; ODS HTML style=minimal body='ppendix09 LOF & Rsquare Suppliment.html'; ODS listing; TITLE 'EXST705: Marathon Footrace from Pennsylvania 00'; filename input 'ppendix09 MReg-0K Polynomial.DT'; ***********************************************; *** Finish times in a race ***; *** Data taken from various sites on the ***; *** internet reporting race results ***; ***********************************************; data P00; length hometown $ 3 gender $ 3; infile input missover; input Marathon $ ge gender $ TIME HomeTown $ 35-57; if age eq 99 then age =.; *(apparently 99 represents missing for 5 P race participants); *---+--------+--------+----3----+----4----+----5----+----6; cards; run; ; data P00; set P00; if marathon = 'P0500'; proc sort data=p00; by gender age; run; TITLE 'Scatter plot of raw data'; proc plot data=p00; by gender; plot time*age; options ls=3 ps=65; run; options ps=56 ls=80; proc glm data=p00; BY gender; TITLE 'Quadratic model fitted to raw data - separate by gender'; model time= age age*age; run; proc means data=p00 noprint; by gender age; var time; output out=meansbygender n=n mean=timemean std=std; run; TITLE 'Scatter plot of means'; proc plot data=meansbygender; by gender; plot timemean*age; options ls=3 ps=65; run; options ps=56 ls=80; proc glm data=meansbygender; BY gender; TITLE 'Quadratic model fitted on UNweighted means - separate by gender'; model timemean = age age*age; run; proc glm data=meansbygender; BY gender; weight n; TITLE 'Quadratic model fitted on weighted means - separate by gender'; model timemean = age age*age; run; proc print data=meansbygender; var gender age n timemean std; run; proc glm data=p00; BY gender; class age; TITLE 'nalysis of Variance of GE - separate by gender'; model time = age; run; data P00; set P00; age_again = age; proc glm data=p00; BY gender; class age_again; TITLE 'nalysis of Variance of GE - separate by gender'; model time = age age*age age_again; run; ODS HTML close; run; quit;
Effect on R of fitting means and calculation of lack of fit (LOF) Page EXST705: Marathon Footrace from Pennsylvania 00 Scatter plot of raw data Plot of TIME*ge. Legend: = obs, B = obs, etc. TIME 400 + B C 350 + B B B B B B B B B B B C B B C B D C B B 300 + B B B B B B B C C B B C B B B E B C D B E B B B C D B B C D B D C C B B C B C B B B E D E B D B B B C B C B B C B E D B B C D B B D C B B B C B B B C B B B C C D B D C D B B B C 50 + B D B D C C C B C C B C B B B D B B B D B B B B F B B F B B B B B B B B C C D B D B B C B B C B B B B B C B C B B B B C B B B C B B B B B B B 00 + B 50 + ---+---------+---------+---------+---------+---------+---------+---------+---------+---------+--- 5 0 5 30 35 40 45 50 55 60 ge NOTE: obs had missing values. Plot of TIME*ge. Legend: = obs, B = obs, etc. TIME 400 + B 350 + B B B B C B B B B CB B B B C B B B C BB B B B B 300 + C CB B CB E C B B B CB C B CB BB EC B BD B B B C B B BC D B B CB BC CB EB B B B CC B BC BB B B B BC BC BC FB D C B B E B D C B BB E B GC BC BC BC BB DB B D C C C D B C C FE BE B E EC EC CC ID B D C B B B D C BC D CC DC BC CB B HG C HE G CF D B B D 50 + D BB BE BB B C E CE D D B FJ BE B E CE B G CE BB HD BG GE BF GC BD EF CE D B BB BE B D FD CD CK BC FC CF EI EB LH JC CB D DB B E BE C BD ED BD CE D DI DD G FB CD DE JF EC B BB B C BB BC DE BB CF EC CH GC C FE E F EB D B B B BB B C DC EB B FC BC BE DC DB CB DC B EB C B B B C C BD BC D CG C CB DE EF GD C D BC 00 + B C B BC BD BC CB C C CB BB CE D D B D B C B B E B C CE BB BB BD C B B B B C BB B B CB BB BC B BD E D B B BB B B B B B 50 + ---+--------------+--------------+--------------+--------------+--------------+--------------+----- 0 0 30 40 50 60 70 ge NOTE: 3 obs had missing values.
Effect on R of fitting means and calculation of lack of fit (LOF) Page 3 EXST705: Marathon Footrace from Pennsylvania 00 Quadratic model fitted to raw data - separate by gender The GLM Procedure Read Used 753 75 Model Error Corrected Total 748 750 Squares Meann Square 657.907 3068.954 49863.844 003.508 55988.75 5.9 R-Square 0.0397 Coeff Var 6.47384 Root MSE 44.76056 TIME Mean 7.7068 ge ge* *ge Type I SS Meann Square 6569.88 6569.88 34688.0863 34688.0863 3.6 7.3 0.0003 ge ge* *ge Type III SS Meann Square 5598.4747 5598.4747 34688.0863 34688.0863.78 7.3 0.0004 Parameter Intercept ge ge* *ge Standardd Estimate 33.97777-4.594366 0.06807 Errorr 0.64444.9696 6 0.063999 9 t Value 6.08-3.57 4.6 Pr > t <. 000 0. 0004 <. 000 Read Used 808 805 Model Error Corrected Total 80 804 Squares Meann Square 7379.96 86895.963 307396.990 705.836 347708.96 50.94 R-Square 0.0535 Coeff Var 6.75563 Root MSE 4.3077 TIME Mean 46.4949 ge ge* *ge Type I SS Meann Square 067.0 067.0 6364.75 6364.75 64.58 37.30 ge ge* *ge Type III SS Meann Square 37836.8687 37836.8687 6364.753 6364.753.8 37.30 Parameter Intercept ge ge* *ge Standardd Estimate 80.70754 -.6374056 0.04037 Errorr 0.9479836 0.56000035 5 0.006889 t Value 5.64-4.7 6. Pr > t <. 000 <. 000 <. 000
Effect on R of fitting means and calculation of lack of fit (LOF) Page 4 EXST705: Marathon Footrace from Pennsylvania 00 Scatter plot of means Plot of timemean*ge. Legend: = obs, B = obs, etc. timemean 340 + 30 + 300 + 80 + 60 + 40 + ---+---------+---------+---------+---------+ +---------+---------+---------+---------+---------+-------- 5 0 5 30 35 40 45 50 55 60 ge NOTE: obs had missing values. Plot of timemean*ge. Legend: = obs, B = obs, etc. timemean 360 + 340 + 30 + 300 + 80 + 60 + 40 + 0 + ---+--------------+--------------+--------------+--------------+--------------+ +--------------+------- 0 0 30 40 50 60 70 NOTE: obs had missing values. ge
Effect on R of fitting means and calculation of lack of fit (LOF) Page 5 EXST705: Marathon Footrace from Pennsylvania 00 Quadratic model fitted on UNweighted means - separate by gender The GLM Procedure Read 46 Used 45 Dependent Variable: timemean Squares Mean Square Model 8548.7835 474.3958 0.08 0.0003 Error 4 788.7839 44.5544 Corrected Total 44 6367.554 R-Square Coeff Var Root MSE timemean Mean 0.347 7.337506 0.59746 80.747 Type I SS Mean Square ge 50.56378 50.56378.8 0.00 ge*ge 3338.774 3338.774 7.87 0.0076 Type III SS Mean Square ge 64.3958 64.3958 5.0 0.09 ge*ge 3338.774 3338.774 7.87 0.0076 Standard Parameter Estimate Error t Value Pr > t Intercept 35.6669 9.08948.5 ge -3.679436.606455 -.6 0.09 ge*ge 0.057346 0.003683.8 0.0076 Read 60 Used 59 Dependent Variable: timemean Squares Mean Square Model 344.5403 706.70 58.99 Error 56 6335.3833 9.709 Corrected Total 58 50747.87036 R-Square Coeff Var Root MSE timemean Mean 0.67808 6.60654 7.0799 57.9699 Type I SS Mean Square ge 6435.95373 6435.95373 90.63 ge*ge 7976.58830 7976.58830 7.34 Type III SS Mean Square ge 303.33856 303.33856 0.98 0.006 ge*ge 7976.58897 7976.58897 7.34 Standard Parameter Estimate Error t Value Pr > t Intercept 67.638495 3.950859 0.8 ge -.006 0.66407060-3.3 0.006 ge*ge 0.0393577 0.0075646 5.3
Effect on R of fitting means and calculation of lack of fit (LOF) Page 6 EXST705: Marathon Footrace from Pennsylvania 00 Quadratic model fitted on weighted means - separate by gender The GLM Procedure Read 46 Used 45 Dependent Variable: timemean Weight: n Squares Mean Square Model 657.9073 3068.9537 7.97 Error 4 7588.0980 704.4785 Corrected Total 44 3846.0053 R-Square Coeff Var Root MSE timemean Mean 0.460 5.948 4.8533 7.7068 Type I SS Mean Square ge 6569.88 6569.88 5.59 0.0003 ge*ge 34688.0863 34688.0863 0.35 Type III SS Mean Square ge 5598.4747 5598.4747 5.0 0.0004 ge*ge 34688.0863 34688.0863 0.35 Standard Parameter Estimate Error t Value Pr > t Intercept 33.97777 9.0388485 7.44 ge -4.594366.099053-3.88 0.0004 ge*ge 0.06807 0.0503 4.5 Read 60 Used 59 Dependent Variable: timemean Weight: n Squares Mean Square Model 7379.96 86895.963 44.70 Error 56 08870.690 944.95 Corrected Total 58 866.68 R-Square Coeff Var Root MSE timemean Mean 0.64839 7.88766 44.097 46.4949 Type I SS Mean Square ge 067.0 067.0 56.67 ge*ge 6364.75 6364.75 3.73 Type III SS Mean Square ge 37836.8687 37836.8687 9.46 ge*ge 6364.753 6364.753 3.73 Standard Parameter Estimate Error t Value Pr > t Intercept 80.70754.6876476 4.0 ge -.6374056 0.59783468-4.4 ge*ge 0.04037 0.0073477 5.7
Effect on R of fitting means and calculation of lack of fit (LOF) Page 7 EXST705: Marathon Footrace from Pennsylvania 00 Quadratic model fitted on weighted means - separate by gender Obs gender ge n timemean std F. 75.075 63.958 F 7 36.55 3.547 3 F 8 5 74.74 9.66 4 F 9 77.09 34.53 5 F 0 4 66.557 34.947 6 F 8 65.066 39.57 7 F 5 70.766 37.979 8 F 3 3 7.543 49.630 9 F 4 7 69.064 36.96 0 F 5 37 6.39 3.977 F 6 33 67.3 4.9 F 7 6 76.793 38.6 3 F 8 6 6.597 5.488 4 F 9 30 60.943 4.763 5 F 30 3 8.060 44.76 6 F 3 3 75.303 47.6 7 F 3 3 6.506 5.47 8 F 33 9 64.058 56.88 9 F 34 5 63.60 4.738 0 F 35 58.0 5.349 F 36 66.474 4.70 F 37 9 65.04 45.74 3 F 38 30 76.45 60.488 4 F 39 4 86.90 47.466 5 F 40 7 66.6 40.08 6 F 4 64.63 47.833 7 F 4 75.758 53.70 8 F 43 7 63.006 49.536 9 F 44 5 59.344 8.0 30 F 45 7 7.878 30.96 3 F 46 90.78 35.86 3 F 47 5 85.339 43.975 33 F 48 89.60 37.39 34 F 49 4 7.380 46.68 35 F 50 7 80.397 5.86 36 F 5 5 95.96 73.94 37 F 5 5 38.588 39.56 38 F 53 354.740 34.450 39 F 54 3 335.793 9.65 40 F 55 4 30.038 6.7 4 F 56 37.050. 4 F 57 3 94.500 36.35 43 F 58 5 339.088 53.048 44 F 59 98.050. 45 F 60 6.950. 46 F 6 38.35 37.9 47 M. 3 55.943 49.579 48 M 05.470. 49 M 8.800. 50 M 6 4 78.938 38.000 5 M 7 05.950 5.456 5 M 8 6 5.348 8.74 53 M 9 7.44 3.5 54 M 0 8 58.83 53.68 55 M 0 7.08 36.350 56 M 38 45.6 4.808 57 M 3 3 4.53 47.606 58 M 4 37 43.947 4.77 59 M 5 7 3.040 38.93 60 M 6 47 39.536 36.343 6 M 7 46 33.89 43.783 6 M 8 58 48.744 45.73 63 M 9 43 43.9 50.860 64 M 30 4 45.474 48.675 65 M 3 65 45.98 47.39 66 M 3 66 35.893 37.748 67 M 33 46 44.307 49.855 68 M 34 58 33.46 4.370 69 M 35 59 4.669 4.640 70 M 36 55 47.3 4.04 7 M 37 45 46.36 39.79 7 M 38 64 38.338 36.309 73 M 39 56 4.350 39.900 74 M 40 69 44.50 37.099 75 M 4 56 9.503 38.09 76 M 4 6 47.5 40.87 77 M 43 69 35.805 34.46 78 M 44 54 4.67 40.908 79 M 45 53 44.8 30.586 80 M 46 36 5.49 39.0 8 M 47 60 58.88 39.68 8 M 48 47 43.054 33.63 83 M 49 54 55.475 4.40 84 M 50 56 55.607 38.493 85 M 5 4 5.65 37.906 86 M 5 3 58.87 45.87 87 M 53 9 6.46 35.076 88 M 54 68.478 50. 89 M 55 9 54.3 45.57 90 M 56 3 46.088 37.578 9 M 57 63.64 38.488 9 M 58 6 83.555 60.69 93 M 59 7.836 47.3 94 M 60 67.85 34.83 95 M 6 0 64.94 4.46 96 M 6 6 34.58 5.75 97 M 63 7 98.454 53.40 98 M 64 3 3.40 67.903 99 M 65 60.55 55.73 00 M 66 4 88.5 7.9 0 M 67 4 88.083 6.504 0 M 69 33.95.549 03 M 70 3 8.43 40.94 04 M 7 98.400. 05 M 73 367.500. 06 M 77 33.670. Comparison of fits to Raw data and fits to means (weighted and unweighted) Females Males Statistic Raw data Means Weighted means Raw data Means Weighted means b 0 33.97777 35.6669 33.97777 80.70754 67.638495 80.70754 b 4.594366 3.679436 4.594366.6374056.006.6374056 b 0.06807 0.057346 0.06807 0.04037 0.0393577 0.04037 R 0.0397 0.347 0.46 0.0535 0.67808 0.64839 MSE 003.508 44.5544 704.4785 705.836 9.709 944.95
Effect on R of fitting means and calculation of lack of fit (LOF) EXST705: Marathon Footrace from Pennsylvania 00 nalysis of Variance of GE - separate by gender Page 8 The GLM Procedure Class Level Informationn Class Levels Values ge 45 7 8 9 0 3 4 5 6 77 8 9 30 3 3 33 40 4 4 43 44 45 46 47 48 49 500 5 5 53 54 55 56 Read 753 Used 75 34 35 36 57 58 59 37 38 39 60 6 Model Error Corrected Total 44 706 750 Squares Meann Square 3846.005 309.7 47035.746 0.97 55988.75.49 0.07 R-Square 0.08564 Coeff Var 6.5468 Root MSE 44.95884 TIME Mean 7.7068 ge 44 Type I SS Meann Square 3846.0053 309.74.49 0.07 ge 44 Type III SS Meann Square 3846.0053 309.74.49 0.07 Class Level Informationn Class Levels Values ge 59 34 54 6 7 8 9 0 3 44 5 6 7 8 9 30 35 36 37 38 39 40 4 4 43 444 45 46 47 48 49 50 55 56 57 58 59 60 6 6 63 644 65 66 67 69 70 7 Read 808 Used 805 3 3 33 5 5 53 73 77 Model Error Corrected Total 58 746 804 Squares Meann Square 866.68 4873.493 965046.98 698.94 347708.96.87 R-Square 0.087034 Coeff Var 6.7805 Root MSE 4.095 TIME Mean 46.4949 ge 58 Type I SS Meann Square 866.68 4873.4934.87 ge 58 Type III SS Meann Square 866.68 4873.4934.87
Effect on R of fitting means and calculation of lack of fit (LOF) Page 9 Test of Lack of Fit a measure of adequacy of the model Regression on means for Females Squares Mean Square Model 657.9073 3068.9537 7.97 Error 4 7588.098 704.4785 Corrected Total 44 3846.0053 R = 0.460 Regression on raw data for Females Squares Mean Square Model 657.907 3068.954 5.9 Error 748 49863.844 003.508 Corrected Total 750 55988.75 R = 0.0397 nalysis of variance on GE using raw data for Females Squares Mean Square Model 44 3846.005 309.7.49 0.07 Error 706 47035.746 0.97 Corrected Total 750 55988.75 R = 0.08564 Test of lack of fit for Females d.f. SSE MSE F P>F Reduced model 748 49863.844 Full model 706 47035.746 Difference 4 7588.098 704.47854 0.8435977 0.7493 Full model 706 47035.746 0.9709 Regression on means for Males Squares Mean Square Model 7379.96 86895.963 44.7 Error 56 08870.69 944.95 Corrected Total 58 866.68 R = 0.64839 Regression on raw data for Males Squares Mean Square Model 7379.96 86895.963 50.94 Error 80 307396.99 705.836 Corrected Total 804 347708.96 R = 0.0535 nalysis of variance on GE using raw data for Males Squares Mean Square Model 58 866.68 4873.493.87 Error 746 965046.98 698.94 Corrected Total 804 347708.96 R = 0.087034 Test of lack of ft for Males d.f. SSE MSE F P>F Reduced model 80 307396.99 Full model 746 965046.98 Difference 56 08870.69 944.95.4486069 0.80 Full model 746 965046.98 698.93756
Effect on R of fitting means and calculation of lack of fit (LOF) Page 0 Model to test LOF for Quadratic model fitted to raw data - separate by gender The GLM Procedure It is possible to test LOF in a single model by including the two sets of the independent variable and putting one in the class statement. The regression is fitted first with the quantitative X variable (not in the class statement) and the categorical version (the one in the class statement) is Class Level Information Class Levels Values fitted last to account for any remaining variation which is LOF. age_again 45 7 8 9 0 3 4 5 6 7 8 9 30 3 3 33 34 35 36 37 38 39 40 4 4 43 44 45 46 47 48 49 50 5 5 53 54 55 56 57 58 59 60 6 Read 753 Used 75 The usual Deviations from regression are = ( Y ˆ ij Yi.) Squares Mean Square Model 44 3846.009 309.7.49 0.07 Error 706 47035.74 0.97 PE = ( Yij Yi.) Corrected Total 750 55988.75 CSS = ( Y Y ) R-Square Coeff Var Root MSE TIME Mean 0.08564 6.5468 44.95884 7.7068 ij.. Type I SS Mean Square ge 6569.88 6569.88 3.4 0.0003 ge*ge 34688.0863 34688.0863 7.6 age_again 4 7588.038 704.47860 0.84 0.7493 LOF = ( Y Yˆ ) i. i. Type III SS Mean Square ge 0 0.00000... ge*ge 0 0.00000... age_again 4 7588.033 704.47860 0.84 0.7493 Class Level Information Class Levels Values age_again 59 6 7 8 9 0 3 4 5 6 7 8 9 30 3 3 33 34 35 36 37 38 39 40 4 4 43 44 45 46 47 48 49 50 5 5 53 54 55 56 57 58 59 60 6 6 63 64 65 66 67 69 70 7 73 77 Read 808 Used 805 Squares Mean Square Model 58 866.60 4873.493.87 Error 746 965046.96 698.94 Corrected Total 804 347708.96 R-Square Coeff Var Root MSE TIME Mean 0.087034 6.7805 4.095 46.4949 Type I SS Mean Square ge 067.0 067.0 64.87 ge*ge 6364.75 6364.75 37.47 age_again 56 08870.6938 944.95.4 0.80 Type III SS Mean Square ge 0 0.0000... ge*ge 0 0.0000... age_again 56 08870.6938 944.95.4 0.80 Compare the tests of age_again to the tests of Lack of Fit on the preceding page.