Running head: EVALUATING THE RESULTS OF A CAR CRASH STUDY USING SAS 1 Evaluating the results of a car crash study using Statistical Analysis System Kennesaw State University
2 Abstract Part 1. The study from which the dataset came from was collected by the National Transportation Safety Administration. The purpose of the study was to determine the extent of the injuries on crash dummies after they had been placed in a driver and front passenger seat of a car that was crashed into a wall at a speed of 35 mph. Cars of different sizes and with different safety features were used in this study and the results were used to evaluate the seriousness of head injuries, chest deceleration, and the right and left femur loads the dummies sustained after the crash. The data set can be accessed by selecting the Crash Datafile in the lower right corner at the following website: http://lib.stat.cmu.edu/dasl/datafiles/crash.html The reasons I selected this dataset are twofold. First, I live in an area where frequent car travel is an absolute necessity; second, because I was interested in finding out how small cars (one of which I drive) compared to the bigger-size cars when involved in a crash under similar circumstances. Also, I wanted to learn about the differences that safety devices (seatbelts, airbags, etc.) can make on the extent of injury a person suffers in such crashes. The study contains the following eleven variables: - CarID & Year (make, model and a year of a car) qualitative variable - HeadIC (head injury criterion) quantitative variable - ChestDecel (chest deceleration) quantitative variable - LLeg (left femur load) quantitative variable - RLeg (right femur load) quantitative variable - DorP (dummy in a driver or passenger seat) qualitative variable
3 - Protection (seat belts, air bags, etc.) qualitative variable - Doors (number of doors on a car) quantitative variable - Year (the year the ca was made) quantitative variable - Wt (weight in pounds) quantitative variable - Size (size) qualitative variable (Please note that in the case of HeadIC, ChestDecel, LLeg, and RLeg variables, lower numbers indicate better results.) Part 2. Variable Label N Mean Median Std Dev Minimum Maximum HeadIC ChestDecel LLeg RLeg Doors Year Wt Head Injury Criterion Chest Deceleration Left Femur Load Right Femur Load Doors Year Weight 903.07 48.37 1054.01 740.92 3.17 88.91 2930.34 808.00 47.00 1008.50 662.00 3.17 89.00 2855.00 456.95 9.43 537.88 420.27 0.89 1.41 627.13 157.00 31.00 101.00 89.00 2.00 87.00 1590.00 3665.00 97.00 3347.00 2856.00 4.00 91.00 5619.00 (N = number of observations in the study) The table above provides measurements of central tendency and dispersion for the 7 quantitative variables. One of the easiest approaches when deciding whether the mean or the median is the best representation of central tendency is to construct a histogram/box plot from which we can see what the distribution of the individual variables looks like. If the distribution is normal, we would report the mean as the best representation of central tendency. However, if the distribution is skewed and if there are outliers, then the best representation of central tendency would be the median.
4 Based on the information we obtain from the histograms/box plots (which can be seen in Part 3 & 4) we can conclude that the best representation of central tendency for the individual variables is as follows: The distributions of the first four quantitative variables (Head Injury Criterion, Chest Deceleration, Left Femur Load, and Right Femur Load) are all skewed to the right and from the box plots we can also see that there are several outliers present in the distribution of each of the variables. Hence, the best representation of central tendency for these four quantitative variables is the median. The mean and the median of the variable Doors are the same, so reporting either one as the best representation of central tendency will give the same result. The mean and the median of the variable Year are very similar, which often indicates that the distribution might be normal. The histogram of this quantitative variable confirms that the distribution is, in fact, normal and consequently we would report the mean as a better representation of central tendency. From the distribution of the Weight variable we can observe that the distribution is slightly skewed to the right and the box plot also shows a presence of three outliers; therefore the median better represents the central tendency. The qualitative variable Car ID & Year does not have a mode. The reason is that every car was used in the study only once (there were some cars whose names and models were the same but the year they were made was different) as a result every car appears in the study with the same frequency; hence, this variable has no mode.
5 DorP Driver or Passenger Seat Frequency Percent Cumulative Frequency Cumulative Percent Driver 176 50.00% 176 50.00% Passenger 176 50.00% 100.00% The distribution of the qualitative variable DorP (dummy placed in a Driver or Passenger seat) is bimodal 50% of the time the dummy was placed into the driver seat and 50% in the passenger seat. Protection Protection Frequency Percent Cumulative Frequency Cumulative Percent Motorized belts 44 12.50% 44 12.50% Driver airbag 60 17.05% 104 29.55% Driver & passenger airbags 4 1.14% 108 30.68% Manual belts 196 55.68% 304 86.36% Passive belts 48 13.64% 100.00% From the table above we can see that the mode of the qualitative variable Protection is Manual belts. The reason is that this safety device was used more frequently than any other safety devices, specifically the manual belts were used 55.68% of the time.
6 Size Size Frequency Percent Cumulative Frequency Cumulative Percent Compact 86 24.43% 86 24.43% Heavy 16 4.55% 102 28.98% Light 74 21.02% 176 50.00% Medium 62 17.61% 238 67.61% Very Small Car 14 3.98% 252 71.59% Minivan 34 9.66% 286 81.25% Pickup truck 36 10.23% 322 91.48% Van 30 8.52% 100.00% The Size table indicates that the mode of this qualitative variable is Compact there were 24.43% of cars in the conducted study which size was Compact. Part 3 & 4. Histogram and a Box plot of the variable Head Injury Criterion The histogram shows that the distribution of the variable Head Injury Criterion is skewed
7 to the right; the box plot also indicates that there are several outliers present in the distribution of this variable. Histogram and a Box plot of the variable Chest deceleration From the histogram of the variable Chest Deceleration we can see that the distribution is positively skewed and the box plot shows that there are also several outliers. Histogram and a Box plot of the variable Left Femur Load
8 The distribution of the variable Left Femur Load is also skewed to the right with numerous outliers that can be seen from the box plot. Histogram and a Box plot of the Right Femur Load variable Similarly, as was the case in the distribution of previous variables, the variable Right Femur Load is also skewed to the right and the box plot indicates a presence of several outliers. Histogram and a Box plot of the Doors variable
9 The box plot graph shows that the mean and the median of the variable Doors are the same and that the distribution is normal. Histogram and a Box plot of the Year variable Both of the graphs of the variable Year illustrate that the distribution is normal. Histogram and a Box plot of the Weight variable
10 The histogram demonstrates that the distribution of the variable Weight is skewed to the right and from the box plot we can observe that there are also three outliers. Bar Chart and Pie Chart of the variable Dummy placed in Driver or Passenger seat The bar chart and the pie chart communicate the same message with one exception (in a bar chart a frequency count is used to report the results, whereas the information provided in the pie chart is described in percentages). Often, the pie charts communicate the message more clearly when compared to the bar charts because the information is presented in percentages however, in the case of the variable Dummy placed in Driver or Passenger seat where the dummy was placed half of the time in the driver s seat and the other half in the passenger s seat, the information is communicated with the same clarity from either graph.
11 Bar Chart and Pie Chart of the variable Protection The bar chart of the variable Protection indicates that out of all cars in the study, the manual belts were used in 196 cars. This information becomes more explicable when reported in percentages from the pie chart we learn that the manual belts were used 55.68% of the time which is more often than the rest of the safety devices combined. The last categorical variable is Size ; however, since this variable includes eight different types of cars used in the study, a bar chart and a pie chart would not provide clear information about the distribution of this categorical variable. This information is better represented by the frequency table which can be seen in the last table in part 2.
12 The scatter plot of the variables Head Injury criterion and Chest deceleration The scatter plot illustrates that there is a strong relationship between the Head Injury Criterion variable and the Chest deceleration variable that is not a surprising result because as one would expect, the more serious the impact of a crash on the chest of a person, the more serious the head injury. In order to create the following contingency table, the quantitative variables Chest deceleration and Weight were converted into qualitative variables as follows: - Chest deceleration: less than 50 = Not Life Threatening; between 51 and 70 = Might Be Life Threatening; and more than 70 = Life Threatening; -Weight: less than 2400 pounds = Small; between 2401 and 3600 pounds = Medium; and more than 3600 pounds = Heavy.
13 Chest deceleration Table of Chest deceleration by Weight Weight Frequency Percent Row Percent Column Percent Small Medium Heavy Total Not Life Threatening Might Be Life Threatening 48 13.64% 22.22% 66.67% 23 6.53% 18.40% 31.94% Life Threatening 1 0.28% 9.09% 1.39% Total 72 20.45% 144 40.91% 66.67% 62.07% 83 23.58% 66.40% 35.78% 5 1.42% 45.45% 2.16% 232 65.91% 24 6.82% 11.11% 50.00% 19 5.40% 15.20% 39.58% 5 1.42% 45.45% 10.42% 48 13.64% 216 61.36% 125 35.51% 11 3.13% 100.00% From the contingency table above we learn that out of all Not Life Threatening chest injuries caused by car accidents, 66.67% happened in Medium weight cars, 22.22% in Small weight cars, and 11.11% in Heavy weight cars. Also, out of all cars, 40.91% of chest injuries were Not Life Threatening and occurred in Medium weight cars. This might suggest that the Medium weight cars are the safest. This table also shows that out of all chest injuries, 61.36% were Not Life Threatening, 35.51% Might be Life Threatening and only 3.13% were Life Threatening chest injuries.
14 Part 5. The following four tables provide 95% and 99% confidence intervals for the quantitative variables: Head Injury Criterion (HeadIC) and Wt (Weight). Analysis Variable : HeadIC Lower 95% CL for Mean Upper 95% CL for Mean 725.21 1127.12 Based on a representative sample of 20 cars, we are 95% confident that the Head Injury Criterion among all cars is estimated to be between 725.21 and 1127.12. From part two we know that the true population mean of the Head Injury Criterion variable is 903.07 which falls inside the 95% confidence interval. Analysis Variable : HeadIC Lower 99% CL for Mean Upper 99% CL for Mean 651.48 1200.85 Based on a representative sample of 20 cars, we are 99% confident that the Head Injury Criterion among all cars is estimated to be between 651.48 and 1200.85. Since our confidence level increased, the confidence interval got larger. The true population mean of 903.07 is included in the 99% confidence interval. Analysis Variable : Wt Lower 95% CL for Mean Upper 95% CL for Mean 2623.33 3096.97
15 Based on a representative sample of 20 cars, we are 95% confident that the weight among all cars is estimated to be between 2623.33 and 3096.97. As stated in part number 2, the true population mean of the Weight variable is 2930.34 and we can see that the 95% confidence interval contains the true population mean. Analysis Variable : Wt Lower 99% CL for Mean Upper 99% CL for Mean 2536.44 3183.86 Based on a representative sample of 20 cars, we are 99% confident that the weight among all cars is estimated to be between 2536.44 and 3183.86. It is clear that the true population mean of 2930.34 falls inside the 99% confidence interval. Part 6. First, this section explains a relationship between Head Injury Criterion and Protection variables a contingency table was used to evaluate which safety device was used most often and which of the safety devices appears to be the most efficient in protecting a person involved in a car accident. Second, a relationship between Head Injury Criterion and Weight variables was evaluated in this case three analytical tools were used: a contingency table, a bar chart, and a 100% stacked bar chart. In the Head Injury Criterion and Protection contingency table the quantitative variable Head Injury Criterion was converted into qualitative variable as follows: - less than 800 = Mild; between 801 and 1500 = Moderate; and more than 1500 = Severe.
16 Head Injury Criterion Frequency Percent Row Percent Column Percent Table of Head Injury Criterion and Protection Motorized belts Mild 32 9.09% 18.60% 72.73% Moderate 10 2.84% 6.90% 22.73% Severe 2 0.57% 5.71% 4.55% Total 44 12.50% Driver airbag 44 12.50% 25.58% 73.33% 14 3.98% 9.66% 23.33% 2 0.57% 5.71% 3.33% 60 17.05% Protection Driver & Passenger airbags 2 0.57% 1.16% 50.00% 2 0.57% 1.38% 50.00% 0 0.00% 0.00% 0.00% 4 1.14% Manual belts 70 19.89% 40.70% 35.71% 97 27.56% 66.90% 49.49% 29 8.24% 82.86% 14.80% 196 55.68% Passive belts 24 6.82% 13.95% 50.00% 22 6.25% 15.17% 45.83% 2 0.57% 5.71% 4.17% 48 13.64% Total 172 48.86% 145 41.19% 35 9.94% 100.00% The contingency table of Head Injury criterion and Protection shows that out of all safety devices, the Manual belts were used in most cars, specifically in 55.68% of cars in this study. Out of all cars which used Manual belts as a safety device, 35.71% of the head injuries caused by a car crash were Mild, 49.49% were Moderate, and 14.80% were Severe. The second most frequently used safety device was Driver airbag, which was used in 17.05% of all cars. The relatively low percentage of Driver airbag devices is due to the fact that this study was conducted on cars that were made about 20 years ago. However, when looking at the Driver airbag safety device we can see that 73.33% of head injuries were Mild, 23.33% were Moderate and only 3.33% were Severe. It is not surprising that a majority (if not all) cars made today have airbags installed as a part of very important safety features.
17 Head Injury Criterion Table of Head Injury Criterion and Weight Weight Frequency Percent Row Percent Column Percent Small Medium Heavy Total Mild 35 9.94% 20.35% 48.61% Moderate 31 8.81% 21.38% 43.06% Severe 6 1.70% 17.14% 8.33% Total 72 20.45% 122 34.66% 70.93% 52.59% 92 26.14% 63.45% 39.66% 18 5.11% 51.43% 7.76% 232 65.91% 15 4.26% 8.72% 31.25% 22 6.25% 15.17% 45.83% 11 3.13% 31.43% 22.92% 48 13.64% 172 48.86% 145 41.19% 35 9.94% 100.00% The Head Injury Criterion and Weight contingency table indicates that a majority of cars used in this study were Medium weight cars (65.91%); there were 20.45% of Small weight cars and 13.64% of Heavy weight cars. Out of all Medium weight cars, 52.59% of head injuries were Mild, 39.66% were Moderate and 7.76% were Severe. Out of all Heavy weight cars, 31.25% of head injuries were Mild, 45.83% were Moderate and 22.92% were Severe. Also, out of all Mild injuries, 70.93% happened in the Medium weight cars, 20.35% in Small cars and only 8.72% in Heavy weight cars. Once again, it looks like the Medium weight cars are the safest.
18 Bar chart of Weight by Head Injury criterion Appendix 1: Proc Print data=work.crash; (Mild = less than 800; Moderate = between 801 and 1500; Severe = more than 1500) (Small = less than 2400 pounds; Medium = between 2401 and 3600 pounds; Heavy = more than 3600 pounds) The bar chart above provides a visual demonstration of the information described previously in the Head Injury Criterion and Weight contingency table. The bar chart provides a frequency count of Mild, Moderate, and Severe head injuries that fall into one of three car weight categories: Small, Medium, and Heavy. The graph suggests that the most of Mild head injuries occur in the Medium weight cars. This information becomes more obvious when looking at the 100% stacked bar chart below.
19 100% Stacked Bar Chart of Weight by Head Injury Criterion (Mild = less than 800; Moderate = between 801 and 1500; Severe = more than 1500) (Small = less than 2400 pounds; Medium = between 2401 and 3600 pounds; Heavy = more than 3600 pounds) From the 100% stacked bar chart we can observe that the majority of Mild head injuries occurred in the Medium weight cars, followed by the Small weight cars, with Heavy weight cars placing last. On the other hand, a majority of the Severe head injuries happened in the Heavy weight cars.
20 Based on the results from this study we can conclude that it appears that the Medium weight cars (cars that weight between 2401 and 3600 pounds) show the best results when it comes to the car safety these cars provided evidence that the extend of injuries when involved in a car crash were less serious than was the case among the small and heavy weight cars.
21 Appendix 1 Proc print data=crash; Proc Contents data=work.crash; Alphabetic List of Variables and Attributes # Variable Type Len Form at Informat Label 3 ChestDecel Num 8 ChestDecel 8 Doors Num 8 Doors 6 DorP Char 9 $9. $9. DorP 2 HeadIC Num 8 HeadIC 4 LLeg Num 8 LLeg 7 Protection Char 15 $15. $15. Protection 5 RLeg Num 8 RLeg 11 Size Char 4 $4. $4. Size 10 Wt Num 8 Wt 9 Year Num 8 Year 1 caridandyear Char 28 $28. $28. caridandyear Proc format; value $Sizecode lt = "Light" med = "Medium" comp = "Compact" hev = "Heavy" van = "Van" pu = "Pickup truck" mpv = "Minivan" mini = "Very small car"; value $Protectioncode dairbag = "driver airbag" dpairbags = "driver & passenger airbags"; Data work.crash1; set work.crash; Format Size $Sizecode.
22 Protection $Protectioncode.; Proc Print data=work.crash1; Proc stdize data=work.crash1 reponly out=crash2; Proc means data=crash2 n mean median std min max maxdec=2; Variable Label N Mean Median Std Dev Minimum Maximum HeadIC ChestDecel LLeg RLeg Doors Year Wt Head Injury Criterion Chest Deceleration Left Femur Load Right Femur Load Doors Year Weight 903.07 48.37 1054.01 740.92 3.17 88.91 2930.34 808.00 47.00 1008.50 662.00 3.17 89.00 2855.00 456.95 9.43 537.88 420.27 0.89 1.41 627.13 157.00 31.00 101.00 89.00 2.00 87.00 1590.00 3665.00 97.00 3347.00 2856.00 4.00 91.00 5619.00 Proc Freq data=crash2; Tables Protection DorP Size; Protection Protection Frequency Percent Cumulative Frequency Cumulative Percent motorized belts 44 12.50% 44 12.50% d airbag 60 17.05% 104 29.55% d&p airbags 4 1.14% 108 30.68% manual belts 196 55.68% 304 86.36% passive belts 48 13.64% 100.00%
23 DorP DorP Frequency Percent Cumulative Frequency Cumulative Percent Driver 176 50.00% 176 50.00% Passenger 176 50.00% 100.00% Size Size Frequency Percent Cumulative Frequency Cumulative Percent Compact 86 24.43% 86 24.43% Heavy 16 4.55% 102 28.98% Light 74 21.02% 176 50.00% Medium 62 17.61% 238 67.61% Very Small Car 14 3.98% 252 71.59% Minivan 34 9.66% 286 81.25% Pickup truck 36 10.23% 322 91.48% Van 30 8.52% 100.00% data crash3; set crash2; if ChestDecel < 50 then Chestcat = "Not Life Threating"; else if ChestDecel < 70 then Chestcat = "Might Be Life Threatning"; else Chestcat = "Life Threatning"; data crash4; set crash3; if Wt < 2400 then Weightcat = "Small"; else if Wt < 3600 then Weightcat = "Medium"; else Weightcat = "Heavy"; Proc freq data=crash4; tables Chestcat*Weightcat;
24 Proc surveyselect data=crash2 out=crash5 method=srs Sampsize=20 Seed=123; Proc Print data=crash5; Selection Method Simple Random Sampling Input Data Set CRASH2 Random Number Seed 123 Sample Size 20 Selection Probability 0.056818 Sampling Weight 17.6 Output Data Set CRASH5 Proc Means data=crash5 CLM alpha=.05; Var HeadIC; Proc Means data=crash5 CLM alpha=.01; Var HeadIC; Proc Means data=crash5 CLM alpha=.05; Var Wt; Proc Means data=crash5 CLM alpha=.01; Var Wt; data crash6; set crash4; if HeadIC < 800 then Headcat = "Mild"; else if HeadIC < 1500 then Headcat = "Moderate"; else Headcat = "Severe"; Proc freq data=crash6; tables Headcat*Protection;
25 Proc freq data=crash6; tables Headcat*Weightcat;