Math 1011 Homework Set 2 Due February 12, 2014 1. Suppose we have two lists: (i) 1, 3, 5, 7, 9, 11; and (ii) 1001, 1003, 1005, 1007, 1009, 1011. (a) Find the average and standard deviation for each of the lists. (b) From your result in (a), can you find any property of the average and the standard deviation? (Hint: what is the relation between (i) and (ii)?) (a) For list (i), the average is (1 + 3 + 5 + 7 + 9 + 11)/6 = 6. The list of deviations will be -5, -3, -1, 1, 3, 5. Then the SD is ( 5)2 + ( 3) 2 + ( 1) 2 + (1) 2 + (3) 2 + (5) 2 35 = 6 3 = 3.42. For list (ii), the average is (1001 + 1003 + 1005 + 1007 + 1009 + 1011)/6 = 1006. The list of deviations will be -5, -3, -1, 1, 3, 5. Then the SD is again ( 5)2 + ( 3) 2 + ( 1) 2 + (1) 2 + (3) 2 + (5) 2 6 = 35 3 = 3.42. (b) This shows the property of change of scale: by adding a constant to each entry of the list, the average will be added by the same constant, but the SD remains the same. 2. Suppose we have two lists: (i) 1, 2, 3, 4, 5, 6, 7; and (ii) 3, 6, 9, 12, 15, 18, 21. (a) Find the average and standard deviation for each of the lists. (b) From your result in (a), can you find any property of the average and the standard deviation? (Hint: what is the relation between (i) and (ii)?) (a) For list (i), the average is (1 + 2 + 3 + 4 + 5 + 6 + 7)/7 = 4. The list of deviations will be -3, -2, -1, 0, 1, 2, 3. Then the SD is ( 3)2 + ( 2) 2 + ( 1) 2 + (0) 2 + (1) 2 + (2) 2 + (3) 2 For list (ii), the average is 7 (3 + 6 + 9 + 12 + 15 + 18 + 21)/7 = 12. = 2. 1
The list of deviations will be -9, -6, -3, 0, 3, 6, 9. Then the SD is ( 9)2 + ( 6) 2 + ( 3) 2 + (0) 2 + (3) 2 + (6) 2 + (9) 2 7 (b) This shows the property of change of scale: by multiplying a positive constant to each entry of the list, the average will be multiplied by the same constant, and so is the SD. = 6. 3. (a) Find the average and SD of the list: 41, 48, 50, 50, 54, 57. (b) Which numbers on the list are within 0.5 SDs of average? within 1.5 SDs of average? (a) The average is (41 + 48 + 50 + 50 + 54 + 57)/6 = 50. So the list of deviations is -9, -2, 0, 0, 4, 7. The SD is ( 9)2 + ( 2) 2 + (0) 2 + (0) 2 + (4) 2 + (7) 2 6 = 5. (b) 48, 50, 50 are within 0.5 SDs of average, i.e. in the range from 47.5 to 52.5. 48, 50, 50, 54, 57 are within 1.5 SDs of average, i.e. in range from 42.5 to 57.5. 4. A study on college students found that the men had an average weight of about 66 kg and an SD of about 9 kg. The women had an average weight of about 55 kg and an SD of 9 kg. (a) Find the averages and SDs, in pounds for both men and women (1 kg = 2.2 lb). (b) Just roughly, what percentage of the men weighed between 57 kg and 75 kg? (c) If you took the men and women together, would the SD of their weights be smaller than 9 kg, just about 9 kg, or bigger than 9 kg? Why? (Hint: recall that the standard deviation indicates how the data spread around the average.) (a) Average weight of men = 66 2.2 145 pounds, SD = 9 2.2 20 pounds. Average weight of women = 55 2.2 121 pounds, SD 20 pounds. (b) 68%: the range is average ± 1 SD. (c) Bigger than 9 kg : if you take the men and the women together, the spread in weights goes up. 2
5. An investigator has a computer file showing family incomes for 1,000 subjects in a certain study. These range from $5,800 a year to $98,600 a year. By accident, the highest income in the file gets changed to $986,000. (a) Does this affect the average? If so, by how much? (b) Does this affect the median? If so, by how much? (Hint: think about whether the highest income will affect the percentage to the right of the median or not.) (a) Yes: the average goes up by ($986, 000 $98, 600)/1000 = $887.40. (b) No: one advantage of the median is that it is not thrown off by outliers. 6. Many observers think there is a permanent underclass in American society most of those in poverty typically remain poor from year to year. Over the period 1970 2000, the percentage of the American population in poverty each year has been remarkably stable, at 12% or so. Income figures for each year were taken from the March Current Population Survey of that year; the cutoff for poverty was based on official government definitions. To what extent do these data support the theory of the permanent underclass? Discuss briefly. (Hint: The study draws conclusions about the effects of year, this is similar to the effects of age in the example talked in class.) The data are cross sectional not longitudinal, so the data only provide weak support for the theory. (Longitudinal data show that most spells of poverty are short.) 7. The following list of test scores has an average of 50 and an SD of 10: 39 41 47 58 65 37 37 49 56 59 62 36 48 52 64 29 44 47 49 52 53 54 72 50 50 (a) Use the normal approximation to estimate the number of scores within 1.25 SDs of the average. (b) How many scores really were within 1.25 SDs of the average? (a) In standard units, it is between -1.25 and 1.25. So from the normal table, under the normal curve, the area is about 79%. The total number of the entries of the list is 25. So the approximation is about 25 79% 20. (b) Within 1.25 SDs means between 37.5 and 62.5, by counting the numbers, we see there are 18 scores. 3
8. The table below shows the distribution of adults by the last digit of their age, as reported in the Census of 1880 and the Census of 1970. You might expect each of the ten possible digits to turn up for 10% of the people, but this is not the case. For example, in 1880, 16.8% of all persons reported an age ending in 0 like 30 or 40 or 50. In 1970, this percentage was only 10.6%. (a) Draw histograms for these two distributions. (Note: you may use the convention for selecting the class intervals when the variable is discrete.) (b) In 1880, there was a strong preference for the digits 0 and 5. How can this be explained? (Hint: in the old days, do people know their ages accurately?) (c) In 1970, the preference was much weaker. How can this be explained? Digit 1880 1970 0 16.8 10.6 1 6.7 9.9 2 9.4 10.0 3 8.6 9.6 4 8.8 9.8 5 13.4 10.0 6 9.4 9.9 7 8.5 10.2 8 10.2 10.0 9 8.2 10.1 Source: United States Census. (a) See the graphs at the end. (2 points: each histogram worth 1 point.) (b) In 1880, people did not know their ages at all accurately, and rounded off. (c) In 1970, people knew when they were born. 9. In a survey carried out at the University of California, Berkeley, a sample of students were interviewed and asked what their grade-point average was. A sketched histogram of the results is shown on the next page. (GPA ranges from 0 to 4, and 2 is a bare pass.) (a) True or false: more students reported a GPA in the range 2.0 to 2.1 than in the range 1.5 to 1.6. (b) True or false: more students reported a GPA in the range 2.0 to 2.1 than in the range 2.5 to 2.6. (c) What accounts for the spike at 2? (Hint: recall the example educational level we discussed in class, think of the property of the peaks.) (a) True. (b) True. (c) People with failing GPAs may round them up ; and 2 is such an important unmber for GPAs that people with GPAs just above 2 may round them down (1 point). 4
10. The histogram on the next page represents the birth weight of babies in some hospital. Suppose we know 30 babies weighed over 4.5 kg. And babies weighing under 2 kg are taken to the Special Care Baby Unit. (Notice that the vertical scale is not percentage per kg, that is, the vertical scale is not the density scale.) (a) What is total number of babies being weighted? (Hint: In this case, the area of the blocks represent the frequency of the corresponding class intervals, i.e. the number of babies with the weight in the interval. To compute the total number of the babies, find out the total area of the blocks. Note that we have already known one of the area of the blocks. By comparing the scale of the blocks, you may figure out the area of the rest of the blocks.) (b) How many babies are taken to the Special Care Baby Unit? (a) By counting the squares with edge length 0.5, there are 6 such squares in the block for babies weighted over 4.5 kg, while there are 30 such squares in total. So the percentage of babies weighted over 4.5 kg is 6/30 = 20%. Since we know that there are 30 babies weighted over 4.5 kg, so the total number of babies is 30/20% = 150. (b) Again by counting the squares with edge length 0.5, there are 4 such squares in the block for babies weighted under 2 kg. So the number of babies weighted under 2 kg (thus taken to the Special Care Baby Unit) is 150 4/30 = 20. The total points are 25. 5
The histograms for problem 8 part (a): You can either draw the two histograms together, or draw them separately. You don t need to point out the units on the vertical scale. You don t need to indicate the heights either. All you need to do for the vertical axis scale is to sketch the heights just like the graph above.