MS DATA ANALYSIS EXAM INSTRUCTIONS Please Read Carefully This exam consists of one question, with two parts. You are asked to write a report on your analyses and we strongly recommend that you begin writing early. You will have until 5pm to complete the exam, by which time you should have handed the exam back to MariAlice McShane. We suggest that you begin writing your report no later than 11am. Your task is to develop analyses that begin to answer the questions of interest. Use whatever computer facilities and resources you feel you need. You have priority to use all the equipment in the Statistics Department terminal rooms. We expect your work to be no more than a first draft of a report. It may be hand written or done on a computer, but do not spend a lot of time word processing your report. Please write neatly so that we can follow your report. Include and clearly label relevant computer output. You may use whatever notes and books you wish. You may not collaborate or discuss any aspect of this problem with anyone other than Mike Meyer, Joel Greenhouse, Chris Genovese, or Jay Kadane. If you do not have the time to complete both parts of the problem, we would prefer to see a complete answer to the first part rather than incomplete or sketchy answers to both. 1
Background We are providing you with a small subset of data from a large multi-year experiment on the effect of rising carbon dioxide (CO2) levels on agriculture specifically the yield of cotton. The full data set has details of five season-long experiments, carried out over the period 1983-1987, examining the effects of CO2 enrichment on cotton growth. Because conditions of insufficient water and nutrients are commonplace in much of the world s agriculture and in the unmanaged biosphere, these studies employed a design whereby the effects of water and nutrient stress could be examined alongside the effects of elevated CO2 level on plant growth parameters. The initial experiment examined the effects of varying CO2 concentration only. In the following two seasons, the interactive effects of CO2 concentration and water availability were studied. In the final two seasons, the effects of the three-way interaction between CO2 concentration, water availability, and nitrogen fertility were investigated. The data you will analyze is from the 1985 experiment which studied the interaction effects of CO2 and water availability. This experiment did not have a nitrogen fertility component. The experiment was conducted at a site near the Western Cotton Research Laboratory, Phoenix, Arizona. A system of open-top chambers was constructed to provide a stable regime of CO2 enrichment through the course of a growing season. The plants were either planted in an opentop chamber or in an open field. (Hint: Note that there are really 4 levels of the CO2 variable.) The cotton was planted on April 9th (day 99), most emerged by April 16th (day 106), CO2 treatments began on May 2 (day 122), irrigation treatments began on May 24 (day 144). The crop was harvested on Oct 3 (day 276). The particular variables included in our part of the study are below: CO2 CO2COD is the mean daily carbon dioxide concentration (mu-mol/mol) for the sampled plot; is a one-character flag code presenting additional information about the sampled plot: A = chamber at ambient (i.e., nonenriched) concentration, N = no chamber (i.e., open field), and E = chamber containing enriched concentration; REP is the replicate number, with a value of either 1 or 2 (two replicates were included for each CO2 treatment); IRRIG is a one-character code describing the irrigation regimen: 0 = water-stressed ( dry ) treatment and 1 = well-watered ( wet ) treatment; XLTYLD is the actual field-measured lint yield (dry weight basis, kg/ha), which is the yield attainable by hand harvesting; XSDYLD is the actual field-measured seed yield (dry weight basis, kg/ha); XBIOM is the field-measured aboveground dry biomass (mass of the biological material) at maturity (kg/ha); XPLTHT is the plant height (cm); XWSTMH is the dry stem weight (kg/ha); Missing values for integer variables are represented by -9. Missing values for real variables are represented by -9.0, -9.00, etc., according to the format of each variable. Line numbers refer to positions within a given data block, containing data for a given year, site, and treatment regimen. A full list of the variables in the study, and considerable supplementary documentation is available in the file /afs/andrew/hss/stat/data/msdata/95/supplement. We are providing the supplemental material as background information. We do not expect you to, 2
nor do you need to, read and understand all of it. Part I The partial data presented below are available in /afs/andrew/hss/stat/data/msdata/95/part1. co2 co2cod rep irrig xltyld xpltht-171 350 N 1 0 1080 39 350 N 2 0 1190 39 350 N 1 1 1320 38 350 N 2 1 1370 45 350 A 1 0 640 26 350 A 2 0 690 30 350 A 1 1 1620 36 350 A 2 1 1630 40 500 E 1 0 1080 35 500 E 2 0 1120 37 500 E 1 1 2650 42 500 E 2 1 2130 54 650 E 1 0 1510 39 650 E 2 0 1230 42 650 E 1 1 2510 48 650 E 2 1 2540 52 The experimenters are particularly concerned about how the treatments effect the yield of cotton. Your job is to analyze these data and answer that question. The variable names are those listed above plus xpltht-171, a potential concomitant variable, which is the plant height at day 171. You should assume that treatments were randomly applied to plots. Prepare a draft report which describes your findings. You should also include a technical appendix with some details of your analyses. You might also want to list any questions about the data or the experiment that you would like the original researchers to answer. We recommend that your report contain at least the following elements: An executive summary your findings. This should be addressed to someone with little or no training in statistics (a half page or so). A summary of the statistical issues and a description of the analyses performed. This should be something that a professional statistician (such as the examining committee) could understand. (two pages or less). A concise summary of what other analyses you would have done if there were time (less than a page). 3
An appendix with whatever other material, including computer printouts, that you would like us to consider. Please label all plots, tables, and computer output clearly. Part II We are also providing an expanded data set, that includes additional response variables (xsdyld and xbiom) and a covariate measured at two times during the experiment. Please outline the types of analyses you would perform on these data and the questions you would attempt to answer. If you have time you might attempt a preliminary analysis of these data concentrating on the relationship between the various response variables. The data for the second part are below and are also in the file /afs/andrew/hss/stat/data/msdata/95/part2. 4
co2 co2cod rep irrig xltyld xpltht-171 xsdyld xbiom xwstmh-157 xwstmh-220 350 N 1 0 1080 39 1780 7370 210 2368 350 N 2 0 1190 39 1940 7460 235 1112 350 N 1 1 1320 38 2390 8290 148 2461 350 N 2 1 1370 45 2440 8400 440 2309 350 A 1 0 640 26 1160 6430 112 2226 350 A 2 0 690 30 1200 6490 175 1890 350 A 1 1 1620 36 2860 10680 145 3047 350 A 2 1 1630 40 2870 11630 265 2435 500 E 1 0 1080 35 1910 9530 356 4678 500 E 2 0 1120 37 1960 10090 372 3106 500 E 1 1 2650 42 4480 15710 215 4887 500 E 2 1 2130 54 3800 15510 579 5818 650 E 1 0 1510 39 2530 10640 322 5007 650 E 2 0 1230 42 2260 11580 520 1932 650 E 1 1 2510 48 4400 16640 306 7731 650 E 2 1 2540 52 4230 16810 541 6091 5