Computer Workshop 1 Part I Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation. Outlier testing Problem: 1. Five months of nickel concentration (ppb) observations from four observation wells are given below: Nickel Conc. (ppb) Month Well 1 Well 2 Well 3 Well 4 1 58.8 19.0 39.0 3.1 2 1.0 81.5 150.0 940.0 3 26.0 33.0 27.0 85.6 4 56.0 14.0 21.4 10.0 5 8.7 64.4 578.0 637.0 a) Obtain summary statistics for the data set as a whole. b) Calculate the coefficient of skewness for the data set as a whole. c) Calculate the MAD, quartile skew, and the geometric mean of the whole data set. d) Obtain summary statistics of the concentrations by month and by well and discuss the results. e) What transformation would make the data approximately normally distributed? As a check, recalculate the skewness after applying the transformation. f) Are there any outliers present in the data set given?
Computer Workshop 1 Part II Graphical description of data Dotplot, boxplot, stem and leaf plot, normal plot, histograms Side-by-side boxplots Comparing distributions For the data in question 1, a) Obtain a dotplot, histogram, boxplot, and stem-and-leaf plot for the whole data set. b) Graphically compare the concentrations by month and by well. c) Determine whether the data as a whole are normally distributed. d) Check the normality of the data of the transformed data in question 1 part (e). e) Redo all the plots in part (a) using the transformed data. Comment on the results. f) If Wells 1 and 2 are upgradient wells and 3 and 4 are downgradient wells, is there an obvious difference in the nickel concentrations of the up- and downgradient wells? g) Compare the distributions of the upgradient and downgradient wells.
Computer Workshop 2 Sampling distributions Interval estimation: parametric (t-interval) and nonparametric (s-interval) approaches Meaning of CI and interpreting Minitab outputs. Introduction to Minitab macros and bootstrapping. Problems: 1. Compute both the nonparametric and parametric 95% interval estimates for the median of the following data. 6.0 0.5 0.4 0.7 0.8 6.0 5.0 0.6 1.2 0.3 0.2 0.5 0.5 10.0 0.2 0.2 1.7 3.0 Which is more appropriate for these data? Why? 2. A concentration of 0.85 ppm of benzene was measured in an observation well. Is this concentration likely to belong to the same distribution as the data given below, or does it represent something larger? Answer this by computing the 95% parametric and nonparametric intervals. Which interval is more appropriate for these data? Benzene (ppm) 0.001 0.030 0.100 0.003 0.040 0.454 0.007 0.041 0.490 0.020 0.077 1.020 3. Suppose that a water quality standard stated that a 90 th percentile of benzene concentration in drinking water shall not exceed 0.20 ppm. Has this standard being violated at α=0.05 by the benzene data of problem 2? 4. Obtain using bootstrapping the standard error and approximate 90% confidence interval for the median and maximum of the data given in problem 1.
Computer Workshop 3 2-sample tests: paired and independent samples. Parametric (t-test) and Nonparametric (Mann-Whitney) tests. Resampling methods for two-sample test. Problems: 1. The following values of specific conductance were measured on two forks of the Nile River. a) State the appropriate null and alternate hypotheses to see if conductance values are the same in the two forks. b) Determine whether a parametric or non-parametric test should be used. c) Do the appropriate statistical test and report the results. d) Estimate the amount by which the forks differ in conductance, regardless of the test outcome. Date South Fork North Fork Date South Fork North Fork 5/23 194 255 2/22 194 295 8/16 348 353 4/24 212 199 10/5 383 470 6/04 320 410 11/15 225 353 7/19 340 346 1/10 266 353 8/28 310 405 2. Historical water quality data for an aquifer shows the following nitrate concentrations (mg/l) Pre-1980 Post-1980 1 2 4 1 5 14 1 3 5 2 8 15 2 3 5 2 10 18 2 5 9 4 11 23 7 4 Is there a statistically significant increase in the nitrate concentration after 1980? Use parametric, nonparametric, and resampling tests.
Computer Workshop 4 1-way parametric and nonparametric ANOVA (Kruskal-Wallis) tests. Multiple comparison tests. Checking assumptions of ANOVA and transformations. 2-way parametric and nonparametric ANOVA (Friedman test). Nested Designs Problems: 1. Leachate from a landfill may have contaminated shallow groundwater with caustic, high ph effluent. Determine whether the ph samples taken from three sets of piezometers are all identical. One piezometer group is known to be uncontaminated. If the ph's are not identical, which groups are different from others, and which are contaminated? ph of samples taken from piezometer groups P1: 7.0 7.2 7.5 7.7 8.7 7.8 P2: 6.3 6.9 7.0 6.4 6.8 6.7 P3: 8.4 7.6 7.5 7.4 9.3 9.0 8.9 2. A Before-After-Control-Impact (BACI) experimental design with 6 random replicates per site and period was used in a study of the effect of effluent discharge on the abundance of a particular species. The species abundance data were as follows: Time Control Area Impact Area Before - Impact 36 67 30 65 40 37 24 60 24 41 95 71 After -Impact 36 32 49 59 38 32 8 8 20 12 9 6 Test the hypothesis that there is no change in the abundance of the species in the impacted area that does not also occurs in the control area. Test all assumptions and use a logarithmic or rank transformation if necessary. 3. Total Suspended Solids (TSS) concentrations were measured at 5 locations along a river during 4 different seasons. Is there a difference in TSS concentration from season to season or from location to location? Location A B C D E 1 17 19 18 20 21 2 21 22 16 23 28 Season 3 19 25 17 25 29 4 11 18 13 20 18
4. A study on pollution was conducted in a certain industrial area. As part of the study, fish were caught from three different lakes in the area and the mercury concentration (in ppm) was measured in each. Fish from another three lakes in another area (to act as a control) were also measured. Lake (impact) Lake (control) A B C D E F 4.0 2.7 3.9 3.8 3.9 3.2 4.6 4.4 4.1 3.8 2.7 2.8 3.8 3.8 4.3 3.9 4.8 4.1 3.7 5.7 3.4 3.9 3.6 3.1 4.5 5.2 3.2 3.8 3.8 3.9 4.2 4.6 2.5 3.7 4.7 3.5 4.8 5.0 4.5 3.9 4.4 4.8 Can we conclude that there is really no difference between the mercury concentrations in fish at the impact and control sites? Is there a difference among the sites within each area?
Computer Workshop 5 Correlation analysis Simple regression analysis and diagnostic checking Interpretation of MINITAB outputs. Alternatives to OLS (Mann-Kendall line, LOWESS). Problems: 1. Ten pairs of X and Y are given below, ordered by increasing x values: Y: 1.22 2.20 4.80 1.28 1.97 1.46 2.64 2.34 4.84 2.96 X: 2 24 99 197 377 544 632 3452 6587 53170 Estimate the correlation between X and Y using Kendall's τ, Spearman's ρ, and Pearson's r. 2. For the data below, compute: a) the Kendall slope estimator and non-parametric regression equation b) the significance level of the test c) If the Y value of 62 is actually 200, how would this new value affect the estimate of slope, intercept, τ, and significance level? Y: 10 40 30 55 62 56 X: 1 2 3 4 5 6 3. The data of the lowering of streambed elevation downstream of a major dam to years following its installation are given below. Obtain an appropriate OLS regression to predict bed lowering (L) from years (Yrs). Plot also the LOWESS line for the data. Describe how well each describes the data. Yrs Lowering (m) Yrs Lowering (m) Yrs Lowering (m) 0.5-0.65 8-0.485 17-5.05 1-1.20 10-4.40 20-5.10 2-2.20 11-4.95 22-5.65 4-2.60 13-5.10 24-5.50 6-3.40 15-4.90 27-5.65 4. Median grain sizes of alluvial aquifer materials in the Arkansas River Valley were related to their yields, in gallons per day per square foot. These enabled estimates of yield to be made at other locations based on measured grain-size analyses. Obtain the regression equation to predict yield, based on the data attached. Estimate the mean yield from a well where the median grain size is 0.4 mm.
Computer Workshop 6 Multiple regression analysis and diagnostic checking Interpretation of MINITAB outputs. ANCOVA Trend Analysis Problems: 1. The data attached reflect data of certain variables taken from 17 different sites. There are 5 explanatory variables (x 1 to x 5 ) and 1 dependent variable y. The goal is to produce an empirical equation that will estimate (or predict) y. For physical reasons, it is known that all explanatory variables are positively correlated to y. 2. An aquifer was investigated to determine relationships between uranium and other concentrations in its waters. Construct a regression model to relate uranium to total dissolved solids and bicarbonate, using the data attached. 3. During the period 1962-1969 the Green River Dam was constructed about 35 miles upstream of a gauging station on the Green River. The question is this - over the period of record 1952-1972 (which includes pre-dam, transition, and regulated periods), has there been a monotonic trend in sediment transport? Also, using this data is there a step-trend in sediment transport from before the dam was built (1952-1961) to after the dam was built (1968-1972)? Data attached.