Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation.

Similar documents
UNIVERSITY OF NAIROBI

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Module 5: Statistical Analysis

Descriptive Statistics

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

SPSS Explore procedure

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Mathematics within the Psychology Curriculum

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.

Projects Involving Statistics (& SPSS)

3.2 Statistical Analysis Procedures

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Chapter 23. Inferences for Regression

How To Test For Significance On A Data Set

The Statistics Tutor s Quick Guide to

Research Methods & Experimental Design

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Simple linear regression

Tutorial 5: Hypothesis Testing

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

MEASURES OF LOCATION AND SPREAD

Premaster Statistics Tutorial 4 Full solutions

Technical Guidance for Exploring TMDL Effectiveness Monitoring Data

MTH 140 Statistics Videos

Part 2: Analysis of Relationship Between Two Variables

SPSS Tests for Versions 9 to 13


Water Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. March 2013 February 2014

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Part II Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Part II

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

Skewed Data and Non-parametric Methods

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

MASTER COURSE SYLLABUS-PROTOTYPE PSYCHOLOGY 2317 STATISTICAL METHODS FOR THE BEHAVIORAL SCIENCES

Confidence Intervals for Spearman s Rank Correlation

NCSS Statistical Software

Factors affecting online sales

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Fairfield Public Schools

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Sample Size and Power in Clinical Trials

Recall this chart that showed how most of our course would be organized:

NCSS Statistical Software

Additional sources Compilation of sources:

Come scegliere un test statistico

List of Examples. Examples 319

A full analysis example Multiple correlations Partial correlations

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

(and sex and drugs and rock 'n' roll) ANDY FIELD

AP Statistics: Syllabus 1

Rank-Based Non-Parametric Tests

Elementary Statistics Sample Exam #3

II. DISTRIBUTIONS distribution normal distribution. standard scores

Biostatistics: Types of Data Analysis

Using R for Linear Regression

Simple Linear Regression Inference

Chapter 13 Introduction to Linear Regression and Correlation Analysis

STAT 2080/MATH 2080/ECON 2280 Statistical Methods for Data Analysis and Inference Fall 2015

The Dummy s Guide to Data Analysis Using SPSS

An introduction to IBM SPSS Statistics

Minitab Session Commands

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

a) Find the five point summary for the home runs of the National League teams. b) What is the mean number of home runs by the American League teams?

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

SOCIOLOGY 7702 FALL, 2014 INTRODUCTION TO STATISTICS AND DATA ANALYSIS

How To Run Statistical Tests in Excel

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7: Simple linear regression Learning Objectives

THE KRUSKAL WALLLIS TEST

Introduction to Statistics and Quantitative Research Methods

You have data! What s next?

Introduction to Quantitative Methods

2 Sample t-test (unequal sample sizes and unequal variances)

12: Analysis of Variance. Introduction

Quality. Guidance for Data Quality Assessment. Practical Methods for Data Analysis EPA QA/G-9 QA00 UPDATE. EPA/600/R-96/084 July, 2000

Week TSX Index

Final Exam Practice Problem Answers

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Statistical tests for SPSS

Chapter 12 Nonparametric Tests. Chapter Table of Contents

Statistics Review PSY379

3. There are three senior citizens in a room, ages 68, 70, and 72. If a seventy-year-old person enters the room, the

Nonparametric Statistics

THE UNIVERSITY OF TEXAS AT TYLER COLLEGE OF NURSING COURSE SYLLABUS NURS 5317 STATISTICS FOR HEALTH PROVIDERS. Fall 2013

Monitoring Data Exploring Your Data, The First Step

Statistical Models in R

Pearson's Correlation Tests

Data analysis process

Study Guide for the Final Exam

ISR Wellfield Background and Restoration Ground Water Quality Data: Collection, Statistical Analysis and Public Access

Paired 2 Sample t-test

Exploratory data analysis (Chapter 2) Fall 2011

Transcription:

Computer Workshop 1 Part I Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation. Outlier testing Problem: 1. Five months of nickel concentration (ppb) observations from four observation wells are given below: Nickel Conc. (ppb) Month Well 1 Well 2 Well 3 Well 4 1 58.8 19.0 39.0 3.1 2 1.0 81.5 150.0 940.0 3 26.0 33.0 27.0 85.6 4 56.0 14.0 21.4 10.0 5 8.7 64.4 578.0 637.0 a) Obtain summary statistics for the data set as a whole. b) Calculate the coefficient of skewness for the data set as a whole. c) Calculate the MAD, quartile skew, and the geometric mean of the whole data set. d) Obtain summary statistics of the concentrations by month and by well and discuss the results. e) What transformation would make the data approximately normally distributed? As a check, recalculate the skewness after applying the transformation. f) Are there any outliers present in the data set given?

Computer Workshop 1 Part II Graphical description of data Dotplot, boxplot, stem and leaf plot, normal plot, histograms Side-by-side boxplots Comparing distributions For the data in question 1, a) Obtain a dotplot, histogram, boxplot, and stem-and-leaf plot for the whole data set. b) Graphically compare the concentrations by month and by well. c) Determine whether the data as a whole are normally distributed. d) Check the normality of the data of the transformed data in question 1 part (e). e) Redo all the plots in part (a) using the transformed data. Comment on the results. f) If Wells 1 and 2 are upgradient wells and 3 and 4 are downgradient wells, is there an obvious difference in the nickel concentrations of the up- and downgradient wells? g) Compare the distributions of the upgradient and downgradient wells.

Computer Workshop 2 Sampling distributions Interval estimation: parametric (t-interval) and nonparametric (s-interval) approaches Meaning of CI and interpreting Minitab outputs. Introduction to Minitab macros and bootstrapping. Problems: 1. Compute both the nonparametric and parametric 95% interval estimates for the median of the following data. 6.0 0.5 0.4 0.7 0.8 6.0 5.0 0.6 1.2 0.3 0.2 0.5 0.5 10.0 0.2 0.2 1.7 3.0 Which is more appropriate for these data? Why? 2. A concentration of 0.85 ppm of benzene was measured in an observation well. Is this concentration likely to belong to the same distribution as the data given below, or does it represent something larger? Answer this by computing the 95% parametric and nonparametric intervals. Which interval is more appropriate for these data? Benzene (ppm) 0.001 0.030 0.100 0.003 0.040 0.454 0.007 0.041 0.490 0.020 0.077 1.020 3. Suppose that a water quality standard stated that a 90 th percentile of benzene concentration in drinking water shall not exceed 0.20 ppm. Has this standard being violated at α=0.05 by the benzene data of problem 2? 4. Obtain using bootstrapping the standard error and approximate 90% confidence interval for the median and maximum of the data given in problem 1.

Computer Workshop 3 2-sample tests: paired and independent samples. Parametric (t-test) and Nonparametric (Mann-Whitney) tests. Resampling methods for two-sample test. Problems: 1. The following values of specific conductance were measured on two forks of the Nile River. a) State the appropriate null and alternate hypotheses to see if conductance values are the same in the two forks. b) Determine whether a parametric or non-parametric test should be used. c) Do the appropriate statistical test and report the results. d) Estimate the amount by which the forks differ in conductance, regardless of the test outcome. Date South Fork North Fork Date South Fork North Fork 5/23 194 255 2/22 194 295 8/16 348 353 4/24 212 199 10/5 383 470 6/04 320 410 11/15 225 353 7/19 340 346 1/10 266 353 8/28 310 405 2. Historical water quality data for an aquifer shows the following nitrate concentrations (mg/l) Pre-1980 Post-1980 1 2 4 1 5 14 1 3 5 2 8 15 2 3 5 2 10 18 2 5 9 4 11 23 7 4 Is there a statistically significant increase in the nitrate concentration after 1980? Use parametric, nonparametric, and resampling tests.

Computer Workshop 4 1-way parametric and nonparametric ANOVA (Kruskal-Wallis) tests. Multiple comparison tests. Checking assumptions of ANOVA and transformations. 2-way parametric and nonparametric ANOVA (Friedman test). Nested Designs Problems: 1. Leachate from a landfill may have contaminated shallow groundwater with caustic, high ph effluent. Determine whether the ph samples taken from three sets of piezometers are all identical. One piezometer group is known to be uncontaminated. If the ph's are not identical, which groups are different from others, and which are contaminated? ph of samples taken from piezometer groups P1: 7.0 7.2 7.5 7.7 8.7 7.8 P2: 6.3 6.9 7.0 6.4 6.8 6.7 P3: 8.4 7.6 7.5 7.4 9.3 9.0 8.9 2. A Before-After-Control-Impact (BACI) experimental design with 6 random replicates per site and period was used in a study of the effect of effluent discharge on the abundance of a particular species. The species abundance data were as follows: Time Control Area Impact Area Before - Impact 36 67 30 65 40 37 24 60 24 41 95 71 After -Impact 36 32 49 59 38 32 8 8 20 12 9 6 Test the hypothesis that there is no change in the abundance of the species in the impacted area that does not also occurs in the control area. Test all assumptions and use a logarithmic or rank transformation if necessary. 3. Total Suspended Solids (TSS) concentrations were measured at 5 locations along a river during 4 different seasons. Is there a difference in TSS concentration from season to season or from location to location? Location A B C D E 1 17 19 18 20 21 2 21 22 16 23 28 Season 3 19 25 17 25 29 4 11 18 13 20 18

4. A study on pollution was conducted in a certain industrial area. As part of the study, fish were caught from three different lakes in the area and the mercury concentration (in ppm) was measured in each. Fish from another three lakes in another area (to act as a control) were also measured. Lake (impact) Lake (control) A B C D E F 4.0 2.7 3.9 3.8 3.9 3.2 4.6 4.4 4.1 3.8 2.7 2.8 3.8 3.8 4.3 3.9 4.8 4.1 3.7 5.7 3.4 3.9 3.6 3.1 4.5 5.2 3.2 3.8 3.8 3.9 4.2 4.6 2.5 3.7 4.7 3.5 4.8 5.0 4.5 3.9 4.4 4.8 Can we conclude that there is really no difference between the mercury concentrations in fish at the impact and control sites? Is there a difference among the sites within each area?

Computer Workshop 5 Correlation analysis Simple regression analysis and diagnostic checking Interpretation of MINITAB outputs. Alternatives to OLS (Mann-Kendall line, LOWESS). Problems: 1. Ten pairs of X and Y are given below, ordered by increasing x values: Y: 1.22 2.20 4.80 1.28 1.97 1.46 2.64 2.34 4.84 2.96 X: 2 24 99 197 377 544 632 3452 6587 53170 Estimate the correlation between X and Y using Kendall's τ, Spearman's ρ, and Pearson's r. 2. For the data below, compute: a) the Kendall slope estimator and non-parametric regression equation b) the significance level of the test c) If the Y value of 62 is actually 200, how would this new value affect the estimate of slope, intercept, τ, and significance level? Y: 10 40 30 55 62 56 X: 1 2 3 4 5 6 3. The data of the lowering of streambed elevation downstream of a major dam to years following its installation are given below. Obtain an appropriate OLS regression to predict bed lowering (L) from years (Yrs). Plot also the LOWESS line for the data. Describe how well each describes the data. Yrs Lowering (m) Yrs Lowering (m) Yrs Lowering (m) 0.5-0.65 8-0.485 17-5.05 1-1.20 10-4.40 20-5.10 2-2.20 11-4.95 22-5.65 4-2.60 13-5.10 24-5.50 6-3.40 15-4.90 27-5.65 4. Median grain sizes of alluvial aquifer materials in the Arkansas River Valley were related to their yields, in gallons per day per square foot. These enabled estimates of yield to be made at other locations based on measured grain-size analyses. Obtain the regression equation to predict yield, based on the data attached. Estimate the mean yield from a well where the median grain size is 0.4 mm.

Computer Workshop 6 Multiple regression analysis and diagnostic checking Interpretation of MINITAB outputs. ANCOVA Trend Analysis Problems: 1. The data attached reflect data of certain variables taken from 17 different sites. There are 5 explanatory variables (x 1 to x 5 ) and 1 dependent variable y. The goal is to produce an empirical equation that will estimate (or predict) y. For physical reasons, it is known that all explanatory variables are positively correlated to y. 2. An aquifer was investigated to determine relationships between uranium and other concentrations in its waters. Construct a regression model to relate uranium to total dissolved solids and bicarbonate, using the data attached. 3. During the period 1962-1969 the Green River Dam was constructed about 35 miles upstream of a gauging station on the Green River. The question is this - over the period of record 1952-1972 (which includes pre-dam, transition, and regulated periods), has there been a monotonic trend in sediment transport? Also, using this data is there a step-trend in sediment transport from before the dam was built (1952-1961) to after the dam was built (1968-1972)? Data attached.