Exploratory Data Analysis as an Efficient Tool for Statistical Analysis: A Case Study From Analysis of Experiments
|
|
|
- Erica Walker
- 10 years ago
- Views:
Transcription
1 Developments in Applied Statistics Anuška Ferligoj and Andrej Mrvar (Editors) Metodološki zvezki, 9, Ljubljana: FDV, 003 Exploratory Data Analysis as an Efficient Tool for Statistical Analysis: A Case Study From Analysis of Experiments Katarina Košmelj, Andrej Blejec, and Drago Kompan Abstract Goat breeders investigated the effects of three different additives on the number of somatic cells in goats milk, additive DHA which is of fish origin, additive ALFA of plant origin, and additive EPA of fish origin. A control treatment contained no additive. The objective of this experiment was to answer two questions: Do any of these additives significantly reduce the number of SC in goats milk? For treatments resulting in a reduction, how long does the effect persist? Standard statistical methods used in the first phase did not give satisfactory results, therefore we analyzed the experiment in the context of exploratory data analysis. We used several graphical displays to establish which transformations have to be used in the preliminary phase and what kind of statistical methods should be applied to answer the questions of the experimenters. The results showed that ALFA treatment is the best, its effect was significant up to 54 th day of the experiment. Exploratory approach generated a new hypothesis: the procedure used for drug administration may increase the number of somatic cell due to the stress caused to the animal. In this paper we tried to show that in the analysis of experiments the common approach mostly based on modeling and hypothesis testing can be expanded by an intense exploration of data. Introduction Goat-breeders carried out an experiment with three new treatments and a control. Two questions were put to the statistician: Which treatment is the best? How long does the effect of the best treatment persist? These are typical questions in Biotechnical Faculty, University of Ljubljana, Slovenia. National Institute of Biology, Slovenia.
2 8 Katarina Košmelj, Andrej Blejec, and Drago Kompan the analysis of experiments where statistical methods such as ANOVA are commonly used. In this case study, other statisticians undertook standard statistical analysis based on modelling, however the results were not satisfactory to the experimenters. When the experimenters addressed us for an alternative view, we decided to undertake an approach based on exploratory data analysis (Tukey, 977). In the next section we describe the experiment in detail, and in the third we present the pathway followed to obtain the answers. We started our analysis with graphical displays of the data. These plots were very informative and led us stepby-step to the next phase of the statistical analysis. In the last section some conclusions are drawn. Description of the experiment. Objectives of the experiment The quality of milk is determined on the basis of the content of microorganisms and somatic cells. The somatic cells considered here are leukocytes and epithelial cells from the udder of the animal. Increased numbers of leukocytes may be due to inflammation, to bacterial infection, or to other factors such as the status of lactation (during the lactation period the number of somatic cells increases), age of the animal, period of the year, udder lesion, quantity of milk and diet. Factors such as stress and anxiety increase the number of somatic cells as well. The number of somatic cells per ml (SC) in goats milk is considerably higher than in cows milk, and variations in content are also higher. Standards for the number of SC in goats milk have not yet been set (Haenlein, 00). Researchers at the Zootechnical Department of the Biotechnical Faculty in Ljubljana, Slovenia suggested that some kind of food additive might decrease the number of SC. This led to the following experiment: the breeders investigated the effects of three different additives on the number of SC in goats milk, additive DHA which is of fish origin, additive ALFA of plant origin, and additive EPA of fish origin. A control treatment contained no additive. The objective of this experiment was to answer two questions: Question : Do any of these additives significantly reduce the number of SC in goats milk? Question : For treatments resulting in a reduction, how long does the effect persist? Sixty goats were involved in the experiment. Animals were of the same age, same breed and at the beginning of lactation.
3 Exploratory Data Analysis as an Efficient Tool Time schedule of the experiment The experiment lasted for 64 days, from Sept 7 th, 000 to November 9 th, 000 and was conducted in three phases. a) The preliminary phase lasted for 0 days and was designed as the adaptation process. According to the experience of the experimenters it takes about a week for the animals to adapt to a new environment and to new personnel. All the animals were put in the same shed. On the 0 th day the goats were divided into 4 groups: DHA, ALFA, EPA, CONTROL. b) The treatment phase, during which additives were administered, lasted from the th to the 4 th day. After the morning milking the animals were chased from one part of the shed to a specific restricted area where a technician put the additive directly in the muzzle of each animal. The control goats were also let through this area. c) The observation phase, during which the animals were observed, lasted up to the 64 th day..3 Data acquisition From each animal a sample of milk was taken at morning and evening milking. It was analyzed with respect to several milk parameters, including the number of SC in a standard milk sample. After the 9 th day, laboratory analyses were done every fifth day. Some animals had to be eliminated from the statistical analysis due to the lack of data (as explained further). The data for 54 animals were taken into account for statistical analysis. 3 Statistical analysis 3. Data presentation and transformation The data were first presented graphically. For each animal we plotted the number of SC as a function of time. Figure displays the time series for four selected goats, each from a different treatment group. Figure, and other plots not presented, reveal several interesting facts: there is great difference in the number of SC between animals (note that the scales on the y-axis are different); great oscillations of SC for each animal; the peaks are quite prominent; variability during the same day is visible: in the morning SC is generally higher than in the evening.
4 30 Katarina Košmelj, Andrej Blejec, and Drago Kompan SC DHA / 66 SC ALFA / 6465 SC EPA / 399 SC CONTROL / 645 Figure : Number of somatic cells per ml (SC) for four goats, each from a different treatment group. The labels refer to the individual animals. The first step in the analysis was to transform the data. The number of SC measures the concentration which is defined as a ratio of two quantities. Therefore the obvious transformation to start with was the logarithmic transformation. For ease of interpretation we used logarithm to base 0. Figure shows that the animals were not comparable in the preliminary phase, therefore the effect of different treatments can not be assessed directly. Some kind of standardization is necessary to obtain the comparability of the animals data before the treatment phase. The adaptation phase was seen to end before the day 7, as the breeders expected. Visual inspection shows modest variability with no noise (outliers) in this period. Therefore the period from days 7 to 0 was considered as the representative period, i.e. when the animals were in their standard (inherent) state. The vertical lines on Figure delineate this period.
5 Exploratory Data Analysis as an Efficient Tool DHA / 66 4 ALFA / 6465 log(sc) 3 log(sc) 3 4 EPA / CONTROL / 645 log(sc) 3 log(sc) 3 Figure : Logarithm of the number of somatic cells per ml (SC) for four goats, each from a different treatment group. The vertical lines delineate the representative period (see text) of the animals (day 7 to day 0). The data is seen to be highly skewed, and there are several extreme outliers. Thus, the commonly used standardization procedure based on the mean and standard deviation was not a good choice for our data. We replaced the mean by the median (Me) and the standard deviation by the average absolute deviate from the median (AAD). The transformed time series S (t) for each animal is calculated as follows: y( t) Me0 S( t) =, AAD 0 where Me 0 and AAD 0 are, respectively, the median and average absolute deviate for the representative period, and y (t) is the logarithm of the original time series. For the representative period (i.e. for the period from day 7 to day 0) the transformed data is standardized: for each animal the median for the representative period ( Me 0 ) is zero and the corresponding average absolute deviate ( AAD 0 ) is one. Outside the representative period the transformed time series S (t) is adjusted to the representative period.
6 3 Katarina Košmelj, Andrej Blejec, and Drago Kompan It should be also noted that some animals had to be eliminated from the subsequent analysis due to the lack of data in the representative period. The final database for the statistical analysis was as follows: 3 animals for DHA, animals for EPA and for ALFA, and 7 animals for the control DHA / ALFA / EPA / CONTROL / 645 Figure 3: Adjusted time series S(t) for four goats, each from a different treatment group. The thick horizontal lines indicate the medians for three periods: the representative period (days 7 to 0 this value is always zero), the period when the additives were administered (days to 4), and that from days 5 to 9. We calculated two additional medians from the adjusted time series: the median for the time at which the additives were administered ( th to 4 th day) and the median for the period from the 5 th to the 9 th day. These are denoted by Me and Me, respectively and shown in Figure 3 by thick horizontal lines. For the goat from the ALFA treatment group negative medians are evident (Figure 3). Plots for the animals from EPA and DHA are very similar, the two medians for the control goat are positive as well.
7 Exploratory Data Analysis as an Efficient Tool Question : Which treatment is best? The distributions of the medians Me and Me obtained for all the animals are shown in Figure 4 as a series of box-plots. Me ALFA DHA EPA CONTROL Me ALFA EPA DHA CONTROL Figure 4: Box-plots for Me (left) and for Me (right) for each of four treatment groups. The lines under the treatment labels mark the groups obtained by multiple comparison test. For the ALFA treatment, the values for Me and for Me are all negative, while for DHA and EPA treatments the majority of values are positive. In contrast, the majority of values for the control group are positive, some of them unexpectedly high (we would expect them to be around zero). When the distributions for Me and Me are compared, the same situation is identified, though the values for Me are higher than for Me. We used distribution-free tests (Siegel and Castellan, 988; Hutchinson, 000) to assess the effect of the treatments. Kruskal-Wallis one-way analysis of variance showed highly significant differences between the distributions of the medians for the different treatments. A multiple comparison test revealed two disjoint groups: the first group consists of the ALFA treatment, the second group treatments with EPA and DHA, and the Control. The result is identical for Me and Me (see Figure 4). The answer to Question is straightforward: ALFA treatment is shown to be the best it significantly reduces the number of somatic cells in the goats milk. Its effect is evident both at the time it is administered and afterwards. No significant reduction is effected by DHA or EPA treatment (both of fish origin).
8 34 Katarina Košmelj, Andrej Blejec, and Drago Kompan 3. Question : How long does the effect of ALFA last? Again we started with a graphical presentation. For the ALFA group we plotted the values for the adjusted time series S(t) for all animals (see Fig. 5 left). For each time point we calculated the median of these values and displayed the time series of the grand medians (solid line). The grand median is negative over the whole period, and significantly so up to the 54 th day (Wilcoxon s test). ALFA CONTROL Figure 5: Values for the adjusted time series S(t) for all animals at each time point. Left ALFA group, right CONTROL group. The solid line connects the grand medians at each time point. The vertical lines define the period during which the additives were administered. 3.3 Ancillary findings of the analysis The results show that the control group is an essential part of the analysis. In the control group, the medians Me and Me are all positive rather than around zero, as expected. This is confirmed by the plot in Figure 5, where it is evident that, after the period of administration started, the grand median is positive up to the 50 th day. The breeders suspected that the procedure of chasing the animals from one side of the shed to the technician s area may be very stressful for the animals and may result in the increase of the number of somatic cells. If this is true, then the effect of the other additives might be underestimated. We checked for the outliers in the control group. It turned out that the majority of extreme values (over 0 on Figure 5 right) belong to one particular animal CONTROL/6464. Figure 6 reveals its characteristics on the original scale and on the adjusted scale. For this goat, SC in the preliminary period was extremely low and stable, when the treatment phase started it increased to about 000. The
9 Exploratory Data Analysis as an Efficient Tool extreme values in the adjusted time series are caused by the low variability in the representative period. SC CONTROL / CONTROL / 6464 Figure 6: Time series for all animal CONTROL/6464: original time series left, adjusted time series right. 4 Discussion The exploratory approach we followed is based on a step-by-step exploration of data. Graphics is its essential part. We started with graphical display of the timeseries for each goat. The plots we made spoke for themselves they revealed several interesting facts about the data under study and showed the way ahead: use of appropriate transformations and distribution-free tests. The choice of transformations was based on the subject matter knowledge (e.g. logarithmic transformation is appropriate for the variables measuring concentration) and on the careful inspection of the plots. The latter revealed the need for standardization to obtain the comparability of the animals data before the treatment phase. Subject matter knowledge was also necessary for this procedure: the experimenters defined the representative period for the animals. We have used two alternative measure of variation in the standardization procedure: the average absolute deviate AAD (as described in the text) and the median of the absolute deviates, MAD (not presented in the text), as proposed by (Huber, 98). Both transformation produced the same rank order of values, therefore the results of the distribution-free tests were the same. We considered the medians as the representative values for the specific time intervals. We are aware of the fact the variability due to subject is not taken into account in the Kruskal-Wallis test. We tried to compensate this deficiency by a careful inspection of the within group variability (see Figure 5). The inspection of the control goats data generated a new research hypothesis: the procedure used for
10 36 Katarina Košmelj, Andrej Blejec, and Drago Kompan drug administration may increase the number of somatic cells due to the stress caused to the animal. To sum up: in this paper we tried to show that in the analysis of experiments the common approach mostly based on modeling and hypothesis testing can be expanded by an intense exploration of data. The case study from the field of animal breeding illustrates this. However intense collaboration among experimenters and statisticians is the prerequisite for that. Acknowledgments We express thanks to an anonymous referee whose comments helped us to improve the paper considerably. References [] Haenlein, G.F.W. (00): Relationship of somatic cells counts in goats milk to mastitis and productivity. Small Ruminant Research, 45, [] Huber, P.J. (98): Robust Statistics. New York: John Wiley and Sons. [3] Hutchinson, T.P. (000): Letter to the editor: ANOVA with skewed data. Environmetrics 000,, -4. [4] Siegel, S. and Castellan, N.J. (988): Nonparametric Statistics for the Behavioral Sciences. New York: Mc-Graw-Hill. [5] Tukey, J.N. (977): Exploratory Data Analysis. Reading: Addison-Wesley.
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
II. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
One-Way Analysis of Variance (ANOVA) Example Problem
One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means
UNIVERSITY OF NAIROBI
UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER
business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
MEASURES OF LOCATION AND SPREAD
Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
Chapter 7. One-way ANOVA
Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks
Exploratory Data Analysis. Psychology 3256
Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find
Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model
Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity
T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these
12: Analysis of Variance. Introduction
1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider
Standard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
Permutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable
Exploratory Data Analysis -- Introduction
Lecture Notes in Quantitative Biology Exploratory Data Analysis -- Introduction Chapter 19.1 Revised 25 November 1997 ReCap: Correlation Exploratory Data Analysis What is it? Exploratory vs confirmatory
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s
How To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
Non-Parametric Tests (I)
Lecture 5: Non-Parametric Tests (I) KimHuat LIM [email protected] http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent
Tutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls [email protected] MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
Introduction to Statistics with GraphPad Prism (5.01) Version 1.1
Babraham Bioinformatics Introduction to Statistics with GraphPad Prism (5.01) Version 1.1 Introduction to Statistics with GraphPad Prism 2 Licence This manual is 2010-11, Anne Segonds-Pichon. This manual
Parametric and Nonparametric: Demystifying the Terms
Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD
Identifying Market Price Levels using Differential Evolution
Identifying Market Price Levels using Differential Evolution Michael Mayo University of Waikato, Hamilton, New Zealand [email protected] WWW home page: http://www.cs.waikato.ac.nz/~mmayo/ Abstract. Evolutionary
SPSS Explore procedure
SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,
Recall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)
Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center
Come scegliere un test statistico
Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table
Using SPSS, Chapter 2: Descriptive Statistics
1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,
10. Analysis of Longitudinal Studies Repeat-measures analysis
Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.
Robust procedures for Canadian Test Day Model final report for the Holstein breed
Robust procedures for Canadian Test Day Model final report for the Holstein breed J. Jamrozik, J. Fatehi and L.R. Schaeffer Centre for Genetic Improvement of Livestock, University of Guelph Introduction
TRAINING PROGRAM INFORMATICS
MEDICAL UNIVERSITY SOFIA MEDICAL FACULTY DEPARTMENT SOCIAL MEDICINE AND HEALTH MANAGEMENT SECTION BIOSTATISTICS AND MEDICAL INFORMATICS TRAINING PROGRAM INFORMATICS FOR DENTIST STUDENTS - I st COURSE,
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression
Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation.
Computer Workshop 1 Part I Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation. Outlier testing Problem: 1. Five months of nickel
MEASURES OF VARIATION
NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are
SPSS Tests for Versions 9 to 13
SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list
DATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST
UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Sun Li Centre for Academic Computing [email protected]
Sun Li Centre for Academic Computing [email protected] Elementary Data Analysis Group Comparison & One-way ANOVA Non-parametric Tests Correlations General Linear Regression Logistic Models Binary Logistic
3: Summary Statistics
3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes
Diagrams and Graphs of Statistical Data
Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in
Instructions for SPSS 21
1 Instructions for SPSS 21 1 Introduction... 2 1.1 Opening the SPSS program... 2 1.2 General... 2 2 Data inputting and processing... 2 2.1 Manual input and data processing... 2 2.2 Saving data... 3 2.3
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
Comparing Means in Two Populations
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
N-Way Analysis of Variance
N-Way Analysis of Variance 1 Introduction A good example when to use a n-way ANOVA is for a factorial design. A factorial design is an efficient way to conduct an experiment. Each observation has data
Version 4.0. Statistics Guide. Statistical analyses for laboratory and clinical researchers. Harvey Motulsky
Version 4.0 Statistics Guide Statistical analyses for laboratory and clinical researchers Harvey Motulsky 1999-2005 GraphPad Software, Inc. All rights reserved. Third printing February 2005 GraphPad Prism
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize
Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS
Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical
STAT355 - Probability & Statistics
STAT355 - Probability & Statistics Instructor: Kofi Placid Adragni Fall 2011 Chap 1 - Overview and Descriptive Statistics 1.1 Populations, Samples, and Processes 1.2 Pictorial and Tabular Methods in Descriptive
CHAPTER 14 NONPARAMETRIC TESTS
CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences
Introduction to Statistics and Quantitative Research Methods
Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.
Analysis of Variance ANOVA
Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.
On Mardia s Tests of Multinormality
On Mardia s Tests of Multinormality Kankainen, A., Taskinen, S., Oja, H. Abstract. Classical multivariate analysis is based on the assumption that the data come from a multivariate normal distribution.
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
Terminating Sequential Delphi Survey Data Collection
A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to
Common Tools for Displaying and Communicating Data for Process Improvement
Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot
How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free)
Statgraphics Centurion XVII (currently in beta test) is a major upgrade to Statpoint's flagship data analysis and visualization product. It contains 32 new statistical procedures and significant upgrades
THE KRUSKAL WALLLIS TEST
THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON
Data Transforms: Natural Logarithms and Square Roots
Data Transforms: atural Log and Square Roots 1 Data Transforms: atural Logarithms and Square Roots Parametric statistics in general are more powerful than non-parametric statistics as the former are based
First-year Statistics for Psychology Students Through Worked Examples. 3. Analysis of Variance
First-year Statistics for Psychology Students Through Worked Examples 3. Analysis of Variance by Charles McCreery, D.Phil Formerly Lecturer in Experimental Psychology Magdalen College Oxford Copyright
How To Write A Data Analysis
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
How Far is too Far? Statistical Outlier Detection
How Far is too Far? Statistical Outlier Detection Steven Walfish President, Statistical Outsourcing Services [email protected] 30-325-329 Outline What is an Outlier, and Why are
Time series Forecasting using Holt-Winters Exponential Smoothing
Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract
Data analysis process
Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis
What Does the Normal Distribution Sound Like?
What Does the Normal Distribution Sound Like? Ananda Jayawardhana Pittsburg State University [email protected] Published: June 2013 Overview of Lesson In this activity, students conduct an investigation
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
13: Additional ANOVA Topics. Post hoc Comparisons
13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Kruskal-Wallis Test Post hoc Comparisons In the prior
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,
CHAPTER 13 Nonparametric and Distribution-Free Statistics Nonparametric tests these test hypotheses that are not statements about population parameters (e.g., 2 tests for goodness of fit and independence).
X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
Implications of Big Data for Statistics Instruction 17 Nov 2013
Implications of Big Data for Statistics Instruction 17 Nov 2013 Implications of Big Data for Statistics Instruction Mark L. Berenson Montclair State University MSMESB Mini Conference DSI Baltimore November
Version 5.0. Statistics Guide. Harvey Motulsky President, GraphPad Software Inc. 2007 GraphPad Software, inc. All rights reserved.
Version 5.0 Statistics Guide Harvey Motulsky President, GraphPad Software Inc. All rights reserved. This Statistics Guide is a companion to GraphPad Prism 5. Available for both Mac and Windows, Prism makes
Analyzing Experimental Data
Analyzing Experimental Data The information in this chapter is a short summary of some topics that are covered in depth in the book Students and Research written by Cothron, Giese, and Rezba. See the end
t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon
t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected] www.excelmasterseries.com
1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers
1.3 Measuring Center & Spread, The Five Number Summary & Boxplots Describing Quantitative Data with Numbers 1.3 I can n Calculate and interpret measures of center (mean, median) in context. n Calculate
DETECTING MASTITIS COW-SIDE. J. Eric Hillerton Institute for Animal Health Compton, United Kingdom
DETECTING MASTITIS COW-SIDE J. Eric Hillerton Institute for Animal Health Compton, United Kingdom The most important practical concern on mastitis is to prevent it happening. Assuming that there has been
Statistics in Medicine Research Lecture Series CSMC Fall 2014
Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power
CHI-SQUARE: TESTING FOR GOODNESS OF FIT
CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity
INFLUENCE OF MEASUREMENT SYSTEM QUALITY ON THE EVALUATION OF PROCESS CAPABILITY INDICES
METALLURGY AND FOUNDRY ENGINEERING Vol. 38, 2012, No. 1 http://dx.doi.org/10.7494/mafe.2012.38.1.25 Andrzej Czarski*, Piotr Matusiewicz* INFLUENCE OF MEASUREMENT SYSTEM QUALITY ON THE EVALUATION OF PROCESS
1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
Part II Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Part II
Part II covers diagnostic evaluations of historical facility data for checking key assumptions implicit in the recommended statistical tests and for making appropriate adjustments to the data (e.g., consideration
Variables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS
DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 [email protected] 1. Descriptive Statistics Statistics
Normality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected]
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.
BioVisualization: Enhancing Clinical Data Mining
BioVisualization: Enhancing Clinical Data Mining Even as many clinicians struggle to give up their pen and paper charts and spreadsheets, some innovators are already shifting health care information technology
Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases:
Profile Analysis Introduction Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases: ) Comparing the same dependent variables
Defining the Beginning: The Importance of Research Design
Research and Management Techniques for the Conservation of Sea Turtles K. L. Eckert, K. A. Bjorndal, F. A. Abreu-Grobois, M. Donnelly (Editors) IUCN/SSC Marine Turtle Specialist Group Publication No. 4,
Chapter 12 Nonparametric Tests. Chapter Table of Contents
Chapter 12 Nonparametric Tests Chapter Table of Contents OVERVIEW...171 Testing for Normality...... 171 Comparing Distributions....171 ONE-SAMPLE TESTS...172 TWO-SAMPLE TESTS...172 ComparingTwoIndependentSamples...172
Multivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
MBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
Magruder Statistics & Data Analysis
Magruder Statistics & Data Analysis Caution: There will be Equations! Based Closely On: Program Model The International Harmonized Protocol for the Proficiency Testing of Analytical Laboratories, 2006
Chapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This
