Are our data symmetric?

Size: px
Start display at page:

Download "Are our data symmetric?"

Transcription

1 Statistical Methods in Medical Research 2003; 12: 505^513 Are our data symmetric? Sumithra J Mandrekar and Jayawant N Mandrekar Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA Skewness indicates a lack of symmetry in a distribution. Knowing the symmetry of the underlying data is essential for parametric analysis, tting distributions or doing transformations to the data. The coef cient of skewness is the commonly used measure to identify a lack of symmetry in the underlying data, although graphical procedures can also be effective. We discuss three different methods to assess skewness: traditional coef cient of skewness index, skewness index based on the L-moments discussed by Hosking and the asymptotic test of symmetry developed by Randles et al. With this work, we provide easy-toimplement S-PLUS 1 functions as well as discuss the advantages and shortcomings of each technique. 1 Introduction The rst step in any statistical analysis includes summarizing the characteristics of the underlying data. All standard statistica l packages routinely provide summary statistics information, and this often includes a sample skewness score, which is a measure of symmetry. Symmetry is a rather complex property of probability distributions and it is dif cult to identify deviations from it in a small number of observations. Broadly speaking, a dataset or a distribution is said to be symmetric if it looks the same to the right and left of the center point. One of the numerous reasons for checking symmetry in a given dataset is that many statistica l tests rely strongly on the assumption of normality, which in turn relies on symmetry. Thus, a skewness measure can provide valuable information on issues such as data transformation, outlier detection, distribution tting, and so on, so as to ensure that an appropriate analysis procedure (parametric versus nonparametric) is employed. In our paper we compare three different methods used to assess skewness: traditional coef cient of skewness index, skewness index based on the L-moments discussed by Hosking, 1 and the asymptotic distribution-free test of symmetry (symmetry test) developed by Randles et al. 2 Royston 3 has demonstrated that the L-moments-based skewness measure has no serious drawbacks in practical data applications whereas the traditional coef cient of skewness index suffers from several theoretical and practical disadvantages. Further, using a Monte Carlo study, Randles et al. 2 showed that the nonparametric test of symmetry is superior to the test based on the sample skewness index. In the past, each of these indices have been explored individually and we aim to compare these three competitors based on their complexity, accuracy and accessibility, and offer practical guidelines using real-life and simulated datasets. Address for Correspondence: Sumithra J Mandrekar, Research Associate (Lead Statistician), Cancer Center Statistics, Kahler 1A, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA. mandrekar.sumithra@mayo.edu # Arnold / sm346oa

2 506 SJ Mandrekar and JN Mandrekar 2 Skewness indices In this section we will give a brief description of the three different procedures used to compute skewness. We only present the formulae necessary to compute these indices=test statistics and the readers are referred to Hosking 1 and Randles et al. 2 for further details about the theoretical background. Let x (1 ), x (2 ),..., x (n) be the ordered random sample of size n from a distribution of the random variable X with mean m and variance s 2. The coef cient of skewness is de ned as S 1 ˆ P n m3 (m 2 ) 3=2 ; where m iˆ1 r ˆ (x i x) r n Here, m r is the rth sample moment about the sample mean. For symmetrical distributions, S 1 has expectation 0, that is, when the data is symmetric, the sample skewness coef cient is near zero. If S 1 > 0, then the distribution is asymmetric with a positive skew and if S 1 < 0, then the distribution is asymmetric with a negative skew. The larger the absolute value of S 1, the more asymmetric is the distribution. (See Gupta 4 for a test based on this sample skewness coef cient.) The estimates of the sample L-skewness are given by 1 where S 2 ˆ l3 l 2 P n iˆ2 l 2 ˆ 2w 2 x; l 3 ˆ 6w 3 6w 2 x; w 2 ˆ (i 1)x (i) ; n(n 1) P n iˆ3 w 3 ˆ (i 1)(i 2)x (i) ; and 1 < S n(n 1)(n 2) 2 < 1: An alternative L-skewness index, S 0 2 ˆ (1 S 2)=(1 S 2 ), has also been de ned by Hosking 1 and its properties have been discussed. 1,3 The index S 0 2 is easier to interpret than S 2, as it is the ratio of the length of the upper tail to the lower tail in samples of size 3. S 0 2 therefore ranges from 0 to 1 and values of 1, >1 and <1 indicate symmetric, positively skewed and negatively skewed distributions. In the case of S 2, a value of 0 indicates symmetry, 1 < S 2 < 0 indicates a negatively skewed distribution and 0 < S 2 < 1 indicates a positively skewed distribution. Both S 1 and S 2 (S 0 2 ) are measures of skewness (whose numerical values quantify symmetry or asymmetry as the case may be), with no widely used test statistics associated with them. The distribution-free test of symmetry, however, tests if a univariate distribution is symmetric about some unknown value against a broad class of symmetric distribution alternatives. To discuss the test statistic proposed by Randles et al. 2 based on the n unordered observations of X, rst consider every triple (X i, X j, X k ), 1 µ i < j < k µ n (all the

3 notations used in this paper are consistent with the discussion given in Wolfe and Hollander 5 ). A set of three distinct observations is called a right triple when the middle observation is closer to the smallest than to the largest (and hence is skewed to the right) and is called a left triple when the middle observation is closer to the largest than to the smallest (and hence is skewed to the left). De ne f*(x i, X j, X k ) ˆ [sign(x i X j 7 2X k )] [sign(x i X k 7 2X j )] [sign(x j X k 7 2X i )], where sign(y) ˆ 1, 1, 0 if y is less than, greater than or equal to 0 respectively. If f*(x i, X j, X k ) ˆ 1, it is a right triple, it is a left triple if its value is 1, and it is neither a left nor a right triple if its value is 0. Note that the test statistic is well de ned when zeros occur in the computation of (X i X j 7 2X k ; 8 i, j, k). We then compute the following for the entire dataset: For each xed t ˆ 1,..., n, let T ˆ [number of right triples] [number of left triples] B t ˆ [number of right triples with X t ] [number of left triples with X t ] and for each xed pair (s; t); 1 µ s < t µ n, let B s;t ˆ [number of right triples with X s ; X t ] [number of left triples with X s ; X t ]: The test statistic is based on the above combinations of the number of right and left triples in the entire dataset and when n is large, its distribution is well approximated by the normal distribution. In particular, the test statistic S 3 is given by where S 3 ˆ T s ( s 2 (n 3)(n 4) X n ˆ B 2 (n 3) t (n 1)(n 2) (n 4) tˆ1 µ ¼ (n 3)(n 4)(n 5) 1 T 2 n(n 1)(n 2) X n 1 X n sˆ1 tˆs 1 Are our data symmetric? 507 B 2 n(n 1)(n 2) s;t 6 From a sample of size n, there are n C 2 distinct triples, and if the null hypothesis of symmetry holds, then we expect half of them to be right triples and half of them to be left triples. Roughly speaking, any substantial deviation in either direction (either more right triples or more left triples) is indicative of asymmetry in the underlying population. The null hypothesis of symmetry against the general alternative of asymmetry at a speci ed level of signi cance, a and large n, is rejected if j S 3 j Z a=2. Appropriate one-sided tests can be done to check for speci c deviations (right or left skewness) from symmetry (see Wolfe and Hollander 5 for further details). Although computationally intensive, the results from this test are accurate even for small sample sizes and display good power in detecting asymmetric distributions as

4 508 SJ Mandrekar and JN Mandrekar compared to sample skewness measures. 2 We have written functions for computing this test statistic as well as the L-skewness in S-PLUS 1 (version 6.0, Release 1). 3 Some considerations 3.1 Accuracy and interpretability The sample skewness coef cient is sensitive to even small changes in the tail of the distribution, whereas L-skewness and symmetry tests are sensitive to changes in the shape of the main portion (in the middle as opposed to the tail). The sample skewness coef cient is susceptible to moderate outliers in the sample since cubes of extreme deviations are highly in uential. Royston 3 further demonstrated that the sample skewness coef cient is a poor estimator of skewness in skew distributions as compared to the L-skewness, which is more reliable. The symmetry test is not effective at identifying asymmetry when sample sizes are small (<20). 2,6 Both the symmetry test and the L-skewness can provide a measure of relative skewness, whereas the sample skewness coef cient is less interpretable in terms of the distribution features. 3.2 Complexity The sample skewness coef cient is the easiest to compute, followed by the L-skewness and the symmetry test statistics. The L-skewness requires the data to be sorted in an increasing order, whereas the symmetry test requires considering every triple of observations for computing the test statistic. Although the symmetry test displays good power, when the sample size is large, it is computationally intensive. 3.3 Accessibility The sample skewness coef cient is part of many standard statistical packages and hence is accessible and also ef cient in terms of time required for computation. Our readily available S-PLUS 1 functions make it feasible to compute the L-skewness and perform the symmetry test. There are, however, trade-offs between computation time and power for the symmetry test, when the sample size is large. 4 An illustration Between January 1974 and May 1984, Mayo Clinic conducted a double-blinded randomized trial in primary biliary cirrhosis of the liver (PBC), comparing the drug D-penicillamine (DPCA) with a placebo. 7 There were 424 patients who met the eligibility criteria seen at the Clinic while the trial was open for patient registration: 312 cases had complete data, 112 cases did not participate in the trial but consented to have basic measurements recorded, and six were lost to follow-up. This dataset has been used for several purposes: estimating survival distribution, testing for differences between the drug and placebo groups, and estimating covariate effects via a regression model. The variables of interest are serum cholesterol (mg=dl), albumin (gm=dl), and triglycerides (mg=dl), where we check for symmetry using all three approaches. As a rst step, an exploratory analysis (Figure 1) revealed asymmetric distributions for all the

5 Are our data symmetric? 509 Figure 1. study. Histograms of serum cholesterol, albumin and triglycerides from the primary biliary cirrhosis (PBC) variables except albumin (which is symmetric once a few outlying values are removed). The results from the formal computations of the skewness indices (Table 1) support the visual observation that all of the variables are positively skewed except albumin, which has a slight negative skew. The removal of a few small values ( µ 2.6 gm=dl) makes the distribution of albumin symmetric (Table 1 and Figure 1). One major issue with using S 1 ; S 2 (S 0 2 ) or S 3 as skewness indicators is quanti cation. Before we proceed, it is important to note that we cannot compare across the measures as the ranges are different and hence all comparisons are made within a skewness index, across the variables. The rst thing to note is that all the variables (except albumin) are positively skewed, irrespective of which measure is used as the deviations from Table 1 Variable Skewness indices for the variables from the primary biliary cirrhosis (PBC) study Skewness indices S 1 S 2 (S 2 0 ) S 3 (stat=p-value) Serum cholesterol (2.49) (<0.001) Triglycerides (1.77) 9.13 (<0.001) Albumin (0.86) 3.02 (0.002) Albumin (>2.6) (0.98) 0.59 (0.56) p-values reported for S 3 statistics are two-sided.

6 510 SJ Mandrekar and JN Mandrekar symmetry are caused by changes in the tails as well as the middle of the distribution. For instance, S 1 is 3.39 units above zero (where zero indicates symmetry) in the case of serum cholesterol, and 2.51 units above zero in the case of triglycerides. S 2 (S 0 2 ) are 0.43 units above zero (1.49 units above one), where zero indicates symmetry for S 2 and one indicates symmetry for S 0 2 for serum cholesterol, and 0.28 units above zero (0.77 units above one) for triglycerides. If we go by the magnitude of either the sample skewness coef cient and=or L-skewness, would it mean that the distribution of serum cholesterol is more positively skewed than triglycerides and if so, by how much more or less? Similar is the case with S 3, where the p-values for both serum cholesterol and triglycerides are very small, thus indicating the presence of asymmetry in the distribution. Under these circumstances, the L-skewness index, S 0 2, can provide some indication of the relative skewness, as discussed below. Let us explore the triglyceride variable a little further. In addition to the histogram of the raw data (Figure 1), a quantile quantile (Q Q) plot of triglycerides on the linear and logarithmic scale (Figure 2) shows that a log-transformation makes the distribution of triglycerides normal (symmetric). Computation of the skewness indices (S 1 ˆ 0:35; S 2 ˆ 0:05(S 0 2 ˆ 1:10) or S 3 ˆ 1:57; p-value ˆ 0:06) on the log-transformed variable provides evidence of considerable reduction in the asymmetry. In terms of interpretability, the L-skewness index, S 0 2, of the untransformed triglyceride variable Figure 2. Q Q plots of triglyceride variable on linear and logarithmic scales.

7 Are our data symmetric? 511 Table 2 Skewness indices for the simulations from a standard normal distribution Sample size Skewness indices S 1 S 2 (S 2 0 ) S 3 (stat=p-value) (0.04) 0.22 (0.83) (0.48) 0.28 (0.78) (2.99) 0.69 (0.49) (3.7) 0.36 (0.72) (9.4) 1.16 (0.25) p-values reported for S 3 statistics are two-sided. suggests that the upper tail (in samples of size 3) is about two times longer than the lower tail, and in the log-transformed variable, this ratio is only about 10%. 5 Are these necessary and su cient? As a simple illustration, we generate random samples of size 10, 30, 75, 100 and 250 from a standard normal distribution (mean ˆ 0, variance ˆ 1), and compute the skewness indices based on all the three methods (Table 2). Based on the values for sample skewness coef cient and L-skewness, we conclude that the data is not symmetric for any sample size. Now, these samples are generated from a standard normal distribution (a symmetric distribution); however, the Q Q plots (Figure 3 gives Q Q plots for sample sizes 30 and 250) show the presence of longer tails and slight deviations toward the middle of the distribution as compared to a standard normal distribution. Since there are changes in both the tail and the middle of the distribution, it is captured by both the sample skewness coef cient and the L-skewness indices and interpreted as being asymmetric. Remember that the sample skewness coef cient is very sensitive to even small changes in the extremes, whereas the L-skewness is responsive when there are overall changes to the shape of the distribution, particularly in the middle. Here is a classic case where we should not make decisions on data transformations to meet the normality assumption (as in our case, the underlying data is already normal) based on these two skewness measures. Also, there is a lack of consistency across the two skewness indices in terms of conclusions of asymmetry. Based on S 1, all of the data except the rst and the last (n ˆ 10, 250) show evidence of a negative skew, whereas based on S 2 (S 0 2 ), only the rst two (n ˆ 10, 30) are classi ed as negatively skewed. Based on the symmetry test, we conclude that the data are symmetric for any sample size, which is to be expected as all of the samples are generated from a normal (symmetric) distribution. 6 Discussion The big question is: which measure is appropriate? It probably suf ces to say that this is situation dependent, particularly on how important the symmetry of the underlying

8 512 SJ Mandrekar and JN Mandrekar Figure 3. Sample Q Q plots for the simulated normal random variables. Top: n ˆ 30; bottom: n ˆ 250. data is for the purposes of the study. The rst step would still be to do a quick plotting of the data to get a sense of its distribution. As we have shown, several factors like sample size, interpreta bility, complexity, and accessibility play a vital role in the selection of the skewness measure. Each measure has its own share of positives and negatives. The sample skewness coef cient and the L-skewness are both readily available (although the L-skewness is not routinely produced as part of a statistica l output, but can be coded easily and quickly) and computationally less intensive compared to the symmetry test. Between them, it has been shown that the L-skewness is more interpreta ble and less sensitive to extreme deviations in the tails. The symmetry test displays good power in detecting asymmetry against a broad class of symmetric distribution alternatives. We therefore propose that if complexity and computational time are not constraints (mainly in the case of large sample sizes), then the symmetry test is considerably better than either of the summary skewness measures. References 1 Hosking JRM. L-moments: analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society, Series B 1990; 52: Randles RH, Fligner MA, Policello GE II, Wolfe DA. An asymptotically distribution-free test for symmetry versus asymmetry. Journal of the American Statistical Association 1980; 369(75):

9 Are our data symmetric? Royston P. Which measures of skewness and kurtosis are best? Statistics in Medicine 1990; 11: Gupta MK. An asymptotically nonparametric test of symmetry. Annals of Mathematical Statistics 1967; 38(3): Wolfe DA, Hollander M. Nonparametric Statistical Methods, 2nd edn. New York: John Wiley, Davis CE, Quade D. U-statistics for skewness or symmetry. Communications of Statistics Theory and Methods 1978; A7(5): Flemming TR, Harrington DP. Counting processes and survival analysis. New York: John Wiley, 1991.

MEASURES OF LOCATION AND SPREAD

MEASURES OF LOCATION AND SPREAD Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

NCSS Statistical Software. One-Sample T-Test

NCSS Statistical Software. One-Sample T-Test Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,

More information

Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

More information

HEDGE FUND RETURNS WEIGHTED-SYMMETRY AND THE OMEGA PERFORMANCE MEASURE

HEDGE FUND RETURNS WEIGHTED-SYMMETRY AND THE OMEGA PERFORMANCE MEASURE HEDGE FUND RETURNS WEIGHTED-SYMMETRY AND THE OMEGA PERFORMANCE MEASURE by Pierre Laroche, Innocap Investment Management and Bruno Rémillard, Department of Management Sciences, HEC Montréal. Introduction

More information

1 Another method of estimation: least squares

1 Another method of estimation: least squares 1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i

More information

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION Chin-Diew Lai, Department of Statistics, Massey University, New Zealand John C W Rayner, School of Mathematics and Applied Statistics,

More information

Data Transforms: Natural Logarithms and Square Roots

Data Transforms: Natural Logarithms and Square Roots Data Transforms: atural Log and Square Roots 1 Data Transforms: atural Logarithms and Square Roots Parametric statistics in general are more powerful than non-parametric statistics as the former are based

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

How To Test For Significance On A Data Set

How To Test For Significance On A Data Set Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Paired T-Test. Chapter 208. Introduction. Technical Details. Research Questions

Paired T-Test. Chapter 208. Introduction. Technical Details. Research Questions Chapter 208 Introduction This procedure provides several reports for making inference about the difference between two population means based on a paired sample. These reports include confidence intervals

More information

HYPOTHESIS TESTING: POWER OF THE TEST

HYPOTHESIS TESTING: POWER OF THE TEST HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,

More information

CONFIDENCE INTERVALS FOR COST EFFECTIVENESS RATIOS: A COMPARISON OF FOUR METHODS

CONFIDENCE INTERVALS FOR COST EFFECTIVENESS RATIOS: A COMPARISON OF FOUR METHODS HEALTH ECONOMICS, VOL. 6: 243 252 (1997) ECONOMIC EVALUATION CONFIDENCE INTERVALS FOR COST EFFECTIVENESS RATIOS: A COMPARISON OF FOUR METHODS DANIEL POLSKY 1, HENRY A. GLICK 1 *, RICHARD WILLKE 2 AND KEVIN

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g., CHAPTER 13 Nonparametric and Distribution-Free Statistics Nonparametric tests these test hypotheses that are not statements about population parameters (e.g., 2 tests for goodness of fit and independence).

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION

SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION STATISTICS IN MEDICINE, VOL. 8, 795-802 (1989) SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION F. Y. HSIEH* Department of Epidemiology and Social Medicine, Albert Einstein College of Medicine, Bronx, N Y 10461,

More information

Lesson 4 Measures of Central Tendency

Lesson 4 Measures of Central Tendency Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Nonparametric Tests for Randomness

Nonparametric Tests for Randomness ECE 461 PROJECT REPORT, MAY 2003 1 Nonparametric Tests for Randomness Ying Wang ECE 461 PROJECT REPORT, MAY 2003 2 Abstract To decide whether a given sequence is truely random, or independent and identically

More information

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13 COMMON DESCRIPTIVE STATISTICS / 13 CHAPTER THREE COMMON DESCRIPTIVE STATISTICS The analysis of data begins with descriptive statistics such as the mean, median, mode, range, standard deviation, variance,

More information

Non-Parametric Tests (I)

Non-Parametric Tests (I) Lecture 5: Non-Parametric Tests (I) KimHuat LIM lim@stats.ox.ac.uk http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent

More information

Descriptive statistics parameters: Measures of centrality

Descriptive statistics parameters: Measures of centrality Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between

More information

T-test & factor analysis

T-test & factor analysis Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Non-Inferiority Tests for Two Means using Differences

Non-Inferiority Tests for Two Means using Differences Chapter 450 on-inferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for non-inferiority tests in two-sample designs in which the outcome is a continuous

More information

Tests for Two Proportions

Tests for Two Proportions Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA

A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University Agency Internal User Unmasked Result Subjects

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

The Variability of P-Values. Summary

The Variability of P-Values. Summary The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly

More information

Variables Control Charts

Variables Control Charts MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

More information

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Tests for Two Survival Curves Using Cox s Proportional Hazards Model Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

More information

Using kernel methods to visualise crime data

Using kernel methods to visualise crime data Submission for the 2013 IAOS Prize for Young Statisticians Using kernel methods to visualise crime data Dr. Kieran Martin and Dr. Martin Ralphs kieran.martin@ons.gov.uk martin.ralphs@ons.gov.uk Office

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

Testing for differences I exercises with SPSS

Testing for differences I exercises with SPSS Testing for differences I exercises with SPSS Introduction The exercises presented here are all about the t-test and its non-parametric equivalents in their various forms. In SPSS, all these tests can

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1. Normal Distribution Definition A continuous random variable has a normal distribution if its probability density e -(y -µ Y ) 2 2 / 2 σ function can be written as for < y < as Y f ( y ) = 1 σ Y 2 π Notation:

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

7 Generalized Estimating Equations

7 Generalized Estimating Equations Chapter 7 The procedure extends the generalized linear model to allow for analysis of repeated measurements or other correlated observations, such as clustered data. Example. Public health of cials can

More information

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

More information

Lecture 2. Summarizing the Sample

Lecture 2. Summarizing the Sample Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

More information

Non-Inferiority Tests for Two Proportions

Non-Inferiority Tests for Two Proportions Chapter 0 Non-Inferiority Tests for Two Proportions Introduction This module provides power analysis and sample size calculation for non-inferiority and superiority tests in twosample designs in which

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Dongfeng Li. Autumn 2010

Dongfeng Li. Autumn 2010 Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

Measuring Skewness: A Forgotten Statistic?

Measuring Skewness: A Forgotten Statistic? Measuring Skewness: A Forgotten Statistic? David P. Doane Oakland University Lori E. Seward University of Colorado Journal of Statistics Education Volume 19, Number 2(2011), www.amstat.org/publications/jse/v19n2/doane.pdf

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

STATISTICAL SIGNIFICANCE OF RANKING PARADOXES

STATISTICAL SIGNIFICANCE OF RANKING PARADOXES STATISTICAL SIGNIFICANCE OF RANKING PARADOXES Anna E. Bargagliotti and Raymond N. Greenwell Department of Mathematical Sciences and Department of Mathematics University of Memphis and Hofstra University

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 1. Which of the following will increase the value of the power in a statistical test

More information

The Assumption(s) of Normality

The Assumption(s) of Normality The Assumption(s) of Normality Copyright 2000, 2011, J. Toby Mordkoff This is very complicated, so I ll provide two versions. At a minimum, you should know the short one. It would be great if you knew

More information

Stat 5102 Notes: Nonparametric Tests and. confidence interval

Stat 5102 Notes: Nonparametric Tests and. confidence interval Stat 510 Notes: Nonparametric Tests and Confidence Intervals Charles J. Geyer April 13, 003 This handout gives a brief introduction to nonparametrics, which is what you do when you don t believe the assumptions

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012] Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics References Some good references for the topics in this course are 1. Higgins, James (2004), Introduction to Nonparametric Statistics 2. Hollander and Wolfe, (1999), Nonparametric

More information

Department of Economics

Department of Economics Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1 Hypothesis testing So far, we ve talked about inference from the point of estimation. We ve tried to answer questions like What is a good estimate for a typical value? or How much variability is there

More information

Algebra 1 Course Information

Algebra 1 Course Information Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

More information

Duration and Bond Price Volatility: Some Further Results

Duration and Bond Price Volatility: Some Further Results JOURNAL OF ECONOMICS AND FINANCE EDUCATION Volume 4 Summer 2005 Number 1 Duration and Bond Price Volatility: Some Further Results Hassan Shirvani 1 and Barry Wilbratte 2 Abstract This paper evaluates the

More information

Monte Carlo analysis used for Contingency estimating.

Monte Carlo analysis used for Contingency estimating. Monte Carlo analysis used for Contingency estimating. Author s identification number: Date of authorship: July 24, 2007 Page: 1 of 15 TABLE OF CONTENTS: LIST OF TABLES:...3 LIST OF FIGURES:...3 ABSTRACT:...4

More information

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.) Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

1.7 Graphs of Functions

1.7 Graphs of Functions 64 Relations and Functions 1.7 Graphs of Functions In Section 1.4 we defined a function as a special type of relation; one in which each x-coordinate was matched with only one y-coordinate. We spent most

More information