Are our data symmetric?
|
|
- Gerald Jefferson
- 7 years ago
- Views:
Transcription
1 Statistical Methods in Medical Research 2003; 12: 505^513 Are our data symmetric? Sumithra J Mandrekar and Jayawant N Mandrekar Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA Skewness indicates a lack of symmetry in a distribution. Knowing the symmetry of the underlying data is essential for parametric analysis, tting distributions or doing transformations to the data. The coef cient of skewness is the commonly used measure to identify a lack of symmetry in the underlying data, although graphical procedures can also be effective. We discuss three different methods to assess skewness: traditional coef cient of skewness index, skewness index based on the L-moments discussed by Hosking and the asymptotic test of symmetry developed by Randles et al. With this work, we provide easy-toimplement S-PLUS 1 functions as well as discuss the advantages and shortcomings of each technique. 1 Introduction The rst step in any statistical analysis includes summarizing the characteristics of the underlying data. All standard statistica l packages routinely provide summary statistics information, and this often includes a sample skewness score, which is a measure of symmetry. Symmetry is a rather complex property of probability distributions and it is dif cult to identify deviations from it in a small number of observations. Broadly speaking, a dataset or a distribution is said to be symmetric if it looks the same to the right and left of the center point. One of the numerous reasons for checking symmetry in a given dataset is that many statistica l tests rely strongly on the assumption of normality, which in turn relies on symmetry. Thus, a skewness measure can provide valuable information on issues such as data transformation, outlier detection, distribution tting, and so on, so as to ensure that an appropriate analysis procedure (parametric versus nonparametric) is employed. In our paper we compare three different methods used to assess skewness: traditional coef cient of skewness index, skewness index based on the L-moments discussed by Hosking, 1 and the asymptotic distribution-free test of symmetry (symmetry test) developed by Randles et al. 2 Royston 3 has demonstrated that the L-moments-based skewness measure has no serious drawbacks in practical data applications whereas the traditional coef cient of skewness index suffers from several theoretical and practical disadvantages. Further, using a Monte Carlo study, Randles et al. 2 showed that the nonparametric test of symmetry is superior to the test based on the sample skewness index. In the past, each of these indices have been explored individually and we aim to compare these three competitors based on their complexity, accuracy and accessibility, and offer practical guidelines using real-life and simulated datasets. Address for Correspondence: Sumithra J Mandrekar, Research Associate (Lead Statistician), Cancer Center Statistics, Kahler 1A, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA. mandrekar.sumithra@mayo.edu # Arnold / sm346oa
2 506 SJ Mandrekar and JN Mandrekar 2 Skewness indices In this section we will give a brief description of the three different procedures used to compute skewness. We only present the formulae necessary to compute these indices=test statistics and the readers are referred to Hosking 1 and Randles et al. 2 for further details about the theoretical background. Let x (1 ), x (2 ),..., x (n) be the ordered random sample of size n from a distribution of the random variable X with mean m and variance s 2. The coef cient of skewness is de ned as S 1 ˆ P n m3 (m 2 ) 3=2 ; where m iˆ1 r ˆ (x i x) r n Here, m r is the rth sample moment about the sample mean. For symmetrical distributions, S 1 has expectation 0, that is, when the data is symmetric, the sample skewness coef cient is near zero. If S 1 > 0, then the distribution is asymmetric with a positive skew and if S 1 < 0, then the distribution is asymmetric with a negative skew. The larger the absolute value of S 1, the more asymmetric is the distribution. (See Gupta 4 for a test based on this sample skewness coef cient.) The estimates of the sample L-skewness are given by 1 where S 2 ˆ l3 l 2 P n iˆ2 l 2 ˆ 2w 2 x; l 3 ˆ 6w 3 6w 2 x; w 2 ˆ (i 1)x (i) ; n(n 1) P n iˆ3 w 3 ˆ (i 1)(i 2)x (i) ; and 1 < S n(n 1)(n 2) 2 < 1: An alternative L-skewness index, S 0 2 ˆ (1 S 2)=(1 S 2 ), has also been de ned by Hosking 1 and its properties have been discussed. 1,3 The index S 0 2 is easier to interpret than S 2, as it is the ratio of the length of the upper tail to the lower tail in samples of size 3. S 0 2 therefore ranges from 0 to 1 and values of 1, >1 and <1 indicate symmetric, positively skewed and negatively skewed distributions. In the case of S 2, a value of 0 indicates symmetry, 1 < S 2 < 0 indicates a negatively skewed distribution and 0 < S 2 < 1 indicates a positively skewed distribution. Both S 1 and S 2 (S 0 2 ) are measures of skewness (whose numerical values quantify symmetry or asymmetry as the case may be), with no widely used test statistics associated with them. The distribution-free test of symmetry, however, tests if a univariate distribution is symmetric about some unknown value against a broad class of symmetric distribution alternatives. To discuss the test statistic proposed by Randles et al. 2 based on the n unordered observations of X, rst consider every triple (X i, X j, X k ), 1 µ i < j < k µ n (all the
3 notations used in this paper are consistent with the discussion given in Wolfe and Hollander 5 ). A set of three distinct observations is called a right triple when the middle observation is closer to the smallest than to the largest (and hence is skewed to the right) and is called a left triple when the middle observation is closer to the largest than to the smallest (and hence is skewed to the left). De ne f*(x i, X j, X k ) ˆ [sign(x i X j 7 2X k )] [sign(x i X k 7 2X j )] [sign(x j X k 7 2X i )], where sign(y) ˆ 1, 1, 0 if y is less than, greater than or equal to 0 respectively. If f*(x i, X j, X k ) ˆ 1, it is a right triple, it is a left triple if its value is 1, and it is neither a left nor a right triple if its value is 0. Note that the test statistic is well de ned when zeros occur in the computation of (X i X j 7 2X k ; 8 i, j, k). We then compute the following for the entire dataset: For each xed t ˆ 1,..., n, let T ˆ [number of right triples] [number of left triples] B t ˆ [number of right triples with X t ] [number of left triples with X t ] and for each xed pair (s; t); 1 µ s < t µ n, let B s;t ˆ [number of right triples with X s ; X t ] [number of left triples with X s ; X t ]: The test statistic is based on the above combinations of the number of right and left triples in the entire dataset and when n is large, its distribution is well approximated by the normal distribution. In particular, the test statistic S 3 is given by where S 3 ˆ T s ( s 2 (n 3)(n 4) X n ˆ B 2 (n 3) t (n 1)(n 2) (n 4) tˆ1 µ ¼ (n 3)(n 4)(n 5) 1 T 2 n(n 1)(n 2) X n 1 X n sˆ1 tˆs 1 Are our data symmetric? 507 B 2 n(n 1)(n 2) s;t 6 From a sample of size n, there are n C 2 distinct triples, and if the null hypothesis of symmetry holds, then we expect half of them to be right triples and half of them to be left triples. Roughly speaking, any substantial deviation in either direction (either more right triples or more left triples) is indicative of asymmetry in the underlying population. The null hypothesis of symmetry against the general alternative of asymmetry at a speci ed level of signi cance, a and large n, is rejected if j S 3 j Z a=2. Appropriate one-sided tests can be done to check for speci c deviations (right or left skewness) from symmetry (see Wolfe and Hollander 5 for further details). Although computationally intensive, the results from this test are accurate even for small sample sizes and display good power in detecting asymmetric distributions as
4 508 SJ Mandrekar and JN Mandrekar compared to sample skewness measures. 2 We have written functions for computing this test statistic as well as the L-skewness in S-PLUS 1 (version 6.0, Release 1). 3 Some considerations 3.1 Accuracy and interpretability The sample skewness coef cient is sensitive to even small changes in the tail of the distribution, whereas L-skewness and symmetry tests are sensitive to changes in the shape of the main portion (in the middle as opposed to the tail). The sample skewness coef cient is susceptible to moderate outliers in the sample since cubes of extreme deviations are highly in uential. Royston 3 further demonstrated that the sample skewness coef cient is a poor estimator of skewness in skew distributions as compared to the L-skewness, which is more reliable. The symmetry test is not effective at identifying asymmetry when sample sizes are small (<20). 2,6 Both the symmetry test and the L-skewness can provide a measure of relative skewness, whereas the sample skewness coef cient is less interpretable in terms of the distribution features. 3.2 Complexity The sample skewness coef cient is the easiest to compute, followed by the L-skewness and the symmetry test statistics. The L-skewness requires the data to be sorted in an increasing order, whereas the symmetry test requires considering every triple of observations for computing the test statistic. Although the symmetry test displays good power, when the sample size is large, it is computationally intensive. 3.3 Accessibility The sample skewness coef cient is part of many standard statistical packages and hence is accessible and also ef cient in terms of time required for computation. Our readily available S-PLUS 1 functions make it feasible to compute the L-skewness and perform the symmetry test. There are, however, trade-offs between computation time and power for the symmetry test, when the sample size is large. 4 An illustration Between January 1974 and May 1984, Mayo Clinic conducted a double-blinded randomized trial in primary biliary cirrhosis of the liver (PBC), comparing the drug D-penicillamine (DPCA) with a placebo. 7 There were 424 patients who met the eligibility criteria seen at the Clinic while the trial was open for patient registration: 312 cases had complete data, 112 cases did not participate in the trial but consented to have basic measurements recorded, and six were lost to follow-up. This dataset has been used for several purposes: estimating survival distribution, testing for differences between the drug and placebo groups, and estimating covariate effects via a regression model. The variables of interest are serum cholesterol (mg=dl), albumin (gm=dl), and triglycerides (mg=dl), where we check for symmetry using all three approaches. As a rst step, an exploratory analysis (Figure 1) revealed asymmetric distributions for all the
5 Are our data symmetric? 509 Figure 1. study. Histograms of serum cholesterol, albumin and triglycerides from the primary biliary cirrhosis (PBC) variables except albumin (which is symmetric once a few outlying values are removed). The results from the formal computations of the skewness indices (Table 1) support the visual observation that all of the variables are positively skewed except albumin, which has a slight negative skew. The removal of a few small values ( µ 2.6 gm=dl) makes the distribution of albumin symmetric (Table 1 and Figure 1). One major issue with using S 1 ; S 2 (S 0 2 ) or S 3 as skewness indicators is quanti cation. Before we proceed, it is important to note that we cannot compare across the measures as the ranges are different and hence all comparisons are made within a skewness index, across the variables. The rst thing to note is that all the variables (except albumin) are positively skewed, irrespective of which measure is used as the deviations from Table 1 Variable Skewness indices for the variables from the primary biliary cirrhosis (PBC) study Skewness indices S 1 S 2 (S 2 0 ) S 3 (stat=p-value) Serum cholesterol (2.49) (<0.001) Triglycerides (1.77) 9.13 (<0.001) Albumin (0.86) 3.02 (0.002) Albumin (>2.6) (0.98) 0.59 (0.56) p-values reported for S 3 statistics are two-sided.
6 510 SJ Mandrekar and JN Mandrekar symmetry are caused by changes in the tails as well as the middle of the distribution. For instance, S 1 is 3.39 units above zero (where zero indicates symmetry) in the case of serum cholesterol, and 2.51 units above zero in the case of triglycerides. S 2 (S 0 2 ) are 0.43 units above zero (1.49 units above one), where zero indicates symmetry for S 2 and one indicates symmetry for S 0 2 for serum cholesterol, and 0.28 units above zero (0.77 units above one) for triglycerides. If we go by the magnitude of either the sample skewness coef cient and=or L-skewness, would it mean that the distribution of serum cholesterol is more positively skewed than triglycerides and if so, by how much more or less? Similar is the case with S 3, where the p-values for both serum cholesterol and triglycerides are very small, thus indicating the presence of asymmetry in the distribution. Under these circumstances, the L-skewness index, S 0 2, can provide some indication of the relative skewness, as discussed below. Let us explore the triglyceride variable a little further. In addition to the histogram of the raw data (Figure 1), a quantile quantile (Q Q) plot of triglycerides on the linear and logarithmic scale (Figure 2) shows that a log-transformation makes the distribution of triglycerides normal (symmetric). Computation of the skewness indices (S 1 ˆ 0:35; S 2 ˆ 0:05(S 0 2 ˆ 1:10) or S 3 ˆ 1:57; p-value ˆ 0:06) on the log-transformed variable provides evidence of considerable reduction in the asymmetry. In terms of interpretability, the L-skewness index, S 0 2, of the untransformed triglyceride variable Figure 2. Q Q plots of triglyceride variable on linear and logarithmic scales.
7 Are our data symmetric? 511 Table 2 Skewness indices for the simulations from a standard normal distribution Sample size Skewness indices S 1 S 2 (S 2 0 ) S 3 (stat=p-value) (0.04) 0.22 (0.83) (0.48) 0.28 (0.78) (2.99) 0.69 (0.49) (3.7) 0.36 (0.72) (9.4) 1.16 (0.25) p-values reported for S 3 statistics are two-sided. suggests that the upper tail (in samples of size 3) is about two times longer than the lower tail, and in the log-transformed variable, this ratio is only about 10%. 5 Are these necessary and su cient? As a simple illustration, we generate random samples of size 10, 30, 75, 100 and 250 from a standard normal distribution (mean ˆ 0, variance ˆ 1), and compute the skewness indices based on all the three methods (Table 2). Based on the values for sample skewness coef cient and L-skewness, we conclude that the data is not symmetric for any sample size. Now, these samples are generated from a standard normal distribution (a symmetric distribution); however, the Q Q plots (Figure 3 gives Q Q plots for sample sizes 30 and 250) show the presence of longer tails and slight deviations toward the middle of the distribution as compared to a standard normal distribution. Since there are changes in both the tail and the middle of the distribution, it is captured by both the sample skewness coef cient and the L-skewness indices and interpreted as being asymmetric. Remember that the sample skewness coef cient is very sensitive to even small changes in the extremes, whereas the L-skewness is responsive when there are overall changes to the shape of the distribution, particularly in the middle. Here is a classic case where we should not make decisions on data transformations to meet the normality assumption (as in our case, the underlying data is already normal) based on these two skewness measures. Also, there is a lack of consistency across the two skewness indices in terms of conclusions of asymmetry. Based on S 1, all of the data except the rst and the last (n ˆ 10, 250) show evidence of a negative skew, whereas based on S 2 (S 0 2 ), only the rst two (n ˆ 10, 30) are classi ed as negatively skewed. Based on the symmetry test, we conclude that the data are symmetric for any sample size, which is to be expected as all of the samples are generated from a normal (symmetric) distribution. 6 Discussion The big question is: which measure is appropriate? It probably suf ces to say that this is situation dependent, particularly on how important the symmetry of the underlying
8 512 SJ Mandrekar and JN Mandrekar Figure 3. Sample Q Q plots for the simulated normal random variables. Top: n ˆ 30; bottom: n ˆ 250. data is for the purposes of the study. The rst step would still be to do a quick plotting of the data to get a sense of its distribution. As we have shown, several factors like sample size, interpreta bility, complexity, and accessibility play a vital role in the selection of the skewness measure. Each measure has its own share of positives and negatives. The sample skewness coef cient and the L-skewness are both readily available (although the L-skewness is not routinely produced as part of a statistica l output, but can be coded easily and quickly) and computationally less intensive compared to the symmetry test. Between them, it has been shown that the L-skewness is more interpreta ble and less sensitive to extreme deviations in the tails. The symmetry test displays good power in detecting asymmetry against a broad class of symmetric distribution alternatives. We therefore propose that if complexity and computational time are not constraints (mainly in the case of large sample sizes), then the symmetry test is considerably better than either of the summary skewness measures. References 1 Hosking JRM. L-moments: analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society, Series B 1990; 52: Randles RH, Fligner MA, Policello GE II, Wolfe DA. An asymptotically distribution-free test for symmetry versus asymmetry. Journal of the American Statistical Association 1980; 369(75):
9 Are our data symmetric? Royston P. Which measures of skewness and kurtosis are best? Statistics in Medicine 1990; 11: Gupta MK. An asymptotically nonparametric test of symmetry. Annals of Mathematical Statistics 1967; 38(3): Wolfe DA, Hollander M. Nonparametric Statistical Methods, 2nd edn. New York: John Wiley, Davis CE, Quade D. U-statistics for skewness or symmetry. Communications of Statistics Theory and Methods 1978; A7(5): Flemming TR, Harrington DP. Counting processes and survival analysis. New York: John Wiley, 1991.
MEASURES OF LOCATION AND SPREAD
Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
More informationPermutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationDescriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
More informationINTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationIntroduction to Statistics for Psychology. Quantitative Methods for Human Sciences
Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html
More informationNCSS Statistical Software. One-Sample T-Test
Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,
More informationRank-Based Non-Parametric Tests
Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs
More informationHEDGE FUND RETURNS WEIGHTED-SYMMETRY AND THE OMEGA PERFORMANCE MEASURE
HEDGE FUND RETURNS WEIGHTED-SYMMETRY AND THE OMEGA PERFORMANCE MEASURE by Pierre Laroche, Innocap Investment Management and Bruno Rémillard, Department of Management Sciences, HEC Montréal. Introduction
More information1 Another method of estimation: least squares
1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i
More informationExact Nonparametric Tests for Comparing Means - A Personal Summary
Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationUNIVERSITY OF NAIROBI
UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationStatistics courses often teach the two-sample t-test, linear regression, and analysis of variance
2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample
More informationFactor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationPROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION
PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION Chin-Diew Lai, Department of Statistics, Massey University, New Zealand John C W Rayner, School of Mathematics and Applied Statistics,
More informationData Transforms: Natural Logarithms and Square Roots
Data Transforms: atural Log and Square Roots 1 Data Transforms: atural Logarithms and Square Roots Parametric statistics in general are more powerful than non-parametric statistics as the former are based
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationHow To Test For Significance On A Data Set
Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationPaired T-Test. Chapter 208. Introduction. Technical Details. Research Questions
Chapter 208 Introduction This procedure provides several reports for making inference about the difference between two population means based on a paired sample. These reports include confidence intervals
More informationHYPOTHESIS TESTING: POWER OF THE TEST
HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,
More informationCONFIDENCE INTERVALS FOR COST EFFECTIVENESS RATIOS: A COMPARISON OF FOUR METHODS
HEALTH ECONOMICS, VOL. 6: 243 252 (1997) ECONOMIC EVALUATION CONFIDENCE INTERVALS FOR COST EFFECTIVENESS RATIOS: A COMPARISON OF FOUR METHODS DANIEL POLSKY 1, HENRY A. GLICK 1 *, RICHARD WILLKE 2 AND KEVIN
More informationQUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.
More informationNonparametric tests these test hypotheses that are not statements about population parameters (e.g.,
CHAPTER 13 Nonparametric and Distribution-Free Statistics Nonparametric tests these test hypotheses that are not statements about population parameters (e.g., 2 tests for goodness of fit and independence).
More informationChapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
More informationSAMPLE SIZE TABLES FOR LOGISTIC REGRESSION
STATISTICS IN MEDICINE, VOL. 8, 795-802 (1989) SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION F. Y. HSIEH* Department of Epidemiology and Social Medicine, Albert Einstein College of Medicine, Bronx, N Y 10461,
More informationLesson 4 Measures of Central Tendency
Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationNonparametric Tests for Randomness
ECE 461 PROJECT REPORT, MAY 2003 1 Nonparametric Tests for Randomness Ying Wang ECE 461 PROJECT REPORT, MAY 2003 2 Abstract To decide whether a given sequence is truely random, or independent and identically
More informationCHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13
COMMON DESCRIPTIVE STATISTICS / 13 CHAPTER THREE COMMON DESCRIPTIVE STATISTICS The analysis of data begins with descriptive statistics such as the mean, median, mode, range, standard deviation, variance,
More informationNon-Parametric Tests (I)
Lecture 5: Non-Parametric Tests (I) KimHuat LIM lim@stats.ox.ac.uk http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent
More informationDescriptive statistics parameters: Measures of centrality
Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between
More informationT-test & factor analysis
Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More informationNon-Inferiority Tests for Two Means using Differences
Chapter 450 on-inferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for non-inferiority tests in two-sample designs in which the outcome is a continuous
More informationTests for Two Proportions
Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics
More informationHow To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
More informationA THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA
A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University Agency Internal User Unmasked Result Subjects
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
More information1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
More informationDESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS
DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationBNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
More informationThe Variability of P-Values. Summary
The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationThe right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median
CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box
More informationUNDERSTANDING THE INDEPENDENT-SAMPLES t TEST
UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly
More informationVariables Control Charts
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables
More informationTests for Two Survival Curves Using Cox s Proportional Hazards Model
Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.
More informationUsing kernel methods to visualise crime data
Submission for the 2013 IAOS Prize for Young Statisticians Using kernel methods to visualise crime data Dr. Kieran Martin and Dr. Martin Ralphs kieran.martin@ons.gov.uk martin.ralphs@ons.gov.uk Office
More informationWeek 1. Exploratory Data Analysis
Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam
More informationSample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
More informationHow To Write A Data Analysis
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
More informationTwo-Sample T-Tests Assuming Equal Variance (Enter Means)
Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of
More informationTesting for differences I exercises with SPSS
Testing for differences I exercises with SPSS Introduction The exercises presented here are all about the t-test and its non-parametric equivalents in their various forms. In SPSS, all these tests can
More informationLecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
More informationComparing Means in Two Populations
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
More informationNormal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.
Normal Distribution Definition A continuous random variable has a normal distribution if its probability density e -(y -µ Y ) 2 2 / 2 σ function can be written as for < y < as Y f ( y ) = 1 σ Y 2 π Notation:
More informationTwo-Sample T-Tests Allowing Unequal Variance (Enter Difference)
Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption
More informationChapter 7 Section 7.1: Inference for the Mean of a Population
Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used
More information7 Generalized Estimating Equations
Chapter 7 The procedure extends the generalized linear model to allow for analysis of repeated measurements or other correlated observations, such as clustered data. Example. Public health of cials can
More informationHISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS
Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS
More informationLecture 2. Summarizing the Sample
Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting
More informationNon-Inferiority Tests for Two Proportions
Chapter 0 Non-Inferiority Tests for Two Proportions Introduction This module provides power analysis and sample size calculation for non-inferiority and superiority tests in twosample designs in which
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationDongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationMeasuring Skewness: A Forgotten Statistic?
Measuring Skewness: A Forgotten Statistic? David P. Doane Oakland University Lori E. Seward University of Colorado Journal of Statistics Education Volume 19, Number 2(2011), www.amstat.org/publications/jse/v19n2/doane.pdf
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
More informationSTATISTICAL SIGNIFICANCE OF RANKING PARADOXES
STATISTICAL SIGNIFICANCE OF RANKING PARADOXES Anna E. Bargagliotti and Raymond N. Greenwell Department of Mathematical Sciences and Department of Mathematics University of Memphis and Hofstra University
More informationChapter 1 Introduction. 1.1 Introduction
Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations
More informationBA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420
BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 1. Which of the following will increase the value of the power in a statistical test
More informationThe Assumption(s) of Normality
The Assumption(s) of Normality Copyright 2000, 2011, J. Toby Mordkoff This is very complicated, so I ll provide two versions. At a minimum, you should know the short one. It would be great if you knew
More informationStat 5102 Notes: Nonparametric Tests and. confidence interval
Stat 510 Notes: Nonparametric Tests and Confidence Intervals Charles J. Geyer April 13, 003 This handout gives a brief introduction to nonparametrics, which is what you do when you don t believe the assumptions
More informationMBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
More informationPie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.
Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationSurvival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]
Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance
More informationNonparametric Statistics
Nonparametric Statistics References Some good references for the topics in this course are 1. Higgins, James (2004), Introduction to Nonparametric Statistics 2. Hollander and Wolfe, (1999), Nonparametric
More informationDepartment of Economics
Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional
More informationInterpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
More informationQuantitative Methods for Finance
Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain
More informationThis unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.
Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course
More informationHypothesis testing. c 2014, Jeffrey S. Simonoff 1
Hypothesis testing So far, we ve talked about inference from the point of estimation. We ve tried to answer questions like What is a good estimate for a typical value? or How much variability is there
More informationAlgebra 1 Course Information
Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationProjects Involving Statistics (& SPSS)
Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,
More informationIntroduction to Statistics and Quantitative Research Methods
Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.
More informationDuration and Bond Price Volatility: Some Further Results
JOURNAL OF ECONOMICS AND FINANCE EDUCATION Volume 4 Summer 2005 Number 1 Duration and Bond Price Volatility: Some Further Results Hassan Shirvani 1 and Barry Wilbratte 2 Abstract This paper evaluates the
More informationMonte Carlo analysis used for Contingency estimating.
Monte Carlo analysis used for Contingency estimating. Author s identification number: Date of authorship: July 24, 2007 Page: 1 of 15 TABLE OF CONTENTS: LIST OF TABLES:...3 LIST OF FIGURES:...3 ABSTRACT:...4
More informationCenter: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)
Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center
More informationBiostatistics: Types of Data Analysis
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
More information1.7 Graphs of Functions
64 Relations and Functions 1.7 Graphs of Functions In Section 1.4 we defined a function as a special type of relation; one in which each x-coordinate was matched with only one y-coordinate. We spent most
More information