# CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

Size: px
Start display at page:

Download "CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13"

Transcription

1 COMMON DESCRIPTIVE STATISTICS / 13 CHAPTER THREE COMMON DESCRIPTIVE STATISTICS The analysis of data begins with descriptive statistics such as the mean, median, mode, range, standard deviation, variance, standard error of the mean, and confidence intervals. These statistics are used to summarize data and provide information about the sample from which the data were drawn and the accuracy with which the sample represents the population of interest. The mean, median, and mode are measurements of the central tendency of the data. The range, standard deviation, variance, standard error of the mean, and confidence intervals provide information about the dispersion or variability of the data about the measurements of central tendency. MEASUREMENTS OF CENTRAL TENDENCY The appropriateness of using the mean, median, or mode in data analysis is dependent upon the nature of the data set and its distribution (normal vs non-normal). The mean (denoted by x) is calculated by dividing the sum of the individual data points (where Σ equals sum of ) by the number of observations (denoted by n). It is the arithmetic average of the observations and is used to describe the center of a data set. Σ mean=x= x n The mean is commonly used to describe numerical data that is normally distributed. It is very sensitive to extreme values in the data set. For example, the mean of the data set {1,2,3,4,5} is 15/5 or 3. If the number 20 is substituted for the 5, the data set becomes {1,2,3,4,20} and the mean is 30/5 or 6. Whereas a mean of 3 accurately describes the center of the first data set, a mean of 6 does not accurately describe the distribution of the second data set. Thus, the mean is subject to extreme values or outliers and may not accurately represent the true center of the data if such outlying values are present. The mean is only appropriate if the data are normally distributed as is the case in the first data set. The median is another method of describing the center of a data set. It is the middle value of the data if the number of observations, n, is odd, or the average of the two middle values if n is even. By definition, half of the data points reside above the median and half reside below the median. For example, the median for each of the above data sets is 3 despite the outlying value of 20 in the second data set. The median is therefore useful for describing the center of a data set that is non-normally distributed as is the case in the second data set. It is also commonly used with ordinal data that is non-numerical. The mode is the value which occurs most frequently in the data. There may be one or more modes for each data set and this makes the mode a useful method for describing a population that is bimodal. In the data set {2,3,4,4,5,6,7,7,8,9}, for example, there are two modes, 4 and 7. Whereas a data set may have one or more modes, there can only be one mean and one median. The mean, median, and mode can be used to evaluate the symmetry of a data distribution. This is essential to choosing the appropriate statistical test. If the mean and median are equal, the data is usually symmetrical or normally distributed and a test for normally distributed data should be used. If the mean and median differ markedly, the data are likely skewed and a test for non-normally distributed data is appropriate. MEASUREMENTS OF VARIABILITY As we have seen, the mean, median, and mode are used to describe the central tendency of the data. Used alone, however, they do not adequately describe a data set. We also need a way of accurately describing the dispersion or variability of the data about the measurements of central tendency. Remember that it is this variability that mandates the use of statistical analysis. One method of describing this variability is the range which is defined as the difference between the smallest and largest values in the data set. It is frequently given as the minimum and maximum values and is used to demonstrate the presence of extreme values or outliers which would tend to skew the mean and median in one direction or another.

2 14 / A PRACTICAL GUIDE TO BIOSTATISTICS Perhaps the most commonly used method to describe the variability of a data set is the standard deviation. It is the cornerstone behind most of the commonly used statistical tests. The standard deviation is an estimate of the average distance of the values from their mean. Assuming the data is normally distributed (i.e., assumes a bell-shaped curve with half of the data lying on either side of the mean), approximately 68% of the data will lie within 1 standard deviation, approximately 95% within 2 standard deviations, and approximately 99% within 3 standard deviations. x-2sd x-1sd x x+1sd x+2sd 68% 95% 99% Figure 3-1: The Normal Distribution The standard deviation is easily calculated by virtually any computer, but the equation is included below as we will use it repeatedly in describing the commonly used statistical tests for data analysis. The variance is the square of the standard deviation. standard deviation = sd = Σ ( x-x) ( n 1) where x = each data point, x = mean, and n = number of observations Assume we wish to calculate the mean and standard deviation for a series of serum sodium measurements to determine their central tendency and variability. Such calculations are the basis for the normal laboratory ranges which we use everyday. The normal range for most laboratory tests is defined as the mean ± 2 standard deviations, which encompasses the test results for 95% of the population. Although easily calculated with a computer, creation of a table such as the one below illustrates the steps involved in calculating the standard deviation. Serum Na (x) (x-x) (x-x) mean = x = 140 sum = 81 standard deviation = sd = 81 = 9 = 3 9 In describing this data, we would state that the mean is 140 meq/l with a standard deviation of 3 and variance of 9, and that 95% of the values are within 6 meq of the mean (2 standard deviations). This is similar to the normal clinical range for serum sodium ( meq/l). 2

3 COMMON DESCRIPTIVE STATISTICS / 15 The mean, standard deviation, and variance are inappropriate if the data contain significant outliers or are skewed (i.e., non-normally distributed). In such a situation, the median and range may more accurately describe the central tendency and dispersion of the data set. Charts and other graphics can similarly be used to describe data where the variability is notable. STANDARD ERROR OF THE MEAN The standard error of the mean (se) is frequently confused with the standard deviation. Whereas the standard deviation describes the variability of the data about the mean, the standard error of the mean estimates how closely the sample mean approximates the population mean. It is essentially the standard deviation of all possible sample means if we were to repeatedly draw samples from our population and calculate the mean for a particular variable of interest in each sample. It is calculated by dividing the standard deviation by the square root of n, the number of observations, and will always be smaller than the standard deviation. sd = se n As the sample size (n) increases and approaches the population size, the standard error of the mean approaches zero as the sample mean will more closely approximate the true population mean. The standard deviation, however, remains fairly constant with changes in sample size as the sample standard deviation is an estimate of the population standard deviation whereas the standard error of the mean is not. The standard error of the mean is useful in comparing the means of two samples to determine whether they are representative of the same population. It does not describe the variability of the data, as does the standard deviation, and should not be used in place of the standard deviation. The mean and standard deviation allow the clinician to apply the data to individual patients. The standard error of the mean describes only the sample as a group and cannot be applied to patients clinically. The following example illustrates the use of descriptive statistics in the reporting of data. The Preoperative Evaluation study compared surgical patients who had pulmonary artery catheters placed preoperatively in an intensive care unit ( study patients ) with those who had catheters placed in the operating room prior to their operation ( control patients ). The data presented are the mean economic data on intensive care unit (ICU) and total hospital length of stay (LOS) and ICU and total hospital charges. Preoperative Pulmonary Artery Catheterization ICU LOS Hospital LOS ICU Charges Hospital Charges Study 8.7 days 33.2 days \$38,557 \$78,887 Control 5.7 days 21.4 days \$21,140 \$56,517 Study - patients with pulmonary artery catheters placed preoperatively in the ICU Control - patients with pulmonary artery catheters placed in the operating room Data presented as the mean of each group From this data, we might conclude that patients who had pulmonary artery catheters placed preoperatively in the ICU had increased ICU and total hospital LOS and increased ICU and total hospital charges compared to patients whose pulmonary artery catheter was placed in the operating room. Such conclusions would be correct if the data are normally distributed, but without some measure of variability we do not know that to be true. If we review the raw data, in fact, we find that the study data is heavily skewed by a single patient (whom we will call patient E ) who had an ICU LOS of 179 days, a hospital LOS of 254 days, and total hospital charges of \$838,282. Since the data are skewed and therefore non-normally distributed, we cannot use the mean to accurately describe the central tendency of the data and any conclusions we might make using the currently available data would be invalid. Consider the effect on the data when patient E is excluded from the data analysis. Without Patient E: ICU LOS Hospital LOS ICU Charges Hospital Charges Study 4.6 days 20.3 days \$15,031 \$49,289 Control 5.7 days 21.4 days \$21,140 \$56,517

4 16 / A PRACTICAL GUIDE TO BIOSTATISTICS After excluding patient E from the analysis, our conclusions would be very different. We would now conclude that preoperative evaluation decreases ICU and total hospital LOS as well as ICU and total hospital charges. From a statistical standpoint, however, patient E cannot be excluded from the data analysis as this introduces selection bias into the data and clearly alters the results of the study. A better method for presenting this non-normally distributed data would be to report not only the mean, but also the median and range of each group to demonstrate the presence and impact of outlying data such as that of patient E. Preoperative Pulmonary Artery Catheterization ICU LOS Hospital LOS ICU Charges Hospital Charges Study mean 8.7 days 33.2 days \$38,557 \$78,887 median 3 days 15 days \$10,496 \$39,758 range days days \$1,123-\$304,152 \$7,144-\$838,282 Control mean 5.7 days 21.4 days \$21,140 \$56,517 median 3 days 14 days \$9,836 \$40,314 range 1-43 days days \$1,714-\$188,245 \$9,644-\$289,045 When appropriate statistics for non-normally distributed data are used, it becomes clear that the two groups are equivalent and that our original conclusions, which were dependent upon the data being normally distributed, were in error. Presenting the data in this way makes clear the presence and effect of outliers to the reader. It also emphasizes the importance of reviewing the raw data to establish the presence of significant trends or errors in data entry that might be missed if the data were analyzed using only the mean. This can be done most effectively by tabulating the occurrences of each value (a frequency distribution) or graphing the data (a histogram or scatter plot) such that outliers, skewed distributions, and data entry errors become readily apparent. Had we not identified the effect of patient E on the data, we might easily have concluded that a significant difference existed between the two groups (committing a Type I error) and have altered our clinical practices based on these erroneous conclusions. CONFIDENCE INTERVALS Probability limits establish a range and specify the probability that an observed value occurs within that range. They can be used to determine whether a particular value is likely to have come from a specified population. For example, the 95% probability interval (assuming a normally distributed data set) is approximately given by the observed value ± 2 standard deviations. Confidence intervals are similar to probability limits. They establish a range and specify the probability of the true population mean being within that range. They are most commonly reported as the 95% confidence interval which is defined as the mean ± 2 standard errors of the mean. 95% confidence interval = x ± 2 se If we were to look at repeated samples from our population of interest and calculate the 95% confidence interval for each, 95% of the confidence intervals so obtained would contain the true population mean. A 99% confidence interval represents the mean ± 3 standard errors of the mean. The width of the confidence interval is determined by the standard error of the mean and by the sample size; as the sample size increases, the confidence interval becomes narrower as the sample and true population means approach one another. Similarly, a narrow confidence interval suggests that the sample is very representative of the population from which it was drawn. In the situation in which the data are skewed or non-normally distributed, using the median to calculate the confidence interval, rather than the mean, may be preferable. Another option is to transform the data to a normal distribution so that it can be analyzed using the mean and standard deviation (see Chapter Six). Confidence limits and confidence intervals are occasionally used interchangeably in the medical literature. Technically, confidence limits refer to the outer boundaries of the interval while the confidence interval refers to the range of values between the limits. It is helpful when reporting a sample mean as an estimate of a population mean to report the confidence limits as they describe how certain you are that the sample mean represents the true population mean. Confidence intervals also provide information on the size of the difference between groups whereas the more commonly used p-values only indicate that a significant different exists. Some researchers thus prefer to report confidence intervals instead of, or in addition to, p- values as the confidence interval provides more information on the data and its validity than does a p-value

5 COMMON DESCRIPTIVE STATISTICS / 17 alone. Further, confidence intervals provide information on important effects that, although not statistically significant, may be useful clinically. They are also helpful in determining whether the sample size was large enough to detect a significant difference to begin with (i.e., whether the study had sufficient statistical power). COEFFICIENT OF VARIATION, BIAS, AND PRECISION Several descriptive statistics exist for the special situation in which we wish to compare two laboratory tests or measurements to determine whether they accurately measure the same physiologic parameter. The coefficient of variation is one method that is frequently used to describe the reliability of a laboratory test or measurement. It is calculated by dividing the standard deviation by the mean and multiplying by 100%. coefficient of variation = cov = standard deviation mean 100% As it has no units, the coefficient of variation can be used to compare two tests which are measured on different scales. It is commonly used to describe the reliability and measurement error associated with electronic instruments and monitors. For example, a cardiac output computer might be described as having a coefficient of variation of 5% which means that the measurement error associated with sequential cardiac output measurements will be less than or equal to 5%. Two other statistics which are used to gauge measurement accuracy are bias and precision. Note that this bias is different from statistical bias which was discussed in the previous chapter. The bias of two methods is simply the difference between their measurements. For example, if a cardiac output by thermodilution is 3.4 L/min and that by radionuclide ventriculography is 3.2 L/min, the bias involved in these two measurements is or 0.2 L/min. Precision is defined as the standard deviation of the bias (the difference between measurements) and is a measure of the variability of the difference. Bias and precision are both necessary to evaluate the agreement between two methods of measurement. DESCRIBING NOMINAL DATA Proportions, percentages, ratios, and rates are used to describe nominal data. A proportion is defined as the number of observations possessing a given characteristic of interest divided by the total number of observations. If we are evaluating successful extubation of patients from mechanical ventilation, the proportion of patients successfully extubated is: successfully extubated patients successfully extubated + reintubated patients A percentage is defined as a proportion multiplied by 100%. A ratio compares the incidence of an event or disease in one group with that in another group. The ratio of successfully extubated to reintubated patients is therefore: successfully extubated patients reintubated patients The odds for an event (see Chapter One) is an example of a ratio. A rate is defined as a proportion multiplied by a particular base and is expressed per unit time. If, for example, it is known that the proportion of male patients with angina who have an acute myocardial infarction is 0.05 (i.e., 5 out of every 100 patients) per year, the incidence rate for myocardial infarction in male patients with angina will be 500 per 100,000 patients per year. Rates will be discussed further in the next chapter.

6 18 / A PRACTICAL GUIDE TO BIOSTATISTICS SUGGESTED READING 1. Emerson JD, Colditz GA. Use of statistical analysis in the New England Journal of Medicine. In: Bailar JC, Mosteller F (Eds.). Medical uses of statistics (2nd Ed). Boston: NEJM Books, 1992: Wassertheil-Smoller S. Biostatistics and epidemiology: a primer for health professionals. New York: Springer-Verlag, 1990: Dawson-Saunders B, Trapp RG. Basic and clinical biostatistics (2nd Ed). Norwalk: Appleton and Lange, 1994: Moses LE. Statistical concepts fundamental to investigations. In: Bailar JC, Mosteller F (Eds.). Medical uses of statistics (2nd Ed). Boston: NEJM Books, 1992: O Brien PC, Shampo MA. Statistics for clinicians: introduction. Mayo Clin Proc 56:45-46, O Brien PC, Shampo MA. Statistics for clinicians: 1. descriptive statistics. Mayo Clin Proc 56:47-49, O Brien PC, Shampo MA. Statistics for clinicians: 4. estimation from samples. Mayo Clin Proc 56: , O Brien PC, Shampo MA. Statistics for clinicians: 9. evaluating a new diagnostic procedure. Mayo Clin Proc 56: , Fletcher RH, Fletcher SW, Wagner EH. Clinical epidemiology - the essentials. Baltimore: Williams and Wilkins, 1982: Bartko JJ. Rationale for reporting standard deviations rather than standard errors of the mean [Editorial]. Am J Psychiatry 1985: Altman DG. Statistics and ethics in medical research: V - analyzing data. Brit Med J 1980: Gardner MJ, Altman DG. Confidence intervals rather than p values: estimation rather than hypothesis testing. Brit Med J 1986:

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

ANALYSIS OF DISCRT VARIABLS / 5 CHAPTR FIV ANALYSIS OF DISCRT VARIABLS Discrete variables are those which can only assume certain fixed values. xamples include outcome variables with results such as live

### "Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1

BASIC STATISTICAL THEORY / 3 CHAPTER ONE BASIC STATISTICAL THEORY "Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1 Medicine

### CHAPTER TWELVE TABLES, CHARTS, AND GRAPHS

TABLES, CHARTS, AND GRAPHS / 75 CHAPTER TWELVE TABLES, CHARTS, AND GRAPHS Tables, charts, and graphs are frequently used in statistics to visually communicate data. Such illustrations are also a frequent

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### Means, standard deviations and. and standard errors

CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

### STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

### Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

### 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

### Northumberland Knowledge

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

### Descriptive Statistics and Measurement Scales

Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

### Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

### Content Sheet 7-1: Overview of Quality Control for Quantitative Tests

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests Role in quality management system Quality Control (QC) is a component of process control, and is a major element of the quality management

### Lesson 4 Measures of Central Tendency

Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

### Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

### Descriptive Statistics

Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

### CHAPTER THREE. Key Concepts

CHAPTER THREE Key Concepts interval, ordinal, and nominal scale quantitative, qualitative continuous data, categorical or discrete data table, frequency distribution histogram, bar graph, frequency polygon,

### MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

### Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

### Standard Deviation Estimator

CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

### How To Write A Data Analysis

Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

### Week 4: Standard Error and Confidence Intervals

Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

### MEASURES OF VARIATION

NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

### Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

### Describing and presenting data

Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data

### 2. Filling Data Gaps, Data validation & Descriptive Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Module 4: Data Exploration

Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive

### Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

### The correlation coefficient

The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

### Statistical estimation using confidence intervals

0894PP_ch06 15/3/02 11:02 am Page 135 6 Statistical estimation using confidence intervals In Chapter 2, the concept of the central nature and variability of data and the methods by which these two phenomena

### Confidence Intervals for One Standard Deviation Using Standard Deviation

Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

### Introduction Hypothesis Methods Results Conclusions Figure 11-1: Format for scientific abstract preparation

ABSTRACT AND MANUSCRIPT PREPARATION / 69 CHAPTER ELEVEN ABSTRACT AND MANUSCRIPT PREPARATION Once data analysis is complete, the natural progression of medical research is to publish the conclusions of

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

### Descriptive Analysis

Research Methods William G. Zikmund Basic Data Analysis: Descriptive Statistics Descriptive Analysis The transformation of raw data into a form that will make them easy to understand and interpret; rearranging,

### COMPARISON MEASURES OF CENTRAL TENDENCY & VARIABILITY EXERCISE 8/5/2013. MEASURE OF CENTRAL TENDENCY: MODE (Mo) MEASURE OF CENTRAL TENDENCY: MODE (Mo)

COMPARISON MEASURES OF CENTRAL TENDENCY & VARIABILITY Prepared by: Jess Roel Q. Pesole CENTRAL TENDENCY -what is average or typical in a distribution Commonly Measures: 1. Mode. Median 3. Mean quantified

### Lecture Notes Module 1

Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

### 4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

### DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

### Measures of Central Tendency and Variability: Summarizing your Data for Others

Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :

### CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

### Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### 4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Data Analysis Plan The appropriate methods of data analysis are determined by your data types and variables of interest, the actual distribution of the variables, and the number of cases. Different analyses

### Descriptive Statistics and Exploratory Data Analysis

Descriptive Statistics and Exploratory Data Analysis Dean s s Faculty and Resident Development Series UT College of Medicine Chattanooga Probasco Auditorium at Erlanger January 14, 2008 Marc Loizeaux,

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1 OVERVIEW STATISTICS PANIK...THE THEORY AND METHODS OF COLLECTING, ORGANIZING, PRESENTING, ANALYZING, AND INTERPRETING DATA SETS SO AS TO DETERMINE THEIR ESSENTIAL

### Descriptive statistics parameters: Measures of centrality

Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between

### This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### 3.2 Measures of Spread

3.2 Measures of Spread In some data sets the observations are close together, while in others they are more spread out. In addition to measures of the center, it's often important to measure the spread

### Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

### Describing Data: Measures of Central Tendency and Dispersion

100 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 8 Describing Data: Measures of Central Tendency and Dispersion In the previous chapter we

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Scatter Plots with Error Bars

Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

### 99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### 430 Statistics and Financial Mathematics for Business

Prescription: 430 Statistics and Financial Mathematics for Business Elective prescription Level 4 Credit 20 Version 2 Aim Students will be able to summarise, analyse, interpret and present data, make predictions

### Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### Using SPSS, Chapter 2: Descriptive Statistics

1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

### Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

### Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

### Two-sample inference: Continuous data

Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

### Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

### Descriptive Statistics

Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

### Confidence Intervals for the Difference Between Two Means

Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

### 03 The full syllabus. 03 The full syllabus continued. For more information visit www.cimaglobal.com PAPER C03 FUNDAMENTALS OF BUSINESS MATHEMATICS

0 The full syllabus 0 The full syllabus continued PAPER C0 FUNDAMENTALS OF BUSINESS MATHEMATICS Syllabus overview This paper primarily deals with the tools and techniques to understand the mathematics

### Introduction; Descriptive & Univariate Statistics

Introduction; Descriptive & Univariate Statistics I. KEY COCEPTS A. Population. Definitions:. The entire set of members in a group. EXAMPLES: All U.S. citizens; all otre Dame Students. 2. All values of

### Using Excel for descriptive statistics

FACT SHEET Using Excel for descriptive statistics Introduction Biologists no longer routinely plot graphs by hand or rely on calculators to carry out difficult and tedious statistical calculations. These

### How To: Analyse & Present Data

INTRODUCTION The aim of this How To guide is to provide advice on how to analyse your data and how to present it. If you require any help with your data analysis please discuss with your divisional Clinical

### The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,

### 1.7 Graphs of Functions

64 Relations and Functions 1.7 Graphs of Functions In Section 1.4 we defined a function as a special type of relation; one in which each x-coordinate was matched with only one y-coordinate. We spent most

### Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

### Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Data Transforms: Natural Logarithms and Square Roots

Data Transforms: atural Log and Square Roots 1 Data Transforms: atural Logarithms and Square Roots Parametric statistics in general are more powerful than non-parametric statistics as the former are based

### Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

### The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box