We begin with the set of data below, with the measurements indicating total protein, measured in µg/ml. This is an example of raw data.

Size: px
Start display at page:

Download "We begin with the set of data below, with the measurements indicating total protein, measured in µg/ml. This is an example of raw data."

Transcription

1 CHAPTER 2 Descriptive Statistics We begin with the set of data below, with the measurements indicating total protein, measured in µg/ml. This is an example of raw data This is a sample of size n = 61. What can we tell about this data in its current form? Not much, actually. Ordered Array data arranged from smallest to largest (usually). So we arrange our data into an ordered array. 6

2 2. DESCRIPTIVE STATISTICS Now what can we say about our data? the minimum value is and the maximum is the middle of the data is in the 60 s or 70 s. Even ordering this data does not give us a good picture of what is happening. Grouped Data the Frequency Distribution We need to select a set of contiguous, non-overlapping intervals such that each value in the set of observations can be placed in exactly one interval, referred to as the class intervals. Generally, One can use Sturges rule as a guide: 5 apple # of intervals k apple 15. k = log 10 n where n is the number of observations. In our example, n = 61, giving us k = log So, rounding o, n = 7. From our data, the R = range = maximum minimum = = ,

3 8 2. DESCRIPTIVE STATISTICS so interval width = w = R k = = Now, when the nature of the data makes them appropriate, class interval widths of 5, 10, or multiples of 10 units make the summarization more comprehensible. Here we will choose class intervals of 20 µg/ml, with the first interval beginning at 30. We will label the intervals by their midpoints. Class intervals Frequency Midpoint 30 apple x < apple x < apple x < apple x < apple x < apple x < apple x < For computations with grouped data, each element in an interval is given the value of the midpoint of the interval. Thus each of the 26 values in the interval 50 apple x < 70 is treated as though it is 60. Note. A value falling on the interval boundary is placed in the higher valued interval (to the right on a number line). Although we can now see where the majority of the data lies and how it is spread (and graphs add to this), the data items lose their individual values to the midpoint value of the interval in which they lie. Relative Frequencies the proportion of values falling into a class interval. We divide the number of values in each category by the total number of values. There are times when we will interpret the relative frequencies as the probability of occurence within a given interval, called the experimental probability or the

4 2. DESCRIPTIVE STATISTICS 9 empirical probability. In the following table we also incorporate cumulative frequencies and relative cumulative frequencies. Cumulative Cumulative Relative Relative Class intervals Midpoint Frequency Frequency Frequency Frequency 30 apple x < apple x < apple x < apple x < apple x < apple x < apple x < Except for round-o errors, the in the Relative Frequency column should always be

5 10 2. DESCRIPTIVE STATISTICS Frequency Histogram and Frequency Polygon special types of bar and line graphs. Here we show the frequency polygon superimposed over the frequency histogram, as created in Maple. They are commonly separate graphs. In this case, the bars of the histogram are labeled by their midpoints on the horizotal axis. The points on the horizontal axis where the bars meet are called cut points, which may be used instead of the midpoints to label the horizontal axis. The frequency polygon is always labeled by the midpoints. The area under the histogram is = 1220 (n interval width). With the lines of the frequency polygon joining the midpoints of the bars along with the midpoints of the adjoining intervals, the area of the frequency polygon is the same as that of the frequency histogram. Suppose we look at the same data with class intervals of width 10. The following table is also from Maple.

6 2. DESCRIPTIVE STATISTICS 11 The class intervals are not labeled, but are class width, frequency, relative frequency, cumulative frequency, and relative cumulative frequency. The frequency histogram follows.

7 12 2. DESCRIPTIVE STATISTICS With this histogram, the two values over 130 appear to be outliers (somewhat disjoint from the rest of the data). Relative Frequency Histogram and Relative Frequency Polygon Maple. See hist.mw or hist.pdf..

8 2. DESCRIPTIVE STATISTICS 13 Stem-and-Leaf Displays bears a strong resemblance to the histogram and serves the same purpose. Here are the ages of 48 students in a statistics course: 1) Use the first part of the data as a stem write them vertically. 2) Use the last part as a leaf, in increasing order we sometimes truncate or round leaves are one digit only The last step is to put the leaves in increasing order. We can split stems to show more detail: 0 4 and 5 9.

9 14 2. DESCRIPTIVE STATISTICS Advantages quick visual picture of the data. see the actual values Disadvantages best for small data sets (n apple 100) can give a poor picture of the data Statistic a descriptive measure computed from a sample Parameter a descriptive measure computed from a population Measures of Central Tendency mean, median, and mode. We want a single value that is typical of the data as a whole. (Arithmetic) Mean average. X = random variable (RV) x i = specific values of X N = number of values in a finite population n = number of values in a sample For ungrouped data: NX population: µ = N x i nx sample: x = n x i

10 Example (Protein). For grouped data: x = nx n 2. DESCRIPTIVE STATISTICS 15 x i = = Class intervals Midpoint=x i Frequency=f i x i f i 30 apple x < apple x < apple x < apple x < apple x < apple x < apple x < x = 7X x i f i n µ = 7X x i f i 61 = =

11 16 2. DESCRIPTIVE STATISTICS Properties of the Mean (1) Uniqueness for a given set of data, there is exactly one arithmetic mean. (2) Simplicity the arithmetic mean is easily understood and easy to compute. (3) Since each and every value in a set of data enters into the computation of the mean, it is a ected by each value. Extreme values, therefore, have an influence on the mean and, in some cases, can so distort it that it becomes undesirable as a measure of central tendency. Outliers (Extreme Values) values that deviate appreciably from most of the measurements in a data set. Robust Estimators estimators that are insensitive to outliers. Trimmed Mean a robust estimator of central tendency. For a set of sample data containing n measurements we calculate the 100 percent trimmed mean as follows: (1) Order the measurements. (2) Discard the smallest 100 percent and the largest 100 percent of the measurements. The recommended value of is something between.1 and.2. (3) Compute the arithmetic mean of the remaining measurements. Example (Protein). (1) The 5% trimed mean (removing 3 elements from each end of the data) is (2) The 10% trimed mean (removing 6 elements from each end of the data) is (3) The 20% trimed mean (removing 12 elements from each end of the data) is

12 2. DESCRIPTIVE STATISTICS 17 Median a value that divides the ordered array into two equal parts. We order the data points from smallest to largest and then take item n + 1 in order. 2 Example. (1) 1 3 {z} 8 median = 8 (2) {z 11} median = =) n = 9.5 = 6 2 = =) n = 7 2 = 3.5. (3) For our data set with n = 61, the median of the ungrouped data is (4) For the grouped data on Page 15, the median is 60, the 31st element of the set where each data point takes on the value of the midpoint of its class interval. Does this seem like a good measure of central tendency in this case? Obviously not! When your only source is grouped data, don t put too much confidence in mean and median. Properties of the Median (1) Uniqueness as was true with the mean, there is a unique median for a given set of data. (2) Simplicity the median is easy to calculate. (3) Robustness it is not as drastically a ected by extreme values as is the mean. Mode the value that occurs most frequently. If all the data items are di erent, there is no mode. A set of data may have more than one mode (this is common for grouped data). A data set with two modes is called bimodal.

13 18 2. DESCRIPTIVE STATISTICS Skewness classification of data distributions on the basis of whether they are symmetric or asymmetric. (1) Symmetric the left half of its graph (histogram or frequency polygon) will be a mirror image of it right half. (2) Asymmetric not symmetric. Definition. If the graph (histogram or frequency polygon) of a distribution is asymmetric, the distribution is said to be skewed. If a distribution is not symmetric because its graph extends further to the right than to the left, that is, if it has a long tail to the right, we say that the distribution is skewed to the right or positively skewed. If a distribution is not symmetric because its graph extends further to the left than to the right, that is, if it has a long tail to the left, we say that the distribution is skewed to the left or negatively skewed. The Skewness Statistic Skewness = p nx n (x i x) 3 nx 3/2 = (n 1) p n 1 s. 3 (x i x) 2 p nx n (x i x) 3

14 2. DESCRIPTIVE STATISTICS 19 The skewness statistic is 0 for a perfectly symmetric distribution, positive for a positively skewed distribution (skewed to the right), and negative for a negativly skewed distribution (skewed to the left). Typically, for unimodal distributions, if it is skewed to the left, and if it is skewed to the right, mean < median < mode, mode < median < mean. If you set a distribution on a fulcrum, the mean is where it balances. The median is the point that divides the area in half, and the mode is the highest point. Measures of Dispersion describe the variation, spread, and scatter of the distribution. Range the di erence between the largest and smallest values in a set of observations. Range = x L x S. This conveys minimal information and is a poor measure for large samples. Variance - measures dispersion based on how the data points are scattered about the mean.

15 20 2. DESCRIPTIVE STATISTICS Sample Variance (ungrouped) nx (x i x) 2 s 2 = n 1 Problem (Page 53#2.5.2). x = 540 = n nx nx (x i ) 2 n(n 1) x i 2 x i x i x (x i x) 2 (x i ) {z} ?? except for rounding errors, this is always 0. s 2 = = or s 2 7( ) = = (6) Example (Protein). s 2 =

16 2. DESCRIPTIVE STATISTICS 21 Sample Variance (grouped) s 2 = Example (Protein). x = X (xi x) 2 f i P fi 1 Class intervals x i x i x (x i x) 2 f i (x i x) 2 f i 30 apple x < apple x < apple x < apple x < apple x < apple x < apple x < s 2 = = Notice how the variance changes with the grouping. We divide by n 1 instead of n and P f i 1 instead of P f i in order to use the sample variance in inference procedures discussed later. This is because dividing by n 1 better approximates (is an unbiased estimator) the population variance. Also, we say we have n 1 degrees of freedom, i.e., once we have made n 1 choices, the last choice is determined. Population Variance 2 = NX (x i µ) 2 N Problem the variance units are the square of the data units.

17 22 2. DESCRIPTIVE STATISTICS Standard Deviation (SD) the square root of the variance - has the same units as the data. Sample SD: Problem (Page 53#2.5.2). Example (Protein). s = p s 2 s = p 2200 = s = p = Population SD: = p 2 Coe cient of Variation used for comparing the variation of two or more distarbutions. This would seem to require ratio scales. The coe cient of variation expresses the SD as a percentage of the mean. Example. Five Number Summary CV = s x (100) x = 10, s = 5, CV = 5 (100) = 50% 10 vs. x = 100, s = 5, CV = 5 (100) = 5% 100 Definition. Given a set of n observations x 1, x 2,..., x n, the pth percentile P is the value of X such that p percent or less of the observations are less than P and (100 p) percent or less of the observations are greater than P. Notation. P 10 denotes the 10th percentile, etc. P 25 is called the first quartile (Q 1 ). P 50, the median, is the middle or second quartile (Q 2 ). P 75 is the third quartile (Q 3 ).

18 2. DESCRIPTIVE STATISTICS 23 1st quartile: Q 1 = n + 1 th ordered observation 4 Example (Protein). n + 1 = = = 15.5 Thus take the number 1/2 way from the 15th to the 16th observation. Q 1 = {z } 15th 2(n + 1) 2nd quartile: Q 2 = 4 Example (Protein). n Thus take the 31st observation. +.5(57.90 {z } 16th {z }) = th = n + 1 th ordered observation 2 = Q 2 = = 62 2 = 31 3(n + 1) 3rd quartile: Q 3 = th ordered observation 4 Example (Protein). 3(n + 1) 3(61 + 1) = = 3(62) = = 46.5 Thus take the number 1/2 way from the 46th to the 47th observation. Q 3 = {z } +.5(84.70 {z } 46th 47th The five-number summary is then {z }) = th minimum Q 1 median Q 3 maximum Example (Protein). The five-number summary is

19 24 2. DESCRIPTIVE STATISTICS Definition. The interquartile range (IQR) is the di erence between the third and first quartiles: IQR = Q 3 Q 1. Box-and-Whisker Plots (or Boxplots) This is a graphical represntation of the five-number summary. It can be drawn vertically (left) or horizontally (right). The box shows the interquartile range, extending from Q 1 to Q 3. The width of the box is arbitrary. The line through the box shows the median. The whiskers extend from the box to the minimum and maximum values. It is di erent in SPSS.

20 2. DESCRIPTIVE STATISTICS 25 The whiskers extend to a maximum of 1.5(IQR) beyond the box. Values 1.5(IQR) to 3(IQR) are labeled with and are termed outliers. Values beyond 3(IQR) are labeled with and are termed extremes. Kurtosis a measure of the degree to which a distribution is peaked or flat in comparison to a normal distribution whose graph is characterized by a bellshaped distribution. The names of 3 basic types of curves are given below. Summary Kurtosis = n nx (x i x) 4 nx 2 3 = (x i x) 2 n nx (x i x) 4 (n 1) 2 s 4 3. In describing the center and dispersion of a data distribution, one usually either provides the mean and standard deviation or the five-number summary, the choice depending on the shape of the distribution mean and standard deviation for symmetric data and the five-number summary for non-symmetric data. Maple. See centdist.mw and centdist.pdf.

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

More information

3: Summary Statistics

3: Summary Statistics 3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

Exploratory Data Analysis. Psychology 3256

Exploratory Data Analysis. Psychology 3256 Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

More information

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.) Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

More information

Sampling and Descriptive Statistics

Sampling and Descriptive Statistics Sampling and Descriptive Statistics Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University Reference: 1. W. Navidi. Statistics for Engineering and Scientists.

More information

Lesson 4 Measures of Central Tendency

Lesson 4 Measures of Central Tendency Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

Frequency Distributions

Frequency Distributions Descriptive Statistics Dr. Tom Pierce Department of Psychology Radford University Descriptive statistics comprise a collection of techniques for better understanding what the people in a group look like

More information

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data A Few Sources for Data Examples Used Introduction to Environmental Statistics Professor Jessica Utts University of California, Irvine jutts@uci.edu 1. Statistical Methods in Water Resources by D.R. Helsel

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

Scatter Plots with Error Bars

Scatter Plots with Error Bars Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

More information

CHAPTER THREE. Key Concepts

CHAPTER THREE. Key Concepts CHAPTER THREE Key Concepts interval, ordinal, and nominal scale quantitative, qualitative continuous data, categorical or discrete data table, frequency distribution histogram, bar graph, frequency polygon,

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Measures of Central Tendency and Variability: Summarizing your Data for Others

Measures of Central Tendency and Variability: Summarizing your Data for Others Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :

More information

2. Filling Data Gaps, Data validation & Descriptive Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics 2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)

More information

determining relationships among the explanatory variables, and

determining relationships among the explanatory variables, and Chapter 4 Exploratory Data Analysis A first look at the data. As mentioned in Chapter 1, exploratory data analysis or EDA is a critical first step in analyzing the data from an experiment. Here are the

More information

Module 4: Data Exploration

Module 4: Data Exploration Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

AP * Statistics Review. Descriptive Statistics

AP * Statistics Review. Descriptive Statistics AP * Statistics Review Descriptive Statistics Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

THE BINOMIAL DISTRIBUTION & PROBABILITY

THE BINOMIAL DISTRIBUTION & PROBABILITY REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Descriptive statistics parameters: Measures of centrality

Descriptive statistics parameters: Measures of centrality Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

Sta 309 (Statistics And Probability for Engineers)

Sta 309 (Statistics And Probability for Engineers) Instructor: Prof. Mike Nasab Sta 309 (Statistics And Probability for Engineers) Chapter 2 Organizing and Summarizing Data Raw Data: When data are collected in original form, they are called raw data. The

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Common Tools for Displaying and Communicating Data for Process Improvement

Common Tools for Displaying and Communicating Data for Process Improvement Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot

More information

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers 1.3 Measuring Center & Spread, The Five Number Summary & Boxplots Describing Quantitative Data with Numbers 1.3 I can n Calculate and interpret measures of center (mean, median) in context. n Calculate

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

3.2 Measures of Spread

3.2 Measures of Spread 3.2 Measures of Spread In some data sets the observations are close together, while in others they are more spread out. In addition to measures of the center, it's often important to measure the spread

More information

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13 COMMON DESCRIPTIVE STATISTICS / 13 CHAPTER THREE COMMON DESCRIPTIVE STATISTICS The analysis of data begins with descriptive statistics such as the mean, median, mode, range, standard deviation, variance,

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Using SPSS, Chapter 2: Descriptive Statistics

Using SPSS, Chapter 2: Descriptive Statistics 1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

More information

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives. The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

Chapter 2 Data Exploration

Chapter 2 Data Exploration Chapter 2 Data Exploration 2.1 Data Visualization and Summary Statistics After clearly defining the scientific question we try to answer, selecting a set of representative members from the population of

More information

Implications of Big Data for Statistics Instruction 17 Nov 2013

Implications of Big Data for Statistics Instruction 17 Nov 2013 Implications of Big Data for Statistics Instruction 17 Nov 2013 Implications of Big Data for Statistics Instruction Mark L. Berenson Montclair State University MSMESB Mini Conference DSI Baltimore November

More information

Mean = (sum of the values / the number of the value) if probabilities are equal

Mean = (sum of the values / the number of the value) if probabilities are equal Population Mean Mean = (sum of the values / the number of the value) if probabilities are equal Compute the population mean Population/Sample mean: 1. Collect the data 2. sum all the values in the population/sample.

More information

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability. Glossary Brase: Understandable Statistics, 10e A B This is the notation used to represent the conditional probability of A given B. A and B This represents the probability that both events A and B occur.

More information

Describing and presenting data

Describing and presenting data Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data

More information

Mind on Statistics. Chapter 2

Mind on Statistics. Chapter 2 Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table

More information

Chapter 2: Frequency Distributions and Graphs

Chapter 2: Frequency Distributions and Graphs Chapter 2: Frequency Distributions and Graphs Learning Objectives Upon completion of Chapter 2, you will be able to: Organize the data into a table or chart (called a frequency distribution) Construct

More information

Describing Data: Measures of Central Tendency and Dispersion

Describing Data: Measures of Central Tendency and Dispersion 100 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 8 Describing Data: Measures of Central Tendency and Dispersion In the previous chapter we

More information

AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

More information

AP Statistics Solutions to Packet 2

AP Statistics Solutions to Packet 2 AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that

More information

1 Descriptive statistics: mode, mean and median

1 Descriptive statistics: mode, mean and median 1 Descriptive statistics: mode, mean and median Statistics and Linguistic Applications Hale February 5, 2008 It s hard to understand data if you have to look at it all. Descriptive statistics are things

More information

Statistics Revision Sheet Question 6 of Paper 2

Statistics Revision Sheet Question 6 of Paper 2 Statistics Revision Sheet Question 6 of Paper The Statistics question is concerned mainly with the following terms. The Mean and the Median and are two ways of measuring the average. sumof values no. of

More information

Introduction; Descriptive & Univariate Statistics

Introduction; Descriptive & Univariate Statistics Introduction; Descriptive & Univariate Statistics I. KEY COCEPTS A. Population. Definitions:. The entire set of members in a group. EXAMPLES: All U.S. citizens; all otre Dame Students. 2. All values of

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

More information

Module 2: Introduction to Quantitative Data Analysis

Module 2: Introduction to Quantitative Data Analysis Module 2: Introduction to Quantitative Data Analysis Contents Antony Fielding 1 University of Birmingham & Centre for Multilevel Modelling Rebecca Pillinger Centre for Multilevel Modelling Introduction...

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

The Normal Distribution

The Normal Distribution Chapter 6 The Normal Distribution 6.1 The Normal Distribution 1 6.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize the normal probability distribution

More information

STAT355 - Probability & Statistics

STAT355 - Probability & Statistics STAT355 - Probability & Statistics Instructor: Kofi Placid Adragni Fall 2011 Chap 1 - Overview and Descriptive Statistics 1.1 Populations, Samples, and Processes 1.2 Pictorial and Tabular Methods in Descriptive

More information

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Unit 9 Describing Relationships in Scatter Plots and Line Graphs Unit 9 Describing Relationships in Scatter Plots and Line Graphs Objectives: To construct and interpret a scatter plot or line graph for two quantitative variables To recognize linear relationships, non-linear

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

Study Guide for Essentials of Statistics for the Social and Behavioral Sciences by Barry H. Cohen and R. Brooke Lea. Chapter 1

Study Guide for Essentials of Statistics for the Social and Behavioral Sciences by Barry H. Cohen and R. Brooke Lea. Chapter 1 Distributions Study Guide for Essentials of Statistics for the Social and Behavioral Sciences by Barry H. Cohen and R. Brooke Lea Chapter 1 Guidelines for Frequency Distributions The procedure for constructing

More information

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175) Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,

More information

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

More information

Lecture 2. Summarizing the Sample

Lecture 2. Summarizing the Sample Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

More information

Descriptive Analysis

Descriptive Analysis Research Methods William G. Zikmund Basic Data Analysis: Descriptive Statistics Descriptive Analysis The transformation of raw data into a form that will make them easy to understand and interpret; rearranging,

More information

First Midterm Exam (MATH1070 Spring 2012)

First Midterm Exam (MATH1070 Spring 2012) First Midterm Exam (MATH1070 Spring 2012) Instructions: This is a one hour exam. You can use a notecard. Calculators are allowed, but other electronics are prohibited. 1. [40pts] Multiple Choice Problems

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck! STP 231 EXAM #1 (Example) Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.

More information

Basics of Statistics

Basics of Statistics Basics of Statistics Jarkko Isotalo 30 20 10 Std. Dev = 486.32 Mean = 3553.8 0 N = 120.00 2400.0 2800.0 3200.0 3600.0 4000.0 4400.0 4800.0 2600.0 3000.0 3400.0 3800.0 4200.0 4600.0 5000.0 Birthweights

More information

TEACHER NOTES MATH NSPIRED

TEACHER NOTES MATH NSPIRED Math Objectives Students will understand that normal distributions can be used to approximate binomial distributions whenever both np and n(1 p) are sufficiently large. Students will understand that when

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

EXPLORING SPATIAL PATTERNS IN YOUR DATA

EXPLORING SPATIAL PATTERNS IN YOUR DATA EXPLORING SPATIAL PATTERNS IN YOUR DATA OBJECTIVES Learn how to examine your data using the Geostatistical Analysis tools in ArcMap. Learn how to use descriptive statistics in ArcMap and Geoda to analyze

More information

Statistics Chapter 2

Statistics Chapter 2 Statistics Chapter 2 Frequency Tables A frequency table organizes quantitative data. partitions data into classes (intervals). shows how many data values are in each class. Test Score Number of Students

More information