Treatment and analysis of data Applied statistics Lecture 3: Sampling and descriptive statistics

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Treatment and analysis of data Applied statistics Lecture 3: Sampling and descriptive statistics"

Transcription

1 Treatment and analysis of data Applied statistics Lecture 3: Sampling and descriptive statistics Topics covered: Parameters and statistics Sample mean and sample standard deviation Order statistics and quantiles Confidence intervals and confidence levels Error bars and box plots Histograms Cumulative and percentile plots Probability plots Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 1 Population, parameters, sample and statistics Sample space (in probability theory) population (in statistics) A (random) sample is drawn from the population data sampling (data collection) data analysis population described by certain parameters such as μ and σ inference statistics such as m and s Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 2

2 Parameters and statistics A parameter is a quantity that describes a population (e.g. the population mean μ and population standard deviation σ). Data are obtained by sampling the population (e.g., x 1, x 2,..., x n ). Any function of the data is called a statistic. Examples of statistics: n - the number of data points min(x 1, x 2,..., x n ) - the smallest data value x 1 + n 1/3 - not a very useful statistic m = (x 1 + x x n )/n - the sample mean s = [ Σ i (x i m) 2 / (n 1) ] 1/2 - the sample standard deviation Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 3 Descriptive statistics Simple calculations on the data allow to condense them in a form useful e.g. in order to summarize results in a way that is quickly grasped assess the quality of the data compare different sets of data explore what kind of information the data may contain support a statement (make a conclusion more convincing) When the data represent a more or less unknown distribution, the most important statistics may be some measure of location, such as the sample mean or median some measure of scale (or scatter, or precision), such as the sample standard deviation or interquartile range This is often supported by graphics which give much more complete information on distributions. (A graph is also a statistic.) Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 4

3 Sample mean and sample standard deviation Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 5 Important comment Be careful to distinguish between: the sample standard deviation s = 1 n 1 n ( x i m) i= 1 which measures the dispersion among the values x 1, x 2,..., x n around the sample mean value m, and the standard deviation of the sample mean, which is usually estimated as D[ m] = s n = n 1 ( ) ( x i m) n 1 and which may be quoted as the standard error (1σ uncertainty) of m. E.g.: "the mean value and dispersion of the data are 12.3 ± 2.5" is ambiguous! n 2 i= 1 2 Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 6

4 Alternative measures of location and scale The sample mean and sample standard deviation are very sensitive to outliers or stongly deviating points. In manual data analysis one can often cope interactively with these cases, but for automatic analysis it is better to use a more robust method. In such cases, or when the distribution is known or suspected to be non-gaussian, there are many other useful measures of location and scale. Instead of the sample mean m we may use the sample median x med (see below). Instead of the sample standard deviation (= RMS deviation from the sample mean), we may use the mean absolute deviation from the mean: 1 MAD = n n i= 1 x i m Often the sample median is used instead of the sample mean when calculating the MAD. In fact, for any fixed sample the median minimizes the MAD, so it is logical to use the median and MAD together. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 7 Order statistics Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 8

5 Sample quantiles Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 9 Quantiles for the normal (Gaussian) distribution 32% of the area is outside ±1σ 68% of the area is within ±1σ 4.6% of the area is outside ±2σ 0.3% of the area is outside ±3σ frequency 3σ 2σ 1σ 0 +1σ +2σ +3σ value Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 10

6 Confidence intervals and levels normal case (1) frequency percentile value standard deviations Alternatively, the precision can be specified as a confidence interval, with an associated confidence level (CL): x = 3.7 ± 2.5 (90% CL) or 1.2 < x < 6.2 (90% CL) x > 1.2 (95% CL) [one-sided confidence interval] Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 11 Confidence intervals and levels normal case (2) Confidence Level two-sided confidence interval (for normal distr.) 50% [ 0.67σ, +0.67σ ] 68% [ 1.00σ, +1.00σ ] 90% [ 1.65σ, +1.65σ ] 95% [ 1.96σ, +1.96σ ] 99% [ 2.58σ, +2.58σ ] 99.9% [ 3.29σ, +3.29σ ] Caution: older astronomical literature (< 1960) often uses probable error (p.e.), which corresponds to 50% CL or ±0.67σ. Thus: (standard error) = 1.5 (probable error) Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 12

7 Deviations from the normal distribution Actual errors rarely follow the normal distribution: usually points beyond ±3σ are much more frequent than expected for a normal distribution (0.3%) the distribution is often skew, especially in the tails sometimes the distribution is completely different, e.g. exponential Although the standard deviation is applicable to many non-normal cases, it could be misleading without further specification of the distribution. For instance, given only the information x = 3.7 ± 1.5 (s.e.) one might conclude that x > 8.2 is very unlikely (0.15%). However, if x has a lognormal distribution, the probability is in fact 2 3%. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 13 Quantiles (fractiles), percentiles, quartiles, etc Other names for quantiles at certain q-values: Q(0.5) = median (or 50th percentile) Q(0.25) = lower quartile (or 25th percentile) Q(0.75) = upper quartile (or 75th percentile) Q(0.1) = first decile, Q(0.2) = second decile, etc [not so often used] The interquartile range IQR = Q(0.75) Q(0.25) is sometimes used as a measure of precision (equal to 1.35σ for a normal distribution). Half the "intersextile range" (not a standard term), [Q(5/6) Q(1/6)]/2 = 0.97σ for a normal distribution, and is useful as a robust assessment of the dispersion. NOTE: The terms quantile, fractile, and percentile are used almost synonymously in the literature, while median, quartile, decile etc have very specific meanings. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 14

8 Error bars and box plots Error bars usually indicate ±1σ (i.e. the confidence interval at 68% CL). If not, the exact meaning must definitely be stated in the figure caption. Box plots (or box-whisker plots): outliers (>1.5 IQR from median) highest non-outlier upper quartile median lower quartile lowest non-outlier outlier Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 15 Histograms One-dimensional sample distributions are often shown as histograms. A histogram displays the number of data points per bin, versus the position of the bin (or the density of data points, if unequal bin sizes are used). E.g., define the sequence x 0, x 1,..., x n which are the boundaries of n bins. Equal bins of size Δx are obtained as x i = x 0 + i Δx, i = 1, 2,..., n. Let h i be the number of data points with x i 1 x < x i. (Note position of <) In the histogram, h i (or sometimes h i /Δx i ) is plotted as a bar from x i 1 to x i. Things to consider when constructing a histogram: Which bin size to use? - compromise between resolution and noise. In any case, be careful to specify the bin size if it is not clear from the graph! Where to start (x 0 )? - often arbitrary! What to do with points outside x 0, x n (if any)? A difficulty with histograms is that they look radically different depending on the choices you make! Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 16

9 Different histograms of the same data... (1) bin size = 2 bin size = 2 bin size = 2 bin size = 2 These histograms (of the same 200 points) differ only in the choice of starting value x 0 Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 17 Different histograms of the same data... (2) bin size = 2 bin size = 1 bin size = 1 bin size = 0.5 These histograms (of the same 200 points) differ in bin size as well. It is better to make the bins too narrow than too wide: the eye can smooth out the noise but cannot recover lost resolution! Note that the uncertainty of any histogram value h i. is of order ± h i. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 18

10 Cumulative plots An alternative to histogram is to plot the cumulative fraction, analoguous to the cumulative distribution function (cdf): theoretical distributions empirical data cumulative distribution function cumulative fraction probability density function histogram The cumulative fraction is a step function that increments by 1/n for each data point, starting from 0 and ending at 1. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 19 Cumulative plot, example n = 200 Cumulative fraction plot for the same 200 data points as in the histograms. The two modes can be seen as the steeper parts of the curve around 10 and 15. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 20

11 Cumulative plots, some more examples (1) You can transform the scale of data valuesto emphasize important intervals. For example, for strictly positive data it often makes sense to use a logarithmic scale (this and following examples from bardeen.physics.csbsju.edu/stats/). Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 21 Cumulative plots, some more examples (2) B1 B2 Cumulative plots are excellent to compare two samples: do they have the same distribution? (Cf. K-S test.) Works also for samples of unequal size. The two samples B1 and B2 are clearly drawn from different populations. This is also evident from the box-plot, but not from the the mean/dispersion plot (right). Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 22

12 Percentile plots The ragged appearance of the cumulative plot can be disturbing to the eye, especially for small n. It may then be better to use a percentile plot (red line), which simply connects the n points with x (i) as abscissa and p = i/(n+1) as ordinate. This is actually a better estimate of the cumulative distribution function than the cumulative fraction plot. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 23 Percentile plot, example n = 200 Percentile plot for the same 200 data points as in the histograms and as in the cumulative fraction plot (slide 20). Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 24

13 Transformed percentiles... n = 200 n = 200 Sometimes it's useful to transform the percentile scale to bring out more clearly the important parts of the distribution. In this example (a sample drawn from from χ 32 ) we are concerned about the tail of large values, which is difficult to see in the standard percentile plot (left). By plotting 1 p instead of p and using a logarithmic scale, the tail is emphasized. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 25 Probability plots As n the percentile plot converges to the cdf F(x). To see if the data follow a given distribution F(x), we could make a percentile plot with F 1 (i/(n+1)) on the y-axis instead of i/(n+1). If the data follow F(x) we should then get (approximately) a straight line. This is a probability plot. The nice thing about probability plots is that any linear transformation ax i +b of the data will just shift and change the slope of the curve, but a straight line (for example) remains straight. The most common type of this plot is the normal probability plot, using the standard normal cdf Φ ( x) = x 2 1 t exp dt 2π 2 The abscissae are x (i) and the ordinates are Φ 1 (i/(n+1)) for i = 1, 2,..., n. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 26

14 The inverse standard normal cdf To make normal probability plots you need to be able to compute the inverse standard normal cdf Φ 1 (p) for any 0 < p < 1. Routines for this are are available in most numerical/statistical packages (can be found e.g. in Numerical Recipes). If not readily available, use the following approximation which is always good enough for probability plots (maximum error is 0.003; Abramowitz & Stegun, Handbook of Mathematical Functions): where Φ t t t ( p) = 1 Φ (1 p) t = 2ln p 2 t if if 0 < p p < 1 The values Φ 1 (p) are sometimes called the normal scores. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 27 Percentile vs. probability plot (1) 5/6 n = /6 1st sextile -3 median 1 5th sextile 7 Percentile plot for 50 random numbers from a normal distribution with mean = 2 and s.d. = 5. Note that you can use the percentile plot to estimate quantiles, e.g. the median and the first/last sextiles. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 28

15 Percentile vs. probability plot (2) n = σ σ -3 median 1 Normal probability plot for the same 50 random numbers. The approximately straight relationship suggest that the data are indeed gaussian. The median and the quantiles corresponding to ±1σ for the normal distribution are easily found. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 29 Normal probability plots, expected variation (n = 20) Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 30

16 Normal probability plots, expected variation (n = 200) Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 31 Normal probability plot for a non-normal sample n = 200 bin size = 0.5 Normal probability plot for the bimodal sample earlier plotted in the histograms (slides 17-18). Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 32

17 Normal probability plot for a non-normal sample n = 200 Typical normal probability plot for a sample that is nearly gaussian, but with some outliers Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 33 Normal probability plot for a non-normal sample n = 200 Normal probability plot for a sample drawn from the Cauchy distribution with location α = 2 and scale β = 5. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 34

18 Cauchy probability plot for the Cauchy sample n = 200 Probability plots may not be very useful for extreme distributions like Caucy! Cauchy probability plot for the same sample as in the previous slide. The inverse cdf for the standard Cauchy distribution is F 1 ( p) = tan [( p 0.5) π]. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 35 Related to probability plots th Century s 100 largest disasters worldwide 10 2 Technological ($10B) 10 1 Natural ($100B) 10 0 US Power outages (10M of customers, ) Slope = -1 (α=1) Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 36

19 A histogram plot from Hipparcos data analysis ESA SP-1200 Vol. 3, Fig Normalised differences between the FAST and NDAC parallax estimates for successive solutions (12, 18, 30, 37 months of data). n = 40, ,000. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 37 The same data in a normal probability plot Real data are sometimes surprisingly Gaussian! ESA SP-1200 Vol. 3, Fig Normalised differences between the FAST and NDAC parallax estimates for successive solutions (12, 18, 30, 37 months of data). n = 40, ,000. Sept-Oct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 38

1 Measures for location and dispersion of a sample

1 Measures for location and dispersion of a sample Statistical Geophysics WS 2008/09 7..2008 Christian Heumann und Helmut Küchenhoff Measures for location and dispersion of a sample Measures for location and dispersion of a sample In the following: Variable

More information

We will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students:

We will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students: MODE The mode of the sample is the value of the variable having the greatest frequency. Example: Obtain the mode for Data Set 1 77 For a grouped frequency distribution, the modal class is the class having

More information

Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010

Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010 Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References

More information

Seminar paper Statistics

Seminar paper Statistics Seminar paper Statistics The seminar paper must contain: - the title page - the characterization of the data (origin, reason why you have chosen this analysis,...) - the list of the data (in the table)

More information

4. Introduction to Statistics

4. Introduction to Statistics Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Descriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination

Descriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination Descriptive Statistics Understanding Data: Dataset: Shellfish Contamination Location Year Species Species2 Method Metals Cadmium (mg kg - ) Chromium (mg kg - ) Copper (mg kg - ) Lead (mg kg - ) Mercury

More information

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

PROPERTIES OF MEAN, MEDIAN

PROPERTIES OF MEAN, MEDIAN PROPERTIES OF MEAN, MEDIAN In the last class quantitative and numerical variables bar charts, histograms(in recitation) Mean, Median Suppose the data set is {30, 40, 60, 80, 90, 120} X = 70, median = 70

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

III. GRAPHICAL METHODS

III. GRAPHICAL METHODS Pie Charts and Bar Charts: III. GRAPHICAL METHODS Pie charts and bar charts are used for depicting frequencies or relative frequencies. We compare examples of each using the same data. Sources: AT&T (1961)

More information

Histograms and density curves

Histograms and density curves Histograms and density curves What s in our toolkit so far? Plot the data: histogram (or stemplot) Look for the overall pattern and identify deviations and outliers Numerical summary to briefly describe

More information

2. Describing Data. We consider 1. Graphical methods 2. Numerical methods 1 / 56

2. Describing Data. We consider 1. Graphical methods 2. Numerical methods 1 / 56 2. Describing Data We consider 1. Graphical methods 2. Numerical methods 1 / 56 General Use of Graphical and Numerical Methods Graphical methods can be used to visually and qualitatively present data and

More information

Lecture 10: Other Continuous Distributions and Probability Plots

Lecture 10: Other Continuous Distributions and Probability Plots Lecture 10: Other Continuous Distributions and Probability Plots Devore: Section 4.4-4.6 Page 1 Gamma Distribution Gamma function is a natural extension of the factorial For any α > 0, Γ(α) = 0 x α 1 e

More information

Continuous Random Variables and Probability Distributions. Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4 - and Cengage

Continuous Random Variables and Probability Distributions. Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4 - and Cengage 4 Continuous Random Variables and Probability Distributions Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4 - and Cengage Continuous r.v. A random variable X is continuous if possible values

More information

Univariate Descriptive Statistics

Univariate Descriptive Statistics Univariate Descriptive Statistics Displays: pie charts, bar graphs, box plots, histograms, density estimates, dot plots, stemleaf plots, tables, lists. Example: sea urchin sizes Boxplot Histogram Urchin

More information

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table 2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

More information

Statistics revision. Dr. Inna Namestnikova. Statistics revision p. 1/8

Statistics revision. Dr. Inna Namestnikova. Statistics revision p. 1/8 Statistics revision Dr. Inna Namestnikova inna.namestnikova@brunel.ac.uk Statistics revision p. 1/8 Introduction Statistics is the science of collecting, analyzing and drawing conclusions from data. Statistics

More information

Chapter 2: Exploring Data with Graphs and Numerical Summaries. Graphical Measures- Graphs are used to describe the shape of a data set.

Chapter 2: Exploring Data with Graphs and Numerical Summaries. Graphical Measures- Graphs are used to describe the shape of a data set. Page 1 of 16 Chapter 2: Exploring Data with Graphs and Numerical Summaries Graphical Measures- Graphs are used to describe the shape of a data set. Section 1: Types of Variables In general, variable can

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Descriptive Statistics. Frequency Distributions and Their Graphs 2.1. Frequency Distributions. Chapter 2

Descriptive Statistics. Frequency Distributions and Their Graphs 2.1. Frequency Distributions. Chapter 2 Chapter Descriptive Statistics.1 Frequency Distributions and Their Graphs Frequency Distributions A frequency distribution is a table that shows classes or intervals of data with a count of the number

More information

GCSE HIGHER Statistics Key Facts

GCSE HIGHER Statistics Key Facts GCSE HIGHER Statistics Key Facts Collecting Data When writing questions for questionnaires, always ensure that: 1. the question is worded so that it will allow the recipient to give you the information

More information

STAT 155 Introductory Statistics. Lecture 5: Density Curves and Normal Distributions (I)

STAT 155 Introductory Statistics. Lecture 5: Density Curves and Normal Distributions (I) The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STAT 155 Introductory Statistics Lecture 5: Density Curves and Normal Distributions (I) 9/12/06 Lecture 5 1 A problem about Standard Deviation A variable

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Appendix C: Graphs. Vern Lindberg

Appendix C: Graphs. Vern Lindberg Vern Lindberg 1 Making Graphs A picture is worth a thousand words. Graphical presentation of data is a vital tool in the sciences and engineering. Good graphs convey a great deal of information and can

More information

Numerical Measures of Central Tendency

Numerical Measures of Central Tendency Numerical Measures of Central Tendency Often, it is useful to have special numbers which summarize characteristics of a data set These numbers are called descriptive statistics or summary statistics. A

More information

Chapter 3 Descriptive Statistics: Numerical Measures. Learning objectives

Chapter 3 Descriptive Statistics: Numerical Measures. Learning objectives Chapter 3 Descriptive Statistics: Numerical Measures Slide 1 Learning objectives 1. Single variable Part I (Basic) 1.1. How to calculate and use the measures of location 1.. How to calculate and use the

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization (Understanding Data) First: Some data preprocessing problems... 1 Missing Values The approach of the problem of missing values adopted in SQL is based on nulls and three-valued

More information

Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

More information

6. Distribution and Quantile Functions

6. Distribution and Quantile Functions Virtual Laboratories > 2. Distributions > 1 2 3 4 5 6 7 8 6. Distribution and Quantile Functions As usual, our starting point is a random experiment with probability measure P on an underlying sample spac

More information

8.2 Confidence Intervals for One Population Mean When σ is Known

8.2 Confidence Intervals for One Population Mean When σ is Known 8.2 Confidence Intervals for One Population Mean When σ is Known Tom Lewis Fall Term 2009 8.2 Confidence Intervals for One Population Mean When σ isfall Known Term 2009 1 / 6 Outline 1 An example 2 Finding

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Biostatistics Lab Notes

Biostatistics Lab Notes Biostatistics Lab Notes Page 1 Lab 1: Measurement and Sampling Biostatistics Lab Notes Because we used a chance mechanism to select our sample, each sample will differ. My data set (GerstmanB.sav), looks

More information

Chapter 3: Data Description Numerical Methods

Chapter 3: Data Description Numerical Methods Chapter 3: Data Description Numerical Methods Learning Objectives Upon successful completion of Chapter 3, you will be able to: Summarize data using measures of central tendency, such as the mean, median,

More information

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4. UCLA STAT 11 A Applied Probability & Statistics for Engineers Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology Teaching Assistant: Neda Farzinnia, UCLA Statistics University of California,

More information

Histogram. Graphs, and measures of central tendency and spread. Alternative: density (or relative frequency ) plot /13/2004

Histogram. Graphs, and measures of central tendency and spread. Alternative: density (or relative frequency ) plot /13/2004 Graphs, and measures of central tendency and spread 9.07 9/13/004 Histogram If discrete or categorical, bars don t touch. If continuous, can touch, should if there are lots of bins. Sum of bin heights

More information

10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation 10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

More information

F. Farrokhyar, MPhil, PhD, PDoc

F. Farrokhyar, MPhil, PhD, PDoc Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

More information

Report of for Chapter 2 pretest

Report of for Chapter 2 pretest Report of for Chapter 2 pretest Exam: Chapter 2 pretest Category: Organizing and Graphing Data 1. "For our study of driving habits, we recorded the speed of every fifth vehicle on Drury Lane. Nearly every

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data to get a general overview of the results. Remember, this is the goal

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Frequency distributions, central tendency & variability. Displaying data

Frequency distributions, central tendency & variability. Displaying data Frequency distributions, central tendency & variability Displaying data Software SPSS Excel/Numbers/Google sheets Social Science Statistics website (socscistatistics.com) Creating and SPSS file Open the

More information

Content DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS

Content DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS Content DESCRIPTIVE STATISTICS Dr Najib Majdi bin Yaacob MD, MPH, DrPH (Epidemiology) USM Unit of Biostatistics & Research Methodology School of Medical Sciences Universiti Sains Malaysia. Introduction

More information

Nominal Scaling. Measures of Central Tendency, Spread, and Shape. Interval Scaling. Ordinal Scaling

Nominal Scaling. Measures of Central Tendency, Spread, and Shape. Interval Scaling. Ordinal Scaling Nominal Scaling Measures of, Spread, and Shape Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning The lowest level of

More information

1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics)

1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics) 1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics) As well as displaying data graphically we will often wish to summarise it numerically particularly if we wish to compare two or more data sets.

More information

The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand

More information

Statistical Concepts and Market Return

Statistical Concepts and Market Return Statistical Concepts and Market Return 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 2 2. Some Fundamental Concepts... 2 3. Summarizing Data Using Frequency Distributions...

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

If the Shoe Fits! Overview of Lesson GAISE Components Common Core State Standards for Mathematical Practice

If the Shoe Fits! Overview of Lesson GAISE Components Common Core State Standards for Mathematical Practice If the Shoe Fits! Overview of Lesson In this activity, students explore and use hypothetical data collected on student shoe print lengths, height, and gender in order to help develop a tentative description

More information

13.2 Measures of Central Tendency

13.2 Measures of Central Tendency 13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Data Analysis: Describing Data - Descriptive Statistics

Data Analysis: Describing Data - Descriptive Statistics WHAT IT IS Return to Table of ontents Descriptive statistics include the numbers, tables, charts, and graphs used to describe, organize, summarize, and present raw data. Descriptive statistics are most

More information

Unit 21 Student s t Distribution in Hypotheses Testing

Unit 21 Student s t Distribution in Hypotheses Testing Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between

More information

Statistical Analysis I

Statistical Analysis I CTSI BERD Research Methods Seminar Series Statistical Analysis I Lan Kong, PhD Associate Professor Department of Public Health Sciences December 22, 2014 Biostatistics, Epidemiology, Research Design(BERD)

More information

Income Distribution and Poverty Methods for Using Available Data in Global Analysis

Income Distribution and Poverty Methods for Using Available Data in Global Analysis Income Distribution and Poverty Methods for Using Available Data in Global Analysis Eric Kemp-Benedict Original Report: April 7, 997 Revised Report: May 7, PoleStar echnical Note No 4 Disclaimer PoleStar

More information

Lecture Topic 6: Chapter 9 Hypothesis Testing

Lecture Topic 6: Chapter 9 Hypothesis Testing Lecture Topic 6: Chapter 9 Hypothesis Testing 9.1 Developing Null and Alternative Hypotheses Hypothesis testing can be used to determine whether a statement about the value of a population parameter should

More information

vs. relative cumulative frequency

vs. relative cumulative frequency Variable - what we are measuring Quantitative - numerical where mathematical operations make sense. These have UNITS Categorical - puts individuals into categories Numbers don't always mean Quantitative...

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

Probabilistic Analysis

Probabilistic Analysis Probabilistic Analysis Tutorial 8-1 Probabilistic Analysis This tutorial will familiarize the user with the basic probabilistic analysis capabilities of Slide. It will demonstrate how quickly and easily

More information

4. DESCRIPTIVE STATISTICS. Measures of Central Tendency (Location) Sample Mean

4. DESCRIPTIVE STATISTICS. Measures of Central Tendency (Location) Sample Mean 4. DESCRIPTIVE STATISTICS Descriptive Statistics is a body of techniques for summarizing and presenting the essential information in a data set. Eg: Here are daily high temperatures for Jan 6, 29 in U.S.

More information

Dongfeng Li. Autumn 2010

Dongfeng Li. Autumn 2010 Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

Chapter 6. The Standard Deviation as a Ruler and the Normal Model. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 6. The Standard Deviation as a Ruler and the Normal Model. Copyright 2012, 2008, 2005 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model Copyright 2012, 2008, 2005 Pearson Education, Inc. The Standard Deviation as a Ruler The trick in comparing very different-looking values

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Descriptive statistics

Descriptive statistics Overview Descriptive statistics 1. Classification of observational values 2. Visualisation methods 3. Guidelines for good visualisation c Maarten Jansen STAT-F-413 Descriptive statistics p.1 1.Classification

More information

Home Runs, Statistics, and Probability

Home Runs, Statistics, and Probability NATIONAL MATH + SCIENCE INITIATIVE Mathematics American League AL Central AL West AL East National League NL West NL East Level 7 th grade in a unit on graphical displays Connection to AP* Graphical Display

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Readings: Ha and Ha Textbook - Chapters 1 8 Appendix D & E (online) Plous - Chapters 10, 11, 12 and 14 Chapter 10: The Representativeness Heuristic Chapter 11: The Availability Heuristic Chapter 12: Probability

More information

Statistics Summary (prepared by Xuan (Tappy) He)

Statistics Summary (prepared by Xuan (Tappy) He) Statistics Summary (prepared by Xuan (Tappy) He) Statistics is the practice of collecting and analyzing data. The analysis of statistics is important for decision making in events where there are uncertainties.

More information

Monte Carlo Method: Probability

Monte Carlo Method: Probability John (ARC/ICAM) Virginia Tech... Math/CS 4414: The Monte Carlo Method: PROBABILITY http://people.sc.fsu.edu/ jburkardt/presentations/ monte carlo probability.pdf... ARC: Advanced Research Computing ICAM:

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

Visual Display of Data in Stata

Visual Display of Data in Stata Lab 2 Visual Display of Data in Stata In this lab we will try to understand data not only through numerical summaries, but also through graphical summaries. The data set consists of a number of variables

More information

Sociology 6Z03 Topic 15: Statistical Inference for Means

Sociology 6Z03 Topic 15: Statistical Inference for Means Sociology 6Z03 Topic 15: Statistical Inference for Means John Fox McMaster University Fall 2016 John Fox (McMaster University) Soc 6Z03: Statistical Inference for Means Fall 2016 1 / 41 Outline: Statistical

More information

MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Georgia Standards of Excellence Curriculum Frameworks. Mathematics. GSE Coordinate Algebra Unit 4: Describing Data

Georgia Standards of Excellence Curriculum Frameworks. Mathematics. GSE Coordinate Algebra Unit 4: Describing Data Georgia Standards of Excellence Curriculum Frameworks Mathematics GSE Coordinate Algebra Unit 4: Describing Data Unit 4 Describing Data Table of Contents OVERVIEW... 3 STANDARDS ADDRESSED IN THIS UNIT...

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for One Standard Deviation Using Standard Deviation Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

AP Statistics: Syllabus 3

AP Statistics: Syllabus 3 AP Statistics: Syllabus 3 Scoring Components SC1 The course provides instruction in exploring data. 4 SC2 The course provides instruction in sampling. 5 SC3 The course provides instruction in experimentation.

More information

3. Continuous Random Variables

3. Continuous Random Variables 3. Continuous Random Variables A continuous random variable is one which can take any value in an interval (or union of intervals) The values that can be taken by such a variable cannot be listed. Such

More information

Statistics Chapter 3 Averages and Variations

Statistics Chapter 3 Averages and Variations Statistics Chapter 3 Averages and Variations Measures of Central Tendency Average a measure of the center value or central tendency of a distribution of values. Three types of average: Mode Median Mean

More information

COMMON CORE STATE STANDARDS FOR

COMMON CORE STATE STANDARDS FOR COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in

More information

Although scatter plots and trend lines may reveal a pattern, the relationship of the variables may indicate a correlation, but not causation.

Although scatter plots and trend lines may reveal a pattern, the relationship of the variables may indicate a correlation, but not causation. Middletown Public Schools Mathematics Unit Planning Organizer Subject Math Grade/Course Algebra I Unit 5 Scatterplots and Trendlines Duration 14 instructional days + days reteaching/enrichment Big Idea(s)

More information

Lecture 7 Linear Regression Diagnostics

Lecture 7 Linear Regression Diagnostics Lecture 7 Linear Regression Diagnostics BIOST 515 January 27, 2004 BIOST 515, Lecture 6 Major assumptions 1. The relationship between the outcomes and the predictors is (approximately) linear. 2. The error

More information

Module 4: Data Exploration

Module 4: Data Exploration Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive

More information

MAT 12O ELEMENTARY STATISTICS I

MAT 12O ELEMENTARY STATISTICS I LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 12O ELEMENTARY STATISTICS I 3 Lecture Hours, 1 Lab Hour, 3 Credits Pre-Requisite:

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Continuous Distributions

Continuous Distributions MAT 2379 3X (Summer 2012) Continuous Distributions Up to now we have been working with discrete random variables whose R X is finite or countable. However we will have to allow for variables that can take

More information

Continuous Random Variables

Continuous Random Variables Continuous Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Continuous Random Variables 2 11 Introduction 2 12 Probability Density Functions 3 13 Transformations 5 2 Mean, Variance and Quantiles

More information

Lean Six Sigma Training/Certification Book: Volume 1

Lean Six Sigma Training/Certification Book: Volume 1 Lean Six Sigma Training/Certification Book: Volume 1 Six Sigma Quality: Concepts & Cases Volume I (Statistical Tools in Six Sigma DMAIC process with MINITAB Applications Chapter 1 Introduction to Six Sigma,

More information

Appendix E: Graphing Data

Appendix E: Graphing Data You will often make scatter diagrams and line graphs to illustrate the data that you collect. Scatter diagrams are often used to show the relationship between two variables. For example, in an absorbance

More information

Introduction to Descriptive Statistics

Introduction to Descriptive Statistics Mathematics Learning Centre Introduction to Descriptive Statistics Jackie Nicholas c 1999 University of Sydney Acknowledgements Parts of this booklet were previously published in a booklet of the same

More information