Treatment and analysis of data Applied statistics Lecture 3: Sampling and descriptive statistics


 Charla Chase
 1 years ago
 Views:
Transcription
1 Treatment and analysis of data Applied statistics Lecture 3: Sampling and descriptive statistics Topics covered: Parameters and statistics Sample mean and sample standard deviation Order statistics and quantiles Confidence intervals and confidence levels Error bars and box plots Histograms Cumulative and percentile plots Probability plots SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 1 Population, parameters, sample and statistics Sample space (in probability theory) population (in statistics) A (random) sample is drawn from the population data sampling (data collection) data analysis population described by certain parameters such as μ and σ inference statistics such as m and s SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 2
2 Parameters and statistics A parameter is a quantity that describes a population (e.g. the population mean μ and population standard deviation σ). Data are obtained by sampling the population (e.g., x 1, x 2,..., x n ). Any function of the data is called a statistic. Examples of statistics: n  the number of data points min(x 1, x 2,..., x n )  the smallest data value x 1 + n 1/3  not a very useful statistic m = (x 1 + x x n )/n  the sample mean s = [ Σ i (x i m) 2 / (n 1) ] 1/2  the sample standard deviation SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 3 Descriptive statistics Simple calculations on the data allow to condense them in a form useful e.g. in order to summarize results in a way that is quickly grasped assess the quality of the data compare different sets of data explore what kind of information the data may contain support a statement (make a conclusion more convincing) When the data represent a more or less unknown distribution, the most important statistics may be some measure of location, such as the sample mean or median some measure of scale (or scatter, or precision), such as the sample standard deviation or interquartile range This is often supported by graphics which give much more complete information on distributions. (A graph is also a statistic.) SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 4
3 Sample mean and sample standard deviation SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 5 Important comment Be careful to distinguish between: the sample standard deviation s = 1 n 1 n ( x i m) i= 1 which measures the dispersion among the values x 1, x 2,..., x n around the sample mean value m, and the standard deviation of the sample mean, which is usually estimated as D[ m] = s n = n 1 ( ) ( x i m) n 1 and which may be quoted as the standard error (1σ uncertainty) of m. E.g.: "the mean value and dispersion of the data are 12.3 ± 2.5" is ambiguous! n 2 i= 1 2 SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 6
4 Alternative measures of location and scale The sample mean and sample standard deviation are very sensitive to outliers or stongly deviating points. In manual data analysis one can often cope interactively with these cases, but for automatic analysis it is better to use a more robust method. In such cases, or when the distribution is known or suspected to be nongaussian, there are many other useful measures of location and scale. Instead of the sample mean m we may use the sample median x med (see below). Instead of the sample standard deviation (= RMS deviation from the sample mean), we may use the mean absolute deviation from the mean: 1 MAD = n n i= 1 x i m Often the sample median is used instead of the sample mean when calculating the MAD. In fact, for any fixed sample the median minimizes the MAD, so it is logical to use the median and MAD together. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 7 Order statistics SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 8
5 Sample quantiles SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 9 Quantiles for the normal (Gaussian) distribution 32% of the area is outside ±1σ 68% of the area is within ±1σ 4.6% of the area is outside ±2σ 0.3% of the area is outside ±3σ frequency 3σ 2σ 1σ 0 +1σ +2σ +3σ value SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 10
6 Confidence intervals and levels normal case (1) frequency percentile value standard deviations Alternatively, the precision can be specified as a confidence interval, with an associated confidence level (CL): x = 3.7 ± 2.5 (90% CL) or 1.2 < x < 6.2 (90% CL) x > 1.2 (95% CL) [onesided confidence interval] SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 11 Confidence intervals and levels normal case (2) Confidence Level twosided confidence interval (for normal distr.) 50% [ 0.67σ, +0.67σ ] 68% [ 1.00σ, +1.00σ ] 90% [ 1.65σ, +1.65σ ] 95% [ 1.96σ, +1.96σ ] 99% [ 2.58σ, +2.58σ ] 99.9% [ 3.29σ, +3.29σ ] Caution: older astronomical literature (< 1960) often uses probable error (p.e.), which corresponds to 50% CL or ±0.67σ. Thus: (standard error) = 1.5 (probable error) SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 12
7 Deviations from the normal distribution Actual errors rarely follow the normal distribution: usually points beyond ±3σ are much more frequent than expected for a normal distribution (0.3%) the distribution is often skew, especially in the tails sometimes the distribution is completely different, e.g. exponential Although the standard deviation is applicable to many nonnormal cases, it could be misleading without further specification of the distribution. For instance, given only the information x = 3.7 ± 1.5 (s.e.) one might conclude that x > 8.2 is very unlikely (0.15%). However, if x has a lognormal distribution, the probability is in fact 2 3%. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 13 Quantiles (fractiles), percentiles, quartiles, etc Other names for quantiles at certain qvalues: Q(0.5) = median (or 50th percentile) Q(0.25) = lower quartile (or 25th percentile) Q(0.75) = upper quartile (or 75th percentile) Q(0.1) = first decile, Q(0.2) = second decile, etc [not so often used] The interquartile range IQR = Q(0.75) Q(0.25) is sometimes used as a measure of precision (equal to 1.35σ for a normal distribution). Half the "intersextile range" (not a standard term), [Q(5/6) Q(1/6)]/2 = 0.97σ for a normal distribution, and is useful as a robust assessment of the dispersion. NOTE: The terms quantile, fractile, and percentile are used almost synonymously in the literature, while median, quartile, decile etc have very specific meanings. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 14
8 Error bars and box plots Error bars usually indicate ±1σ (i.e. the confidence interval at 68% CL). If not, the exact meaning must definitely be stated in the figure caption. Box plots (or boxwhisker plots): outliers (>1.5 IQR from median) highest nonoutlier upper quartile median lower quartile lowest nonoutlier outlier SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 15 Histograms Onedimensional sample distributions are often shown as histograms. A histogram displays the number of data points per bin, versus the position of the bin (or the density of data points, if unequal bin sizes are used). E.g., define the sequence x 0, x 1,..., x n which are the boundaries of n bins. Equal bins of size Δx are obtained as x i = x 0 + i Δx, i = 1, 2,..., n. Let h i be the number of data points with x i 1 x < x i. (Note position of <) In the histogram, h i (or sometimes h i /Δx i ) is plotted as a bar from x i 1 to x i. Things to consider when constructing a histogram: Which bin size to use?  compromise between resolution and noise. In any case, be careful to specify the bin size if it is not clear from the graph! Where to start (x 0 )?  often arbitrary! What to do with points outside x 0, x n (if any)? A difficulty with histograms is that they look radically different depending on the choices you make! SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 16
9 Different histograms of the same data... (1) bin size = 2 bin size = 2 bin size = 2 bin size = 2 These histograms (of the same 200 points) differ only in the choice of starting value x 0 SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 17 Different histograms of the same data... (2) bin size = 2 bin size = 1 bin size = 1 bin size = 0.5 These histograms (of the same 200 points) differ in bin size as well. It is better to make the bins too narrow than too wide: the eye can smooth out the noise but cannot recover lost resolution! Note that the uncertainty of any histogram value h i. is of order ± h i. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 18
10 Cumulative plots An alternative to histogram is to plot the cumulative fraction, analoguous to the cumulative distribution function (cdf): theoretical distributions empirical data cumulative distribution function cumulative fraction probability density function histogram The cumulative fraction is a step function that increments by 1/n for each data point, starting from 0 and ending at 1. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 19 Cumulative plot, example n = 200 Cumulative fraction plot for the same 200 data points as in the histograms. The two modes can be seen as the steeper parts of the curve around 10 and 15. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 20
11 Cumulative plots, some more examples (1) You can transform the scale of data valuesto emphasize important intervals. For example, for strictly positive data it often makes sense to use a logarithmic scale (this and following examples from bardeen.physics.csbsju.edu/stats/). SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 21 Cumulative plots, some more examples (2) B1 B2 Cumulative plots are excellent to compare two samples: do they have the same distribution? (Cf. KS test.) Works also for samples of unequal size. The two samples B1 and B2 are clearly drawn from different populations. This is also evident from the boxplot, but not from the the mean/dispersion plot (right). SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 22
12 Percentile plots The ragged appearance of the cumulative plot can be disturbing to the eye, especially for small n. It may then be better to use a percentile plot (red line), which simply connects the n points with x (i) as abscissa and p = i/(n+1) as ordinate. This is actually a better estimate of the cumulative distribution function than the cumulative fraction plot. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 23 Percentile plot, example n = 200 Percentile plot for the same 200 data points as in the histograms and as in the cumulative fraction plot (slide 20). SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 24
13 Transformed percentiles... n = 200 n = 200 Sometimes it's useful to transform the percentile scale to bring out more clearly the important parts of the distribution. In this example (a sample drawn from from χ 32 ) we are concerned about the tail of large values, which is difficult to see in the standard percentile plot (left). By plotting 1 p instead of p and using a logarithmic scale, the tail is emphasized. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 25 Probability plots As n the percentile plot converges to the cdf F(x). To see if the data follow a given distribution F(x), we could make a percentile plot with F 1 (i/(n+1)) on the yaxis instead of i/(n+1). If the data follow F(x) we should then get (approximately) a straight line. This is a probability plot. The nice thing about probability plots is that any linear transformation ax i +b of the data will just shift and change the slope of the curve, but a straight line (for example) remains straight. The most common type of this plot is the normal probability plot, using the standard normal cdf Φ ( x) = x 2 1 t exp dt 2π 2 The abscissae are x (i) and the ordinates are Φ 1 (i/(n+1)) for i = 1, 2,..., n. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 26
14 The inverse standard normal cdf To make normal probability plots you need to be able to compute the inverse standard normal cdf Φ 1 (p) for any 0 < p < 1. Routines for this are are available in most numerical/statistical packages (can be found e.g. in Numerical Recipes). If not readily available, use the following approximation which is always good enough for probability plots (maximum error is 0.003; Abramowitz & Stegun, Handbook of Mathematical Functions): where Φ t t t ( p) = 1 Φ (1 p) t = 2ln p 2 t if if 0 < p p < 1 The values Φ 1 (p) are sometimes called the normal scores. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 27 Percentile vs. probability plot (1) 5/6 n = /6 1st sextile 3 median 1 5th sextile 7 Percentile plot for 50 random numbers from a normal distribution with mean = 2 and s.d. = 5. Note that you can use the percentile plot to estimate quantiles, e.g. the median and the first/last sextiles. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 28
15 Percentile vs. probability plot (2) n = σ σ 3 median 1 Normal probability plot for the same 50 random numbers. The approximately straight relationship suggest that the data are indeed gaussian. The median and the quantiles corresponding to ±1σ for the normal distribution are easily found. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 29 Normal probability plots, expected variation (n = 20) SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 30
16 Normal probability plots, expected variation (n = 200) SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 31 Normal probability plot for a nonnormal sample n = 200 bin size = 0.5 Normal probability plot for the bimodal sample earlier plotted in the histograms (slides 1718). SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 32
17 Normal probability plot for a nonnormal sample n = 200 Typical normal probability plot for a sample that is nearly gaussian, but with some outliers SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 33 Normal probability plot for a nonnormal sample n = 200 Normal probability plot for a sample drawn from the Cauchy distribution with location α = 2 and scale β = 5. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 34
18 Cauchy probability plot for the Cauchy sample n = 200 Probability plots may not be very useful for extreme distributions like Caucy! Cauchy probability plot for the same sample as in the previous slide. The inverse cdf for the standard Cauchy distribution is F 1 ( p) = tan [( p 0.5) π]. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 35 Related to probability plots th Century s 100 largest disasters worldwide 10 2 Technological ($10B) 10 1 Natural ($100B) 10 0 US Power outages (10M of customers, ) Slope = 1 (α=1) SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 36
19 A histogram plot from Hipparcos data analysis ESA SP1200 Vol. 3, Fig Normalised differences between the FAST and NDAC parallax estimates for successive solutions (12, 18, 30, 37 months of data). n = 40, ,000. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 37 The same data in a normal probability plot Real data are sometimes surprisingly Gaussian! ESA SP1200 Vol. 3, Fig Normalised differences between the FAST and NDAC parallax estimates for successive solutions (12, 18, 30, 37 months of data). n = 40, ,000. SeptOct 2006 Statistics for astronomers (L. Lindegren, Lund Observatory) Lecture 3, p. 38
1 Measures for location and dispersion of a sample
Statistical Geophysics WS 2008/09 7..2008 Christian Heumann und Helmut Küchenhoff Measures for location and dispersion of a sample Measures for location and dispersion of a sample In the following: Variable
More informationWe will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students:
MODE The mode of the sample is the value of the variable having the greatest frequency. Example: Obtain the mode for Data Set 1 77 For a grouped frequency distribution, the modal class is the class having
More informationData Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010
Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References
More informationSeminar paper Statistics
Seminar paper Statistics The seminar paper must contain:  the title page  the characterization of the data (origin, reason why you have chosen this analysis,...)  the list of the data (in the table)
More information4. Introduction to Statistics
Statistics for Engineers 41 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation
More informationBNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
More informationDescriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination
Descriptive Statistics Understanding Data: Dataset: Shellfish Contamination Location Year Species Species2 Method Metals Cadmium (mg kg  ) Chromium (mg kg  ) Copper (mg kg  ) Lead (mg kg  ) Mercury
More informationA frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes
A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that
More informationSTATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
More informationPROPERTIES OF MEAN, MEDIAN
PROPERTIES OF MEAN, MEDIAN In the last class quantitative and numerical variables bar charts, histograms(in recitation) Mean, Median Suppose the data set is {30, 40, 60, 80, 90, 120} X = 70, median = 70
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationIII. GRAPHICAL METHODS
Pie Charts and Bar Charts: III. GRAPHICAL METHODS Pie charts and bar charts are used for depicting frequencies or relative frequencies. We compare examples of each using the same data. Sources: AT&T (1961)
More informationHistograms and density curves
Histograms and density curves What s in our toolkit so far? Plot the data: histogram (or stemplot) Look for the overall pattern and identify deviations and outliers Numerical summary to briefly describe
More information2. Describing Data. We consider 1. Graphical methods 2. Numerical methods 1 / 56
2. Describing Data We consider 1. Graphical methods 2. Numerical methods 1 / 56 General Use of Graphical and Numerical Methods Graphical methods can be used to visually and qualitatively present data and
More informationLecture 10: Other Continuous Distributions and Probability Plots
Lecture 10: Other Continuous Distributions and Probability Plots Devore: Section 4.44.6 Page 1 Gamma Distribution Gamma function is a natural extension of the factorial For any α > 0, Γ(α) = 0 x α 1 e
More informationContinuous Random Variables and Probability Distributions. Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4  and Cengage
4 Continuous Random Variables and Probability Distributions Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4  and Cengage Continuous r.v. A random variable X is continuous if possible values
More informationUnivariate Descriptive Statistics
Univariate Descriptive Statistics Displays: pie charts, bar graphs, box plots, histograms, density estimates, dot plots, stemleaf plots, tables, lists. Example: sea urchin sizes Boxplot Histogram Urchin
More information2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table
2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations
More informationData Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
More informationDescriptive Statistics
Descriptive Statistics Suppose following data have been collected (heights of 99 fiveyearold boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9
More informationStatistics revision. Dr. Inna Namestnikova. Statistics revision p. 1/8
Statistics revision Dr. Inna Namestnikova inna.namestnikova@brunel.ac.uk Statistics revision p. 1/8 Introduction Statistics is the science of collecting, analyzing and drawing conclusions from data. Statistics
More informationChapter 2: Exploring Data with Graphs and Numerical Summaries. Graphical Measures Graphs are used to describe the shape of a data set.
Page 1 of 16 Chapter 2: Exploring Data with Graphs and Numerical Summaries Graphical Measures Graphs are used to describe the shape of a data set. Section 1: Types of Variables In general, variable can
More informationDescriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
More informationLecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
More informationExploratory Data Analysis
Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationIntroduction to Statistics for Psychology. Quantitative Methods for Human Sciences
Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html
More informationDescriptive Statistics. Frequency Distributions and Their Graphs 2.1. Frequency Distributions. Chapter 2
Chapter Descriptive Statistics.1 Frequency Distributions and Their Graphs Frequency Distributions A frequency distribution is a table that shows classes or intervals of data with a count of the number
More informationGCSE HIGHER Statistics Key Facts
GCSE HIGHER Statistics Key Facts Collecting Data When writing questions for questionnaires, always ensure that: 1. the question is worded so that it will allow the recipient to give you the information
More informationSTAT 155 Introductory Statistics. Lecture 5: Density Curves and Normal Distributions (I)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STAT 155 Introductory Statistics Lecture 5: Density Curves and Normal Distributions (I) 9/12/06 Lecture 5 1 A problem about Standard Deviation A variable
More informationMTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
More informationAppendix C: Graphs. Vern Lindberg
Vern Lindberg 1 Making Graphs A picture is worth a thousand words. Graphical presentation of data is a vital tool in the sciences and engineering. Good graphs convey a great deal of information and can
More informationNumerical Measures of Central Tendency
Numerical Measures of Central Tendency Often, it is useful to have special numbers which summarize characteristics of a data set These numbers are called descriptive statistics or summary statistics. A
More informationChapter 3 Descriptive Statistics: Numerical Measures. Learning objectives
Chapter 3 Descriptive Statistics: Numerical Measures Slide 1 Learning objectives 1. Single variable Part I (Basic) 1.1. How to calculate and use the measures of location 1.. How to calculate and use the
More informationDescriptive Data Summarization
Descriptive Data Summarization (Understanding Data) First: Some data preprocessing problems... 1 Missing Values The approach of the problem of missing values adopted in SQL is based on nulls and threevalued
More informationNumerical Summarization of Data OPRE 6301
Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting
More information6. Distribution and Quantile Functions
Virtual Laboratories > 2. Distributions > 1 2 3 4 5 6 7 8 6. Distribution and Quantile Functions As usual, our starting point is a random experiment with probability measure P on an underlying sample spac
More information8.2 Confidence Intervals for One Population Mean When σ is Known
8.2 Confidence Intervals for One Population Mean When σ is Known Tom Lewis Fall Term 2009 8.2 Confidence Intervals for One Population Mean When σ isfall Known Term 2009 1 / 6 Outline 1 An example 2 Finding
More informationExercise 1.12 (Pg. 2223)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationBiostatistics Lab Notes
Biostatistics Lab Notes Page 1 Lab 1: Measurement and Sampling Biostatistics Lab Notes Because we used a chance mechanism to select our sample, each sample will differ. My data set (GerstmanB.sav), looks
More informationChapter 3: Data Description Numerical Methods
Chapter 3: Data Description Numerical Methods Learning Objectives Upon successful completion of Chapter 3, you will be able to: Summarize data using measures of central tendency, such as the mean, median,
More informationContinuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.
UCLA STAT 11 A Applied Probability & Statistics for Engineers Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology Teaching Assistant: Neda Farzinnia, UCLA Statistics University of California,
More informationHistogram. Graphs, and measures of central tendency and spread. Alternative: density (or relative frequency ) plot /13/2004
Graphs, and measures of central tendency and spread 9.07 9/13/004 Histogram If discrete or categorical, bars don t touch. If continuous, can touch, should if there are lots of bins. Sum of bin heights
More information103 Measures of Central Tendency and Variation
103 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.
More informationF. Farrokhyar, MPhil, PhD, PDoc
Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How
More informationReport of for Chapter 2 pretest
Report of for Chapter 2 pretest Exam: Chapter 2 pretest Category: Organizing and Graphing Data 1. "For our study of driving habits, we recorded the speed of every fifth vehicle on Drury Lane. Nearly every
More informationFrequency Distributions
Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data to get a general overview of the results. Remember, this is the goal
More informationBASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi110 012 seema@iasri.res.in Genomics A genome is an organism s
More informationFrequency distributions, central tendency & variability. Displaying data
Frequency distributions, central tendency & variability Displaying data Software SPSS Excel/Numbers/Google sheets Social Science Statistics website (socscistatistics.com) Creating and SPSS file Open the
More informationContent DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS
Content DESCRIPTIVE STATISTICS Dr Najib Majdi bin Yaacob MD, MPH, DrPH (Epidemiology) USM Unit of Biostatistics & Research Methodology School of Medical Sciences Universiti Sains Malaysia. Introduction
More informationNominal Scaling. Measures of Central Tendency, Spread, and Shape. Interval Scaling. Ordinal Scaling
Nominal Scaling Measures of, Spread, and Shape Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning The lowest level of
More information1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics)
1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics) As well as displaying data graphically we will often wish to summarise it numerically particularly if we wish to compare two or more data sets.
More informationThe Big 50 Revision Guidelines for S1
The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand
More informationStatistical Concepts and Market Return
Statistical Concepts and Market Return 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 2 2. Some Fundamental Concepts... 2 3. Summarizing Data Using Frequency Distributions...
More informationDescriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion
Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research
More informationIf the Shoe Fits! Overview of Lesson GAISE Components Common Core State Standards for Mathematical Practice
If the Shoe Fits! Overview of Lesson In this activity, students explore and use hypothetical data collected on student shoe print lengths, height, and gender in order to help develop a tentative description
More information13.2 Measures of Central Tendency
13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationData Analysis: Describing Data  Descriptive Statistics
WHAT IT IS Return to Table of ontents Descriptive statistics include the numbers, tables, charts, and graphs used to describe, organize, summarize, and present raw data. Descriptive statistics are most
More informationUnit 21 Student s t Distribution in Hypotheses Testing
Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between
More informationStatistical Analysis I
CTSI BERD Research Methods Seminar Series Statistical Analysis I Lan Kong, PhD Associate Professor Department of Public Health Sciences December 22, 2014 Biostatistics, Epidemiology, Research Design(BERD)
More informationIncome Distribution and Poverty Methods for Using Available Data in Global Analysis
Income Distribution and Poverty Methods for Using Available Data in Global Analysis Eric KempBenedict Original Report: April 7, 997 Revised Report: May 7, PoleStar echnical Note No 4 Disclaimer PoleStar
More informationLecture Topic 6: Chapter 9 Hypothesis Testing
Lecture Topic 6: Chapter 9 Hypothesis Testing 9.1 Developing Null and Alternative Hypotheses Hypothesis testing can be used to determine whether a statement about the value of a population parameter should
More informationvs. relative cumulative frequency
Variable  what we are measuring Quantitative  numerical where mathematical operations make sense. These have UNITS Categorical  puts individuals into categories Numbers don't always mean Quantitative...
More informationProbability and Statistics Vocabulary List (Definitions for Middle School Teachers)
Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence
More informationProbabilistic Analysis
Probabilistic Analysis Tutorial 81 Probabilistic Analysis This tutorial will familiarize the user with the basic probabilistic analysis capabilities of Slide. It will demonstrate how quickly and easily
More information4. DESCRIPTIVE STATISTICS. Measures of Central Tendency (Location) Sample Mean
4. DESCRIPTIVE STATISTICS Descriptive Statistics is a body of techniques for summarizing and presenting the essential information in a data set. Eg: Here are daily high temperatures for Jan 6, 29 in U.S.
More informationDongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
More informationDescriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
More informationChapter 6. The Standard Deviation as a Ruler and the Normal Model. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 6 The Standard Deviation as a Ruler and the Normal Model Copyright 2012, 2008, 2005 Pearson Education, Inc. The Standard Deviation as a Ruler The trick in comparing very differentlooking values
More informationThe right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median
CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box
More informationDescriptive statistics
Overview Descriptive statistics 1. Classification of observational values 2. Visualisation methods 3. Guidelines for good visualisation c Maarten Jansen STATF413 Descriptive statistics p.1 1.Classification
More informationHome Runs, Statistics, and Probability
NATIONAL MATH + SCIENCE INITIATIVE Mathematics American League AL Central AL West AL East National League NL West NL East Level 7 th grade in a unit on graphical displays Connection to AP* Graphical Display
More informationVariables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
More informationDescribe what is meant by a placebo Contrast the doubleblind procedure with the singleblind procedure Review the structure for organizing a memo
Readings: Ha and Ha Textbook  Chapters 1 8 Appendix D & E (online) Plous  Chapters 10, 11, 12 and 14 Chapter 10: The Representativeness Heuristic Chapter 11: The Availability Heuristic Chapter 12: Probability
More informationStatistics Summary (prepared by Xuan (Tappy) He)
Statistics Summary (prepared by Xuan (Tappy) He) Statistics is the practice of collecting and analyzing data. The analysis of statistics is important for decision making in events where there are uncertainties.
More informationMonte Carlo Method: Probability
John (ARC/ICAM) Virginia Tech... Math/CS 4414: The Monte Carlo Method: PROBABILITY http://people.sc.fsu.edu/ jburkardt/presentations/ monte carlo probability.pdf... ARC: Advanced Research Computing ICAM:
More informationExploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
More informationLecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course
More informationVisual Display of Data in Stata
Lab 2 Visual Display of Data in Stata In this lab we will try to understand data not only through numerical summaries, but also through graphical summaries. The data set consists of a number of variables
More informationSociology 6Z03 Topic 15: Statistical Inference for Means
Sociology 6Z03 Topic 15: Statistical Inference for Means John Fox McMaster University Fall 2016 John Fox (McMaster University) Soc 6Z03: Statistical Inference for Means Fall 2016 1 / 41 Outline: Statistical
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationGeorgia Standards of Excellence Curriculum Frameworks. Mathematics. GSE Coordinate Algebra Unit 4: Describing Data
Georgia Standards of Excellence Curriculum Frameworks Mathematics GSE Coordinate Algebra Unit 4: Describing Data Unit 4 Describing Data Table of Contents OVERVIEW... 3 STANDARDS ADDRESSED IN THIS UNIT...
More informationNorthumberland Knowledge
Northumberland Knowledge Know Guide How to Analyse Data  November 2012  This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about
More informationConfidence Intervals for One Standard Deviation Using Standard Deviation
Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from
More informationWeek 1. Exploratory Data Analysis
Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam
More informationAP Statistics: Syllabus 3
AP Statistics: Syllabus 3 Scoring Components SC1 The course provides instruction in exploring data. 4 SC2 The course provides instruction in sampling. 5 SC3 The course provides instruction in experimentation.
More information3. Continuous Random Variables
3. Continuous Random Variables A continuous random variable is one which can take any value in an interval (or union of intervals) The values that can be taken by such a variable cannot be listed. Such
More informationStatistics Chapter 3 Averages and Variations
Statistics Chapter 3 Averages and Variations Measures of Central Tendency Average a measure of the center value or central tendency of a distribution of values. Three types of average: Mode Median Mean
More informationCOMMON CORE STATE STANDARDS FOR
COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in
More informationAlthough scatter plots and trend lines may reveal a pattern, the relationship of the variables may indicate a correlation, but not causation.
Middletown Public Schools Mathematics Unit Planning Organizer Subject Math Grade/Course Algebra I Unit 5 Scatterplots and Trendlines Duration 14 instructional days + days reteaching/enrichment Big Idea(s)
More informationLecture 7 Linear Regression Diagnostics
Lecture 7 Linear Regression Diagnostics BIOST 515 January 27, 2004 BIOST 515, Lecture 6 Major assumptions 1. The relationship between the outcomes and the predictors is (approximately) linear. 2. The error
More informationModule 4: Data Exploration
Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive
More informationMAT 12O ELEMENTARY STATISTICS I
LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 12O ELEMENTARY STATISTICS I 3 Lecture Hours, 1 Lab Hour, 3 Credits PreRequisite:
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More informationContinuous Distributions
MAT 2379 3X (Summer 2012) Continuous Distributions Up to now we have been working with discrete random variables whose R X is finite or countable. However we will have to allow for variables that can take
More informationContinuous Random Variables
Continuous Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Continuous Random Variables 2 11 Introduction 2 12 Probability Density Functions 3 13 Transformations 5 2 Mean, Variance and Quantiles
More informationLean Six Sigma Training/Certification Book: Volume 1
Lean Six Sigma Training/Certification Book: Volume 1 Six Sigma Quality: Concepts & Cases Volume I (Statistical Tools in Six Sigma DMAIC process with MINITAB Applications Chapter 1 Introduction to Six Sigma,
More informationAppendix E: Graphing Data
You will often make scatter diagrams and line graphs to illustrate the data that you collect. Scatter diagrams are often used to show the relationship between two variables. For example, in an absorbance
More informationIntroduction to Descriptive Statistics
Mathematics Learning Centre Introduction to Descriptive Statistics Jackie Nicholas c 1999 University of Sydney Acknowledgements Parts of this booklet were previously published in a booklet of the same
More information