Kernel density function for protein abundance

Size: px
Start display at page:

Download "Kernel density function for protein abundance"

Transcription

1 Kernel density function for protein abundance March, 2012 Kernel density functions clearly show the differential abundance of a protein between one set of samples, that is one sample category, and another category. Since kernel density functions are less familiar than other ways of displaying the measurements, the first section of this document will review the more familiar ways of plotting the data and show that the kernel density function provides an informative alternate view. The second section will show what the kernel density function is and how it is a perfectly reasonable extension to the familiar histogram. Scaffold Q+ estimates the abundance of each protein from the measurements made on the protein s peptides. In SILAC these measurements are made for each peptide. In itraq or TMT these measurements are made by measuring the intensities of the reporter ions on the spectra. Why kernel density graphs are interesting In order to tell if proteins are expressed at different abundances under different conditions, it is important that the uncertainty of the measurements are calculated and displayed. In the following examples show the simple case where there are samples in two categories. The protein is differentially expressed if the difference in the abundance of the protein between the two categories is significantly higher than the errors of measurement. Line graph Suppose as in Figure 1 that only one itraq spectrum is measured. In this case it is impossible to know if there is differential expression between the sample categories because no error bars are displayed.

2 Figure 1. The itraq measurements for a single spectrum without error bars for samples in two categories. The first category has samples Quant 1 and Quant 3 and the second category has samples Quant 2 and Quant 4. If the measurements for all the proteins are displayed as in Figure 2, the spread of the measured values for each sample begins to give a sense of the difference between the sample categories. Figure 2. Multiple peptides measurements give a sense of the accuracy with which the protein abundance is measured. Box plot A better feeling for the data, especially if there are a number of measurements, can be seen with a box plot as in figure 3. In a box plot the median value is in the middle and the ends of the box are at the 25 and 75 percentile points.

3 Figure 3. Box plot of the data shows the data summarized for each sample. The measurement error is estimated by the height of the boxes and by the scatter in the medians of the samples. The box plots show the difference between samples, but what we really want to know is the difference between sample categories. To decide if the protein is differentially expressed you need to meld two things together in your mind, the variation between samples and the size of the error bars for each sample. This can get challenging in a case like that shown in Figure 4. Figure 4. Example of a box plot with many samples each having substantial error bars. It is not easy to see if there is a significant difference between the samples. Kernel density To see if the proteins are differentially expressed, a graph that summarizes the data to the category level would be very useful. Let s return to the simpler two category example from Figure 3 to look at the categories graphically. The curves shown in Figure 5 which provide this category level summary are called kernel density functions. The red kernel density function looks quite like a bell-shaped normal distribution. It summarizes all the measurements made by all the spectra for all the samples in the category 1.

4 Figure 5. In this kernel density function for the protein shown in Figure 3 the blue line is for the reference category and the red line for category 1. This density function is interpreted much like a normal probability distribution. For example, it shows that these measurements fall between about 1,100 and 2,300 and are most likely to be around 1,700. The area under the whole curve is 1.0 and the area under the curve between 1,500 and 2,000 is the probability that the data is in this range. The blue curve in Figure 5 is the kernel density function for the reference category. While this kernel density doesn t look like a normal distribution since it has a shoulder on the right side. The shoulder in Figure 5 arises because the two blue samples in Figure 3 have different intensities. However this is a protein density distribution that is interpreted like other such curves. So for example while the blue curve isn t as tall as the red curve but it is wider so that the area under the blue curve is also 1.0. From Figure 5 it is apparent that the measurements for the two different categories overlap quite a bit. Does this overlap mean that the proteins are not differentially expressed? Not necessarily. A protein is differentially expressed if its abundance in one category of samples, calculated by averaging, is difference from its abundance in the second category. But recall that the abundance of the protein is estimated by averaging the individual measurements. Distribution of means So in order to clearly show differential expression a graph should show whether one average is significantly different from the second. Naively you might think that a calculating the average (whether mean or median) gives only one value. However when you calculate the mean from a list of numbers, it is not precise since the numbers that the

5 mean is calculated from are only estimates of the true abundance. This uncertainty in the mean is sometimes called the confidence interval for the mean. Figure 6. The range of values for the average of the pink kernel density functions is plotted as the pink shaded area. Similarly the blue shaded area shows the range of values for the average of the blue kernel density function. In Figure 6 the uncertainty in the mean for each category is plotted as a shaded area. This estimate of the mean is also plotted as a probability density function. Statisticians tell us that the uncertainty of the mean is less than the uncertainty in the original measurements. In fact roughly speaking width of the mean kernel density function will be half the width of the data kernel density function if there are 4 measurements, it will be one third if there are 9 measurements, it will be one quarter if there are 16 measurements and so on. This is strictly true for normal distributions. Since as we have seen some distributions are not even approximately normal, Scaffold Q+ estimates the uncertainty in the mean using a statistical procedure called bootstrapping. See the chapter on bootstrapping the mean for details. Differential expression For the data in our example Figure 6 graph shows the shaded areas overlap only very little. That means that there is only a small probability that the means are the same. In other words, the protein is differentially expressed. Of course there are also a variety of statistical tests that are used to tell if a protein is differentially expressed. But many of these tests make assumptions about the data. The most common assumption is that it follows a normal distribution. By looking at the kernel density graphs you can assess whether this assumption is correct and so whether the statistical test will give reasonable results.

6 In summary the kernel density graphs in Scaffold Q+ provide an easy way to see if a protein is differentially expressed. Kernel density functions can also represent two additional features that are important when estimating protein abundances. The first is the weight you want to give to each measurement and the second is the accuracy of each measurement. These topics will be discussed in the next section. What a kernel density function is A kernel density function is a kind of histogram. A histogram is a very common way to summarize graphically a set of measurements so that they will be easier to understand. The kernel density function has all the advantages of a histogram plus several more. In particular the kernel density function is a probability distribution can take into consideration the weight and accuracy of each data point. To see what a kernel density function is and how it takes these things into consideration, we will look at a simple artificial example in some detail. Original data We will start our example with a set of seven measurements as shown in Figure 7. Figure 7. The seven data points that will be used in to demonstrate what a kernel density function is. The values are on the x-axis, the y-axis in this figure is irrelevant. Histogram The first step in our journey to understand kernel density functions is to make a histogram. Here we will make a histogram as shown in Figure 8 with a bin width of 1.0. Each data point corresponds to a little box. Some bins will have one data point, some will have two boxes stacked up. Note that in our histogram in Figure 8 that each box has the same area, in this case 1.0.

7 Figure 8. A histogram of the data in Figure 7. The vertical axis is the count of the number of data points in the bins. Another way to view this histogram leaves out the internals and gives a line outlining the graph as in Figure 9. This graph is a sum of the boxes in the histogram Figure 9. This is a simplified view of the histogram is made by summing the number of boxes in each bin. Weighted histogram In itraq and SILAC experiments some of the data points are less trustworthy than others. For example these data points might be measurements where the signal is in the noise or where the signal is saturated.

8 One way to deal with these less trustworthy data points is to weigh them less when calculating the mean or median. As a really simple example of a weighted mean first consider the average of 2 and 6 which is (2 + 6)/2 = 4. Now if the measurement 2 has a weight of 3 and the measurement 6 has a weight of 1, then the weighted mean is (2*3 + 6*1)/(3+1) = 3. While the mean (4) is halfway between 2 and 6, the weighted mean (3) is closer to 2 than to 6. The chapter on Weights will discuss how Scaffold Q+ comes up with weights for the data. Without worrying about what the weights should be, let s look at how they can be incorporated into a histogram. For our example data in Figure 7 let s assign some weights to the measurements. To make a histogram of this weighted data, the heights of the boxes that make up the histogram are multiplied by the weighting factors. The weighted histogram in Figure 10 shows that not all the boxes are the same height Figure 10. A weighted histogram reflects the weights of the measurements by varying the height of the boxes that make up the histogram. Once again we can show, as in Figure 11, a graph that is a sum of the heights of the boxes in the histogram.

9 Figure 11. The weighted histogram is made from the sum of the heights of each point in each bin. The histogram in Figure 11 does a better job than Figure 9 does of representing the original data if not all the data is equally trustworthy. The center value of the histogram is approximately the weighted median of the original data points. In the next section we will do one more step that will make the center of the figure be the weighted median. Data centered histogram The histogram above has had bins of equal width centered 8, 9, 10, 11 and 12. A variation on the histogram is to plot each box centered on the original data point. That is the first box will be centered at 8.4, the second at 9.2, the third at 9.7 and so on. In Figure 12 these boxes are shown without being stacked up

10 Figure 12. The boxes from Figure 10 are arranged with their centers located at the original data point instead of on integer boundaries. The sides of the boxes have been made slightly slanted so they don t step on each other. In the same way that the histogram was made in previous sections, we can add up the heights of the boxes at each point to make a data centered histogram as in Figure Figure 13. A histogram created by summing at each point the boxes in Figure 12. The little spikes are artifacts caused by the slanted sides of the boxes in Figure 12. The histogram in Figure 13 summarizes the weighted data. The center of this distribution, that is its median, is the weighted median of the original data points. Histograms of data that has error bars We have just seen how to modify the histogram concept to reflect different weights for different data points. While the data points are weighted differently in the series of histograms shown above, it was also assumed that the data values are known precisely. You might think that the poor data points would be given small weights and therefore it is overkill to also penalize the same data points by giving them large error bars. However to a large extent the weights affect the where the center of the histogram is and not its width and the error bars affect the histogram s width and not its center. Now let s see how to modify the histogram concept to reflect uncertainty in the data values. For our sample data, let each data point be given error bars that reflect how precisely the point was measured. For example data point 8.4 becomes the range between and Similarly data point 10.6 becomes 10.6 plus or minus 1.1. The discussion of

11 how the error bars are estimated will be covered in the chapter on Error Bars. These error bars are shown in Figure 14. Figure 14. Data points from Figure 7 with error bars. Now let us take the boxes from Figure 12 and stretch them out to the widths of the error bars. At the same time let s scale down the height of each box show that its area is still the same as it was in Figure 12. The rescaled boxes are shown in Figure Figure 15. Each weighted data value box in Figure 12 is now stretched out to reflect the error bars associated with the measured precision of the data points. Once again the box heights in Figure 15 can be summed at each point to give another generalization of the histogram as shown in Figure 16.

12 Figure 16. This histogram takes into account both the weighting of the data points and the precision of the measurement at each point. This version of the histogram more accurately reflects all that we know about the data. Not only are the data values, but also their weight and precision are factored into the graph. This version of the histogram assumes that the measurement of a data point can be equally at any point in the range covered by the error bars. That is, the first data point, the one that was measured at 8.4, is equally likely to be anywhere between 7.1 and 9.7. A more realistic assumption might be that it is more likely to be close to the 8.4 that was originally measured. This further modification to the concept of the histogram will be covered in the next section. Kernel density function The box representing each data point on the histogram can be thought of as a probability distribution with a uniform probability function over the distance represented by the error bars. For instance the real value of each point might be distributed like the normal probability distribution around the measured value. For instance physicists have used this function and called their version of the histogram the gaussian ideograph. Statisticians have found that for approximating a histogram assigning each data point to almost any function that looks sort of hump shaped works about as well. This hump function is the kernel of the kernel density function. For example the cosine function shown in Figure 17 as the solid line works as well as the normal curve shown as a dashed line.

13 Figure 17. The cosine curve, the blue solid line, matches the normal distribution curve, the dashed red line, fairly well except for the tails. Now if we replace each box in Figure 15 with a cosine kernel we get Figure Figure 18. The data points can be represented by kernels which represent the probability that the measurements were not precise. As we have done before we can sum the values of all the curves to get the generalized histogram that is shown in Figure 19.

14 Figure 19. The smoothed histogram summarizes the data and its weights and the measurement accuracy of each data points. The probability density function is much like a histogram except that the area under the curve adds up to 1.0. The smoothed histogram can be converted into a probability distribution by rescaling the y-axis so that the area under the curve is 1.0. This probability distribution is called the kernel density function. The kernel density function is shown in Figure Figure 20. The kernel density function shows the probability distribution for the data points. The red vertical line is the center of the distribution. The yellow lines mark off the inter quartile ranges at 25% and 75%. The original data points are shown as the diamonds along the x-axis.

15 Kernel density function vs normal distribution How does this kernel density function compare to the normal distribution? The normal distribution of the same data shown above in Figure 20 is shown below in Figure Figure 21. The normal probability distribution for the same data as represented in Figure 20 by the kernel density function. The yellow lines mark off the inter quartile range at 25% and 75%. The original data points are shown as the diamonds along the x-axis. The normal distribution is a theoretical distribution. It is commonly used because the statisticians have a theorem that says that if you have a large enough number of data points, your data frequently looks like a normal distribution. They also have all sorts of statistical tests that you can apply to the data in a normal distribution. The limitations of a using a normal distribution and the standard statistical tests are: 1. Real data may not follow a normal distribution very well. For example the itraq data in Figure 22 below shows a decidedly non-normal aspect.

16 Figure 22. A) The protein intensity was measured in two samples each of which has a distribution show by the box plot. B) Combining these two samples gives an asymmetric probability distribution, not a normal distribution. 2. A normal distribution treats all the data as points. All errors are assumed to be of the same size and kind and can be treated in one way. However in itraq datasets for example some errors depend on the intensity of the data, some depend upon the sample, and some are random. The kernel density function, like the histogram, is an empirical distribution. It shows the data as it is, not as it is fitted to a standard distribution. The advantages of this are that 1) the each data point can be weighted based upon its trustworthiness, 2) given a probability distribution based upon its accuracy. It also allows combining data from different samples which each have their own distribution of data into one overall distribution. The disadvantage of the kernel density function is that the ordinary statistical tests such as the t-test and ANOVA can t be applied. However there are alternative statistical procedures such as the bootstrap and permutation test that can be applied. End of file

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

What Does the Normal Distribution Sound Like?

What Does the Normal Distribution Sound Like? What Does the Normal Distribution Sound Like? Ananda Jayawardhana Pittsburg State University ananda@pittstate.edu Published: June 2013 Overview of Lesson In this activity, students conduct an investigation

More information

Unit 7: Normal Curves

Unit 7: Normal Curves Unit 7: Normal Curves Summary of Video Histograms of completely unrelated data often exhibit similar shapes. To focus on the overall shape of a distribution and to avoid being distracted by the irregularities

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize

More information

6 3 The Standard Normal Distribution

6 3 The Standard Normal Distribution 290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

More information

Jitter Measurements in Serial Data Signals

Jitter Measurements in Serial Data Signals Jitter Measurements in Serial Data Signals Michael Schnecker, Product Manager LeCroy Corporation Introduction The increasing speed of serial data transmission systems places greater importance on measuring

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

More information

Statistics Revision Sheet Question 6 of Paper 2

Statistics Revision Sheet Question 6 of Paper 2 Statistics Revision Sheet Question 6 of Paper The Statistics question is concerned mainly with the following terms. The Mean and the Median and are two ways of measuring the average. sumof values no. of

More information

7. Normal Distributions

7. Normal Distributions 7. Normal Distributions A. Introduction B. History C. Areas of Normal Distributions D. Standard Normal E. Exercises Most of the statistical analyses presented in this book are based on the bell-shaped

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

TEACHER NOTES MATH NSPIRED

TEACHER NOTES MATH NSPIRED Math Objectives Students will understand that normal distributions can be used to approximate binomial distributions whenever both np and n(1 p) are sufficiently large. Students will understand that when

More information

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random

More information

Common Tools for Displaying and Communicating Data for Process Improvement

Common Tools for Displaying and Communicating Data for Process Improvement Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot

More information

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

More information

Bar Graphs and Dot Plots

Bar Graphs and Dot Plots CONDENSED L E S S O N 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

9. Sampling Distributions

9. Sampling Distributions 9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

More information

The Normal Distribution

The Normal Distribution Chapter 6 The Normal Distribution 6.1 The Normal Distribution 1 6.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize the normal probability distribution

More information

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests Content Sheet 7-1: Overview of Quality Control for Quantitative Tests Role in quality management system Quality Control (QC) is a component of process control, and is a major element of the quality management

More information

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS CENTRAL LIMIT THEOREM (SECTION 7.2 OF UNDERSTANDABLE STATISTICS) The Central Limit Theorem says that if x is a random variable with any distribution having

More information

Describing Populations Statistically: The Mean, Variance, and Standard Deviation

Describing Populations Statistically: The Mean, Variance, and Standard Deviation Describing Populations Statistically: The Mean, Variance, and Standard Deviation BIOLOGICAL VARIATION One aspect of biology that holds true for almost all species is that not every individual is exactly

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Nonparametric statistics and model selection

Nonparametric statistics and model selection Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.

More information

Reflection and Refraction

Reflection and Refraction Equipment Reflection and Refraction Acrylic block set, plane-concave-convex universal mirror, cork board, cork board stand, pins, flashlight, protractor, ruler, mirror worksheet, rectangular block worksheet,

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Descriptive statistics; Correlation and regression

Descriptive statistics; Correlation and regression Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Characteristics of Binomial Distributions

Characteristics of Binomial Distributions Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

More information

The Perverse Nature of Standard Deviation Denton Bramwell

The Perverse Nature of Standard Deviation Denton Bramwell The Perverse Nature of Standard Deviation Denton Bramwell Standard deviation is simpler to understand than you think, but also harder to deal with. If you understand it, you can use it for sensible decision

More information

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Measurement with Ratios

Measurement with Ratios Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

chapter >> Making Decisions Section 2: Making How Much Decisions: The Role of Marginal Analysis

chapter >> Making Decisions Section 2: Making How Much Decisions: The Role of Marginal Analysis chapter 7 >> Making Decisions Section : Making How Much Decisions: The Role of Marginal Analysis As the story of the two wars at the beginning of this chapter demonstrated, there are two types of decisions:

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

Dealing with Data in Excel 2010

Dealing with Data in Excel 2010 Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Chapter 4. Polynomial and Rational Functions. 4.1 Polynomial Functions and Their Graphs

Chapter 4. Polynomial and Rational Functions. 4.1 Polynomial Functions and Their Graphs Chapter 4. Polynomial and Rational Functions 4.1 Polynomial Functions and Their Graphs A polynomial function of degree n is a function of the form P = a n n + a n 1 n 1 + + a 2 2 + a 1 + a 0 Where a s

More information

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions Typical Inference Problem Definition of Sampling Distribution 3 Approaches to Understanding Sampling Dist. Applying 68-95-99.7 Rule

More information

sample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted

sample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted Sample uartiles We have seen that the sample median of a data set {x 1, x, x,, x n }, sorted in increasing order, is a value that divides it in such a way, that exactly half (i.e., 50%) of the sample observations

More information

Frequency Distributions

Frequency Distributions Descriptive Statistics Dr. Tom Pierce Department of Psychology Radford University Descriptive statistics comprise a collection of techniques for better understanding what the people in a group look like

More information

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds Isosceles Triangle Congruent Leg Side Expression Equation Polynomial Monomial Radical Square Root Check Times Itself Function Relation One Domain Range Area Volume Surface Space Length Width Quantitative

More information

Probability. Distribution. Outline

Probability. Distribution. Outline 7 The Normal Probability Distribution Outline 7.1 Properties of the Normal Distribution 7.2 The Standard Normal Distribution 7.3 Applications of the Normal Distribution 7.4 Assessing Normality 7.5 The

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

1051-232 Imaging Systems Laboratory II. Laboratory 4: Basic Lens Design in OSLO April 2 & 4, 2002

1051-232 Imaging Systems Laboratory II. Laboratory 4: Basic Lens Design in OSLO April 2 & 4, 2002 05-232 Imaging Systems Laboratory II Laboratory 4: Basic Lens Design in OSLO April 2 & 4, 2002 Abstract: For designing the optics of an imaging system, one of the main types of tools used today is optical

More information

Mathematical goals. Starting points. Materials required. Time needed

Mathematical goals. Starting points. Materials required. Time needed Level S6 of challenge: B/C S6 Interpreting frequency graphs, cumulative cumulative frequency frequency graphs, graphs, box and box whisker and plots whisker plots Mathematical goals Starting points Materials

More information

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER seven Statistical Analysis with Excel CHAPTER chapter OVERVIEW 7.1 Introduction 7.2 Understanding Data 7.3 Relationships in Data 7.4 Distributions 7.5 Summary 7.6 Exercises 147 148 CHAPTER 7 Statistical

More information

Graphical Integration Exercises Part Four: Reverse Graphical Integration

Graphical Integration Exercises Part Four: Reverse Graphical Integration D-4603 1 Graphical Integration Exercises Part Four: Reverse Graphical Integration Prepared for the MIT System Dynamics in Education Project Under the Supervision of Dr. Jay W. Forrester by Laughton Stanley

More information

Describing and presenting data

Describing and presenting data Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data

More information

Continuing, we get (note that unlike the text suggestion, I end the final interval with 95, not 85.

Continuing, we get (note that unlike the text suggestion, I end the final interval with 95, not 85. Chapter 3 -- Review Exercises Statistics 1040 -- Dr. McGahagan Problem 1. Histogram of male heights. Shaded area shows percentage of men between 66 and 72 inches in height; this translates as "66 inches

More information

SPSS Manual for Introductory Applied Statistics: A Variable Approach

SPSS Manual for Introductory Applied Statistics: A Variable Approach SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Descriptive statistics consist of methods for organizing and summarizing data. It includes the construction of graphs, charts and tables, as well various descriptive measures such

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Scatter Plots with Error Bars

Scatter Plots with Error Bars Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

More information

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Questions: Does it always take the same amount of force to lift a load? Where should you press to lift a load with the least amount of force?

Questions: Does it always take the same amount of force to lift a load? Where should you press to lift a load with the least amount of force? Lifting A Load 1 NAME LIFTING A LOAD Questions: Does it always take the same amount of force to lift a load? Where should you press to lift a load with the least amount of force? Background Information:

More information

AP * Statistics Review. Descriptive Statistics

AP * Statistics Review. Descriptive Statistics AP * Statistics Review Descriptive Statistics Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Unit 9 Describing Relationships in Scatter Plots and Line Graphs Unit 9 Describing Relationships in Scatter Plots and Line Graphs Objectives: To construct and interpret a scatter plot or line graph for two quantitative variables To recognize linear relationships, non-linear

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Exploratory Spatial Data Analysis

Exploratory Spatial Data Analysis Exploratory Spatial Data Analysis Part II Dynamically Linked Views 1 Contents Introduction: why to use non-cartographic data displays Display linking by object highlighting Dynamic Query Object classification

More information

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set. SKEWNESS All about Skewness: Aim Definition Types of Skewness Measure of Skewness Example A fundamental task in many statistical analyses is to characterize the location and variability of a data set.

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Updates to Graphing with Excel

Updates to Graphing with Excel Updates to Graphing with Excel NCC has recently upgraded to a new version of the Microsoft Office suite of programs. As such, many of the directions in the Biology Student Handbook for how to graph with

More information

Analyzing Data with GraphPad Prism

Analyzing Data with GraphPad Prism 1999 GraphPad Software, Inc. All rights reserved. All Rights Reserved. GraphPad Prism, Prism and InStat are registered trademarks of GraphPad Software, Inc. GraphPad is a trademark of GraphPad Software,

More information

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Petrel TIPS&TRICKS from SCM

Petrel TIPS&TRICKS from SCM Petrel TIPS&TRICKS from SCM Knowledge Worth Sharing Histograms and SGS Modeling Histograms are used daily for interpretation, quality control, and modeling in Petrel. This TIPS&TRICKS document briefly

More information

Mathematics (Project Maths Phase 1)

Mathematics (Project Maths Phase 1) 2012. M128 S Coimisiún na Scrúduithe Stáit State Examinations Commission Leaving Certificate Examination, 2012 Sample Paper Mathematics (Project Maths Phase 1) Paper 2 Ordinary Level Time: 2 hours, 30

More information