- A few more notes about Z - SPSS and the normal curve - Chapter 6: Samples vs. Populations - Convenience/accidental sampling: why online polls suck



Similar documents
z-scores AND THE NORMAL CURVE MODEL

The Assumption(s) of Normality

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

Foundation of Quantitative Data Analysis

CHAPTER 14 NONPARAMETRIC TESTS

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey):

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

CALCULATIONS & STATISTICS

Lab 11. Simulations. The Concept

Descriptive Statistics and Measurement Scales

Week 3&4: Z tables and the Sampling Distribution of X

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Normal Distribution Lecture Notes

Chapter 1: The Nature of Probability and Statistics

Frequency Distributions

The Math. P (x) = 5! = = 120.

Elementary Statistics

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Using SPSS, Chapter 2: Descriptive Statistics

Chapter 2: Descriptive Statistics

Chapter 3. The Normal Distribution

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

8. THE NORMAL DISTRIBUTION

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.

Northumberland Knowledge

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

An SPSS companion book. Basic Practice of Statistics

Projects Involving Statistics (& SPSS)

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. C) (a) 2. (b) 1.5. (c)

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Mind on Statistics. Chapter 2

Odds ratio, Odds ratio test for independence, chi-squared statistic.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Interpreting Data in Normal Distributions

Confidence intervals

Example: Find the expected value of the random variable X. X P(X)

Lesson 2: Constructing Line Graphs and Bar Graphs

CONTINUOUS IMPROVEMENT EXERCISE L. Leslie Gardner, Ph.D., Assistant Professor School of Business University of Indianapolis

Descriptive Statistics

Math 108 Exam 3 Solutions Spring 00

Descriptive Statistics

Teaching & Learning Plans. Plan 1: Introduction to Probability. Junior Certificate Syllabus Leaving Certificate Syllabus

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Exploratory Data Analysis. Psychology 3256

AMS 7L LAB #2 Spring, Exploratory Data Analysis

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

WISE Sampling Distribution of the Mean Tutorial

SAMPLING DISTRIBUTIONS

The Chi-Square Test. STAT E-50 Introduction to Statistics

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Chapter 23. Inferences for Regression

How To Understand The Scientific Theory Of Evolution

The Normal Distribution

AMS 5 CHANCE VARIABILITY

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA

Sample Term Test 2A. 1. A variable X has a distribution which is described by the density curve shown below:

Scatter Plots with Error Bars

Midterm Review Problems

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Experimental Designs (revisited)

A Picture Really Is Worth a Thousand Words

The Basics of Building Credit Answer Guides

Probability Distributions

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

g. The mean is found by putting the data in order and choosing the middle data value.

How To Test For Significance On A Data Set

Normal and Binomial. Distributions

Describing, Exploring, and Comparing Data

Review #2. Statistics

Using Excel for Statistical Analysis

List of Examples. Examples 319

Chapter 7 Review. Confidence Intervals. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

TIPS FOR DOING STATISTICS IN EXCEL

6.2 Normal distribution. Standard Normal Distribution:

Lesson 7 Z-Scores and Probability

DESCRIPTIVE STATISTICS & DATA PRESENTATION*

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Lesson 4 Measures of Central Tendency

8 6 X 2 Test for a Variance or Standard Deviation

Why Sample? Why not study everyone? Debate about Census vs. sampling

Determining the Acceleration Due to Gravity

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. A) B) C) D) 0.

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

9.07 Introduction to Statistical Methods Homework 4. Name:

Stat 20: Intro to Probability and Statistics

HYPOTHESIS TESTING WITH SPSS:

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

STAT 350 Practice Final Exam Solution (Spring 2015)

AP STATISTICS TEST #2 - REVIEW - Ch. 14 &15 Period:

Exercise 1.12 (Pg )

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

The Normal Distribution

Transcription:

- A few more notes about Z - SPSS and the normal curve - Chapter 6: Samples vs. Populations - Convenience/accidental sampling: why online polls suck

Last day, we looked at the relationship between standard scores (z-scores) and raw scores.

For example, if the average alcohol consumption of all towns had a mean μ = 8/week and σ = 2/week. If in Burnaby, people drank an average of 7.2/week. Their z-score would be

Z= -0.4, and they would drink more than 34.46% of towns as a whole, or less than 65.54% towns as a whole.

Z scores and SPSS. Start with the data set from the web page Dragons. There are a bunch of variables of 300 adult bearded dragons (artificially made, sorry). We ll be using this dataset for some future exercises, so it has more than we need at the moment.

Go to Analyze Descriptive Stats Frequencies, and choose Weight and Length Go to Statistics, and choose Mean, Median, and Standard Deviation. Go to Charts, select Histogram, and check the box Include normal curve.

The number of bearded dragons in each equally spaced category is the height of each bar in the histogram. The bars are about the same height as the normal curve, so length is approximately normal.

The weight of bearded dragons is right-skewed, so weight is non-normal. Likewise, the mean is greater than the median.

Basil has a length of 24 cm, given that μ = 27.83 cm, σ = 5.06 cm, we get the z-score. Z = (X - μ ) / σ = (24 27.83) / 5.06= -0.76 By the table he s bigger than 22.36% of the dragons.

We can verify by getting the 22.36 th percentile, under Analyse Descriptive Frequencies and in Stats again.

Then click Percentile(s), put in 22.36% and click Add.

For this data set, 22.36% of the values are below 24, which is close to basil s weight of 24. We only have a sample of dragons, so it s not going to be dead on. For perfect precision, we would need the entire population of bearded dragons.

Beginning of Chapter 6: Samples and Populations

Usually we re interested in the features of an entire population, but often it s impossible to get information about every single member of that population. Instead we take a sample, which is a small portion of the population of interest. We hope the sample represents the population fairly.

Example: Blood test. If you re going for a blood test, you re interested in knowing the state of all the blood. Rather than take ALL the blood out of you to test, the clinic will take a SAMPLE of your blood as a representative.

Example: Phone polls In an opinion poll, we re interested in the opinion of all the people in an area. (The parameter) What we get are the opinions of the people that we call and ask. (The statistic)

The parameter (of the population) is what we want. A statistic (of a sample) is what we get. What we want What we get

The symbols we use reflect this relationship: Statistics, the values pertaining to Samples, have ordinary looking symbols like for the mean, or s for the standard deviation. Parameters, the value related to Populations, have fancy greek symbols like μ for the mean and σ for the standard deviation.

Mnemonic (memory trick):

Application: Label each of the bolded values as a statistic or a parameter. Of the 1046 people polled, 719 knew where the circuit breaker was in their home. (Statistic, 1046 polled is a SAMPLE) Of all the people in Vancouver, 70% of them know where the circuit breaker was in their home. Parameter, all of Vancouver is the population)

A car was tested and found consume 7.8 L per 100km on the highway. Canada consumes 24.2 Barrels of Oil per year per capita.

Alice won the election with 55% of the votes. But the week before, the polls showed her at 42%.

In all of these sample examples, we re making one really big assumption: The sample is representative of the population. This lets us take the sample and generalize it to the whole population. e.g. The car we tested consumed 7.8L/100km, we assume that most cars of the same model and year will have similar mileage.

To make this assumption of representation, our sample has to chosen randomly. Random for our purposes means every member of the population has an equal chance to be in the sample. (Important!)

A simple random sample, or SRS, is a sample in which every member has an equal chance of being in the sample AND this is independent of other members. In other words, an SRS is a random sample with no other structure / plan to it. (also important)

Example: Raffle tickets From a large drum of names, pick a few. This is:

Example: Raffle tickets From a large drum of names, pick a few. This is: SRS.

Example: Opinion Polls. Opinion polls are done by choosing phone numbers at random and calling them. This is:

Example: Opinion Polls. Opinion polls are done by choosing phone numbers at random and calling them. This is: SRS. Simple Random Sample (SRS) because choosing one phone number isn t going to affect choosing another one.

Example: Class opinion. I try to get an opinion from the class by asking the front row. This is:

Example: Class opinion. I try to get an opinion from the class by asking the front row. This is: Not Random!! Why is not random bad in this case? People in the front of the class tend to be more engaged in the material and less likely to slumber. Engaged people are overrepresented.

Also, the people in the front have self-selected themselves to be there. That s a common problem with polls.

Polls on webpages and social media are self-selected. This means people are choosing for themselves to response, rather than being randomly chosen.

This is called convenience sampling, or accidental sampling. It s easy but it has a lot of problems. People that don t know about the poll or decide not to be polled have zero chance of being in the sample.

This is also why I made a to-do about the representative assumption in the class survey in the first week. Like the first row sample, it s probably over representing the engaged students, but making it random and compulsory seemed like overkill.

(for interest) Convenience/Accidental sampling can also be easy to manipulate. A specific group within the population can make a dedicated effort to throw the results in one direction artificially.

- Stratified Samples - Systematic Samples - Samples can vary - If time: Landlines and the Canadian election