COMP6053 lecture: The chi-squared test and distribution.

Save this PDF as:

Size: px
Start display at page:

Transcription

1 COMP6053 lecture: The chi-squared test and distribution

2 Assumption of normal error term In all of the techniques we've covered so far, there has been a background assumption that the noise in the data generating process was normally distributed. Taking ANOVA as an example, we assume that each group has a characteristic mean value and that a normal random variate is added to that mean to get the actual data point.

3 Normality assumption in ANOVA

4 A reasonable assumption? Often the normal-error-term assumption is a safe one: the truth is close enough that our inferential logic works well. But sometimes it is obvious that the data generating process could not have been some systematic effect plus a normal error term.

5 Distribution of X across 2 groups Values in group A and B are clearly being generated by two very different -- and nonnormal -- processes.

6 Distribution of X across 2 groups If we were trying to test for a difference between the means of samples A and B, we can't really trust the answer we'd get from a 2-sample t-test. We need some statistical tests that don't make the normally-distributed-errors assumption.

7 Non-parametric statistics The term "non-parametric" refers to statistical tools that do not assume normality in the error term of the generation process. There are non-parametric equivalents for most of the techniques we've discussed. Why not use them all the time? The mathematics are trickier, and we lose some power by dropping the normality assumption.

8 The chi-squared test This is a versatile non-parametric test. We can use it to ask whether a sample fits an expected distribution (of arbitrary shape). Consider throwing a die many times to see whether or not it's fair. We expect a uniform distribution: an equal number of 1s, 2s, 3s, etc.

9 The chi-squared test Most of the time we won't get a perfectly uniform distribution though. How far away from uniform would the results have to be before we would suspect the die was not fair? How might we measure deviation from the expected values?

10 Sample of 300 throws

11 Difference from what was expected? Obs. Exp. O - E (O-E)^

12 Difference from what was expected? We're "expecting" to see each number 50 times. We can tally up how far from the expected value each observation is. We then try squaring those values to deal with negative numbers. We're on our way to some measure of deviation from the expected values.

13 Scaling our measure The particular size of these numbers depends on our sample size: if we'd thrown the dice 30 times or 3000 times we'd have different numbers. We scale the "observed minus expected, squared" term somewhat by dividing through by the expected value.

14 Scaling our measure Obs. (O-E)^2 (O-E)^2 / E

15 The chi-squared test The final step is to sum the (O-E)^2 / E terms. The total value is the chi-squared statistic. In our example this number is But what distribution should we expect this to have? How do we translate this number to a p-value?

16 Repeated sampling experiment Let's generate 500 samples of 300 die throws, and calculate the chi-squared statistic each time. That should give us an idea of how often extreme deviations from uniformity (and thus large values on the chi-squared statistic) come up by chance.

17 Repeated sampling experiment

18 Repeated sampling experiment In practice, a p-value of 0.05 (i.e., the topmost 5% of the distribution) equates to a chisquared statistic of around 10. The exact value is For p = 0.01, the critical chi-squared value is Our observed value of 7.88 is not unusual.

19 How is the chi-squared statistic distributed? Imagine that we were doing the same kind of test not with die-rolling but with coin tossing. If we toss a fair coin 100 times, clearly the expected values for heads and tails are 50 each. In reality we'll get combinations like 48/52, 53/47, 45/55, etc. What's the distribution of those numbers?

20 The binomial distribution The binomial distribution describes the number of heads we expect to get from throwing a coin multiple times. The binomial describes the outcome of multiple Bernoulli trials, i.e., any situation where there's a probability p of one outcome and 1-p for the other. What does it look like?

21 Binomial normal distribution As N gets larger, the binomial approximates the normal distribution very closely.

22 From binomial to chi-squared So if the number of heads we get in multiple coin tosses is binomially distributed... And the binomial is approximately normal... Then the chi-squared procedure of squaring the differences from the expected value, then dividing by the expected value, will give us what?

23 From binomial to chi-squared Each term in our chi-squared procedure is taking an approximately normal distribution and squaring it. The chi-squared distribution is in fact the sum of K squared-standard-normal deviates (K is the degrees of freedom of the test). Ironic that we've returned to the normal distribution in an effort to avoid assuming it was present.

24 Degrees of freedom? If we toss a coin 100 times, it's easy to see how many degrees of freedom there are: if there are 53 heads, there must be 47 tails. Thus there's 1 degree of freedom for the chisquared test here. Similarly with the die: once we know how many 1s, 2s, 3s, 4s, and 5s, the number of 6s is determined. So there are 5 degrees of freedom.

25 The chi-squared distribution

26 Another use of chi-squared tests We've seen how to use the test to look at whether a one-dimensional distribution fits its expected values reasonably closely. What about two-dimensional distributions?

27 Recap: when to use which test? If we have a continuous outcome measure, and a binary predictor variable, we use a two-sample t-test. If we have a continuous outcome measure, and one or more categorical predictor variables, we use ANOVA. If we have a continuous outcome measure, and a single continuous predictor variable, we use simple linear regression.

28 Recap: when to use which test? If we have a continuous outcome measure and multiple continuous and categorical predictor variables, we use multiple regression. If we have a binary categorical outcome measure and multiple predictor variables, we use logistic regression.

29 When to use a chi-squared test? If we have a categorical outcome measure, and one (or more) categorical predictor variables, it turns out we can use a chisquared test. We're asking whether the observed incidence of outcomes for each level of the predictor variable could have happened by chance.

30 Voting example Suppose we're asking whether sex has any relevance to predicting the way people will vote. We ask 100 randomly selected men and 100 randomly selected women which party they voted for at the last election. Note that both variables are categorical.

31 Voting example Labour Tory Lib dem Men Women Number of men and women voting for each party shown, and the marginal totals.

32 Voting example: expected values Labour Tory Lib dem Men Women Using the marginal totals, we can easily calculate the expected number of counts in each cell if there were no relationship between sex and voting preference.

33 Voting example: chi-squared We now have six places to calculate "observed minus expected". We can use the same logic of the chisquared test as outlined earlier. Sum of: ( O - E ) ^ 2 / E.

34 How many degrees of freedom? How many degrees of freedom? You might think 5, because we have 6 observed counts. But there are not 5 numbers here that are free to vary if we are to hit the marginal totals. DF = num columns minus one, times num rows minus one. In this case: 2 x 1 = 2.

35 Voting example The calculation for our voting example works out at a chi-squared value of The critical value for a chi-squared distribution with 2 DF, p = 0.05, is Our test statistic is larger than the critical value, so we reject the null hypothesis of no link between sex and voting (assuming we're happy with the p = 0.05 threshold).

36 How to do this in R In R the easiest way to run a chi-squared test is to first produce a table. For the example: chisq.test(table(sex,voting)) The output includes the test statistic, the degrees of freedom, and the associated p- value.

37 Further non-parametric tests No time to cover them all in this lecture, but there are many tests that do not assume a normally distributed error term in the data generation process. The most useful is the rank-sum test (also known as the Wilcoxon rank-sum test and the Mann-Whitney U test).

38 Further non-parametric tests We would use the rank-sum test to investigate whether these two samples come from distributions with the same mean.

39 Further non-parametric tests It's based on the logic that if distribution A has a higher mean than distribution B, then values from sample A should be in the top half of the joint sample more often than those from sample B. Used in the same situations as a two-sample t-test. wilcox.test(avalues,bvalues)

40 Additional material There is no R script for this lecture; you just need the commands chisq.test, table, and wilcox. Here is the Python program for generating the histograms and sampling experiment.

Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2

Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable

Solutions: Problems for Chapter 3. Solutions: Problems for Chapter 3

Problem A: You are dealt five cards from a standard deck. Are you more likely to be dealt two pairs or three of a kind? experiment: choose 5 cards at random from a standard deck Ω = {5-combinations of

One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

Elementary Statistics

lementary Statistics Chap10 Dr. Ghamsary Page 1 lementary Statistics M. Ghamsary, Ph.D. Chapter 10 Chi-square Test for Goodness of fit and Contingency tables lementary Statistics Chap10 Dr. Ghamsary Page

Chi Square Distribution

17. Chi Square A. Chi Square Distribution B. One-Way Tables C. Contingency Tables D. Exercises Chi Square is a distribution that has proven to be particularly useful in statistics. The first section describes

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: Chi-Squared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chi-squared

11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) 90. 35 (d) 20 (e) 25 (f) 80. Totals/Marginal 98 37 35 170

Work Sheet 2: Calculating a Chi Square Table 1: Substance Abuse Level by ation Total/Marginal 63 (a) 17 (b) 10 (c) 90 35 (d) 20 (e) 25 (f) 80 Totals/Marginal 98 37 35 170 Step 1: Label Your Table. Label

Experimental Designs (revisited)

Introduction to ANOVA Copyright 2000, 2011, J. Toby Mordkoff Probably, the best way to start thinking about ANOVA is in terms of factors with levels. (I say this because this is how they are described

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

About Chi Squares TABLE OF CONTENTS About Chi Squares... 1 What is a CHI SQUARE?... 1 Chi Squares... 1 Goodness of fit test (One-way χ 2 )... 1 Test of Independence (Two-way χ 2 )... 2 Hypothesis Testing

Chi Square Tests. Chapter 10. 10.1 Introduction

Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012) TA: Zhen (Alan) Zhang zhangz19@stt.msu.edu Office hour: (C500 WH) 1:45 2:45PM Tuesday (office tel.: 432-3342) Help-room: (A102 WH) 11:20AM-12:30PM,

Using Stata for Categorical Data Analysis

Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

AMS 5 CHANCE VARIABILITY

AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Analyzing Data SPSS Resources 1. See website (readings) for SPSS tutorial & Stats handout Don t have your own copy of SPSS? 1. Use the libraries to analyze your data 2. Download a trial version of SPSS

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS Hypothesis 1: People are resistant to the technological change in the security system of the organization. Hypothesis 2: information hacked and misused. Lack

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

Regression 3: Logistic Regression

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic regression Logistic regression in R Outline Logistic regression Introduction The model Looking at and comparing

VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

data visualization and regression

data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species

Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

List of Examples. Examples 319

Examples 319 List of Examples DiMaggio and Mantle. 6 Weed seeds. 6, 23, 37, 38 Vole reproduction. 7, 24, 37 Wooly bear caterpillar cocoons. 7 Homophone confusion and Alzheimer s disease. 8 Gear tooth strength.

13.0 Central Limit Theorem

13.0 Central Limit Theorem Discuss Midterm/Answer Questions Box Models Expected Value and Standard Error Central Limit Theorem 1 13.1 Box Models A Box Model describes a process in terms of making repeated

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

MONT 07N Understanding Randomness Solutions For Final Examination May, 00 Short Answer (a) (0) How are the EV and SE for the sum of n draws with replacement from a box computed? Solution: The EV is n times

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

Chapter 19 The Chi-Square Test

Tutorial for the integration of the software R with introductory statistics Copyright c Grethe Hystad Chapter 19 The Chi-Square Test In this chapter, we will discuss the following topics: We will plot

AP Statistics 7!3! 6!

Lesson 6-4 Introduction to Binomial Distributions Factorials 3!= Definition: n! = n( n 1)( n 2)...(3)(2)(1), n 0 Note: 0! = 1 (by definition) Ex. #1 Evaluate: a) 5! b) 3!(4!) c) 7!3! 6! d) 22! 21! 20!

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to

Chapter 16: law of averages

Chapter 16: law of averages Context................................................................... 2 Law of averages 3 Coin tossing experiment......................................................

Results from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu

Results from the 2014 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu The six free-response questions Question #1: Extracurricular activities

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

The Math. P (x) = 5! = 1 2 3 4 5 = 120.

The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct

Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

Guide to Microsoft Excel for calculations, statistics, and plotting data

Page 1/47 Guide to Microsoft Excel for calculations, statistics, and plotting data Topic Page A. Writing equations and text 2 1. Writing equations with mathematical operations 2 2. Writing equations with

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

Name: Date: Use the following to answer questions 3-4:

Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

Law of Large Numbers. Alexandra Barbato and Craig O Connell. Honors 391A Mathematical Gems Jenia Tevelev

Law of Large Numbers Alexandra Barbato and Craig O Connell Honors 391A Mathematical Gems Jenia Tevelev Jacob Bernoulli Life of Jacob Bernoulli Born into a family of important citizens in Basel, Switzerland

Sums of Independent Random Variables

Chapter 7 Sums of Independent Random Variables 7.1 Sums of Discrete Random Variables In this chapter we turn to the important question of determining the distribution of a sum of independent random variables

Motor and Household Insurance: Pricing to Maximise Profit in a Competitive Market

Motor and Household Insurance: Pricing to Maximise Profit in a Competitive Market by Tom Wright, Partner, English Wright & Brockman 1. Introduction This paper describes one way in which statistical modelling

Violent crime total. Problem Set 1

Problem Set 1 Note: this problem set is primarily intended to get you used to manipulating and presenting data using a spreadsheet program. While subsequent problem sets will be useful indicators of the

Expected Value and the Game of Craps

Expected Value and the Game of Craps Blake Thornton Craps is a gambling game found in most casinos based on rolling two six sided dice. Most players who walk into a casino and try to play craps for the

HLM software has been one of the leading statistical packages for hierarchical

Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

An analysis method for a quantitative outcome and two categorical explanatory variables.

Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

The InStat guide to choosing and interpreting statistical tests

Version 3.0 The InStat guide to choosing and interpreting statistical tests Harvey Motulsky 1990-2003, GraphPad Software, Inc. All rights reserved. Program design, manual and help screens: Programming:

Chapter 16. Law of averages. Chance. Example 1: rolling two dice Sum of draws. Setting up a. Example 2: American roulette. Summary.

Overview Box Part V Variability The Averages Box We will look at various chance : Tossing coins, rolling, playing Sampling voters We will use something called s to analyze these. Box s help to translate

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

Review of Random Variables

Chapter 1 Review of Random Variables Updated: January 16, 2015 This chapter reviews basic probability concepts that are necessary for the modeling and statistical analysis of financial data. 1.1 Random

TI-Inspire manual 1. Instructions. Ti-Inspire for statistics. General Introduction

TI-Inspire manual 1 General Introduction Instructions Ti-Inspire for statistics TI-Inspire manual 2 TI-Inspire manual 3 Press the On, Off button to go to Home page TI-Inspire manual 4 Use the to navigate

6.042/18.062J Mathematics for Computer Science. Expected Value I

6.42/8.62J Mathematics for Computer Science Srini Devadas and Eric Lehman May 3, 25 Lecture otes Expected Value I The expectation or expected value of a random variable is a single number that tells you

Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

CHAPTER 12. Chi-Square Tests and Nonparametric Tests LEARNING OBJECTIVES. USING STATISTICS @ T.C. Resort Properties

CHAPTER 1 Chi-Square Tests and Nonparametric Tests USING STATISTICS @ T.C. Resort Properties 1.1 CHI-SQUARE TEST FOR THE DIFFERENCE BETWEEN TWO PROPORTIONS (INDEPENDENT SAMPLES) 1. CHI-SQUARE TEST FOR

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

One-Way Analysis of Variance

One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment

In the past, the increase in the price of gasoline could be attributed to major national or global

Chapter 7 Testing Hypotheses Chapter Learning Objectives Understanding the assumptions of statistical hypothesis testing Defining and applying the components in hypothesis testing: the research and null

Chapter 2: Systems of Linear Equations and Matrices:

At the end of the lesson, you should be able to: Chapter 2: Systems of Linear Equations and Matrices: 2.1: Solutions of Linear Systems by the Echelon Method Define linear systems, unique solution, inconsistent,

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.

5 Cumulative Frequency Distributions, Area under the Curve & Probability Basics 5.1 Relative Frequencies, Cumulative Frequencies, and Ogives

5 Cumulative Frequency Distributions, Area under the Curve & Probability Basics 5.1 Relative Frequencies, Cumulative Frequencies, and Ogives We often have to compute the total of the observations in a

Confidence Intervals on Effect Size David C. Howell University of Vermont

Confidence Intervals on Effect Size David C. Howell University of Vermont Recent years have seen a large increase in the use of confidence intervals and effect size measures such as Cohen s d in reporting

COMP6053 lecture: Time series analysis, autocorrelation. jn2@ecs.soton.ac.uk

COMP6053 lecture: Time series analysis, autocorrelation jn2@ecs.soton.ac.uk Time series analysis The basic idea of time series analysis is simple: given an observed sequence, how can we build a model that

ACMS 10140 Section 02 Elements of Statistics October 28, 2010. Midterm Examination II

ACMS 10140 Section 02 Elements of Statistics October 28, 2010 Midterm Examination II Name DO NOT remove this answer page. DO turn in the entire exam. Make sure that you have all ten (10) pages of the examination

Study Design and Statistical Analysis

Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing

Math/Stats 342: Solutions to Homework

Math/Stats 342: Solutions to Homework Steven Miller (sjm1@williams.edu) November 17, 2011 Abstract Below are solutions / sketches of solutions to the homework problems from Math/Stats 342: Probability

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

STAT 35A HW2 Solutions

STAT 35A HW2 Solutions http://www.stat.ucla.edu/~dinov/courses_students.dir/09/spring/stat35.dir 1. A computer consulting firm presently has bids out on three projects. Let A i = { awarded project i },

EC151.02 Statistics for Business and Economics (MWF 8:00-8:50) Instructor: Chiu Yu Ko Office: 462D, 21 Campenalla Way Phone: 2-6093 Email: kocb@bc.edu Office Hours: by appointment Description This course

Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

www.rmsolutions.net R&M Solutons

Ahmed Hassouna, MD Professor of cardiovascular surgery, Ain-Shams University, EGYPT. Diploma of medical statistics and clinical trial, Paris 6 university, Paris. 1A- Choose the best answer The duration

Categorical Data Analysis

Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative