# Introduction to Regression and Data Analysis

Size: px
Start display at page:

Transcription

1 Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008

3 If you are interested in whether one variable differs among possible groups, for instance, then regression isn t necessarily the best way to answer that question. Often you can find your answer by doing a t-test or an ANOVA. The flow chart shows you the types of questions you should ask yourselves to determine what type of analysis you should perform. Regression will be the focus of this workshop, because it is very commonly used and is quite versatile, but if you need information or assistance with any other type of analysis, the consultants at the Statlab are here to help. II. Regression: An Introduction: A. What is regression? Regression is a statistical technique to determine the linear relationship between two or more variables. Regression is primarily used for prediction and causal inference. In its simplest (bivariate) form, regression shows the relationship between one independent variable (X) and a dependent variable (Y), as in the formula below: The magnitude and direction of that relation are given by the slope parameter ( 1 ), and the status of the dependent variable when the independent variable is absent is given by the intercept parameter ( 0 ). An error term (u) captures the amount of variation not predicted by the slope and intercept terms. The regression coefficient (R 2 ) shows how well the values fit the data. Regression thus shows us how variation in one variable co-occurs with variation in another. What regression cannot show is causation; causation is only demonstrated analytically, through substantive theory. For example, a regression with shoe size as an independent variable and foot size as a dependent variable would show a very high regression coefficient and highly significant parameter estimates, but we should not conclude that higher shoe size causes higher foot size. All that the mathematics can tell us is whether or not they are correlated, and if so, by how much. It is important to recognize that regression analysis is fundamentally different from ascertaining the correlations among different variables. Correlation determines the strength of the relationship between variables, while regression attempts to describe that relationship between these variables in more detail. B. The linear regression model (LRM) The simple (or bivariate) LRM model is designed to study the relationship between a pair of variables that appear in a data set. The multiple LRM is designed to study the relationship between one variable and several of other variables. In both cases, the sample is considered a random sample from some population. The two variables, X and Y, are two measured outcomes for each observation in the data set. For 3

4 example, let s say that we had data on the prices of homes on sale and the actual number of sales of homes: Price(thousands of \$) Sales of new homes x y This data is found in the file house sales.jmp. We want to know the relationship between X and Y. Well, what does our data look like? We will use the program JMP (pronounced jump ) for our analyses today. Start JMP, look in the JMP Starter window and click on the Open Data Table button. Navigate to the file and open it. In the JMP Starter, click on Basic in the category list on the left. Now click on Bivariate in the lower section of the window. Click on y in the left window and then click the Y, Response button. Put x in the X, Regressor box, as illustrated above. Now click OK to display the following scatterplot. 4

5 We need to specify the population regression function, the model we specify to study the relationship between X and Y. This is written in any number of ways, but we will specify it as: where Y is an observed random variable (also called the response variable or the lefthand side variable). X is an observed non-random or conditioning variable (also called the predictor or right-hand side variable). 1 is an unknown population parameter, known as the constant or intercept term. 2 is an unknown population parameter, known as the coefficient or slope parameter. u is an unobserved random variable, known as the error or disturbance term. Once we have specified our model, we can accomplish 2 things: Estimation: How do we get good estimates of 1 and 2? What assumptions about the LRM make a given estimator a good one? Inference: What can we infer about 1 and 2 from sample information? That is, how do we form confidence intervals for 1 and 2 and/or test hypotheses about them? The answer to these questions depends upon the assumptions that the linear regression model makes about the variables. The Ordinary Least Squres (OLS) regression procedure will compute the values of the parameters 1 and 2 (the intercept and slope) that best fit the observations. 5

6 Obviously, no straight line can exactly run through all of the points. The vertical distance between each observation and the line that fits best the regression line is called the residual or error. The OLS procedure calculates our parameter values by minimizing the sum of the squared errors for all observations. Why OLS? It has some very nice mathematical properties, and it is compatible with Normally distributed errors, a very common situation in practice. However, it requires certain assumptions to be valid. C. Assumptions of the linear regression model 1. The proposed linear model is the correct model. Violations: Omitted variables, nonlinear effects of X on Y (e.g., area of circle = *radius 2 ) 2. The mean of the error term (i.e. the unobservable variable) does not depend on the observed X variables. 3. The error terms are uncorrelated with each other and exhibit constant variance that does not depend on the observed X variables. Violations: Variance increases as X or Y increases. Errors are positive or negative in bunches called heteroskedasticity. 4. No independent variable exactly predicts another. Violations: Including monthly precipitation for 12 months, and annual precipitation in the same model. 5. Independent variables are either random or fixed in repeated sampling If the five assumptions listed above are met, then the Gauss-Markov Theorem states that the Ordinary Least Squares regression estimator of the coefficients of the model is the Best Linear Unbiased Estimator of the effect of X on Y. Essentially this means that it is the most accurate estimate of the effect of X on Y. III. Deriving OLS estimators The point of the regression equation is to find the best fitting line relating the variables to one another. In this enterprise, we wish to minimize the sum of the squared deviations (residuals) from this line. OLS will do this better than any other process as long as these conditions are met. 6

7 The best fit line associated with the n points (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) has the form where y = mx + b slope = m = intercept = b = So we can take our data from above and substitute in to find our parameters: Price(thousands of \$) Sales of new homes X y xy x ,160 25, ,540 32, ,400 40, ,500 48, ,680 57, ,400 67, ,600 78,400 Sum Slope = m = Intercept = b = 7

8 To have JMP calculate this for you, click on the small red triangle just to the left of Bivariate Fit of y By x. From the drop down menu, choose Fit Line to produce the following output. Thus our least squares line is y = x 8

9 IV. Interpreting data Coefficient: (Parameter Estimate) for every one unit increase in the price of a house, fewer houses are sold. Std Error: if the regression were performed repeatedly on different datasets (that contained the same variables), this would represent the standard deviation of the estimated coefficients t-ratio: the coefficient divided by the standard error, which tells us how large the coefficient is relative to how much it varies in repeated sampling. If the coefficient varies a lot in repeated sampling, then its t-statistic will be smaller, and if it varies little in repeated sampling, then its t-statistic will be larger. Prob> t : the p-value is the result of the test of the following null hypothesis: in repeated sampling, the mean of the estimated coefficient is zero. E.g., if p = 0.001, the probability of observing an estimate of that is at least as extreme as the observed estimate is 0.001, if the true value of is zero. In general, a p-value less than some threshold, like 0.05 or 0.01, will mean that the coefficient is statistically significant. Confidence interval: the 95% confidence interval is the set of values that lie within 1.96 standard deviations of the estimated Rsquare: this statistic represents the proportion of variation in the dependent variable that is explained by the model (the remainder represents the proportion that is present in the error). It is also the square of the correlation coefficient (hence the name). V. Multivariate regression models. Now let s do it again for a multivariate regression, where we add in number of red cars in neighborhood as an additional predictor of housing sales: Price(thousands of \$) Sales of new homes Number of red cars x y cars

10 Our general regression equation this time is this: To calculate a multiple regression, go the JMP Starter window and select the Model category. Click on the top button, Fit Model for the following dialog: Click on y in the left window, then on the Y button. Now click on x and then Add in the Construct Model Effects section. Repeat with cars then click on Run Model to produce the following window. (I ve hidden a few results to save space. Click on the blue triangles to toggle between displayed and hidden.) 10

12 Other violations of the regression assumptions are addressed below. In all of these cases, be aware that Ordinary Least Squares regression, as we have discussed it today, gives biased estimates of parameters and/or standard errors. Non-linear relationship between X and Y: use Non-linear Least Squares or Maximum Likelihood Estimation Endogeneity: use Two-Stage Least Squares or a lagged dependent variable Autoregression: use a Time-Series Estimator (ARIMA) Serial Correlation: use Generalized Least Squares Heteroskedasticity: use Generalized Least Squares How will you know if the assumptions are violated? You usually will NOT find the answer from looking at your regression results alone. Consider the four scatter plots below: 12

13 Clearly they show widely different relationships between x and y. However, in each of these four cases, the regression results are exactly the same: the same best fit line, the same t-statistics, the same R

14 In the first case, the assumptions are satisfied, and linear regression does what we would expect it to. In the second case, we clearly have a non-linear (in fact, a quadratic) relationship. In the third and fourth cases, we have heteroskedastic errors: the errors are much greater for some values of x than for others. The point is, the assumptions are crucial, because they determine how well the regression model fits the data. If you don t look closely at your data and make sure a linear regression is appropriate for it, then the regression results will be meaningless. VI. Additional resources If you believe that the nature of your data will force you to use a more sophisticated estimation technique than Ordinary Least Squares, you should first consult the resources listed on the Statlab Web Page at You may also find that Google searches of these regression techniques often find simple tutorials for the methods that you must use, complete with help on estimation, programming, and interpretation. Finally, you should also be aware that the Statlab consultants are available throughout regular Statlab hours to answer your questions (see the areas of expertise for Statlab consultants at

15 Purpose of statistical analysis Summarizing univariate data Exploring relationships between variables Testing significance of differences Descriptive statistics (mean, standard deviation, variance, etc) Form of data Number of groups Frequencies Measurements One: mean compared to a specified value Two Multiple Number of variables Number of variables One-sample t-test Independent samples Related samples Independent samples Related samples One: compared to theoretical distribution Two: tested for association Two: degree of relationship Multiple: effect of 2+ predictors on a dependant variable Form of data Form of data One independent variable Multiple independent variables Repeatedmeasures ANOVA Chi-square goodness-of-fit test Chi-square test for association Level of measurement Multiple regression Ordinal Interval Ordinal Interval Ordinal Interval Mann-Whitney U test Independentsamples t-test One-way ANOVA Multifactorial ANOVA Spearman's rho Pearson's correlation cooeficient Wilcoxon matchedpairs test Pairedsamples t-test Source: Corston and Colman (2000) Figure 9. Choosing an appropriate statistical procedure

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### SPSS Tests for Versions 9 to 13

SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### SPSS Explore procedure

SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

### Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

### Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

### January 26, 2009 The Faculty Center for Teaching and Learning

THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Directions for using SPSS

Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Factors affecting online sales

Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### 4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

### Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

### MTH 140 Statistics Videos

MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

### An SPSS companion book. Basic Practice of Statistics

An SPSS companion book to Basic Practice of Statistics SPSS is owned by IBM. 6 th Edition. Basic Practice of Statistics 6 th Edition by David S. Moore, William I. Notz, Michael A. Flinger. Published by

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### Algebra 1 Course Information

Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

### 1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

### STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

### Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

### Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University

Practical I conometrics data collection, analysis, and application Christiana E. Hilmer Michael J. Hilmer San Diego State University Mi Table of Contents PART ONE THE BASICS 1 Chapter 1 An Introduction

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Common Core Unit Summary Grades 6 to 8

Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations

### Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

### Module 5: Statistical Analysis

Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Moderation. Moderation

Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

### A Primer on Forecasting Business Performance

A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

### Note 2 to Computer class: Standard mis-specification tests

Note 2 to Computer class: Standard mis-specification tests Ragnar Nymoen September 2, 2013 1 Why mis-specification testing of econometric models? As econometricians we must relate to the fact that the

### LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-

### Organizing Your Approach to a Data Analysis

Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

### Correlation key concepts:

CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Teaching Business Statistics through Problem Solving

Teaching Business Statistics through Problem Solving David M. Levine, Baruch College, CUNY with David F. Stephan, Two Bridges Instructional Technology CONTACT: davidlevine@davidlevinestatistics.com Typical

### APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship

### DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

EC151.02 Statistics for Business and Economics (MWF 8:00-8:50) Instructor: Chiu Yu Ko Office: 462D, 21 Campenalla Way Phone: 2-6093 Email: kocb@bc.edu Office Hours: by appointment Description This course

### Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

### Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

### Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:

### How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

### Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th Standard 3: Data Analysis, Statistics, and Probability 6 th Prepared Graduates: 1. Solve problems and make decisions that depend on un

### Lin s Concordance Correlation Coefficient

NSS Statistical Software NSS.com hapter 30 Lin s oncordance orrelation oefficient Introduction This procedure calculates Lin s concordance correlation coefficient ( ) from a set of bivariate data. The

### The Statistics Tutor s Quick Guide to

statstutor community project encouraging academics to share statistics support resources All stcp resources are released under a Creative Commons licence The Statistics Tutor s Quick Guide to Stcp-marshallowen-7

### Instructions for SPSS 21

1 Instructions for SPSS 21 1 Introduction... 2 1.1 Opening the SPSS program... 2 1.2 General... 2 2 Data inputting and processing... 2 2.1 Manual input and data processing... 2 2.2 Saving data... 3 2.3

### Simple Linear Regression, Scatterplots, and Bivariate Correlation

1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

### Correlational Research

Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.

### 430 Statistics and Financial Mathematics for Business

Prescription: 430 Statistics and Financial Mathematics for Business Elective prescription Level 4 Credit 20 Version 2 Aim Students will be able to summarise, analyse, interpret and present data, make predictions

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### August 2012 EXAMINATIONS Solution Part I

August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

### Analyzing Research Data Using Excel

Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial

### Mathematics within the Psychology Curriculum

Mathematics within the Psychology Curriculum Statistical Theory and Data Handling Statistical theory and data handling as studied on the GCSE Mathematics syllabus You may have learnt about statistics and

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

### Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Calibration and Linear Regression Analysis: A Self-Guided Tutorial Part 1 Instrumental Analysis with Excel: The Basics CHM314 Instrumental Analysis Department of Chemistry, University of Toronto Dr. D.

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### An introduction to IBM SPSS Statistics

An introduction to IBM SPSS Statistics Contents 1 Introduction... 1 2 Entering your data... 2 3 Preparing your data for analysis... 10 4 Exploring your data: univariate analysis... 14 5 Generating descriptive