# Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time Introduction p Statistical Methods Used p and under Males p.

Save this PDF as:

Size: px
Start display at page:

Download "Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p."

## Transcription

1 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p Statistical Methods Used p and under Males p and up Males p and under Females p and up Females p References p. 20

2 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 2 Introduction In the world of swimming, there is a commonly accepted notion of the perfect swimmer body. Coaches, spectators, and swimmers alike use the phrase when referring to the top notch swimmers by saying, Of course Michael Phelps is fast, he has the perfect swimmer s body. Typically what is meant by this saying is that the athlete is tall, slender around with waist with broad shoulders, and has disproportionately long arms. Well if long arms help a swimmer pull faster, it could be thought that long feet make the swimmer kick faster. To see if the perfect swimmer body actually indicates a swimmer s speed, I set up a project to test this theory. I have been a swim coach for 5 years now, so access to swimmer measurements was not hard to come by. I coach swimmers ages 4-18 (typically referred to as age group swimmers) and ranging in all abilities. I decided to set up a project where I measured height, wingspan, shoulder circumference (around the shoulders at armpit level), waist circumference, foot length, how many years each swimmer has participated in competitive swimming, and their 50 yard freestyle time. Every variable was measured in centimeters, besides years of swimming which was measured in years. If this was a swimmer s first year, it was considered 1 year of swimming. Collecting the data was not done in the best manner, since I had to choose swimmers whose parents I knew would be comfortable with me taking these measurements. Therefore, I do not have a random sample of swimmers, but the ages and range of ability still vary greatly. I ended up getting measurements from nineteen females ranging in ages 7-17, and nineteen males ranging in ages I knew if I analyzed all of the ages for each gender together, that a trend would occur because as much as a 17 year old female is taller than a 7 year old female, she would also be faster, naturally, because of strength and coordination that is developed through growing up. The same can be said for the males. Therefore, I split up both genders into two age groups: 10 and under, and 11 and up. I was able to collect the following data: 10 and under males: Swimmer Age Height Wingspan Shoulder Circumference Waist Foot length Years of swimming 50 yard freestyle time

3 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 3 11 and up males: Swimmer Age Height Wingspan Shoulder Circumference Waist Foot length Years of swimming 50 yard freestyle time and under females: Swimmer Age Height Wingspan Shoulder Circumference Waist Foot Length Years of Swimming 50 yard freestyle time

4 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 4 11 and up females: Swimmer Age Height Wingspan Shoulder Circumference Waist Foot Length Years of Swimming 50 yard freestyle time The rest of this paper will focus on each data set individually and running statistical analyses involving multiple linear regression models. However, to understand the significance of the models and how they re formed, one would need to be aware of certain statistical definitions and methods. Therefore, these will be discussed in the next section.

5 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 5 Statistical Methods Used To understand multiple regression models, one must first understand simple linear regression models. These models will always be in the form where: is the dependent or response variable is the independent or predictor variable is the random error component is the y-intercept of the line is the slope of the line The model above is set up so that it fits the entire population. However, this is hardly ever known so we take a sample instead and set up a similar model based on this sample: This formula yields the least-squares line where and are estimates of the population parameters based off our sample and can be calculated as follows: This introduces even more notation that needs to be discussed. Pearson s correlation coefficient (denoted by r) is a measure of the strength and direction of the linear relationship between x and y. This value always falls between -1 and 1. The closer the absolute value of r is to one, the stronger the relationship is between x and y. The formula for this correlation coefficient is: where is the sum of the cross products, is the sum of the squared deviation in x and, is the sum of the squared deviation in y. Also, is known as the coefficient of determination and represents the proportion of the total sample variability that is explained by the linear relationship between x and y. For a model to be considered a good prediction, it is best for to be close to one. This way we know the simple linear regression model explains most of the variability within the sample. Now that we have a formula to predict a variable, how can we determine whether or not the predictor variable is a useful aid in predicting our dependent variable? Here one could use a hypothesis test set us as follows:

6 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 6 Essentially, if then for all values of x we would predict one constant value of y. So we assume this is true, and then reject if we have enough evidence to show that this is not true, and hence x is a useful predictor of y. To test this hypothesis, we would use a t-distribution with a t-statistic of where and n is the number of observations in the sample. The p-value will be the area under the curve to the right and left of the value. This area represents the probability of observing what was true in the sample if the null hypothesis is true for the population. Therefore, we want this value to be small, meaning if then there is hardly any chance we should see this event happening, which leads us to believe is most likely not equal to 0. For this paper, a p-value of less that 10% or 0.10 will be used in order for an event to be called statistically significant. Instead of doing all of these calculations by hand, Excel will calculate them for us by producing an ANOVA table (Analysis of Variance Table). The bottom portion of the ANOVA table will label the rows with intercept and the X variable and the columns will be labeled Coefficients, Standard Error, t Stat, and P-value in order respectively. The coefficient column will give us in the intercept row and in the X row. Each row will also have a p-value. If the P-value is below 0.10, we consider that variable to be significant in predicting the y-variable. Also, from this table we can form the least square line by using and that is computed. The top portion of the ANOVA table gives us data about the regression model as a whole. This becomes more important when we start adding more predictor variables. Regression models that include more than one independent variable are called regression models and follow the form where: y is the dependent variable,, are the independent variables determines the contribution of the independent variable Multiple regression is very similar to simple linear regression except now we introduce an F-distribution which we use to determine the usefulness of the entire model instead of just one variable at a time. Here we can set up our hypothesis test for usefulness as

7 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 7 Then we would use the F-statistic to calculate the p-value just like we did with the t-statistic before. The F-statistic and resulting p-value (labeled Significance F) can be found in the top portion of the ANOVA table. The top portion of the ANOVA table will have 3 rows: Model, Error, and Total. There will be 5 other columns, but our main focus will be on the F Value column and p-value. These are the cells we will use to determine how useful the regression model is. The lower portion of an ANOVA table for multiple regression will be only slightly different by having more than one X-variable listed. All of the predictor variables will be shown with the value of resulting t stat and p-value for each predictor variable. To determine a multiple regression model for the swimmers, I first ran an analysis through excel that used all 7 predictor (independent) variables, and the one dependent variable. Essentially, I created a multiple regression equation of the following model: and the For each group I ran this analysis on, the model failed to be useful in predicting a swimmer s 50 freestyle time. Thinking there must be a better model out there, I used a method referred to as Backward Elimination. Using this, I would eliminate the variable with the highest (least statistically significant) p- value, and run the regression analysis again. If the Significance F value is statistically significant (below 0.10) and each variable s p-value is statistically significant, then this is the final model used. If not, I again eliminate the variable with the highest p-value and run the regression analysis again. I continue doing this until all variables in the model show statistical significance. Note: All statistical definitions, concepts, and equations, should be credited to the online notes from Paul Holmes STAT 6315 class. See the reference page for more details. Also, if the reader would like further explanations of concepts, please consult the following textbook: Introduction to the Practice of Statistics (7th Edition) by Moore, McCabe & Craig (ISBN-13: )

8 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 8 Males Ages 10 and Under To determine a multiple regression model for the 10 and under males, I first ran an analysis through excel that used all 7 predictor (independent) variables, and the one dependent variable. Essentially, this follows the model where is the constant intercept and each are slope coefficients for each independent variable. Excel produced the following ANOVA table: Multiple R R Square Adjusted R Square Standard Error Observations ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P- value Lower 95% Upper 95% Intercept Age Height Wingspan Shoulder Circumference Waist Foot length Years of swimming Using the backwards elimination method I removed the following variables one by one (in order): age, shoulder circumference, foot length, and wingspan. This leaves the following variables in the model: Height, waist, and years of swimming. The following ANOVA table showed my final regression analysis:

9 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 9 Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 9 ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept Height Waist Years of swimming This gives us the following model to predict a 10 and under male swimmer s 50 yard freestyle time: Examining these coefficients we see that (holding all else constant): For each 1cm increase in height, we predict the swimmer to be seconds faster in the 50 freestyle For each 1cm increase in waist, we predict the swimmer to be seconds slower in the 50 freestyle For each 1 additional year of swimming, we predict the swimmer to be seconds faster in the 50 freestyle Overall, our model is statistically significant at the <2% level, which falls below the 10% mark and 86.42% of the variability in the sample is explained through this model.

10 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 10 Males 11 and Up To determine a multiple regression model for the 11 and up males, I first ran an analysis through excel that used all 7 predictor (independent) variables, and the one dependent variable. Essentially, this follows the model where is the constant intercept and each are slope coefficients for each independent variable. Excel produced the following ANOVA table: Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 10 ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P- value Lower 95% Upper 95% Intercept Age Height Wingspan Shoulder Circumference Waist Foot length Years of swimming

11 50 yard freestyle time Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 11 Using the backwards elimination method I removed the following variables one by one (in order): height, years of swimming, foot length, waist, shoulder circumference, and age. This left only wingspan in the model. This meant that a simple linear regression model was the best fit for the data I obtained. The following ANOVA table showed my final regression analysis: Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 10 ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept Wingspan This gives us the following model to predict a 10 and under male swimmer s 50 yard freestyle time: Examining the slope of this line we see that for each 1cm increase in wingspan, we predict the swimmer to be seconds faster in the 50 freestyle. This model would produce the following graph along with a scatterplot of the data Linear Regression Model y = x Wingspan

12 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 12 Overall, our model is statistically significant at the <1% level, which falls way below the 10% mark and 70.09% of the variability in the sample is explained through this model.

13 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 13 Females 10 and Under To determine a multiple regression model for the 10 and under males, I first ran an analysis through excel that used all 7 predictor (independent) variables, and the one dependent variable. Essentially, this follows the model where is the constant intercept and each are slope coefficients for each independent variable. Excel produced the following ANOVA table: SUMMARY OUTPUT Regression Statistics Multiple R 1 R Square 1 Adjusted R Square Standard Error 0 Observations 8 ANOVA df SS MS F Significance F Regression #NUM! #NUM! Residual Total Coefficients Standard Error t Stat P- value Lower 95% Upper 95% Intercept #NUM! Age #NUM! Height #NUM! Wingspan #NUM! Shoulder Circumference #NUM! Waist #NUM! Foot Length #NUM! Years of Swimming #NUM!

14 Wingspan Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 14 As you can see, Excel was producing #NUM! for all p-values, and for the F-statistic and Significance of F value. In short, this shows that some sort of error within the data set is keeping Excel from being able to calculate what is being asked. After some further exploration, I found that my sample size was too small for the amount of variables I was including. However, I have a suspicion that height and wingspan might be dependent, and hence not really adding two separate variables to the model. (I.e. wingspan and height might be adding the same information to the model.) To test this idea, I ran a Chi-Square analysis between height and wingspan. Essentially, a Chi-Square distribution will inform me if the two variables are in fact dependent on each other. For this analysis, I needed the data to be put into a contingency table, and hence I needed the data to be categorical. For logistic reasons, I took the range of data I had for height, divided it by two, and made two categories: short and long where the interval for short was 124.5cm-133.7cm and the interval for long was 133.8cm-143.0cm. I did the same thing for wingspan and decided that the interval for short was 113.9cm-130cm and the interval for long was 130.1cm-146.2cm. Then I was able to form the following contingency table: Height Small Long Total Small Long Total For this analysis, we will set up a hypothesis test as follows: Our test statistic for this analysis will be where i and j refer to the row and column number respectively, and. In this situation,. Then, Using 8 as our test statistic and a degrees of freedom of (number of rows 1) x (number of columns 1) = (2-1) x (2-1) = 1, we get a p-value of which is below 0.1, so we reject the null hypothesis. Hence, wingspan and height are dependent on each other, and there is no need to include both variables in the model. To determine which variable to delete, I ran a simple linear regression analysis on both variables

15 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 15 separately and determined that height had a significance level of and wingspan had a significance level of Since wingspan was least statistically significant, I deleted that one and ran a multiple regression analysis and received the following ANOVA table: SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 8 ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept Age Height Shoulder Circumference Waist Foot Length Years of Swimming Surprisingly, this showed that every remaining independent variable was statistically significant at the 10% level, so the regression model to predict a 10 and under female s 50 yard freestyle time is: Examining these coefficients we see that (holding all else constant): For a one year increase in age, we predict the swimmer to be seconds faster in the 50 freestyle.

16 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 16 For a 1cm increase in height, we predict the swimmer to be seconds faster in the 50 freestyle. For a 1cm increase in the shoulder circumference, we predict the swimmer to be seconds slower in the 50 freestyle. For a 1cm increase in the waist, we predict the swimmer to be seconds faster in the 50 freestyle. For a 1cm increase in the foot length, we predict the swimmer to be seconds slower in the 50 freestyle. For a one year increase in years of swimming, we predict the swimmer to be seconds faster. Some of these coefficients do seem strange to me. In particular, I do not expect a bigger foot size to make a swimmer slower. These unusual responses could be due to a small sample size. Therefore, I would recommend further studies be done with a larger sample of swimmers. Overall, our model is statistically significant at the <3% level, which falls below the 10% mark and 99.98% of the variability in the sample is explained through this model.

17 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 17 Females 11 and Up To determine a multiple regression model for the 11 and up females, I first ran an analysis through excel that used all 7 predictor (independent) variables, and the one dependent variable. This follows the model where is the constant intercept and each are slope coefficients for each independent variable. Excel produced the following ANOVA table: Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 11 ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept Age Height Wingspan Shoulder Circumference Waist Foot Length Years of Swimming

18 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 18 Here we see that the overall model produced is in fact statistically significant since Significance F = However, I would prefer each variable to also be statistically significant so using the backwards elimination method I removed the following variables one by one (in order): age, wingspan, and height. This left the following variables in the model: shoulder circumference, waist, foot length, and years of swimming. The following ANOVA table showed my final regression analysis: Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 11 ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept Shoulder Circumference Waist Foot Length Years of Swimming This gives us the following model to predict an eleven and up female swimmer s 50 yard freestyle time: Examining these coefficients we see that (holding all else constant): For each 1cm increase in shoulder circumference, we predict the swimmer to be seconds faster in the 50 freestyle For each 1cm increase in waist, we predict the swimmer to be seconds slower in the 50 freestyle For each 1 cm increase in footlength, we predict the swimmer to be seconds faster in the 50 freestyle

19 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 19 For each one additional year of swimming, we predict the swimmer to be seconds faster in the 50 freestyle Overall, our model is statistically significant at the <1% level, which falls way below the 10% mark and 93.27% of the variability in the sample is explained through this model.

20 Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 20 References Holmes, P. Course Notes 2 [PDF document]. Retrieved from the University of Georgia s elearning Commons website: anager?contentid= Holmes, P. Course Notes 3 [PDF document]. Retrieved from the University of Georgia s elearning Commons website: anager?contentid= Holmes, P. Course Notes 4 [PDF document]. Retrieved from the University of Georgia s elearning Commons website: anager?contentid=

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Simple Methods and Procedures Used in Forecasting

Simple Methods and Procedures Used in Forecasting The project prepared by : Sven Gingelmaier Michael Richter Under direction of the Maria Jadamus-Hacura What Is Forecasting? Prediction of future events

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### Statistical Functions in Excel

Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

### Name: Date: Use the following to answer questions 2-3:

Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

### Directions for using SPSS

Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

### MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

### How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Factors affecting online sales

Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

### Chapter 7. One-way ANOVA

Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

### One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### Outline: Demand Forecasting

Outline: Demand Forecasting Given the limited background from the surveys and that Chapter 7 in the book is complex, we will cover less material. The role of forecasting in the chain Characteristics of

### Guide to Microsoft Excel for calculations, statistics, and plotting data

Page 1/47 Guide to Microsoft Excel for calculations, statistics, and plotting data Topic Page A. Writing equations and text 2 1. Writing equations with mathematical operations 2 2. Writing equations with

### One-Way Analysis of Variance

One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

### Using Excel for Statistical Analysis

Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure

### GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

### We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

Business Valuation Review Regression Analysis in Valuation Engagements By: George B. Hawkins, ASA, CFA Introduction Business valuation is as much as art as it is science. Sage advice, however, quantitative

### Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

### One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

### Assignments Analysis of Longitudinal data: a multilevel approach

Assignments Analysis of Longitudinal data: a multilevel approach Frans E.S. Tan Department of Methodology and Statistics University of Maastricht The Netherlands Maastricht, Jan 2007 Correspondence: Frans

Using Your TI-83/84/89 Calculator: Linear Correlation and Regression Dr. Laura Schultz Statistics I This handout describes how to use your calculator for various linear correlation and regression applications.

### The Internal Rate of Return Model for Life Insurance Policies

The Internal Rate of Return Model for Life Insurance Policies Abstract Life insurance policies are no longer seen solely as a means of insuring life. Due to many new features introduced by life insurers,

### LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-

### 1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

### Violent crime total. Problem Set 1

Problem Set 1 Note: this problem set is primarily intended to get you used to manipulating and presenting data using a spreadsheet program. While subsequent problem sets will be useful indicators of the

### Module 5: Statistical Analysis

Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the

### An SPSS companion book. Basic Practice of Statistics

An SPSS companion book to Basic Practice of Statistics SPSS is owned by IBM. 6 th Edition. Basic Practice of Statistics 6 th Edition by David S. Moore, William I. Notz, Michael A. Flinger. Published by

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4

STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate

### CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS Hypothesis 1: People are resistant to the technological change in the security system of the organization. Hypothesis 2: information hacked and misused. Lack

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month

### Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) 90. 35 (d) 20 (e) 25 (f) 80. Totals/Marginal 98 37 35 170

Work Sheet 2: Calculating a Chi Square Table 1: Substance Abuse Level by ation Total/Marginal 63 (a) 17 (b) 10 (c) 90 35 (d) 20 (e) 25 (f) 80 Totals/Marginal 98 37 35 170 Step 1: Label Your Table. Label

### A Primer on Forecasting Business Performance

A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

### Multiple Regression Analysis A Case Study

Multiple Regression Analysis A Case Study Case Study Method 1 The first step in a case study analysis involves research into the subject property and a determination of the key factors that impact that

### HLM software has been one of the leading statistical packages for hierarchical

Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

### Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Analyzing Data SPSS Resources 1. See website (readings) for SPSS tutorial & Stats handout Don t have your own copy of SPSS? 1. Use the libraries to analyze your data 2. Download a trial version of SPSS

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

### An analysis method for a quantitative outcome and two categorical explanatory variables.

Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2

Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable

### The Big Picture. Correlation. Scatter Plots. Data

The Big Picture Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, We have just completed a length series of lectures on ANOVA where we considered

### socscimajor yes no TOTAL female 25 35 60 male 30 27 57 TOTAL 55 62 117

Review for Final Stat 10 (1) The table below shows data for a sample of students from UCLA. (a) What percent of the sampled students are male? 57/117 (b) What proportion of sampled students are social

### 1 Theory: The General Linear Model

QMIN GLM Theory - 1.1 1 Theory: The General Linear Model 1.1 Introduction Before digital computers, statistics textbooks spoke of three procedures regression, the analysis of variance (ANOVA), and the

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

EC151.02 Statistics for Business and Economics (MWF 8:00-8:50) Instructor: Chiu Yu Ko Office: 462D, 21 Campenalla Way Phone: 2-6093 Email: kocb@bc.edu Office Hours: by appointment Description This course

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

Table of Contents Preface Chapter 1: Introduction 1-1 Opening an SPSS Data File... 2 1-2 Viewing the SPSS Screens... 3 o Data View o Variable View o Output View 1-3 Reading Non-SPSS Files... 6 o Convert

### Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

### Main Effects and Interactions

Main Effects & Interactions page 1 Main Effects and Interactions So far, we ve talked about studies in which there is just one independent variable, such as violence of television program. You might randomly

### Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

### STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012) TA: Zhen (Alan) Zhang zhangz19@stt.msu.edu Office hour: (C500 WH) 1:45 2:45PM Tuesday (office tel.: 432-3342) Help-room: (A102 WH) 11:20AM-12:30PM,

### RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS

RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS I. Basic Course Information A. Course Number and Title: MATH 111H Statistics II Honors B. New or Modified Course:

### Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

### Basic Probability and Statistics Review. Six Sigma Black Belt Primer

Basic Probability and Statistics Review Six Sigma Black Belt Primer Pat Hammett, Ph.D. January 2003 Instructor Comments: This document contains a review of basic probability and statistics. It also includes

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

### SPSS Manual for Introductory Applied Statistics: A Variable Approach

SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All

### Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

### 10. Analysis of Longitudinal Studies Repeat-measures analysis

Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

### August 2012 EXAMINATIONS Solution Part I

August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

### Analysis of Variance. MINITAB User s Guide 2 3-1

3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced

### January 26, 2009 The Faculty Center for Teaching and Learning

THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

### Homework #1 Solutions

Homework #1 Solutions Problems Section 1.1: 8, 10, 12, 14, 16 Section 1.2: 2, 8, 10, 12, 16, 24, 26 Extra Problems #1 and #2 1.1.8. Find f (5) if f (x) = 10x x 2. Solution: Setting x = 5, f (5) = 10(5)

### Chapter 23 Inferences About Means

Chapter 23 Inferences About Means Chapter 23 - Inferences About Means 391 Chapter 23 Solutions to Class Examples 1. See Class Example 1. 2. We want to know if the mean battery lifespan exceeds the 300-minute

### Results from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu

Results from the 2014 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu The six free-response questions Question #1: Extracurricular activities

### www.rmsolutions.net R&M Solutons

Ahmed Hassouna, MD Professor of cardiovascular surgery, Ain-Shams University, EGYPT. Diploma of medical statistics and clinical trial, Paris 6 university, Paris. 1A- Choose the best answer The duration

### Scatter Plot, Correlation, and Regression on the TI-83/84

Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page

### Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

### Introduction to Statistics with GraphPad Prism (5.01) Version 1.1

Babraham Bioinformatics Introduction to Statistics with GraphPad Prism (5.01) Version 1.1 Introduction to Statistics with GraphPad Prism 2 Licence This manual is 2010-11, Anne Segonds-Pichon. This manual

### The F distribution and the basic principle behind ANOVAs. Situating ANOVAs in the world of statistical tests

Tutorial The F distribution and the basic principle behind ANOVAs Bodo Winter 1 Updates: September 21, 2011; January 23, 2014; April 24, 2014; March 2, 2015 This tutorial focuses on understanding rather

### Inferential Statistics. What are they? When would you use them?

Inferential Statistics What are they? When would you use them? What are inferential statistics? Why learn about inferential statistics? Why use inferential statistics? When are inferential statistics utilized?

### SPSS-Applications (Data Analysis)

CORTEX fellows training course, University of Zurich, October 2006 Slide 1 SPSS-Applications (Data Analysis) Dr. Jürg Schwarz, juerg.schwarz@schwarzpartners.ch Program 19. October 2006: Morning Lessons

### ECONOMETRICS PROJECT ECON-620 SALES FORECAST FOR SUBMITED TO: LAURENCE O CONNELL

ECONOMETRICS PROJECT ECON-620 SALES FORECAST FOR SUBMITED TO: LAURENCE O CONNELL SUBMITTED BY: POOJA OZA: 0839378 NISHTHA PARIKH: 0852970 SUNNY SINGH: 0368016 YULIYA INOZEMYSEVA: 0817808 TABLE OF CONTENTS

### Introduction Course in SPSS - Evening 1

ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/

MODELING AUTO INSURANCE PREMIUMS Brittany Parahus, Siena College INTRODUCTION The findings in this paper will provide the reader with a basic knowledge and understanding of how Auto Insurance Companies

### Chapter 19 The Chi-Square Test

Tutorial for the integration of the software R with introductory statistics Copyright c Grethe Hystad Chapter 19 The Chi-Square Test In this chapter, we will discuss the following topics: We will plot

### Data analysis process

Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

### data visualization and regression

data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Calibration and Linear Regression Analysis: A Self-Guided Tutorial Part 1 Instrumental Analysis with Excel: The Basics CHM314 Instrumental Analysis Department of Chemistry, University of Toronto Dr. D.