# The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader)

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader) Abstract This project measures the effects of various baseball statistics on the win percentage of all the teams in MLB. Data was collected off of espn.com for the 2010 regular season for the 30 teams in the MLB. First, simple linear regression models were ran in Stata for each of the variables. Next, a multiple regression model was used as the basis for our model. The final model, after performing a step-wise regression with significance set at a p-value of.05 or less, reflects that the significant determining variables are batting average, strike outs, quality starts, and errors. The results of the regression were not surprising in the fact that the coefficients were positive or negative as expected; however, the variables that proved to be significant and the fact that payroll, home runs, and league were not significant was relatively surprising. Lastly, the researchers discuss the implications of these results and possible strategies that MLB teams could employ to increase their regular season win percentages in the future. Introduction The motivation for this team of researchers was their personal interest in sports. Baseball was the sport of choice because of its heavy reliance on statistics. When one thinks of baseball, it s all about the numbers: runs, strikeouts, payroll, etc, so our group decided to boil down the numbers and find out what numbers really matter. After all, that final number anyone really cares about is win percentage. That s the number that s going to get you to the playoffs. For this reason, our team selected a variety of variables and ran a multiple regression for win percentage to determine which variables were significant indicators for win percentage. The MLB consists of 30 teams. Data was collected from espn.com for each team during the 2010 Regular Season. The independent variables considered were league (National or American), strikeouts, quality starts, home runs, payroll, errors, and batting average. The dependent variable in the model is win percentage, measured as an actual percentage, ranging from 0 to 100. The variable league was put in place as a dummy variable and measured on a 0 or 1 scale. 1 represented National League, and, therefore, 0 represented American League. This variable allowed us to test if there was a difference in win percentages amongst teams in each distinctive league. Strike outs and quality starts fall under the category of pitching statistics. The researchers were curious if there was a correlation between pitching and overall wins, and these variables seemed most well-suited to run in the regression. ERA was not chosen as the pitching statistic to consider because there is a obvious correlation between ERA and win percentage. Rather, the group was interested in determining the effects of statistics less directly related to win percentage. Because these statistics are good qualities of a pitcher, the researchers expect a positive correlation between these pitching statistics and win percentage. The offensive statistics included home runs and the team batting average. Similarly to ERA, runs scored were not incorporated into the model because there was too direct a correlation between runs and wins. Batting average was put into a more output-friendly format as percentages, falling between 0 and 100; the traditional batting average format was multiplied times 100. The team thought that home runs would bring an interesting perspective to the model, considering the range of teams known for big-hitters and those that score off other methods.

2 Since offense is crucial for a successful team, one would hope to see a positive coefficient in front of these two offensive variables. In conjunction with the pitching statistics, the variable errors represents the defensive statistic for each team. The initial thought of baseball is all about hitting and pitching, so it will be interesting to see if defensive statistics, such as errors, play a role in determining win percentage for any given team. Since the goal of baseball is not only to maximize runs scored, but to minimize runs scored by the other team, the team would expect to see a negative correlation between errors and win percentage. Lastly, the researchers wanted to look at payroll and see the effect on win percentage. You hear so much about individual player earnings in the media. After looking at the data, it is funny to think that one single player on Team X (Alex Rodriguez) makes almost as much money as an entire Team Y (Pittsburgh Pirates). There are a lot of outliers in this category, however, so the group is not expecting a very strong correlation. Methods The researchers collected data from various pages of espn.com. The data was organized into an Excel chart, making sure to properly line up each statistic with the correct team. The data consists of 30 observations for each variable, since there are only 30 teams in the MLB. From there, the team exported the data into Stata. First, the group ran a simple linear regression on win percentage and each of the x variables. These simple linear regressions allowed the group members to get a sense of the relationship between each variable and win percentage. After determining the correlation between each variable on its own, the team decided to run a multiple linear regression to study the effects of each variable when the remaining variables were also taken into consideration. A p-value of.05 or less was used to determine significance of any given variable. Next, a step-wise regression was performed to eliminate variables that were not significant. This left the team with the final model for the project. This final model recognized four variables as significant indicators of win percentage, in addition to the constant. While it is questionable to have more than 3 variables included with the limited number of observations presented in the data, the team feels that the model is well-suited for the goals of this project. With the final model determined, it was necessary to verify that all assumptions were correct, so the team ran several tests in Stata. These included the hettest for heteroskedasticity, the ovtest for nonlinearity, Shapiro-Wilk test to see if residuals were normal, and a test for colinearity. Results The results of the project were interesting. After running the stepwise multiple regression, the data reflects that the only significant variables in determining win percentage are batting average, errors, strike outs, and quality starts. With the lowest p-value at 0.001, batting average was the most significant variable. The coefficient for batting average was 2.602, indicating that when all other variables are held constant, an 1% increase in batting average results in a 2.602% increase in win percentage for any given team. This strong correlation makes sense since wins are directly related to runs scored, and runs are directly related to hits and batting average. The coefficients and p-values for the intercept and all significant variables are shown below.

3 Variable Coefficient p-value intercept batting average (%) errors strike outs quality starts The final step-wise regression model is shown below.. sw, pr(.05): regress winpct nl1al0 k qs hr payrollmillions errors bafinal begin with full model p = >= removing payrollmillions p = >= removing hr p = >= removing nl1al F( 4, 25) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = bafinal k qs errors _cons As shown in the Stata output, the R 2 for this model is , with an adjusted R 2 of , indicating that the model is a relatively strong predictor for win percentage. Additionally, it is important to note that adjusted R 2 is greater in the step-wise final model, compared to the multiple linear regression with all variables included, verifying that it is a better model. The results of the simple linear regressions for each x variable independently demonstrated nothing too surprising. Although the most significant variable in the final model appears to be batting average, the simple linear regression results indicate that errors is the most significant variable. For this reason, the scatter plots for the simple linear regressions of both variables are shown below.

4 Conclusions and Discussion When analyzing the results of the final multiple regression model, the coefficients appear as one would predict. Good qualities of a team (batting average, strike outs, and quality starts) all have positive coefficients, and the variable errors is associated with a negative coefficient. Surprisingly, the payroll of a team was not a significant indicator of win percentage. While this may be surprising to any everyday baseball fan, the team was wary of this variable due to the large number of outliers. So, what do these results mean for the future of baseball? Because it was determined that batting average and errors are the most significant predictors, perhaps teams should target their payrolls towards players that are going to better these team statistics. For example, they should focus on offensive players with a high batting average and that play good defense while committing few errors. To a lesser extent, teams should also recruit pitchers with who have a history of quality starts and a high record of strike outs. The dummy variable league proved not to be significant, indicating that win percentage for any given team does not differ between American League and National League, if all other variables are held constant. In other words, the significant variables are equally significant in each league, and therefore, league will not have a direct effect on win percentage. Home runs were also removed in the final model. This is surprising because home runs are directly associated with runs scored, and runs are a strong indicator of win percentage. On the other hand, this could reflect the number of runs scored by hits other than home runs, in addition to walks and errors. The removal of this variable proves that getting more hits overall is more important than getting more home runs. Interestingly, the simple linear regression of win percentage and payroll indicated that payroll was needed in the model. However, in the multiple regression model, payroll was the least significant variable, and therefore, was the first removed. The group has concluded that this discrepancy reflects the poor allocation of team s payroll. For example, rather than using their payroll to sign players as described above, the team may sign big-name players with relatively poor statistics in light of their salary. Essentially, these players are payed more than they re worth just to draw out more fan interest.

5 Lastly, it is important to mention that the tests ran in the final sections of the project verify that all of our assumptions for linearity, normality of residues, etc. all passed. This means that the final model requires no transformations before being used as an accurate predictor model. Some of these issues were taken care of by our initial transformations of data values into similar numbers. For example, win percentage and batting average were both converted to actual percentages in the range of 0 to 100. This was done so that output could be more easily read in terms of a 1 unit increase in x and its effect on y. The major weakness of this project is the fact that our data was limited to 30 observations. This factor was limited because there are only 30 teams in the MLB, and we wanted to include only data from a single season since strategies have changed throughout the history of baseball. Additionally, the team members only analyzed the regular season, not the playoffs. For example, the Giants won the World Series, and they tied for the third highest number of quality starts and had the highest number of strike outs within our data. This means that these two variables may be more significant for the playoffs than indicated, but because our data was limited to the regular season, these effects were not taken into account. In conclusion, if a teams goal is to increase their win percentage in the regular season, our model is a strong indicator of the significant variables. However, because of the structure of the MLB playoff system, the team with the highest win percentage does not necessarily win the World Series. Therefore, a different study should be conducted to analyze the most significant variables in determining success in the post-season. References Stata Software. Gould, Bill Version MLB Statistics. Elias Sports Bureau. Appendix Simple Linear Regressions:. regress winpct nl1al F( 1, 28) = 0.07 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = nl1al _cons regress winpct k

6 F( 1, 28) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = k _cons regress winpct qs F( 1, 28) = 8.58 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = qs _cons regress winpct hr F( 1, 28) = 6.60 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = 6.22 hr _cons regress winpct payrollmillions F( 1, 28) = 4.34 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = payrollmil~s _cons

7 . regress winpct errors F( 1, 28) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = errors _cons regress winpct bafinal F( 1, 28) = 7.50 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = bafinal _cons corr winpct errors (obs=30) winpct errors winpct errors

8 Multiple Linear Regression. regress winpct nl1al0 k qs hr payrollmillions errors bafinal F( 7, 22) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = nl1al k qs hr payrollmil~s errors bafinal _cons Step-Wise Multiple Linear Regression. sw, pr(.05): regress winpct nl1al0 k qs hr payrollmillions errors bafinal begin with full model p = >= removing payrollmillions p = >= removing hr p = >= removing nl1al F( 4, 25) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = bafinal k qs errors _cons Test for Heteroskedasticity. hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of winpct chi2(1) = 0.07 Prob > chi2 =

9 Test for Non-Linearity. ovtest Ramsey RESET test using powers of the fitted values of winpct Ho: model has no omitted variables F(3, 22) = 0.57 Prob > F = Test for Normality of Noise. predict res (option xb assumed; fitted values). swilk res Shapiro-Wilk W test for normal data Variable Obs W V z Prob>z res Test for Colinearity. corr k qs errors bafinal (obs=30) k qs errors bafinal k qs errors bafinal The final model passes all tests checking the initial assumptions of the project.

### Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

### Regression in Stata. Alicia Doyle Lynch Harvard-MIT Data Center (HMDC)

Regression in Stata Alicia Doyle Lynch Harvard-MIT Data Center (HMDC) Documents for Today Find class materials at: http://libraries.mit.edu/guides/subjects/data/ training/workshops.html Several formats

### An Exploration into the Relationship of MLB Player Salary and Performance

1 Nicholas Dorsey Applied Econometrics 4/30/2015 An Exploration into the Relationship of MLB Player Salary and Performance Statement of the Problem: When considering professionals sports leagues, the common

### A Predictive Model for NFL Rookie Quarterback Fantasy Football Points

A Predictive Model for NFL Rookie Quarterback Fantasy Football Points Steve Bronder and Alex Polinsky Duquesne University Economics Department Abstract This analysis designs a model that predicts NFL rookie

### College Education Matters for Happier Marriages and Higher Salaries ----Evidence from State Level Data in the US

College Education Matters for Happier Marriages and Higher Salaries ----Evidence from State Level Data in the US Anonymous Authors: SH, AL, YM Contact TF: Kevin Rader Abstract It is a general consensus

### Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

### DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo nadja.dreca@students.ius.edu.ba Abstract The analysis of a data set of observation for 10

MODELING AUTO INSURANCE PREMIUMS Brittany Parahus, Siena College INTRODUCTION The findings in this paper will provide the reader with a basic knowledge and understanding of how Auto Insurance Companies

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### Using Baseball Data as a Gentle Introduction to Teaching Linear Regression

Creative Education, 2015, 6, 1477-1483 Published Online August 2015 in SciRes. http://www.scirp.org/journal/ce http://dx.doi.org/10.4236/ce.2015.614148 Using Baseball Data as a Gentle Introduction to Teaching

### Baseball Pay and Performance

Baseball Pay and Performance MIS 480/580 October 22, 2015 Mikhail Averbukh Scott Brown Brian Chase ABSTRACT Major League Baseball (MLB) is the only professional sport in the United States that is a legal

### The econometrics of baseball: A statistical investigation

The econometrics of baseball: A statistical investigation Mary Hilston Keener The University of Tampa The purpose of this paper is to use various baseball statistics available at the beginning of each

### Data Analysis Methodology 1

Data Analysis Methodology 1 Suppose you inherited the database in Table 1.1 and needed to find out what could be learned from it fast. Say your boss entered your office and said, Here s some software project

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,

### Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination

Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used

### Data Mining in Sports Analytics. Salford Systems Dan Steinberg Mikhail Golovnya

Data Mining in Sports Analytics Salford Systems Dan Steinberg Mikhail Golovnya Data Mining Defined Data mining is the search for patterns in data using modern highly automated, computer intensive methods

### WIN AT ANY COST? How should sports teams spend their m oney to win more games?

Mathalicious 2014 lesson guide WIN AT ANY COST? How should sports teams spend their m oney to win more games? Professional sports teams drop serious cash to try and secure the very best talent, and the

### 1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### Correlation and Regression

Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### Causal Inference and Major League Baseball

Causal Inference and Major League Baseball Jamie Thornton Department of Mathematical Sciences Montana State University May 4, 2012 A writing project submitted in partial fulfillment of the requirements

### An econometric analysis of the 2013 major league baseball season

An econometric analysis of the 2013 major league baseball season ABSTRACT Steven L. Fullerton New Mexico State University Thomas M. Fullerton, Jr. University of Texas at El Paso Adam G. Walke University

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

### Does pay inequality within a team affect performance? Tomas Dvorak*

Does pay inequality within a team affect performance? Tomas Dvorak* The title should concisely express what the paper is about. It can also be used to capture the reader's attention. The Introduction should

### Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)

Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation

### Regression Analysis (Spring, 2000)

Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity

### The Free Agency Market In Major League Baseball: Examining Demand for Free Agents

The Free Agency Market In Major League Baseball: Examining Demand for Free Agents Daniel Coulsell Economics Senior Project Cal Poly San Luis Obispo Advisor: Michael Marlow Fall 2012 1 Abstract This paper

### Estimating the Value of Major League Baseball Players

Estimating the Value of Major League Baseball Players Brian Fields * East Carolina University Department of Economics Masters Paper July 26, 2001 Abstract This paper examines whether Major League Baseball

### MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects

### EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA

EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA Michael A. Walega Covance, Inc. INTRODUCTION In broad terms, Exploratory Data Analysis (EDA) can be defined as the numerical and graphical examination

### A Study of Sabermetrics in Major League Baseball: The Impact of Moneyball on Free Agent Salaries

A Study of Sabermetrics in Major League Baseball: The Impact of Moneyball on Free Agent Salaries Jason Chang & Joshua Zenilman 1 Honors in Management Advisor: Kelly Bishop Washington University in St.

### DO NOT TURN OVER UNTIL TOLD TO BEGIN

THIS PAPER IS NOT TO BE REMOVED FROM THE EXAMINATION HALLS University of London BSc Examination 2012 BA1040 (BBA0040) +Enc Business Administration Business Statistics Date tba: Time tba DO NOT TURN OVER

### Econ 371 Problem Set #3 Answer Sheet

Econ 371 Problem Set #3 Answer Sheet 4.3 In this question, you are told that a OLS regression analysis of average weekly earnings yields the following estimated model. AW E = 696.7 + 9.6 Age, R 2 = 0.023,

### The average hotel manager recognizes the criticality of forecasting. However, most

Introduction The average hotel manager recognizes the criticality of forecasting. However, most managers are either frustrated by complex models researchers constructed or appalled by the amount of time

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### Cost of Winning: What contributing factors play the most significant roles in increasing the winning percentage of a major league baseball team?

Cost of Winning: What contributing factors play the most significant roles in increasing the winning percentage of a major league baseball team? The Honors Program Senior Capstone Project Student s Name:

### Beating the MLB Moneyline

Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

### Predicting Market Value of Soccer Players Using Linear Modeling Techniques

1 Predicting Market Value of Soccer Players Using Linear Modeling Techniques by Yuan He Advisor: David Aldous Index Introduction ----------------------------------------------------------------------------

### Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

### Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University

Practical I conometrics data collection, analysis, and application Christiana E. Hilmer Michael J. Hilmer San Diego State University Mi Table of Contents PART ONE THE BASICS 1 Chapter 1 An Introduction

### Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length

Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length Introduction: The Lending Club is a unique website that allows people to directly borrow money from other people [1].

### Nonlinear relationships Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Nonlinear relationships Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February, 5 Sources: Berry & Feldman s Multiple Regression in Practice 985; Pindyck and Rubinfeld

### Introduction to Stata

Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

### Regression of Systolic Blood Pressure on Age, Weight & Cholesterol

Regression of Systolic Blood Pressure on Age, Weight & Cholesterol 1 * bp.sas; 2 options ls=120 ps=75 nocenter nodate; 3 title Regression of Systolic Blood Pressure on Age, Weight & Cholesterol ; 4 * BP

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### Interaction effects between continuous variables (Optional)

Interaction effects between continuous variables (Optional) Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February 0, 05 This is a very brief overview of this somewhat

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### STA 4163 Lecture 10: Practice Problems

STA 463 Lecture 0: Practice Problems Problem.0: A study was conducted to determine whether a student's final grade in STA406 is linearly related to his or her performance on the MATH ability test before

### Regression Analysis. Data Calculations Output

Regression Analysis In an attempt to find answers to questions such as those posed above, empirical labour economists use a useful tool called regression analysis. Regression analysis is essentially a

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### Quick Stata Guide by Liz Foster

by Liz Foster Table of Contents Part 1: 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the

### FIRM SPECIFIC FACTORS THAT DETERMINE INSURANCE COMPANIES PERFORMANCE IN ETHIOPIA

FIRM SPECIFIC FACTORS THAT DETERMINE INSURANCE COMPANIES PERFORMANCE IN ETHIOPIA Daniel Mehari, MSc Arba Minch University, Arba Minch, Ethiopia Tilahun Aemiro, Msc Bahir Dar University, Bahir Dar, Ethiopia

### Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

### The way to measure individual productivity in

Which Baseball Statistic Is the Most Important When Determining Team I. INTRODUCTION The way to measure individual productivity in any working environment is to track individual performance in a working

### PREDICTING THE MATCH OUTCOME IN ONE DAY INTERNATIONAL CRICKET MATCHES, WHILE THE GAME IS IN PROGRESS

PREDICTING THE MATCH OUTCOME IN ONE DAY INTERNATIONAL CRICKET MATCHES, WHILE THE GAME IS IN PROGRESS Bailey M. 1 and Clarke S. 2 1 Department of Epidemiology & Preventive Medicine, Monash University, Australia

### Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

### ANDREW BAILEY v. THE OAKLAND ATHLETICS

ANDREW BAILEY v. THE OAKLAND ATHLETICS REPRESENTATION FOR: THE OAKLAND ATHLETICS TEAM 36 Table of Contents Introduction... 1 Hierarchy of Pitcher in Major League Baseball... 2 Quality of Mr. Bailey s Contribution

### A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector

Journal of Modern Accounting and Auditing, ISSN 1548-6583 November 2013, Vol. 9, No. 11, 1519-1525 D DAVID PUBLISHING A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing

### An Analysis of the Undergraduate Tuition Increases at the University of Minnesota Duluth

Proceedings of the National Conference On Undergraduate Research (NCUR) 2012 Weber State University March 29-31, 2012 An Analysis of the Undergraduate Tuition Increases at the University of Minnesota Duluth

### Discussion Section 4 ECON 139/239 2010 Summer Term II

Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase

### Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1 Review 1. As part of survey of college students a researcher is interested in the variable class standing. She records a 1 if the student is a freshman, a 2 if the student

### Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

### Solution Let us regress percentage of games versus total payroll.

Assignment 3, MATH 2560, Due November 16th Question 1: all graphs and calculations have to be done using the computer The following table gives the 1999 payroll (rounded to the nearest million dolars)

### Maximizing Precision of Hit Predictions in Baseball

Maximizing Precision of Hit Predictions in Baseball Jason Clavelli clavelli@stanford.edu Joel Gottsegen joeligy@stanford.edu December 13, 2013 Introduction In recent years, there has been increasing interest

### Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:

### A Guide to Baseball Scorekeeping

A Guide to Baseball Scorekeeping Keeping score for Claremont Little League Spring 2015 Randy Swift rjswift@cpp.edu Paul Dickson opens his book, The Joy of Keeping Score: The baseball world is divided into

### BEAVER COUNTY FASTPITCH RULES FOR 2013 SEASON

BEAVER COUNTY FASTPITCH RULES FOR 2013 SEASON The League Website is www.eteamz.com/bcgfpl this will be updated as league schedule of events are completed, team schedules and scores are submitted. Basic

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### Multiple Regression: What Is It?

Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

### Directions for using SPSS

Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

### Regression analysis in practice with GRETL

Regression analysis in practice with GRETL Prerequisites You will need the GNU econometrics software GRETL installed on your computer (http://gretl.sourceforge.net/), together with the sample files that

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### Statistical Predictors of March Madness: An Examination of the NCAA Men s Basketball Championship

Wright 1 Statistical Predictors of March Madness: An Examination of the NCAA Men s Basketball Championship Chris Wright Pomona College Economics Department April 30, 2012 Wright 2 1. Introduction 1.1 History

### The Value of Major League Baseball Players An Empirical Analysis of the Baseball Labor Market

HAVERFORD COLLEGE The Value of Major League Baseball Players An Empirical Analysis of the Baseball Labor Market Chris Maurice 4/29/2010 The goal of this thesis was to test the efficiency of the baseball

### The Effects of Atmospheric Conditions on Pitchers

The Effects of Atmospheric Conditions on Rodney Paul Syracuse University Matt Filippi Syracuse University Greg Ackerman Syracuse University Zack Albright Syracuse University Andrew Weinbach Coastal Carolina

### Team Success and Personnel Allocation under the National Football League Salary Cap John Haugen

Team Success and Personnel Allocation under the National Football League Salary Cap 56 Introduction T especially interesting market in which to study labor economics. The salary cap rule of the NFL that

### TULANE BASEBALL ARBITRATION COMPETITION BRIEF FOR THE TEXAS RANGERS. Team 20

TULANE BASEBALL ARBITRATION COMPETITION BRIEF FOR THE TEXAS RANGERS Team 20 1 Table of Contents I. Introduction...3 II. Nelson Cruz s Contribution during Last Season Favors the Rangers \$4.7 Million Offer...4

### Figure 1. Figure 2. Figure 3. Figure 4. normality such as non-parametric procedures.

Introduction to Building a Linear Regression Model Leslie A. Christensen The Goodyear Tire & Rubber Company, Akron Ohio Abstract This paper will explain the steps necessary to build a linear regression

### Using JMP with a Specific

1 Using JMP with a Specific Example of Regression Ying Liu 10/21/ 2009 Objectives 2 Exploratory data analysis Simple liner regression Polynomial regression How to fit a multiple regression model How to

### Determinants of Demand for Cable TV Services in the Era of Internet Communication Technologies

Pace University DigitalCommons@Pace Honors College Theses Pforzheimer Honors College 2015 Determinants of Demand for Cable TV Services in the Era of Internet Communication Technologies Michael Gorodetsky

### BIOL 933 Lab 6 Fall 2015. Data Transformation

BIOL 933 Lab 6 Fall 2015 Data Transformation Transformations in R General overview Log transformation Power transformation The pitfalls of interpreting interactions in transformed data Transformations

### Examining if High-Team Payroll Leads to High-Team Performance in Baseball: A Statistical Study. Nicholas Lambrianou 13'

Examining if High-Team Payroll Leads to High-Team Performance in Baseball: A Statistical Study Nicholas Lambrianou 13' B.S. In Mathematics with Minors in English and Economics Dr. Nickolas Kintos Thesis

### EXPECTANCY THEORY AND MAJOR LEAGUE BASEBALL PLAYER COMPENSATION EDWARD J. LEONARD

EXPECTANCY THEORY AND MAJOR LEAGUE BASEBALL PLAYER COMPENSATION by EDWARD J. LEONARD A thesis submitted is partial fulfillment of the requirements for the Honors in the Major Program in Management in the

### Indian School of Business Forecasting Sales for Dairy Products

Indian School of Business Forecasting Sales for Dairy Products Contents EXECUTIVE SUMMARY... 3 Data Analysis... 3 Forecast Horizon:... 4 Forecasting Models:... 4 Fresh milk - AmulTaaza (500 ml)... 4 Dahi/

### GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

### Addressing Alternative. Multiple Regression. 17.871 Spring 2012

Addressing Alternative Explanations: Multiple Regression 17.871 Spring 2012 1 Did Clinton hurt Gore example Did Clinton hurt Gore in the 2000 election? Treatment is not liking Bill Clinton 2 Bivariate

### MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims

### c 2015, Jeffrey S. Simonoff 1

Modeling Lowe s sales Forecasting sales is obviously of crucial importance to businesses. Revenue streams are random, of course, but in some industries general economic factors would be expected to have

### Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental