The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader)


 Augusta Green
 3 years ago
 Views:
Transcription
1 The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader) Abstract This project measures the effects of various baseball statistics on the win percentage of all the teams in MLB. Data was collected off of espn.com for the 2010 regular season for the 30 teams in the MLB. First, simple linear regression models were ran in Stata for each of the variables. Next, a multiple regression model was used as the basis for our model. The final model, after performing a stepwise regression with significance set at a pvalue of.05 or less, reflects that the significant determining variables are batting average, strike outs, quality starts, and errors. The results of the regression were not surprising in the fact that the coefficients were positive or negative as expected; however, the variables that proved to be significant and the fact that payroll, home runs, and league were not significant was relatively surprising. Lastly, the researchers discuss the implications of these results and possible strategies that MLB teams could employ to increase their regular season win percentages in the future. Introduction The motivation for this team of researchers was their personal interest in sports. Baseball was the sport of choice because of its heavy reliance on statistics. When one thinks of baseball, it s all about the numbers: runs, strikeouts, payroll, etc, so our group decided to boil down the numbers and find out what numbers really matter. After all, that final number anyone really cares about is win percentage. That s the number that s going to get you to the playoffs. For this reason, our team selected a variety of variables and ran a multiple regression for win percentage to determine which variables were significant indicators for win percentage. The MLB consists of 30 teams. Data was collected from espn.com for each team during the 2010 Regular Season. The independent variables considered were league (National or American), strikeouts, quality starts, home runs, payroll, errors, and batting average. The dependent variable in the model is win percentage, measured as an actual percentage, ranging from 0 to 100. The variable league was put in place as a dummy variable and measured on a 0 or 1 scale. 1 represented National League, and, therefore, 0 represented American League. This variable allowed us to test if there was a difference in win percentages amongst teams in each distinctive league. Strike outs and quality starts fall under the category of pitching statistics. The researchers were curious if there was a correlation between pitching and overall wins, and these variables seemed most wellsuited to run in the regression. ERA was not chosen as the pitching statistic to consider because there is a obvious correlation between ERA and win percentage. Rather, the group was interested in determining the effects of statistics less directly related to win percentage. Because these statistics are good qualities of a pitcher, the researchers expect a positive correlation between these pitching statistics and win percentage. The offensive statistics included home runs and the team batting average. Similarly to ERA, runs scored were not incorporated into the model because there was too direct a correlation between runs and wins. Batting average was put into a more outputfriendly format as percentages, falling between 0 and 100; the traditional batting average format was multiplied times 100. The team thought that home runs would bring an interesting perspective to the model, considering the range of teams known for bighitters and those that score off other methods.
2 Since offense is crucial for a successful team, one would hope to see a positive coefficient in front of these two offensive variables. In conjunction with the pitching statistics, the variable errors represents the defensive statistic for each team. The initial thought of baseball is all about hitting and pitching, so it will be interesting to see if defensive statistics, such as errors, play a role in determining win percentage for any given team. Since the goal of baseball is not only to maximize runs scored, but to minimize runs scored by the other team, the team would expect to see a negative correlation between errors and win percentage. Lastly, the researchers wanted to look at payroll and see the effect on win percentage. You hear so much about individual player earnings in the media. After looking at the data, it is funny to think that one single player on Team X (Alex Rodriguez) makes almost as much money as an entire Team Y (Pittsburgh Pirates). There are a lot of outliers in this category, however, so the group is not expecting a very strong correlation. Methods The researchers collected data from various pages of espn.com. The data was organized into an Excel chart, making sure to properly line up each statistic with the correct team. The data consists of 30 observations for each variable, since there are only 30 teams in the MLB. From there, the team exported the data into Stata. First, the group ran a simple linear regression on win percentage and each of the x variables. These simple linear regressions allowed the group members to get a sense of the relationship between each variable and win percentage. After determining the correlation between each variable on its own, the team decided to run a multiple linear regression to study the effects of each variable when the remaining variables were also taken into consideration. A pvalue of.05 or less was used to determine significance of any given variable. Next, a stepwise regression was performed to eliminate variables that were not significant. This left the team with the final model for the project. This final model recognized four variables as significant indicators of win percentage, in addition to the constant. While it is questionable to have more than 3 variables included with the limited number of observations presented in the data, the team feels that the model is wellsuited for the goals of this project. With the final model determined, it was necessary to verify that all assumptions were correct, so the team ran several tests in Stata. These included the hettest for heteroskedasticity, the ovtest for nonlinearity, ShapiroWilk test to see if residuals were normal, and a test for colinearity. Results The results of the project were interesting. After running the stepwise multiple regression, the data reflects that the only significant variables in determining win percentage are batting average, errors, strike outs, and quality starts. With the lowest pvalue at 0.001, batting average was the most significant variable. The coefficient for batting average was 2.602, indicating that when all other variables are held constant, an 1% increase in batting average results in a 2.602% increase in win percentage for any given team. This strong correlation makes sense since wins are directly related to runs scored, and runs are directly related to hits and batting average. The coefficients and pvalues for the intercept and all significant variables are shown below.
3 Variable Coefficient pvalue intercept batting average (%) errors strike outs quality starts The final stepwise regression model is shown below.. sw, pr(.05): regress winpct nl1al0 k qs hr payrollmillions errors bafinal begin with full model p = >= removing payrollmillions p = >= removing hr p = >= removing nl1al F( 4, 25) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = bafinal k qs errors _cons As shown in the Stata output, the R 2 for this model is , with an adjusted R 2 of , indicating that the model is a relatively strong predictor for win percentage. Additionally, it is important to note that adjusted R 2 is greater in the stepwise final model, compared to the multiple linear regression with all variables included, verifying that it is a better model. The results of the simple linear regressions for each x variable independently demonstrated nothing too surprising. Although the most significant variable in the final model appears to be batting average, the simple linear regression results indicate that errors is the most significant variable. For this reason, the scatter plots for the simple linear regressions of both variables are shown below.
4 Conclusions and Discussion When analyzing the results of the final multiple regression model, the coefficients appear as one would predict. Good qualities of a team (batting average, strike outs, and quality starts) all have positive coefficients, and the variable errors is associated with a negative coefficient. Surprisingly, the payroll of a team was not a significant indicator of win percentage. While this may be surprising to any everyday baseball fan, the team was wary of this variable due to the large number of outliers. So, what do these results mean for the future of baseball? Because it was determined that batting average and errors are the most significant predictors, perhaps teams should target their payrolls towards players that are going to better these team statistics. For example, they should focus on offensive players with a high batting average and that play good defense while committing few errors. To a lesser extent, teams should also recruit pitchers with who have a history of quality starts and a high record of strike outs. The dummy variable league proved not to be significant, indicating that win percentage for any given team does not differ between American League and National League, if all other variables are held constant. In other words, the significant variables are equally significant in each league, and therefore, league will not have a direct effect on win percentage. Home runs were also removed in the final model. This is surprising because home runs are directly associated with runs scored, and runs are a strong indicator of win percentage. On the other hand, this could reflect the number of runs scored by hits other than home runs, in addition to walks and errors. The removal of this variable proves that getting more hits overall is more important than getting more home runs. Interestingly, the simple linear regression of win percentage and payroll indicated that payroll was needed in the model. However, in the multiple regression model, payroll was the least significant variable, and therefore, was the first removed. The group has concluded that this discrepancy reflects the poor allocation of team s payroll. For example, rather than using their payroll to sign players as described above, the team may sign bigname players with relatively poor statistics in light of their salary. Essentially, these players are payed more than they re worth just to draw out more fan interest.
5 Lastly, it is important to mention that the tests ran in the final sections of the project verify that all of our assumptions for linearity, normality of residues, etc. all passed. This means that the final model requires no transformations before being used as an accurate predictor model. Some of these issues were taken care of by our initial transformations of data values into similar numbers. For example, win percentage and batting average were both converted to actual percentages in the range of 0 to 100. This was done so that output could be more easily read in terms of a 1 unit increase in x and its effect on y. The major weakness of this project is the fact that our data was limited to 30 observations. This factor was limited because there are only 30 teams in the MLB, and we wanted to include only data from a single season since strategies have changed throughout the history of baseball. Additionally, the team members only analyzed the regular season, not the playoffs. For example, the Giants won the World Series, and they tied for the third highest number of quality starts and had the highest number of strike outs within our data. This means that these two variables may be more significant for the playoffs than indicated, but because our data was limited to the regular season, these effects were not taken into account. In conclusion, if a teams goal is to increase their win percentage in the regular season, our model is a strong indicator of the significant variables. However, because of the structure of the MLB playoff system, the team with the highest win percentage does not necessarily win the World Series. Therefore, a different study should be conducted to analyze the most significant variables in determining success in the postseason. References Stata Software. Gould, Bill Version MLB Statistics. Elias Sports Bureau. Appendix Simple Linear Regressions:. regress winpct nl1al F( 1, 28) = 0.07 Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = nl1al _cons regress winpct k
6 F( 1, 28) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = k _cons regress winpct qs F( 1, 28) = 8.58 Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = qs _cons regress winpct hr F( 1, 28) = 6.60 Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = 6.22 hr _cons regress winpct payrollmillions F( 1, 28) = 4.34 Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = payrollmil~s _cons
7 . regress winpct errors F( 1, 28) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = errors _cons regress winpct bafinal F( 1, 28) = 7.50 Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = bafinal _cons corr winpct errors (obs=30) winpct errors winpct errors
8 Multiple Linear Regression. regress winpct nl1al0 k qs hr payrollmillions errors bafinal F( 7, 22) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = nl1al k qs hr payrollmil~s errors bafinal _cons StepWise Multiple Linear Regression. sw, pr(.05): regress winpct nl1al0 k qs hr payrollmillions errors bafinal begin with full model p = >= removing payrollmillions p = >= removing hr p = >= removing nl1al F( 4, 25) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = bafinal k qs errors _cons Test for Heteroskedasticity. hettest BreuschPagan / CookWeisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of winpct chi2(1) = 0.07 Prob > chi2 =
9 Test for NonLinearity. ovtest Ramsey RESET test using powers of the fitted values of winpct Ho: model has no omitted variables F(3, 22) = 0.57 Prob > F = Test for Normality of Noise. predict res (option xb assumed; fitted values). swilk res ShapiroWilk W test for normal data Variable Obs W V z Prob>z res Test for Colinearity. corr k qs errors bafinal (obs=30) k qs errors bafinal k qs errors bafinal The final model passes all tests checking the initial assumptions of the project.
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship
More informationRegression in Stata. Alicia Doyle Lynch HarvardMIT Data Center (HMDC)
Regression in Stata Alicia Doyle Lynch HarvardMIT Data Center (HMDC) Documents for Today Find class materials at: http://libraries.mit.edu/guides/subjects/data/ training/workshops.html Several formats
More informationAn Exploration into the Relationship of MLB Player Salary and Performance
1 Nicholas Dorsey Applied Econometrics 4/30/2015 An Exploration into the Relationship of MLB Player Salary and Performance Statement of the Problem: When considering professionals sports leagues, the common
More informationA Predictive Model for NFL Rookie Quarterback Fantasy Football Points
A Predictive Model for NFL Rookie Quarterback Fantasy Football Points Steve Bronder and Alex Polinsky Duquesne University Economics Department Abstract This analysis designs a model that predicts NFL rookie
More informationCollege Education Matters for Happier Marriages and Higher Salaries Evidence from State Level Data in the US
College Education Matters for Happier Marriages and Higher Salaries Evidence from State Level Data in the US Anonymous Authors: SH, AL, YM Contact TF: Kevin Rader Abstract It is a general consensus
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationDETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS
DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo nadja.dreca@students.ius.edu.ba Abstract The analysis of a data set of observation for 10
More informationMODELING AUTO INSURANCE PREMIUMS
MODELING AUTO INSURANCE PREMIUMS Brittany Parahus, Siena College INTRODUCTION The findings in this paper will provide the reader with a basic knowledge and understanding of how Auto Insurance Companies
More informationREGRESSION LINES IN STATA
REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression
More informationUsing Baseball Data as a Gentle Introduction to Teaching Linear Regression
Creative Education, 2015, 6, 14771483 Published Online August 2015 in SciRes. http://www.scirp.org/journal/ce http://dx.doi.org/10.4236/ce.2015.614148 Using Baseball Data as a Gentle Introduction to Teaching
More informationBaseball Pay and Performance
Baseball Pay and Performance MIS 480/580 October 22, 2015 Mikhail Averbukh Scott Brown Brian Chase ABSTRACT Major League Baseball (MLB) is the only professional sport in the United States that is a legal
More informationThe econometrics of baseball: A statistical investigation
The econometrics of baseball: A statistical investigation Mary Hilston Keener The University of Tampa The purpose of this paper is to use various baseball statistics available at the beginning of each
More informationData Analysis Methodology 1
Data Analysis Methodology 1 Suppose you inherited the database in Table 1.1 and needed to find out what could be learned from it fast. Say your boss entered your office and said, Here s some software project
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA Email: peverso1@swarthmore.edu 1. Introduction
More informationInteraction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015
Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,
More informationElementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination
Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used
More informationData Mining in Sports Analytics. Salford Systems Dan Steinberg Mikhail Golovnya
Data Mining in Sports Analytics Salford Systems Dan Steinberg Mikhail Golovnya Data Mining Defined Data mining is the search for patterns in data using modern highly automated, computer intensive methods
More informationWIN AT ANY COST? How should sports teams spend their m oney to win more games?
Mathalicious 2014 lesson guide WIN AT ANY COST? How should sports teams spend their m oney to win more games? Professional sports teams drop serious cash to try and secure the very best talent, and the
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is Rsquared? Rsquared Published in Agricultural Economics 0.45 Best article of the
More informationCorrelation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informationCausal Inference and Major League Baseball
Causal Inference and Major League Baseball Jamie Thornton Department of Mathematical Sciences Montana State University May 4, 2012 A writing project submitted in partial fulfillment of the requirements
More informationAn econometric analysis of the 2013 major league baseball season
An econometric analysis of the 2013 major league baseball season ABSTRACT Steven L. Fullerton New Mexico State University Thomas M. Fullerton, Jr. University of Texas at El Paso Adam G. Walke University
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationDoes pay inequality within a team affect performance? Tomas Dvorak*
Does pay inequality within a team affect performance? Tomas Dvorak* The title should concisely express what the paper is about. It can also be used to capture the reader's attention. The Introduction should
More informationDepartment of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)
Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation
More informationRegression Analysis (Spring, 2000)
Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity
More informationThe Free Agency Market In Major League Baseball: Examining Demand for Free Agents
The Free Agency Market In Major League Baseball: Examining Demand for Free Agents Daniel Coulsell Economics Senior Project Cal Poly San Luis Obispo Advisor: Michael Marlow Fall 2012 1 Abstract This paper
More informationEstimating the Value of Major League Baseball Players
Estimating the Value of Major League Baseball Players Brian Fields * East Carolina University Department of Economics Masters Paper July 26, 2001 Abstract This paper examines whether Major League Baseball
More informationMODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING
Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects
More informationEXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA
EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA Michael A. Walega Covance, Inc. INTRODUCTION In broad terms, Exploratory Data Analysis (EDA) can be defined as the numerical and graphical examination
More informationA Study of Sabermetrics in Major League Baseball: The Impact of Moneyball on Free Agent Salaries
A Study of Sabermetrics in Major League Baseball: The Impact of Moneyball on Free Agent Salaries Jason Chang & Joshua Zenilman 1 Honors in Management Advisor: Kelly Bishop Washington University in St.
More informationDO NOT TURN OVER UNTIL TOLD TO BEGIN
THIS PAPER IS NOT TO BE REMOVED FROM THE EXAMINATION HALLS University of London BSc Examination 2012 BA1040 (BBA0040) +Enc Business Administration Business Statistics Date tba: Time tba DO NOT TURN OVER
More informationEcon 371 Problem Set #3 Answer Sheet
Econ 371 Problem Set #3 Answer Sheet 4.3 In this question, you are told that a OLS regression analysis of average weekly earnings yields the following estimated model. AW E = 696.7 + 9.6 Age, R 2 = 0.023,
More informationThe average hotel manager recognizes the criticality of forecasting. However, most
Introduction The average hotel manager recognizes the criticality of forecasting. However, most managers are either frustrated by complex models researchers constructed or appalled by the amount of time
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More informationCost of Winning: What contributing factors play the most significant roles in increasing the winning percentage of a major league baseball team?
Cost of Winning: What contributing factors play the most significant roles in increasing the winning percentage of a major league baseball team? The Honors Program Senior Capstone Project Student s Name:
More informationBeating the MLB Moneyline
Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring timeseries
More informationPredicting Market Value of Soccer Players Using Linear Modeling Techniques
1 Predicting Market Value of Soccer Players Using Linear Modeling Techniques by Yuan He Advisor: David Aldous Index Introduction 
More informationStepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (StepUp) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
More informationNonlinear Regression Functions. SW Ch 8 1/54/
Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationPractical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University
Practical I conometrics data collection, analysis, and application Christiana E. Hilmer Michael J. Hilmer San Diego State University Mi Table of Contents PART ONE THE BASICS 1 Chapter 1 An Introduction
More informationTitle: Lending Club Interest Rates are closely linked with FICO scores and Loan Length
Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length Introduction: The Lending Club is a unique website that allows people to directly borrow money from other people [1].
More informationNonlinear relationships Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015
Nonlinear relationships Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February, 5 Sources: Berry & Feldman s Multiple Regression in Practice 985; Pindyck and Rubinfeld
More informationIntroduction to Stata
Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the midrange of how easy it is to use. Other options include SPSS,
More informationRegression of Systolic Blood Pressure on Age, Weight & Cholesterol
Regression of Systolic Blood Pressure on Age, Weight & Cholesterol 1 * bp.sas; 2 options ls=120 ps=75 nocenter nodate; 3 title Regression of Systolic Blood Pressure on Age, Weight & Cholesterol ; 4 * BP
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationInteraction effects between continuous variables (Optional)
Interaction effects between continuous variables (Optional) Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February 0, 05 This is a very brief overview of this somewhat
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationSTA 4163 Lecture 10: Practice Problems
STA 463 Lecture 0: Practice Problems Problem.0: A study was conducted to determine whether a student's final grade in STA406 is linearly related to his or her performance on the MATH ability test before
More informationRegression Analysis. Data Calculations Output
Regression Analysis In an attempt to find answers to questions such as those posed above, empirical labour economists use a useful tool called regression analysis. Regression analysis is essentially a
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS OneSample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationQuick Stata Guide by Liz Foster
by Liz Foster Table of Contents Part 1: 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the
More informationFIRM SPECIFIC FACTORS THAT DETERMINE INSURANCE COMPANIES PERFORMANCE IN ETHIOPIA
FIRM SPECIFIC FACTORS THAT DETERMINE INSURANCE COMPANIES PERFORMANCE IN ETHIOPIA Daniel Mehari, MSc Arba Minch University, Arba Minch, Ethiopia Tilahun Aemiro, Msc Bahir Dar University, Bahir Dar, Ethiopia
More informationFailure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.
Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is
More informationThe way to measure individual productivity in
Which Baseball Statistic Is the Most Important When Determining Team I. INTRODUCTION The way to measure individual productivity in any working environment is to track individual performance in a working
More informationPREDICTING THE MATCH OUTCOME IN ONE DAY INTERNATIONAL CRICKET MATCHES, WHILE THE GAME IS IN PROGRESS
PREDICTING THE MATCH OUTCOME IN ONE DAY INTERNATIONAL CRICKET MATCHES, WHILE THE GAME IS IN PROGRESS Bailey M. 1 and Clarke S. 2 1 Department of Epidemiology & Preventive Medicine, Monash University, Australia
More informationUsing Minitab for Regression Analysis: An extended example
Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to
More informationANDREW BAILEY v. THE OAKLAND ATHLETICS
ANDREW BAILEY v. THE OAKLAND ATHLETICS REPRESENTATION FOR: THE OAKLAND ATHLETICS TEAM 36 Table of Contents Introduction... 1 Hierarchy of Pitcher in Major League Baseball... 2 Quality of Mr. Bailey s Contribution
More informationA Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector
Journal of Modern Accounting and Auditing, ISSN 15486583 November 2013, Vol. 9, No. 11, 15191525 D DAVID PUBLISHING A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing
More informationAn Analysis of the Undergraduate Tuition Increases at the University of Minnesota Duluth
Proceedings of the National Conference On Undergraduate Research (NCUR) 2012 Weber State University March 2931, 2012 An Analysis of the Undergraduate Tuition Increases at the University of Minnesota Duluth
More informationDiscussion Section 4 ECON 139/239 2010 Summer Term II
Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Chapter 1 Review 1. As part of survey of college students a researcher is interested in the variable class standing. She records a 1 if the student is a freshman, a 2 if the student
More informationGetting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
More informationSolution Let us regress percentage of games versus total payroll.
Assignment 3, MATH 2560, Due November 16th Question 1: all graphs and calculations have to be done using the computer The following table gives the 1999 payroll (rounded to the nearest million dolars)
More informationMaximizing Precision of Hit Predictions in Baseball
Maximizing Precision of Hit Predictions in Baseball Jason Clavelli clavelli@stanford.edu Joel Gottsegen joeligy@stanford.edu December 13, 2013 Introduction In recent years, there has been increasing interest
More informationDoing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:
Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:
More informationA Guide to Baseball Scorekeeping
A Guide to Baseball Scorekeeping Keeping score for Claremont Little League Spring 2015 Randy Swift rjswift@cpp.edu Paul Dickson opens his book, The Joy of Keeping Score: The baseball world is divided into
More informationBEAVER COUNTY FASTPITCH RULES FOR 2013 SEASON
BEAVER COUNTY FASTPITCH RULES FOR 2013 SEASON The League Website is www.eteamz.com/bcgfpl this will be updated as league schedule of events are completed, team schedules and scores are submitted. Basic
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationLab 5 Linear Regression with Withinsubject Correlation. Goals: Data: Use the pig data which is in wide format:
Lab 5 Linear Regression with Withinsubject Correlation Goals: Data: Fit linear regression models that account for withinsubject correlation using Stata. Compare weighted least square, GEE, and random
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationMultiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
More informationDirections for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
More informationRegression analysis in practice with GRETL
Regression analysis in practice with GRETL Prerequisites You will need the GNU econometrics software GRETL installed on your computer (http://gretl.sourceforge.net/), together with the sample files that
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationStatistical Predictors of March Madness: An Examination of the NCAA Men s Basketball Championship
Wright 1 Statistical Predictors of March Madness: An Examination of the NCAA Men s Basketball Championship Chris Wright Pomona College Economics Department April 30, 2012 Wright 2 1. Introduction 1.1 History
More informationThe Value of Major League Baseball Players An Empirical Analysis of the Baseball Labor Market
HAVERFORD COLLEGE The Value of Major League Baseball Players An Empirical Analysis of the Baseball Labor Market Chris Maurice 4/29/2010 The goal of this thesis was to test the efficiency of the baseball
More informationThe Effects of Atmospheric Conditions on Pitchers
The Effects of Atmospheric Conditions on Rodney Paul Syracuse University Matt Filippi Syracuse University Greg Ackerman Syracuse University Zack Albright Syracuse University Andrew Weinbach Coastal Carolina
More informationTeam Success and Personnel Allocation under the National Football League Salary Cap John Haugen
Team Success and Personnel Allocation under the National Football League Salary Cap 56 Introduction T especially interesting market in which to study labor economics. The salary cap rule of the NFL that
More informationTULANE BASEBALL ARBITRATION COMPETITION BRIEF FOR THE TEXAS RANGERS. Team 20
TULANE BASEBALL ARBITRATION COMPETITION BRIEF FOR THE TEXAS RANGERS Team 20 1 Table of Contents I. Introduction...3 II. Nelson Cruz s Contribution during Last Season Favors the Rangers $4.7 Million Offer...4
More informationFigure 1. Figure 2. Figure 3. Figure 4. normality such as nonparametric procedures.
Introduction to Building a Linear Regression Model Leslie A. Christensen The Goodyear Tire & Rubber Company, Akron Ohio Abstract This paper will explain the steps necessary to build a linear regression
More informationUsing JMP with a Specific
1 Using JMP with a Specific Example of Regression Ying Liu 10/21/ 2009 Objectives 2 Exploratory data analysis Simple liner regression Polynomial regression How to fit a multiple regression model How to
More informationDeterminants of Demand for Cable TV Services in the Era of Internet Communication Technologies
Pace University DigitalCommons@Pace Honors College Theses Pforzheimer Honors College 2015 Determinants of Demand for Cable TV Services in the Era of Internet Communication Technologies Michael Gorodetsky
More informationBIOL 933 Lab 6 Fall 2015. Data Transformation
BIOL 933 Lab 6 Fall 2015 Data Transformation Transformations in R General overview Log transformation Power transformation The pitfalls of interpreting interactions in transformed data Transformations
More informationExamining if HighTeam Payroll Leads to HighTeam Performance in Baseball: A Statistical Study. Nicholas Lambrianou 13'
Examining if HighTeam Payroll Leads to HighTeam Performance in Baseball: A Statistical Study Nicholas Lambrianou 13' B.S. In Mathematics with Minors in English and Economics Dr. Nickolas Kintos Thesis
More informationEXPECTANCY THEORY AND MAJOR LEAGUE BASEBALL PLAYER COMPENSATION EDWARD J. LEONARD
EXPECTANCY THEORY AND MAJOR LEAGUE BASEBALL PLAYER COMPENSATION by EDWARD J. LEONARD A thesis submitted is partial fulfillment of the requirements for the Honors in the Major Program in Management in the
More informationIndian School of Business Forecasting Sales for Dairy Products
Indian School of Business Forecasting Sales for Dairy Products Contents EXECUTIVE SUMMARY... 3 Data Analysis... 3 Forecast Horizon:... 4 Forecasting Models:... 4 Fresh milk  AmulTaaza (500 ml)... 4 Dahi/
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationAddressing Alternative. Multiple Regression. 17.871 Spring 2012
Addressing Alternative Explanations: Multiple Regression 17.871 Spring 2012 1 Did Clinton hurt Gore example Did Clinton hurt Gore in the 2000 election? Treatment is not liking Bill Clinton 2 Bivariate
More informationMGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal
MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims
More informationc 2015, Jeffrey S. Simonoff 1
Modeling Lowe s sales Forecasting sales is obviously of crucial importance to businesses. Revenue streams are random, of course, but in some industries general economic factors would be expected to have
More informationData Management Summative MDM 4U1 Alex Bouma June 14, 2007. Sporting Cities Major League Locations
Data Management Summative MDM 4U1 Alex Bouma June 14, 2007 Sporting Cities Major League Locations Table of Contents Title Page 1 Table of Contents...2 Introduction.3 Background.3 Results 511 Future Work...11
More informationLecture 15. Endogeneity & Instrumental Variable Estimation
Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationThis chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
More information