# Violent crime total. Problem Set 1

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Problem Set 1 Note: this problem set is primarily intended to get you used to manipulating and presenting data using a spreadsheet program. While subsequent problem sets will be useful indicators of the difficulty of exam questions, this one is not. Descriptive Statistics 1. Go to the Bureau of Justice Statistics. Find the data for reported crime in Pennsylvania for Put these data in a spreadsheet. These data are available in convenient formats at 2. Graph the total violent crime by year. 7 Violent crime total Violent crime total 1 Year Graph the total violent crime rate by year.

2 6 Violent Crime rate Violent Crime rate 1 Year If you were trying to give someone a sense of the evolution of crime over time, is the graph from #2 or #3 more useful? Why? It turns out that population growth in PA is more or less linear, so the total and rate series give, more or less, the same perspective (especially since inter Census years are linearly interpolated for population ). In terms of which is better, it depends on the questions of interest. If one is interested in getting some sense of how likely an individual is to be a victim of violent crime, rates are more useful. If, instead, someone was interested in budgetary issues, totals may be more useful. 5. Calculate the standard deviation of violent crime rates in PA over this period. In excel, the standard deviation macro is =STDEV.P(cell range) [for the purposes of this class, ignore the difference between a standard deviation for a population and for a sample] Calculate the standard deviation of homicide rates in PA over this period. 7. From the Bureau of Justice Statistics, pull crime totals and rates by all available categories of crime for the year 29 into a spreadsheet Create a histogram for violent crime rates. You need to load the Analysis Tool Pack. To see how to do that, search in the help.

3 3 2 1 Histogram Bin 9. Get state per capita GDP data from the Bureau of Economic Analysis for 29. Create a scatterplot of violent crime rates on the vertical axis and per capita GDP on the horizontal axis. What does the relationship appear to be? Does this relationship change if DC is eliminated from the graph? Data available at: Scatter with DC Violent Crime rate Violent Crime rate Linear (Violent Crime rate) With DC Excluded

4 Violent Crime rate Violent Crime rate Linear (Violent Crime rate)

6 n! x Px ( ) p 1 p n x! x! n x So the likelihood of picking 15 correct days out of 2 is 2! * !15! But we presumably also need to factor in the possibility that he could have done better as well (i.e., the relevant question is probably what is the likelihood that he would have chosen at least 15 correct days to trade), so we also want to add in P(16), P(17), P(18), P(19), and P(2). Each of those can be calculated the same way as above (replacing x with 16, 17, etc). P(16) =.5 P(17) =.1 P(18) =.2 P(19) =.2 P(2) =.1 So the total likelihood of picking at least 15 correct days to trade out of 2 is.21, so he had only slightly more than a 2 percent chance of this happening randomly.

7 Problem Set 3 Assume the same set up as in problem set 2. However, in this case, assume that the number of suspicious trades is 4. 4 trials are enough to use the normal distribution as an approximation for the binomial distribution. Determine the likelihood of having at least x number of trades occurring on days where the British funds appreciate in value where: x = 21 x = 22 x = 23 x = 24 The easiest way to handle this problem, because of the fact that 4 trials allows us to use the normal distribution as an approximation to the binomial distribution, is to set this up as a simple hypothesis test, where the hypothesis that the customer is trading at random (at least with respect to market timing issues). We would expect, under random trading, that the customer would choose correctly 2 times out of 4. However, there will be variation around this. We need to convert each x to standard deviation terms, so we need to calculate the standard deviation. Recall that in the binomial distribution, variance is simply n*p*(1 p) => 4*.5*.5 = 1, so the standard deviation is 1. We thus need to determine how likely one is to observe more than a 1 standard deviation departure from the mean, 2 standard deviations, 3 standard deviations, and 4 standard deviations. If you consult the table of critical values for the standard normal distribution (back cover of text, or fold out chart, or a number of other sources), you will find that the probability mass to the left of z standard deviations is: z = 1:.8413 z = 2:.9772 z = 3:.9987 z = 4:.999+ so the likelihood of observing at least 21 correctly timed trades at random is < 16%, 22 is < 3%, 23 is <.2%, and 24 is <.1%.

8 Problem Set 4 You are asked to perform a preliminary analysis of some evidence in a discrimination suit. The claim asserts that FSU Corp. discriminates against black employees. While there is no direct evidence of discrimination, you are told that a random sample of 3 black FSU employees has an average wage of \$3,, whereas the average wage for non black employees is \$37,. You ask human resources to calculate the variance of the salaries of non black FSU employees. You are told that the variance in nonblack wages is 58,65, Is this evidence generally consistent with anti black discrimination? We need to set this up as a hypothesis test. The hypothesized mean is \$37,. The test statistic is 3, 37, normalized in standard deviation terms. So we need to calculate: 3, 37, 7, 7, 7, 1.71 var 58, 65,519 22,553 4,11 n Because we are dealing with a relatively small sample (n = 3), we need to consult the t distribution which tells us that for 29 degrees of freedom (n 1), the critical values associated with the 95% confidence interval are 2.45 and Since our test statistic is inside of this range, we cannot reject the hypothesis that the black employees are paid comparably to the white employees. 2. At what type 1 error level is this evidence consistent with anti black discrimination? Consulting the t distribution table (again using the 29 degrees of freedom row), we need to find 1.71 as a critical value. The closest we can find in the table (though of course we could get more precision consulting electronic sources) is which is associated with a p value of <.1 (two tailed test). This tells us that the black mean wage is consistent with the non black mean wage at any type 1 error of less than 1 percent or so. 3. If you are the defendant, what kinds of arguments do you make on your behalf in light of this evidence? If you are the defendant, you use the analysis above to suggest that it is not possible to distinguish the black pay from non black pay; therefore, there can be no inference of discrimination. 4. If you are the plaintiff, what kinds of arguments do you make on your behalf in light of this evidence? If you are the plaintiff, you should argue that we should only be using a one tailed test since there would not be any inference of discrimination from a legal standpoint if the black mean wage were above the non black mean wage. Thus, none of the type 1 error should be apportioned to the right hand tail. Using a one tailed test, the critical value associated with a 5 % type 1 error is Our

9 test statistic is less than this critical value, so the black mean wage is not consistent with the nonblack mean wage. Both sides may then argue that different attributes should be controlled for (in a regression framework) or that the comparison mean wage should be calculated on different bases (just whites vs non blacks, etc) depending on which calculations are more favorable to their positions.

10 Problem Set 5 Use the excel file called PS5.xlsx for this problem set. The wage variable is a random variable drawn from a normal distribution; treat it as the relevant population. Sample1...sample6 are 6 independently drawn random samples from wage. 1. Create histograms for wage and each of the samples. How are they similar? How are they different? Histograms are found in the data analysis add in pack (as described in an earlier problem set). Wage Histogram Bin Sample1 Histogram More Bin Sample 2

11 Sample 3 Sample 4 Sample More Bin Histogram More Bin Histogram More Bin Histogram

12 Histogram More Bin Sample 6 Histogram More Bin Similarity: They are all centered on something close to the population mean (although not exactly) Differences: The samples are generally more narrow in their ranges than the population (min for each sample > min for pop; max for each sample < max for pop). Generally, you can t tell with great confidence that the samples are actually coming from the distribution from a visual standpoint. 2. Calculate the mean, variance, and standard deviation (using excel commands) for wage. =average(a2..a11) which yields 149,998 =var.p(a2..a11) which yields 621,549,795 =stdev.p(a2..a11) which yields 24, Calculate the means for sample 1 through sample 6. For each one, test the hypothesis that the sample mean is equivalent to the population mean at the 2% type 1 error level For each of these we need a test statistic and we need to compare that test statistic to the appropriate critical value from the t distribution (since n = 1 for the samples). The critical values for the t distribution with 99 degrees of freedom (n 1) at the 2% type 1 error are 1.29 and 1.29 (note I used degrees of freedom for 1 since that s what the book prints and the difference between dof 99 and dof 1 is small, but you could be more precise using electronic sources).

13 Now we need t statistics. The general form will be: x 149,998 t sd x 1 Sample 1: average = 147,316; sd = 23,458; t = 1.14, so we can t reject Sample 2: average = 144,68; sd = 23,676; t = 2.28, so we reject Sample 3: average = 148,11; sd = 23,952; t =.83, so we can t reject Sample 4: average = 155,729; sd = 25,29; t = 2.27, so we reject Sample 5: average = 149,192; sd = 24,81; t =.33, so we can t reject Sample 6: average = 15,281; sd = 25,72; t =.11, so we can t reject So, as you can see, while we don t reject in 4/6 times, we do incorrectly reject one third of the time (more or less close to our chosen 2% type 1 error). 4. Wagesub is a random variable; test whether it comes from the same distribution as the wage variable at the 1% type one error level. Using the same formula for the test statistic, we get: 96,26 149,998 t ,642 1 Which is well beyond the critical value of 2.63, so we can safely reject the hypothesis that wagesub comes from the same distribution. 5. Subsample1 through subsample6 are random draws from wagesub. Test whether each comes from the same distribution as wage providing the appropriate p value. Subsample Average SD t All are well below the critical value of 2.77 which is the 29 dof critical value for the 1% type 1 error, so we know that p <.1 for each of these.

14 Problem Set 6 Use the dataset PS6.xlsx for this problem set. It contains state level data for 29, including state population, violent crime total, and property crime total (from the Bureau of Justice Statistics). It also includes real per capita state GDP (rgdppc; from the Bureau of Economic Analysis), the unemployment rate (from the Bureau of Labor Statistics), and an indicator for the fraction of people indicating they are heavy drinkers, have less than a high school education, and have a college degree or higher (from the Behavioral Risk Factor Surveillance Survey done by the Centers for Disease Control). 1. When analyzing the determinants of crime, does it make more sense to examine the levels or the rate of crime and why? It depends on the question, in some sense. For example, if one were attempting to estimate a criminal enforcement budget for an entire state, looking at total crime in the state could be interesting. Generally speaking, however, for most policy purposes, we probably care about crime rates. For example, almost regardless of anything else, California will have more total crime than anywhere else solely because of its size, but that doesn t tell us much about the causes or predictors of crime in a policy relevant sense. 2. Regress the violent crime rate (crime per 1, people is the general convention) on the heavy drinker variable and a constant. Interpret your coefficient, including its statistical significance. Discuss what the p value represents. Excel output: SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 51 ANOVA df SS MS F Regression Residual Total Coefficients Standard Error t Stat P value Intercept Heavy Drinking

15 This suggests that violent crimes per 1, residents increases by about 14 additional crimes for every 1% increase in the heavy drinking rate. However, the standard errors are very large, leaving us with a t stat of.6 which is not statistically significant (i.e., it cannot be distinguished from random noise around a zero effect). We conclude from this that we are not very confident that there is a nonzero effect of heavy drinking incidence on the rate of violent crime for conventional choices of a type 1 error. The p value, in fact, tells us that we would not reject the null of zero effect unless we had a two tailed type 1 error of greater than 57 percent. 3. Intuitively, discuss what is likely to happen to the coefficient for the heavy drinking variable if you added the unemployment rate to the regression in #2. Run the regression with the unemployment rate and discuss the interpretation of each coefficient, including a discussion of how the heavy drinking coefficient changes. Presumably, both heavy drinking and unemployment are positively associated with violent crime. Also, presumably, drinking goes up during times of unemployment, so we might expect the magnitude of the heavy drinking effect to decline when we include unemployment since the heavy drinking effect in #2 was likely picking up some of the unemployment effect as well. Excel output: SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 51 ANOVA df SS MS F Regression Residual Total Coefficients Standard Error t Stat P value Intercept unemployment Heavy Drinking Our intuition was borne out wrt the heavy drinking since the magnitude of the effect declines once we control for unemployment. This suggests that the number of violent crimes per 1, population increases by almost 35 crimes when unemployment increases by 1%. This effect is statistically significant. We know this because the t stat is 2.4 which is larger than the critical value associated with a 5% type 1 error. In fact, the p value suggests that we would reject the null of no effect for any

16 type 1 error greater than 1.9% (two tailed). The heavy drinking interpretation is the same as above (with an effect of 1.5 crimes) and it is still not statistically significant. 4. Add the per capita income variable to the regression from #3. How do the coefficients change. Excel Output SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 51 ANOVA df SS MS F Regression Residual Total Coefficients Standard Error t Stat P value Intercept rgdppc E 8 unemployment Heavy Drinking The interpretations of unemployment and heavy drinking are the same as above except now the effect of a 1% increase in the unemployment rate is to increase the violent crime rate by 41 crimes per 1, population (statistically significant at any type 1 error above.3%) and the heavy drinking effect flips signs such that now we would estimate that a 1 % increase in the heavy drinking rate would be associated with a reduction of almost 47 violent crimes per 1, population (and the effect is statistically significant at any type 1 error above 2%). The income effect implies that when real per capita GDP increase by \$1, crime goes up by.1 violent crimes per 1, population. While this seems silly, the effect is statistically significant at any type 1 error bigger than.3%. Of course, there s some omitted variable bias at work here. 5. Run the kitchen sink regression (income, unemployment, drinking, both education variables). Discuss reasons why this model is superior to the model estimated in #4. Excel Output:

17 SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 51 ANOVA df SS MS F Regression Residual Total Coefficients Standard Error t Stat P value Intercept rgdppc E 6 unemployment Heavy Drinking BelowHS CollegePlus Presumably, education has some effect on crime and education levels are probably correlated with income, unemployment, and heavy drinking, leading to omitted variables problems. For example, income probably goes up with education and crime probably declines with both of these, so not controlling for education probably understates the negative effect of income on crime. However, when we compare the estimates, this does not seem to be borne out, but a similar intuition for unemployment does seem to be borne out. Until we face a degree of freedom problem (and with n=51, we probably don t face one until we have > 2 controls in the model), it s probably better to control for stuff. Unfortunately, we still have lots of stuff missing here, so we do not have any real confidence in any of these estimates.

19 Coefficients Standard Error t Stat P value Intercept E X Variable E 5 X Variable 1 is the market return. To get our predicted return, we take this implied equation *market return = predicted HSY return on July 25. The S&P 5 return on July 25 was.9, so our predicted HSY is.3+(.36*.9) =.3 The actual HSY return was.253, so the abnormal return is =.25 (or 25%). We now need to standardize this, so we need some estimate of the standard deviation of HSY returns (before July 25) which we can calculate in excel. If we use stdev.s, we get.13. Our standardized abnormal return then is.25/.13 = standard deviations away from zero. This is statistically significant at the 5% level (in fact, p <.1, so it s significant at pretty much any type 1 error level you would choose). b. The estimate it using the 1 step dummy variable regression approach. You need to create an event dummy which takes the value of for all of the non July 25 dates, and then regress HSY on the S&P 5 indicator and the event dummy you created (including the observation for July 25). The excel output is: SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 1 ANOVA df SS MS F Regression Residual Total Standard Coefficients Error t Stat P value Intercept E

21 Problem Set 7 answer correction: Note in 2a, the abnormal return should have been.253 (.3) =.256. Thus, the standardized abnormal return should be.256/.13 = All of the rest remains unchanged. Sorry for the confusion.

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

### SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

### ACTM State Exam-Statistics

ACTM State Exam-Statistics For the 25 multiple-choice questions, make your answer choice and record it on the answer sheet provided. Once you have completed that section of the test, proceed to the tie-breaker

Business Valuation Review Regression Analysis in Valuation Engagements By: George B. Hawkins, ASA, CFA Introduction Business valuation is as much as art as it is science. Sage advice, however, quantitative

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### The Effects of Unemployment on Crime Rates in the U.S.

The Effects of Unemployment on Crime Rates in the U.S. Sandra Ajimotokin, Alexandra Haskins, Zach Wade April 14 th, 2015 Abstract This paper aims to analyze the relationship between unemployment and crime

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Instrumental Variables Regression. Instrumental Variables (IV) estimation is used when the model has endogenous s.

Instrumental Variables Regression Instrumental Variables (IV) estimation is used when the model has endogenous s. IV can thus be used to address the following important threats to internal validity: Omitted

### Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

### Earnings Announcement and Abnormal Return of S&P 500 Companies. Luke Qiu Washington University in St. Louis Economics Department Honors Thesis

Earnings Announcement and Abnormal Return of S&P 500 Companies Luke Qiu Washington University in St. Louis Economics Department Honors Thesis March 18, 2014 Abstract In this paper, I investigate the extent

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

### Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

### A Primer on Forecasting Business Performance

A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

### Stata Walkthrough 4: Regression, Prediction, and Forecasting

Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25-year-old nephew, who is dating a 35-year-old woman. God, I can t see them getting

### 1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

### The Effect of Prison Populations on Crime Rates

The Effect of Prison Populations on Crime Rates Revisiting Steven Levitt s Conclusions Nathan Shekita Department of Economics Pomona College Abstract: To examine the impact of changes in prisoner populations

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### Drawing a histogram using Excel

Drawing a histogram using Excel STEP 1: Examine the data to decide how many class intervals you need and what the class boundaries should be. (In an assignment you may be told what class boundaries to

### Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

### Correlation and Regression

Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

### Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Statistics Review PSY379

Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

### Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

### Multiple Linear Regression

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

### Regression step-by-step using Microsoft Excel

Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

### Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:

### Outline: Demand Forecasting

Outline: Demand Forecasting Given the limited background from the surveys and that Chapter 7 in the book is complex, we will cover less material. The role of forecasting in the chain Characteristics of

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### For example, enter the following data in three COLUMNS in a new View window.

Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

### COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

### MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Using Excel Jeffrey L. Rummel Emory University Goizueta Business School BBA Seminar Jeffrey L. Rummel BBA Seminar 1 / 54 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships

### Chapter 7. One-way ANOVA

Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

### KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

### Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

### Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

### AP Statistics 2005 Scoring Guidelines

AP Statistics 2005 Scoring Guidelines The College Board: Connecting Students to College Success The College Board is a not-for-profit membership association whose mission is to connect students to college

### Moderation. Moderation

Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

### Introduction to Hypothesis Testing

I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

### AP Statistics 2010 Scoring Guidelines

AP Statistics 2010 Scoring Guidelines The College Board The College Board is a not-for-profit membership association whose mission is to connect students to college success and opportunity. Founded in

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### 12: Analysis of Variance. Introduction

1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider

### Problems With Using Microsoft Excel for Statistics

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001 Problems With Using Microsoft Excel for Statistics Jonathan D. Cryer (Jon-Cryer@uiowa.edu) Department of Statistics

### 1. How different is the t distribution from the normal?

Statistics 101 106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M 7.1 and 7.2, ignoring starred parts. Reread M&M 3.2. The effects of estimated variances on normal approximations. t-distributions.

### Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

### Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

### Implied Volatility Skews in the Foreign Exchange Market. Empirical Evidence from JPY and GBP: 1997-2002

Implied Volatility Skews in the Foreign Exchange Market Empirical Evidence from JPY and GBP: 1997-2002 The Leonard N. Stern School of Business Glucksman Institute for Research in Securities Markets Faculty

### Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

### Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

### The Volatility Index Stefan Iacono University System of Maryland Foundation

1 The Volatility Index Stefan Iacono University System of Maryland Foundation 28 May, 2014 Mr. Joe Rinaldi 2 The Volatility Index Introduction The CBOE s VIX, often called the market fear gauge, measures

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Spreadsheet software for linear regression analysis

Spreadsheet software for linear regression analysis Robert Nau Fuqua School of Business, Duke University Copies of these slides together with individual Excel files that demonstrate each program are available

### Chapter 6: t test for dependent samples

Chapter 6: t test for dependent samples ****This chapter corresponds to chapter 11 of your book ( t(ea) for Two (Again) ). What it is: The t test for dependent samples is used to determine whether the

### August 2012 EXAMINATIONS Solution Part I

August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### One-Way Analysis of Variance

One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### Response to Critiques of Mortgage Discrimination and FHA Loan Performance

A Response to Comments Response to Critiques of Mortgage Discrimination and FHA Loan Performance James A. Berkovec Glenn B. Canner Stuart A. Gabriel Timothy H. Hannan Abstract This response discusses the

### Paper No 19. FINALTERM EXAMINATION Fall 2009 MTH302- Business Mathematics & Statistics (Session - 2) Ref No: Time: 120 min Marks: 80

Paper No 19 FINALTERM EXAMINATION Fall 2009 MTH302- Business Mathematics & Statistics (Session - 2) Ref No: Time: 120 min Marks: 80 Question No: 1 ( Marks: 1 ) - Please choose one Scatterplots are used

### Characteristics of Binomial Distributions

Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

### FINANCIAL ECONOMICS OPTION PRICING

OPTION PRICING Options are contingency contracts that specify payoffs if stock prices reach specified levels. A call option is the right to buy a stock at a specified price, X, called the strike price.

### Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

### Statistics 104: Section 6!

Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

### Market Efficiency and Behavioral Finance. Chapter 12

Market Efficiency and Behavioral Finance Chapter 12 Market Efficiency if stock prices reflect firm performance, should we be able to predict them? if prices were to be predictable, that would create the