Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Size: px
Start display at page:

Download "Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011"

Transcription

1 Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this exam. When time is called please stop writing immediately. There are 9 questions. Unless otherwise indicated, each part of each question is worth 2 points. You may use a calculator and two letter size (both sides) cheat sheets of your own notes. Present your answers in a clear and concise manner. 1

2 Question 1: 13 parts, 26 points Question 2: 5 parts, 10 points Question 3: 6 parts, 12 points Question 4: 3 parts, 6 points Question 5: 7 parts, 15 points Question 6: 7 parts, 15 points Question 7: 3 parts, 6 points Question 8: 9 parts, 18 points Question 9: 6 parts, 12 points Total: 120 points 2

3 Question # 1. The data in this question are returns on three portfolios called Market, SMB, and HML. These portfolios were made famous by Eugene Fama and Kenneth French and have been widely used by finance practitioners for over 30 years. For this question I collected monthly annualized returns on the three portfolios from January 1983 through February 2008, for a total of n = 302 observations. All returns are in percent Time series plot of SMB Use the time series plot above to answer the following questions. (a.) The sample mean of the SMB returns is Answer: (ii) (i.) (ii.) (iii.) 3.07 (iv.)

4 (b.) The sample standard deviation of SMB returns is approximately Answer: (ii) (i.) 1.56 (ii.) 3.27 (iii.) 5.38 (iv.) 9.12 NOTE: You can see this by noting that about 5% of the returns lie outside -6.5 to 6.5, which would be roughly 2 standard deviations above and below the mean. Remember to use the empirical rule as a rough approximation. (c.) Suppose I conduct a hypothesis test for the null hypothesis that the above data are i.i.d.. The p-value associated with this test is approximately Answer: (iii) (i.) (ii.) (iii.) 0.98 (iv.) 1.14 NOTE: From the plot, the data look approximately i.i.d.. Therefore we would expect a large p-value from the test. A large p-value would provide no evidence against the null hypothesis that they are i.i.d.. 4

5 Below are histograms of the three variables. The horizontal and vertical axes are the same in all three plots SMB Market HML Answer the following questions using the histograms on the previous page. (d.) Which of the three portfolio returns has the largest sample variance? Answer: (iii) (i.) SMB (ii.) HML (iii.) Market 5

6 (e.) Which of the three portfolio returns is most left-skewed? Answer: (iii) (i.) SMB (ii.) HML (iii.) Market (iv.) none of them (f.) I conducted a test of normality on each variable. The null hypothesis is the data are i.i.d. normal. Which produces the smallest p-value? Answer: (iii) (i.) SMB (ii.) HML (iii.) Market (iv.) all three are the same NOTE: A small p-value means that it provides evidence against the null hypothesis. Of the three histograms, the Market histogram looks the least bell-shaped, i.e. non-normal. (g.) The sample variance of the HML portfolio returns is approximately Answer: (iii) (i.) 3.12 (ii.) 6.57 (iii.) 9.52 (iv.) NOTE: Remember to use the empirical rule as a rough approximation. We can see that 95% of the HML returns are between -5.5 and 6.5. This means that the standard dev. must be about 3. We can see that 9.52 =

7 Below is a scatter plot of SMB versus HML returns February SMB HML Answer the following questions using the scatter plot on the previous page. (h.) The sample correlation between the SMB and HML returns is Answer: (ii) (i.) (ii.) (iii.) 0.42 (iv.) 0.85 (i.) If we deleted the February 2000 observation (indicated on the plot) from the sample, the sample correlation would be Answer: (ii) (i) closer to -1 (ii) closer to 0 (iii) closer to 1 (iv) unchanged (j.) What is the sample covariance between SMB and HML? s SMB,HML = s SMB s HML r s = (3.27) (3.085) ( 0.42) =

8 Suppose that we estimate the following linear regression model: SMB i = α + βhml i + ε i (k.) The R-squared of this regression will be approximately Answer: (i) (i.) 0.17 (ii.) 0.42 (iii.) 0.73 (iv.) 0.85 NOTE: remember that for simple linear regression, the R-squared is equal to the correlation squared! (l.) What is our estimate of the slope, b? NOTE: you have to know the formula for the slope coefficient from class. b = s SMB,HML s 2 HML = = (m.) Suppose we know that the HML return next month will be 2%, and we use the regression above to construct a 95% plug-in predictive interval for next month s SMB return. Such an interval implicitly assumes that Answer: (ii) (i.) HML is i.i.d. normal (ii.) ε is i.i.d. normal (iii.) both (i) and (ii) (iv.) none of the above NOTE: To compute the predictive interval for y, we don t need to make assumptions about the distribution of the right-hand side variable x but we do need to make an assumption about the distribution of the errors ε. 8

9 Question # 2. Multiple choice. For each question, choose one answer. (a.) Fill in the blanks in the following phrase, in order: Answer: (iv) Statistical methods draw conclusions about unknown based on computed from. (i.) parameters, samples, a statistic. (ii.) samples, statistics, a parameter. (iii.) statistics, parameters, a sample. (iv.) parameters, statistics, a sample. (v.) samples, parameters, a statistic. (b.) Suppose that most of the observations in a given data set are of the same magnitude, except for a few data points that are substantially larger. Which of the following would be true? Answer: (ii) (i.) The sample mean would be smaller than the median, and the histogram would be skewed with a long right tail. (ii.) The sample mean would be larger than the median, and the histogram would be skewed with a long right tail. (iii.) The sample mean would be smaller than the median, and the histogram would be skewed with a long left tail. (iv.) The sample mean would be larger than the median, and the histogram would be skewed with a long left tail. (v.) The sample mean and median would be approximately the same, and the histogram would be roughly symmetric. NOTE: We saw an example of this in the bank arrival time data. Large outliers affect the mean more than the median. The histogram had a pronounced right tail. 9

10 (c.) An achievement test is given each year to 3rd graders in a certain school district. Scores on the test are normally distributed with a mean of 100 points and a standard deviation of 15 points. If Jane s z-score was 1.2, how many points did she score on the test? Answer: (v) (i.) 82 (ii.) 88 (iii.) 100 (iv.) 112 (v.) 118 (d.) The Central Limit theorem implies that: Answer: (v) (i.) If we simulate 5,000 i.i.d. draws from any probability distribution, the histogram will appear bell-shaped. (ii.) If we simulate 5,000 i.i.d. draws from any probability distribution, the time series plot should not display any obvious patterns. (iii.) The average of 5,000 i.i.d. draws from any probability distribution should exactly equal the population mean. (iv.) If our sample consists of 5,000 iid draws from any probability distribution, the data points will be approximately normally distributed around the sample mean (v.) If we start with 5,000 i.i.d. draws from any probability distribution and let x 1 be the average of the first 50 data points, x 2 be the average of the 51st through 100th data points, x 3 be the average of the 101st through 151st data points, etc. then a histogram of the numbers x 1 through x 100 should appear bell-shaped. 10

11 (e.) You obtain a sample of 25 students from the same high school. Based on this data, a 95% confidence interval for the expected value of a student s SAT score is 900 to Which of the following is a valid interpretation of this interval? Answer: (iv) (i.) 95% of the 25 students in the sample have an SAT score between 900 and (ii.) 95% of the population of students at this high school will have an SAT score between 900 and (iii.) Given the outcomes in this sample, there is a 95% probability that the true expected value of SAT scores is between 900 and (iv.) If all high schools were the same and we repeated this procedure at many other schools, 95% of the resulting intervals would contain the true expected value of a student s SAT score. (v.) If all high schools were the same and we repeated this procedure at many other schools, 95% of the sample means would be between 900 and NOTE: Before we see the data, the 95% confidence interval has a 95% probability of covering the true (population) value of the parameter. For a particular sample (after we have seen the data), the true value is either in the interval or it is not. The answer (iii) may seem correct but it is technically wrong. 11

12 Question # 3. The following joint probability distribution is based on survey data collected by a major financial publication in For a randomly selected person living in the U.S., define the random variable S as the percentage of retirement income invested in the stock market. Define the random variable A as A = 1 A = 2 A = 3 if the person is below 30 years of age if the person is between 30 and 50 years old if the person is above 50 years old Based on the survey, we have come up with the following joint probability distribution for S and A: S 10% 30% 60% A (a.) What is the marginal probability that A = 3? P (A = 3) = P (A = 3, S = 0.1) + P (A = 3, S = 0.3) + P (A = 3, S = 0.6) = = 0.43 (b.) What is the expected value of S? First, we need to know the marginal distribution of S. This is s p(s) 10% 0.19 S 30% %

13 E[S] = 0.19 (10) (30) (60) = 34.3% (c.) What is the standard deviation of S? V [S] = 0.19 ( ) ( ) ( ) 2 = This implies that the standard deviation is SD(S) = = 17.3%. (d.) What is the probability that a randomly selected investor is below 50 years of age and has 30% or more of his retirement savings invested in the stock market? = 0.48 (e.) Suppose we know a particular investor has only 10% of her retirement savings invested in the stock market. What is the probability she is over 50 years old? P (A = 3, S = 10%) P (A = 3 S = 10%) = P (S = 10%) = =

14 (f.) Are A and S independent? Briefly justify your answer. From part (e), P (A = 3 S = 10%) = 0.526, while the marginal probability of being over 50 is P (A = 3) = Therefore, the random variables A and S are not independent. 14

15 Question # 4. Suppose starting next Monday that I go to a casino every night for a week and play 125 hands of blackjack. Suppose that I bet $10 per hand, so on each hand I will either lose $10, push, win $10, or double down and win $20 (assume that house rules prohibit me from doubling down or splitting more than once). Suppose that my winnings on each hand, i = 1, 2,..., 125, is a random variable W i with E[W i ] = $0.10 σ(w i ) = $12.30 (a.) Suppose I look at my average winnings per hand on a given night, w = w 1 + w w where each hand W i is i.i.d.. What are the expected value and variance of w? We can use our linear formulas from Lecture #4 to show that: E[W ] = 0.10 V [W ] = (12.3)2 125 = 1.21 (b.) Now suppose I do this every night for a month (30 days). Assuming I play 125 hands each night, on approximately how many nights will I average a $1 or more loss per hand? By the Central Limit Theorem, we know W N(0.10, 1.21). On a normal distribution with µ = 0.1 and σ = 1.1, P (W < 1) = 0.16 because this is 1 standard deviation to the left of the mean. Consequently, I average a $1 loss or more on about 0.16*30 = 4.8 nights. 15

16 (c.) Your answer to part (b) implicitly makes use of the Central Limit Theorem. Which of the following correctly justifies your use of the CLT: Answer: (ii) (i.) Outcomes for each hand are very nearly normally distributed. (ii.) Outcomes for each hand are i.i.d. and I am playing a large number of hands per night. (iii.) Outcomes for each hand are i.i.d. and I am playing for a sufficiently large sample of days. (iv.) Outcomes for each night are i.i.d. and I am averaging over a large sample of nightly outcomes. NOTE: We are taking an average over the hands played each night, which are assumed to be i.i.d.. 16

17 Question # 5. First People s Bank (FPB) has most of their commercial loan department working with small business clients. The bank s managers consider this their most important growth area and several years ago hired a consulting team to improve two aspects of their loan process. In particular, they want to decrease the default rate on the loans (that is, the proportion of loans for which the borrower is unable to make payments). They also want to improve customer service by decreasing the time it takes to process loan applications. Historically (prior to the consultants being hired), management has found that the number of business days required to process a small business loan application is i.i.d. normal with a mean of 14 and a variance of 4. (a.) Before the consultants were hired, approximately what percentage of loan applications were processed in 10 days or less? Use the normal distribution with µ = 14 and σ = 2. P (X < 10) = because we are 2 standard deviations to the left of the mean. (b.) The consulting team identified and implemented a number of measures to speed up the application process. Management has reviewed a sample of 25 loan applications processed after these measures were implemented. The average processing time in the sample was 11.2 days and the sample standard deviation of processing times is 2.0 days. Were the consultants measures effective? Formulate an appropriate hypothesis test or confidence interval and state your conclusions. The null hypothesis is H 0 : µ = 14. This is like saying that the processing time is the same as it was before the consultants were hired. 17

18 z = x µ0 σ n = 2 25 = 7 We reject the null hypothesis and conclude that the measures were effective. (c.) If we treat the estimates in part (b) (mean of 11.2 and standard deviation of 2.0) as if they were the actual mean and standard deviation, approximately what percentage of loans will be processed in 10 days or less? Answer: (iii) (i.) 5.3% (ii.) 15.9% (iii.) 27.4% (iv.) 51.1% NOTE: 10 is less than 1 standard deviation below the mean, so P (X < 10) is bigger than 16% and less than 50%. Historically (prior to the consultants being hired), 15% of FPB s small business loans resulted in default. The consulting team trained FPB s analysts to use software designed to reduce the default rate by more effectively identifying high risk businesses that are more likely to default. (d.) In a typical year, FPB grants 120 loans to small businesses. Assume that defaults are i.i.d. events; that is, if two firms are granted loans, whether the first firm defaults is independent of whether the second firm defaults. Let Y be the number of loans granted in a typical year that will eventually end up in default. What is the distribution of Y? Binomial(120,0.15) We are looking at n = 120 i.i.d. Bernoulli outcomes where each one has p =

19 (e.) Give an interval that is 95% likely to contain the number of loans granted in a typical year which will eventually end up in default. E[Y ] = np = 120 (0.15) = 18 V [Y ] = np(1 p) = 120 (0.15) (0.85) = 15.3 A 95% interval is 18 ± , which is approximately (10,26). (f.) Looking at a sample of 100 loans granted after FPB s analysts started using the new software, management finds that 7 of those loans ended up in default. Was the new software effective in reducing defaults? Formulate an appropriate hypothesis test and state your conclusions. An appropriate null hypothesis is H 0 : p = z = ˆp p 0 p 0 (1 p 0 ) n = = 2.24 We would reject the null hypothesis at a 5% level and conclude that the software is effective. 19

20 In reality, defaults on small business loans are probably not independent. One reason for this is that a broad economic downturn can cause lots of small businesses to default in a relatively short time period. Because of this, defaults may be positively correlated across firms. That is, if we look at a sample of n loans given in the same year, and let X i = 1 if loan i ends up in default and 0 otherwise for i = 1, 2, 3,..., n, we now assume that: cov(x i, X j ) > 0 for any loans i j (g.) [3 points] Let n = 120 and again define Y = X 1 + X X 120 as the number of loans given in a particular year that end up in default. Suppose we still believe that any single loan has a 15% chance of ending up in default. This is the same random variable we considered in parts (d)-(e), except there we assumed that the individual loan defaults were i.i.d. and now we are assuming cov(x i, X j ) > 0. How does this affect the expected value of Y? How does it affect the variance of Y? Briefly explain. The expected value of a sum of random variables is always the sum of the expected values, so E[Y ] is unaffected. However, the variance will be affected. We know that in general V [Y ] = V [X 1 ] + V [X 2 ] V [X 120 ] + 2 [Cov (X 1, X 2 ) Cov (X 119, X 120 )]. Since the covariances are positive, this means that the variance of Y is substantially larger. NOTE: When discussing the affect on the variance, it would be fine if you just state the case for n = 2, i.e. V [Y ] = V [X 1 ] + V [X 2 ] + 2Cov (X 1, X 2 ). 20

21 Question # 6. In this problem we estimate the market model using returns on an asset GE and returns on the S&P 500. The sample size is n = 254. GE: returns on General Electric stock Market: The market portfolio (the S&P 500) I took monthly returns on each asset and ran the following regression: GE = α + βmarket + ε Some of the results from running this regression in StatPro are reported here: ANOVA table Source df SS MS F p-value Explained Unexplained Regression coefficients Coefficient Std Err Constant SP (a.) Give a 95% confidence interval for β, the coefficient on Market. b ± 2 s b = ± = ± = (1.1123, ) (b.) Test the null hypothesis that the Market is not related to GE (β = 0) at the 5% level. t = b β0 s b = =

22 We would reject the null hypothesis at the 5% level. (c.) What is the standard deviation of the residuals s e? unexplained sum of squares s e = (n 2) = 252 = (d.) What is the sample correlation between the Fitted Values and Residuals? The sample correlation between the fitted values ŷ and the residuals e is zero. This is one of the major properties of the residuals and is a result of using least squares. 22

23 Suppose returns for the Market next month are given by: SP500 Fitted Values Residuals 4/1/ ?? (e.) [3 points] Construct a 95% plug-in predictive interval for GE for this month. A 95% plug-in predictive interval is: (a + b x 2 s e, a + b x + 2 s e ) = ( ± ) = ( , ) (f.) In the table above, what is the fitted value? Can you calculate the residual? The fitted value is: a + b x = = No. You cannot calculate the residual because you have not observed the value of GE for this month yet. (g.) Test the null hypothesis that H 0 : α = at the 5% level. t = a α0 s a = = = We would clearly reject this null hypothesis at the 5% level. 23

24 Question # 7. When coded messages are received, there are sometimes errors in transmission creating uncertainty about the message that was actually sent. In particular, Morse code uses dots and dashes as a way to encode messages. Specifically, each letter of the alphabet and each number are given a special sequence of dots and dashes. Let the random variable S = 1 if a dot is sent and S = 0 if a dash is sent. Define the random variable R = 1 if a dot is received and R = 0 if a dash is received. Dots and dashes are known to occur in the proportion 3:4. This means that P (S = 1) = 3 7 and P (S = 0) = 4 7. Suppose there is interference on the transmission line, and with probability 1 8 received as a dash, and vice versa. a dot is mistakenly (a.) What is the probability that a dot was received given a dot was sent P (R = 1 S = 1)? This is a 1 8 chance of a mistake, which makes the probability of getting it right equal to: P (R = 1 S = 1) = 7 8 (b.) What is the marginal probability that a dot is received P (R = 1)? The marginal probability is the sum of the two joint probabilities. P (R = 1) = P (R = 1 S = 1)P (S = 1) + P (R = 1 S = 0)P (S = 0) = =

25 (c.) If we receive a dot, can we be sure that a dot was sent? Calculate the probability of a dot being sent given that a dot was received. We use Bayes Rule. P (S = 1 R = 1) = P (R = 1 S = 1)P (S = 1) P (R = 1) = ( 7 8 ) ( 3 7 ) =

26 Question # 8. Suppose I toss two six-sided dice, like those in this picture: Let the random variable X 1 be the number shown on the first die. Let the random variable X 2 be the number shown on the second die. The possible outcomes for each of X 1 and X 2 are 1, 2, 3, 4, 5, or 6. Each of the six outcomes is equally likely. Assume that X 1 and X 2 are independent. If it helps you visualize, the joint distribution of X 1 and X 2 would look like: X /36 1/36 1/36 1/36 1/36 1/36 X 1 2 1/36 1/36 1/36 1/36 1/36 1/36 3 1/36 1/36 1/36 1/36 1/36 1/36 4 1/36 1/36 1/36 1/36 1/36 1/36 5 1/36 1/36 1/36 1/36 1/36 1/36 6 1/36 1/36 1/36 1/36 1/36 1/36 (a.) What is P (X 1 > 3)? Using the marginal distribution of X 1, we get P (X 1 > 3) = P (X 1 = 4) + P (X 1 = 5) + P (X 1 = 6) = = 1 2 (b.) Given that X 1 + X 2 = 10, what is the probability that X 1 = 5 and X 2 = 5? There are 3 ways for the sum to equal 10: (4,6), (5,5), (6,4) and these are all equally likely. Therefore, the probability is 1. 3 (c.) Given that X 1 = 5, what is the expected value of X 1 + X 2? (In other words, suppose the first die shows a 5. What is the expected value of the sum of the two die rolls?) E [X 2 ] = (1)(1/6) + (2)(1/6) + (3)(1/6) + (4)(1/6) + (5)(1/6) + (6)(1/6) =

27 We know that X 1 = 5. Therefore, we get: E [X 1 + X 2 ] = E [5 + X 2 ] = 5 + E [X 2 ] =

28 (d.) The popular dice game craps begins with a person (the shooter ) rolling two dice. If the sum of the two dice (X 1 + X 2 ) equals 7 or 11, the shooter is said to have rolled a natural and automatically wins. If the sum is 2, 3, or 12, the shooter is said to crap out and automatically loses. What is the probability that X 1 + X 2 equals 2, 3, or 12? = P (X 1 = 1, X 2 = 1) + P (X 1 = 1, X 2 = 2) + P (X 1 = 2, X 2 = 1) + P (X 1 = 6, X 2 = 6) = = 4 36 = 1 9 Let s suppose the first time the two dice are rolled, the total is ten (X 1 + X 2 = 10). In this case, 10 becomes the point. The shooter then continues rolling the two dice over and over again (both dice are always thrown at the same time). Each time the two dice are thrown, one of the following three things happens: If the total is 10, the game ends and the shooter wins. If the total is 7, the game ends and the shooter loses. Otherwise, the game continues, and the shooter rolls again. Theoretically this could continue forever! (e.) Each time the two dice are rolled, what is the probability the game continues? The game ends if either a 7 or 10 is rolled. We need to compute these probabilities. There are six ways to roll a 7 and three ways to roll a 10. P (X 1 + X 2 = 7 or X 1 + X 2 = 10) = = 9 36 = 1 4 The probability that the game continues is 1 - P (X 1 + X 2 = 7 or X 1 + X 2 = 10) or

29 (f.) Suppose we know the game is going to end on the next roll. What is the probability the shooter wins? Similar to part (b), if we know we re going to get 7 or 10, there are 9 total possibilities, 3 of which result in a win. Therefore, it is 3 or (g.) Starting with i = 2, let U i = 1 if the game ends on the i-th roll and 0 otherwise. What is the probability distribution of U i? (Hint: use your answer to (e).) U i Bernoulli(0.25) 29

30 Assume that each U i is i.i.d.. Let R be a random variable equal to the number of rolls before the game ends. R can be any positive integer (1, 2, 3,...). We assumed the first roll was 10. If the game ends on the second roll (U 2 = 1), then R = 1. If the game ends on the third roll (that is, U 2 = 0 and U 3 = 1), then R = 2. If the game ends on the fourth roll (that is, U 2 = 0, U 3 = 0, and U 4 = 1), then R = 3, etc. As an interesting side note, it turns out that the probability distribution of the random variable R is known as the geometric distribution. (h.) What is the probability that R = 3, i.e. the craps game ends after four rolls? P (R = 3) = P (U 2 = 0, U 3 = 0, U 4 = 1) = P (U 2 = 0)P (U 3 = 0)P (U 4 = 1) = = 9 64 Be careful! Actual rules for craps can differ from what we ve assumed here (e.g., sometimes a 12 will end the game as well a 7). In casinos, betting the pass line is equivalent to betting that the shooter wins as we defined it here. After the point is established, you can then take odds, which here would mean betting that a 10 will be rolled before a 7. The interesting thing is the odds bet is actually a fair bet (if the point is 10, it would pay 2-to-1), i.e. there is no house advantage! Because of this many casinos limit odds bets to 6-7 times your bet on the pass line. (i.) What is the probability that R > 3, i.e. the craps game lasts longer than four rolls? Here, you must recognize that this is 1 minus the probability of being less than or equal to 3. P (R > 3) = 1 P (R = 1) P (R = 2) P (R = 3) = = We could go on from here, and you d see that, despite it being possible for craps to continue forever, there s a 90% probability the game ends within the first 8 rolls. 30

31 Question # 9. Suppose we are working with 0-1 data (i.e., a dummy variable), and as usual we have assumed that X i Bernoulli (p) i.i.d. We are going to look at a sample of size n, and use the sample proportion of one s, ˆp, as an estimator of p. Recall that for dummy variables, the sample proportion is an average: ˆp = X 1 + X X n n (a.) Suppose that n is large enough for us to use the Central Limit Theorem. What is the sampling distribution of ˆp? (HINT: Your answer should depend on the unknown parameter p.) ( The sampling distribution is ˆp N p, p(1 p) n ) Now suppose we want to build a confidence interval for p, but we run into two issues. First, we have a sample of only n = 10 observations. Second, our actual data is {1, 1, 1, 1, 1, 1, 1, 1, 1, 1} All ten of our observations in the sample are equal to one, which means that ˆp = 1! (b.) Give a 90% confidence interval for p using your answer to part (a). (NOTE: the appropriate critical value here is 1.64, but it doesn t matter, you still get a very silly answer!) Based on the sampling distribution from part (a), the confidence interval is (1, 1). This is pretty obviously messed up... we are in no way absolutely certain that the true value of p is equal to 1 based on a sample of n = 10 observations! 31

32 Because n = 10 is a relatively small sample size and our data is highly non-normal, we should probably not rely on the Central Limit Theorem here. However, we can actually build a 90% confidence interval for p without using the CLT. Remember that p is a probability, so it must be somewhere between 0 and 1. (c.) Suppose we knew the true value of p. Without using the CLT, we can find the sampling distribution of ˆp = 1 by recognizing that ˆp = Y n distribution of Y? where Y is a random variable. What is the probability The random variable Y is the sum of n = 10 i.i.d. Bernoulli random variables. Therefore, the distribution of Y is binomial(n,p) or binomial(10,p) (d.) Suppose that p = 0.9. What is P (ˆp = 1) in a sample of size n = 10? Should our 90% confidence interval include p = 0.9? (i.e., is 0.9 a reasonable value of p?) With p = 0.9, P (ˆp = 1) = (0.9) 10 = Our confidence interval SHOULD include p = 0.9. With n = 10 observations and p = 0.9, it is definitely possible (there is a 35% chance) we would see a value of ˆp = 1. (e.) Suppose that p = 0.7. What is P (ˆp = 1) in a sample of size n = 10? Should our 90% confidence interval include p = 0.7? With p = 0.7, P (ˆp = 1) = (0.7) 10 = Our confidence interval should probably NOT include p = 0.7. With n = 10 observations and p = 0.7, it is pretty unlikely (there is only a 3% chance) we would see a value of ˆp = 1. (f.) Based on the sample of n = 10 observations on the previous page, give a 90% confidence interval for p without using the Central Limit Theorem. Our confidence interval should obviously include p = 1. What is the smallest value of p we d call reasonable? Well, for a 90% CI, we d rule out any p for which P (ˆp = 1) < Solving p 10 = 0.1 for p, we get p = (0.1) 1/10 = The exact 90% confidence interval is (0.794, 1). 32

33 USE FOR SCRATCH PAPER. WORK ON THIS PAGE WILL NOT BE GRADED. 33

34 USE FOR SCRATCH PAPER. WORK ON THIS PAGE WILL NOT BE GRADED. 34

Statistics 104: Section 6!

Statistics 104: Section 6! Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

Chapter 7 Section 1 Homework Set A

Chapter 7 Section 1 Homework Set A Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

University of Chicago Graduate School of Business. Business 41000: Business Statistics

University of Chicago Graduate School of Business. Business 41000: Business Statistics Name: University of Chicago Graduate School of Business Business 41000: Business Statistics Special Notes: 1. This is a closed-book exam. You may use an 8 11 piece of paper for the formulas. 2. Throughout

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

AMS 5 CHANCE VARIABILITY

AMS 5 CHANCE VARIABILITY AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

1 Simple Linear Regression I Least Squares Estimation

1 Simple Linear Regression I Least Squares Estimation Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Problem sets for BUEC 333 Part 1: Probability and Statistics

Problem sets for BUEC 333 Part 1: Probability and Statistics Problem sets for BUEC 333 Part 1: Probability and Statistics I will indicate the relevant exercises for each week at the end of the Wednesday lecture. Numbered exercises are back-of-chapter exercises from

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 1. Which of the following will increase the value of the power in a statistical test

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

AP STATISTICS (Warm-Up Exercises)

AP STATISTICS (Warm-Up Exercises) AP STATISTICS (Warm-Up Exercises) 1. Describe the distribution of ages in a city: 2. Graph a box plot on your calculator for the following test scores: {90, 80, 96, 54, 80, 95, 100, 75, 87, 62, 65, 85,

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Two-sample hypothesis testing, II 9.07 3/16/2004

Two-sample hypothesis testing, II 9.07 3/16/2004 Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,

More information

Simple Linear Regression

Simple Linear Regression STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

Results from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu

Results from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu Results from the 2014 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu The six free-response questions Question #1: Extracurricular activities

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.$ and Sales $: 1. Prepare a scatter plot of these data. The scatter plots for Adv.$ versus Sales, and Month versus

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science Mondays 2:10 4:00 (GB 220) and Wednesdays 2:10 4:00 (various) Jeffrey Rosenthal Professor of Statistics, University of Toronto

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random

More information

Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

More information

Expected Value and the Game of Craps

Expected Value and the Game of Craps Expected Value and the Game of Craps Blake Thornton Craps is a gambling game found in most casinos based on rolling two six sided dice. Most players who walk into a casino and try to play craps for the

More information

MA 1125 Lecture 14 - Expected Values. Friday, February 28, 2014. Objectives: Introduce expected values.

MA 1125 Lecture 14 - Expected Values. Friday, February 28, 2014. Objectives: Introduce expected values. MA 5 Lecture 4 - Expected Values Friday, February 2, 24. Objectives: Introduce expected values.. Means, Variances, and Standard Deviations of Probability Distributions Two classes ago, we computed the

More information

University of Chicago Graduate School of Business. Business 41000: Business Statistics Solution Key

University of Chicago Graduate School of Business. Business 41000: Business Statistics Solution Key Name: OUTLINE SOLUTIONS University of Chicago Graduate School of Business Business 41000: Business Statistics Solution Key Special Notes: 1. This is a closed-book exam. You may use an 8 11 piece of paper

More information

Chapter 4. Probability Distributions

Chapter 4. Probability Distributions Chapter 4 Probability Distributions Lesson 4-1/4-2 Random Variable Probability Distributions This chapter will deal the construction of probability distribution. By combining the methods of descriptive

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption Last time, we used the mean of one sample to test against the hypothesis that the true mean was a particular

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Chapter 7. One-way ANOVA

Chapter 7. One-way ANOVA Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

Name: Date: Use the following to answer questions 3-4:

Name: Date: Use the following to answer questions 3-4: Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

More information

The Math. P (x) = 5! = 1 2 3 4 5 = 120.

The Math. P (x) = 5! = 1 2 3 4 5 = 120. The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

9. Sampling Distributions

9. Sampling Distributions 9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

Description. Textbook. Grading. Objective

Description. Textbook. Grading. Objective EC151.02 Statistics for Business and Economics (MWF 8:00-8:50) Instructor: Chiu Yu Ko Office: 462D, 21 Campenalla Way Phone: 2-6093 Email: kocb@bc.edu Office Hours: by appointment Description This course

More information

1. How different is the t distribution from the normal?

1. How different is the t distribution from the normal? Statistics 101 106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M 7.1 and 7.2, ignoring starred parts. Reread M&M 3.2. The effects of estimated variances on normal approximations. t-distributions.

More information

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly

More information

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL

More information

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012) STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012) TA: Zhen (Alan) Zhang zhangz19@stt.msu.edu Office hour: (C500 WH) 1:45 2:45PM Tuesday (office tel.: 432-3342) Help-room: (A102 WH) 11:20AM-12:30PM,

More information

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses Introduction to Hypothesis Testing 1 Hypothesis Testing A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population Hypothesis is stated in terms of the

More information