# Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

 To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video
Save this PDF as:

Size: px
Start display at page:

Download "Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011"

## Transcription

1 Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this exam. When time is called please stop writing immediately. There are 9 questions. Unless otherwise indicated, each part of each question is worth 2 points. You may use a calculator and two letter size (both sides) cheat sheets of your own notes. Present your answers in a clear and concise manner. 1

2 Question 1: 13 parts, 26 points Question 2: 5 parts, 10 points Question 3: 6 parts, 12 points Question 4: 3 parts, 6 points Question 5: 7 parts, 15 points Question 6: 7 parts, 15 points Question 7: 3 parts, 6 points Question 8: 9 parts, 18 points Question 9: 6 parts, 12 points Total: 120 points 2

3 Question # 1. The data in this question are returns on three portfolios called Market, SMB, and HML. These portfolios were made famous by Eugene Fama and Kenneth French and have been widely used by finance practitioners for over 30 years. For this question I collected monthly annualized returns on the three portfolios from January 1983 through February 2008, for a total of n = 302 observations. All returns are in percent Time series plot of SMB Use the time series plot above to answer the following questions. (a.) The sample mean of the SMB returns is Answer: (ii) (i.) (ii.) (iii.) 3.07 (iv.)

4 (b.) The sample standard deviation of SMB returns is approximately Answer: (ii) (i.) 1.56 (ii.) 3.27 (iii.) 5.38 (iv.) 9.12 NOTE: You can see this by noting that about 5% of the returns lie outside -6.5 to 6.5, which would be roughly 2 standard deviations above and below the mean. Remember to use the empirical rule as a rough approximation. (c.) Suppose I conduct a hypothesis test for the null hypothesis that the above data are i.i.d.. The p-value associated with this test is approximately Answer: (iii) (i.) (ii.) (iii.) 0.98 (iv.) 1.14 NOTE: From the plot, the data look approximately i.i.d.. Therefore we would expect a large p-value from the test. A large p-value would provide no evidence against the null hypothesis that they are i.i.d.. 4

5 Below are histograms of the three variables. The horizontal and vertical axes are the same in all three plots SMB Market HML Answer the following questions using the histograms on the previous page. (d.) Which of the three portfolio returns has the largest sample variance? Answer: (iii) (i.) SMB (ii.) HML (iii.) Market 5

6 (e.) Which of the three portfolio returns is most left-skewed? Answer: (iii) (i.) SMB (ii.) HML (iii.) Market (iv.) none of them (f.) I conducted a test of normality on each variable. The null hypothesis is the data are i.i.d. normal. Which produces the smallest p-value? Answer: (iii) (i.) SMB (ii.) HML (iii.) Market (iv.) all three are the same NOTE: A small p-value means that it provides evidence against the null hypothesis. Of the three histograms, the Market histogram looks the least bell-shaped, i.e. non-normal. (g.) The sample variance of the HML portfolio returns is approximately Answer: (iii) (i.) 3.12 (ii.) 6.57 (iii.) 9.52 (iv.) NOTE: Remember to use the empirical rule as a rough approximation. We can see that 95% of the HML returns are between -5.5 and 6.5. This means that the standard dev. must be about 3. We can see that 9.52 =

7 Below is a scatter plot of SMB versus HML returns February SMB HML Answer the following questions using the scatter plot on the previous page. (h.) The sample correlation between the SMB and HML returns is Answer: (ii) (i.) (ii.) (iii.) 0.42 (iv.) 0.85 (i.) If we deleted the February 2000 observation (indicated on the plot) from the sample, the sample correlation would be Answer: (ii) (i) closer to -1 (ii) closer to 0 (iii) closer to 1 (iv) unchanged (j.) What is the sample covariance between SMB and HML? s SMB,HML = s SMB s HML r s = (3.27) (3.085) ( 0.42) =

8 Suppose that we estimate the following linear regression model: SMB i = α + βhml i + ε i (k.) The R-squared of this regression will be approximately Answer: (i) (i.) 0.17 (ii.) 0.42 (iii.) 0.73 (iv.) 0.85 NOTE: remember that for simple linear regression, the R-squared is equal to the correlation squared! (l.) What is our estimate of the slope, b? NOTE: you have to know the formula for the slope coefficient from class. b = s SMB,HML s 2 HML = = (m.) Suppose we know that the HML return next month will be 2%, and we use the regression above to construct a 95% plug-in predictive interval for next month s SMB return. Such an interval implicitly assumes that Answer: (ii) (i.) HML is i.i.d. normal (ii.) ε is i.i.d. normal (iii.) both (i) and (ii) (iv.) none of the above NOTE: To compute the predictive interval for y, we don t need to make assumptions about the distribution of the right-hand side variable x but we do need to make an assumption about the distribution of the errors ε. 8

9 Question # 2. Multiple choice. For each question, choose one answer. (a.) Fill in the blanks in the following phrase, in order: Answer: (iv) Statistical methods draw conclusions about unknown based on computed from. (i.) parameters, samples, a statistic. (ii.) samples, statistics, a parameter. (iii.) statistics, parameters, a sample. (iv.) parameters, statistics, a sample. (v.) samples, parameters, a statistic. (b.) Suppose that most of the observations in a given data set are of the same magnitude, except for a few data points that are substantially larger. Which of the following would be true? Answer: (ii) (i.) The sample mean would be smaller than the median, and the histogram would be skewed with a long right tail. (ii.) The sample mean would be larger than the median, and the histogram would be skewed with a long right tail. (iii.) The sample mean would be smaller than the median, and the histogram would be skewed with a long left tail. (iv.) The sample mean would be larger than the median, and the histogram would be skewed with a long left tail. (v.) The sample mean and median would be approximately the same, and the histogram would be roughly symmetric. NOTE: We saw an example of this in the bank arrival time data. Large outliers affect the mean more than the median. The histogram had a pronounced right tail. 9

10 (c.) An achievement test is given each year to 3rd graders in a certain school district. Scores on the test are normally distributed with a mean of 100 points and a standard deviation of 15 points. If Jane s z-score was 1.2, how many points did she score on the test? Answer: (v) (i.) 82 (ii.) 88 (iii.) 100 (iv.) 112 (v.) 118 (d.) The Central Limit theorem implies that: Answer: (v) (i.) If we simulate 5,000 i.i.d. draws from any probability distribution, the histogram will appear bell-shaped. (ii.) If we simulate 5,000 i.i.d. draws from any probability distribution, the time series plot should not display any obvious patterns. (iii.) The average of 5,000 i.i.d. draws from any probability distribution should exactly equal the population mean. (iv.) If our sample consists of 5,000 iid draws from any probability distribution, the data points will be approximately normally distributed around the sample mean (v.) If we start with 5,000 i.i.d. draws from any probability distribution and let x 1 be the average of the first 50 data points, x 2 be the average of the 51st through 100th data points, x 3 be the average of the 101st through 151st data points, etc. then a histogram of the numbers x 1 through x 100 should appear bell-shaped. 10

11 (e.) You obtain a sample of 25 students from the same high school. Based on this data, a 95% confidence interval for the expected value of a student s SAT score is 900 to Which of the following is a valid interpretation of this interval? Answer: (iv) (i.) 95% of the 25 students in the sample have an SAT score between 900 and (ii.) 95% of the population of students at this high school will have an SAT score between 900 and (iii.) Given the outcomes in this sample, there is a 95% probability that the true expected value of SAT scores is between 900 and (iv.) If all high schools were the same and we repeated this procedure at many other schools, 95% of the resulting intervals would contain the true expected value of a student s SAT score. (v.) If all high schools were the same and we repeated this procedure at many other schools, 95% of the sample means would be between 900 and NOTE: Before we see the data, the 95% confidence interval has a 95% probability of covering the true (population) value of the parameter. For a particular sample (after we have seen the data), the true value is either in the interval or it is not. The answer (iii) may seem correct but it is technically wrong. 11

12 Question # 3. The following joint probability distribution is based on survey data collected by a major financial publication in For a randomly selected person living in the U.S., define the random variable S as the percentage of retirement income invested in the stock market. Define the random variable A as A = 1 A = 2 A = 3 if the person is below 30 years of age if the person is between 30 and 50 years old if the person is above 50 years old Based on the survey, we have come up with the following joint probability distribution for S and A: S 10% 30% 60% A (a.) What is the marginal probability that A = 3? P (A = 3) = P (A = 3, S = 0.1) + P (A = 3, S = 0.3) + P (A = 3, S = 0.6) = = 0.43 (b.) What is the expected value of S? First, we need to know the marginal distribution of S. This is s p(s) 10% 0.19 S 30% %

13 E[S] = 0.19 (10) (30) (60) = 34.3% (c.) What is the standard deviation of S? V [S] = 0.19 ( ) ( ) ( ) 2 = This implies that the standard deviation is SD(S) = = 17.3%. (d.) What is the probability that a randomly selected investor is below 50 years of age and has 30% or more of his retirement savings invested in the stock market? = 0.48 (e.) Suppose we know a particular investor has only 10% of her retirement savings invested in the stock market. What is the probability she is over 50 years old? P (A = 3, S = 10%) P (A = 3 S = 10%) = P (S = 10%) = =

14 (f.) Are A and S independent? Briefly justify your answer. From part (e), P (A = 3 S = 10%) = 0.526, while the marginal probability of being over 50 is P (A = 3) = Therefore, the random variables A and S are not independent. 14

15 Question # 4. Suppose starting next Monday that I go to a casino every night for a week and play 125 hands of blackjack. Suppose that I bet \$10 per hand, so on each hand I will either lose \$10, push, win \$10, or double down and win \$20 (assume that house rules prohibit me from doubling down or splitting more than once). Suppose that my winnings on each hand, i = 1, 2,..., 125, is a random variable W i with E[W i ] = \$0.10 σ(w i ) = \$12.30 (a.) Suppose I look at my average winnings per hand on a given night, w = w 1 + w w where each hand W i is i.i.d.. What are the expected value and variance of w? We can use our linear formulas from Lecture #4 to show that: E[W ] = 0.10 V [W ] = (12.3)2 125 = 1.21 (b.) Now suppose I do this every night for a month (30 days). Assuming I play 125 hands each night, on approximately how many nights will I average a \$1 or more loss per hand? By the Central Limit Theorem, we know W N(0.10, 1.21). On a normal distribution with µ = 0.1 and σ = 1.1, P (W < 1) = 0.16 because this is 1 standard deviation to the left of the mean. Consequently, I average a \$1 loss or more on about 0.16*30 = 4.8 nights. 15

16 (c.) Your answer to part (b) implicitly makes use of the Central Limit Theorem. Which of the following correctly justifies your use of the CLT: Answer: (ii) (i.) Outcomes for each hand are very nearly normally distributed. (ii.) Outcomes for each hand are i.i.d. and I am playing a large number of hands per night. (iii.) Outcomes for each hand are i.i.d. and I am playing for a sufficiently large sample of days. (iv.) Outcomes for each night are i.i.d. and I am averaging over a large sample of nightly outcomes. NOTE: We are taking an average over the hands played each night, which are assumed to be i.i.d.. 16

17 Question # 5. First People s Bank (FPB) has most of their commercial loan department working with small business clients. The bank s managers consider this their most important growth area and several years ago hired a consulting team to improve two aspects of their loan process. In particular, they want to decrease the default rate on the loans (that is, the proportion of loans for which the borrower is unable to make payments). They also want to improve customer service by decreasing the time it takes to process loan applications. Historically (prior to the consultants being hired), management has found that the number of business days required to process a small business loan application is i.i.d. normal with a mean of 14 and a variance of 4. (a.) Before the consultants were hired, approximately what percentage of loan applications were processed in 10 days or less? Use the normal distribution with µ = 14 and σ = 2. P (X < 10) = because we are 2 standard deviations to the left of the mean. (b.) The consulting team identified and implemented a number of measures to speed up the application process. Management has reviewed a sample of 25 loan applications processed after these measures were implemented. The average processing time in the sample was 11.2 days and the sample standard deviation of processing times is 2.0 days. Were the consultants measures effective? Formulate an appropriate hypothesis test or confidence interval and state your conclusions. The null hypothesis is H 0 : µ = 14. This is like saying that the processing time is the same as it was before the consultants were hired. 17

18 z = x µ0 σ n = 2 25 = 7 We reject the null hypothesis and conclude that the measures were effective. (c.) If we treat the estimates in part (b) (mean of 11.2 and standard deviation of 2.0) as if they were the actual mean and standard deviation, approximately what percentage of loans will be processed in 10 days or less? Answer: (iii) (i.) 5.3% (ii.) 15.9% (iii.) 27.4% (iv.) 51.1% NOTE: 10 is less than 1 standard deviation below the mean, so P (X < 10) is bigger than 16% and less than 50%. Historically (prior to the consultants being hired), 15% of FPB s small business loans resulted in default. The consulting team trained FPB s analysts to use software designed to reduce the default rate by more effectively identifying high risk businesses that are more likely to default. (d.) In a typical year, FPB grants 120 loans to small businesses. Assume that defaults are i.i.d. events; that is, if two firms are granted loans, whether the first firm defaults is independent of whether the second firm defaults. Let Y be the number of loans granted in a typical year that will eventually end up in default. What is the distribution of Y? Binomial(120,0.15) We are looking at n = 120 i.i.d. Bernoulli outcomes where each one has p =

19 (e.) Give an interval that is 95% likely to contain the number of loans granted in a typical year which will eventually end up in default. E[Y ] = np = 120 (0.15) = 18 V [Y ] = np(1 p) = 120 (0.15) (0.85) = 15.3 A 95% interval is 18 ± , which is approximately (10,26). (f.) Looking at a sample of 100 loans granted after FPB s analysts started using the new software, management finds that 7 of those loans ended up in default. Was the new software effective in reducing defaults? Formulate an appropriate hypothesis test and state your conclusions. An appropriate null hypothesis is H 0 : p = z = ˆp p 0 p 0 (1 p 0 ) n = = 2.24 We would reject the null hypothesis at a 5% level and conclude that the software is effective. 19

20 In reality, defaults on small business loans are probably not independent. One reason for this is that a broad economic downturn can cause lots of small businesses to default in a relatively short time period. Because of this, defaults may be positively correlated across firms. That is, if we look at a sample of n loans given in the same year, and let X i = 1 if loan i ends up in default and 0 otherwise for i = 1, 2, 3,..., n, we now assume that: cov(x i, X j ) > 0 for any loans i j (g.) [3 points] Let n = 120 and again define Y = X 1 + X X 120 as the number of loans given in a particular year that end up in default. Suppose we still believe that any single loan has a 15% chance of ending up in default. This is the same random variable we considered in parts (d)-(e), except there we assumed that the individual loan defaults were i.i.d. and now we are assuming cov(x i, X j ) > 0. How does this affect the expected value of Y? How does it affect the variance of Y? Briefly explain. The expected value of a sum of random variables is always the sum of the expected values, so E[Y ] is unaffected. However, the variance will be affected. We know that in general V [Y ] = V [X 1 ] + V [X 2 ] V [X 120 ] + 2 [Cov (X 1, X 2 ) Cov (X 119, X 120 )]. Since the covariances are positive, this means that the variance of Y is substantially larger. NOTE: When discussing the affect on the variance, it would be fine if you just state the case for n = 2, i.e. V [Y ] = V [X 1 ] + V [X 2 ] + 2Cov (X 1, X 2 ). 20

21 Question # 6. In this problem we estimate the market model using returns on an asset GE and returns on the S&P 500. The sample size is n = 254. GE: returns on General Electric stock Market: The market portfolio (the S&P 500) I took monthly returns on each asset and ran the following regression: GE = α + βmarket + ε Some of the results from running this regression in StatPro are reported here: ANOVA table Source df SS MS F p-value Explained Unexplained Regression coefficients Coefficient Std Err Constant SP (a.) Give a 95% confidence interval for β, the coefficient on Market. b ± 2 s b = ± = ± = (1.1123, ) (b.) Test the null hypothesis that the Market is not related to GE (β = 0) at the 5% level. t = b β0 s b = =

22 We would reject the null hypothesis at the 5% level. (c.) What is the standard deviation of the residuals s e? unexplained sum of squares s e = (n 2) = 252 = (d.) What is the sample correlation between the Fitted Values and Residuals? The sample correlation between the fitted values ŷ and the residuals e is zero. This is one of the major properties of the residuals and is a result of using least squares. 22

23 Suppose returns for the Market next month are given by: SP500 Fitted Values Residuals 4/1/ ?? (e.) [3 points] Construct a 95% plug-in predictive interval for GE for this month. A 95% plug-in predictive interval is: (a + b x 2 s e, a + b x + 2 s e ) = ( ± ) = ( , ) (f.) In the table above, what is the fitted value? Can you calculate the residual? The fitted value is: a + b x = = No. You cannot calculate the residual because you have not observed the value of GE for this month yet. (g.) Test the null hypothesis that H 0 : α = at the 5% level. t = a α0 s a = = = We would clearly reject this null hypothesis at the 5% level. 23

24 Question # 7. When coded messages are received, there are sometimes errors in transmission creating uncertainty about the message that was actually sent. In particular, Morse code uses dots and dashes as a way to encode messages. Specifically, each letter of the alphabet and each number are given a special sequence of dots and dashes. Let the random variable S = 1 if a dot is sent and S = 0 if a dash is sent. Define the random variable R = 1 if a dot is received and R = 0 if a dash is received. Dots and dashes are known to occur in the proportion 3:4. This means that P (S = 1) = 3 7 and P (S = 0) = 4 7. Suppose there is interference on the transmission line, and with probability 1 8 received as a dash, and vice versa. a dot is mistakenly (a.) What is the probability that a dot was received given a dot was sent P (R = 1 S = 1)? This is a 1 8 chance of a mistake, which makes the probability of getting it right equal to: P (R = 1 S = 1) = 7 8 (b.) What is the marginal probability that a dot is received P (R = 1)? The marginal probability is the sum of the two joint probabilities. P (R = 1) = P (R = 1 S = 1)P (S = 1) + P (R = 1 S = 0)P (S = 0) = =

25 (c.) If we receive a dot, can we be sure that a dot was sent? Calculate the probability of a dot being sent given that a dot was received. We use Bayes Rule. P (S = 1 R = 1) = P (R = 1 S = 1)P (S = 1) P (R = 1) = ( 7 8 ) ( 3 7 ) =

26 Question # 8. Suppose I toss two six-sided dice, like those in this picture: Let the random variable X 1 be the number shown on the first die. Let the random variable X 2 be the number shown on the second die. The possible outcomes for each of X 1 and X 2 are 1, 2, 3, 4, 5, or 6. Each of the six outcomes is equally likely. Assume that X 1 and X 2 are independent. If it helps you visualize, the joint distribution of X 1 and X 2 would look like: X /36 1/36 1/36 1/36 1/36 1/36 X 1 2 1/36 1/36 1/36 1/36 1/36 1/36 3 1/36 1/36 1/36 1/36 1/36 1/36 4 1/36 1/36 1/36 1/36 1/36 1/36 5 1/36 1/36 1/36 1/36 1/36 1/36 6 1/36 1/36 1/36 1/36 1/36 1/36 (a.) What is P (X 1 > 3)? Using the marginal distribution of X 1, we get P (X 1 > 3) = P (X 1 = 4) + P (X 1 = 5) + P (X 1 = 6) = = 1 2 (b.) Given that X 1 + X 2 = 10, what is the probability that X 1 = 5 and X 2 = 5? There are 3 ways for the sum to equal 10: (4,6), (5,5), (6,4) and these are all equally likely. Therefore, the probability is 1. 3 (c.) Given that X 1 = 5, what is the expected value of X 1 + X 2? (In other words, suppose the first die shows a 5. What is the expected value of the sum of the two die rolls?) E [X 2 ] = (1)(1/6) + (2)(1/6) + (3)(1/6) + (4)(1/6) + (5)(1/6) + (6)(1/6) =

27 We know that X 1 = 5. Therefore, we get: E [X 1 + X 2 ] = E [5 + X 2 ] = 5 + E [X 2 ] =

28 (d.) The popular dice game craps begins with a person (the shooter ) rolling two dice. If the sum of the two dice (X 1 + X 2 ) equals 7 or 11, the shooter is said to have rolled a natural and automatically wins. If the sum is 2, 3, or 12, the shooter is said to crap out and automatically loses. What is the probability that X 1 + X 2 equals 2, 3, or 12? = P (X 1 = 1, X 2 = 1) + P (X 1 = 1, X 2 = 2) + P (X 1 = 2, X 2 = 1) + P (X 1 = 6, X 2 = 6) = = 4 36 = 1 9 Let s suppose the first time the two dice are rolled, the total is ten (X 1 + X 2 = 10). In this case, 10 becomes the point. The shooter then continues rolling the two dice over and over again (both dice are always thrown at the same time). Each time the two dice are thrown, one of the following three things happens: If the total is 10, the game ends and the shooter wins. If the total is 7, the game ends and the shooter loses. Otherwise, the game continues, and the shooter rolls again. Theoretically this could continue forever! (e.) Each time the two dice are rolled, what is the probability the game continues? The game ends if either a 7 or 10 is rolled. We need to compute these probabilities. There are six ways to roll a 7 and three ways to roll a 10. P (X 1 + X 2 = 7 or X 1 + X 2 = 10) = = 9 36 = 1 4 The probability that the game continues is 1 - P (X 1 + X 2 = 7 or X 1 + X 2 = 10) or

29 (f.) Suppose we know the game is going to end on the next roll. What is the probability the shooter wins? Similar to part (b), if we know we re going to get 7 or 10, there are 9 total possibilities, 3 of which result in a win. Therefore, it is 3 or (g.) Starting with i = 2, let U i = 1 if the game ends on the i-th roll and 0 otherwise. What is the probability distribution of U i? (Hint: use your answer to (e).) U i Bernoulli(0.25) 29

30 Assume that each U i is i.i.d.. Let R be a random variable equal to the number of rolls before the game ends. R can be any positive integer (1, 2, 3,...). We assumed the first roll was 10. If the game ends on the second roll (U 2 = 1), then R = 1. If the game ends on the third roll (that is, U 2 = 0 and U 3 = 1), then R = 2. If the game ends on the fourth roll (that is, U 2 = 0, U 3 = 0, and U 4 = 1), then R = 3, etc. As an interesting side note, it turns out that the probability distribution of the random variable R is known as the geometric distribution. (h.) What is the probability that R = 3, i.e. the craps game ends after four rolls? P (R = 3) = P (U 2 = 0, U 3 = 0, U 4 = 1) = P (U 2 = 0)P (U 3 = 0)P (U 4 = 1) = = 9 64 Be careful! Actual rules for craps can differ from what we ve assumed here (e.g., sometimes a 12 will end the game as well a 7). In casinos, betting the pass line is equivalent to betting that the shooter wins as we defined it here. After the point is established, you can then take odds, which here would mean betting that a 10 will be rolled before a 7. The interesting thing is the odds bet is actually a fair bet (if the point is 10, it would pay 2-to-1), i.e. there is no house advantage! Because of this many casinos limit odds bets to 6-7 times your bet on the pass line. (i.) What is the probability that R > 3, i.e. the craps game lasts longer than four rolls? Here, you must recognize that this is 1 minus the probability of being less than or equal to 3. P (R > 3) = 1 P (R = 1) P (R = 2) P (R = 3) = = We could go on from here, and you d see that, despite it being possible for craps to continue forever, there s a 90% probability the game ends within the first 8 rolls. 30

31 Question # 9. Suppose we are working with 0-1 data (i.e., a dummy variable), and as usual we have assumed that X i Bernoulli (p) i.i.d. We are going to look at a sample of size n, and use the sample proportion of one s, ˆp, as an estimator of p. Recall that for dummy variables, the sample proportion is an average: ˆp = X 1 + X X n n (a.) Suppose that n is large enough for us to use the Central Limit Theorem. What is the sampling distribution of ˆp? (HINT: Your answer should depend on the unknown parameter p.) ( The sampling distribution is ˆp N p, p(1 p) n ) Now suppose we want to build a confidence interval for p, but we run into two issues. First, we have a sample of only n = 10 observations. Second, our actual data is {1, 1, 1, 1, 1, 1, 1, 1, 1, 1} All ten of our observations in the sample are equal to one, which means that ˆp = 1! (b.) Give a 90% confidence interval for p using your answer to part (a). (NOTE: the appropriate critical value here is 1.64, but it doesn t matter, you still get a very silly answer!) Based on the sampling distribution from part (a), the confidence interval is (1, 1). This is pretty obviously messed up... we are in no way absolutely certain that the true value of p is equal to 1 based on a sample of n = 10 observations! 31

32 Because n = 10 is a relatively small sample size and our data is highly non-normal, we should probably not rely on the Central Limit Theorem here. However, we can actually build a 90% confidence interval for p without using the CLT. Remember that p is a probability, so it must be somewhere between 0 and 1. (c.) Suppose we knew the true value of p. Without using the CLT, we can find the sampling distribution of ˆp = 1 by recognizing that ˆp = Y n distribution of Y? where Y is a random variable. What is the probability The random variable Y is the sum of n = 10 i.i.d. Bernoulli random variables. Therefore, the distribution of Y is binomial(n,p) or binomial(10,p) (d.) Suppose that p = 0.9. What is P (ˆp = 1) in a sample of size n = 10? Should our 90% confidence interval include p = 0.9? (i.e., is 0.9 a reasonable value of p?) With p = 0.9, P (ˆp = 1) = (0.9) 10 = Our confidence interval SHOULD include p = 0.9. With n = 10 observations and p = 0.9, it is definitely possible (there is a 35% chance) we would see a value of ˆp = 1. (e.) Suppose that p = 0.7. What is P (ˆp = 1) in a sample of size n = 10? Should our 90% confidence interval include p = 0.7? With p = 0.7, P (ˆp = 1) = (0.7) 10 = Our confidence interval should probably NOT include p = 0.7. With n = 10 observations and p = 0.7, it is pretty unlikely (there is only a 3% chance) we would see a value of ˆp = 1. (f.) Based on the sample of n = 10 observations on the previous page, give a 90% confidence interval for p without using the Central Limit Theorem. Our confidence interval should obviously include p = 1. What is the smallest value of p we d call reasonable? Well, for a 90% CI, we d rule out any p for which P (ˆp = 1) < Solving p 10 = 0.1 for p, we get p = (0.1) 1/10 = The exact 90% confidence interval is (0.794, 1). 32

33 USE FOR SCRATCH PAPER. WORK ON THIS PAGE WILL NOT BE GRADED. 33

34 USE FOR SCRATCH PAPER. WORK ON THIS PAGE WILL NOT BE GRADED. 34

### Statistics 104: Section 6!

Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Chapter 7 Section 1 Homework Set A

Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the

### 1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

### 4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

### AMS 5 CHANCE VARIABILITY

AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and

Name: University of Chicago Graduate School of Business Business 41000: Business Statistics Special Notes: 1. This is a closed-book exam. You may use an 8 11 piece of paper for the formulas. 2. Throughout

### 17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Problem sets for BUEC 333 Part 1: Probability and Statistics

Problem sets for BUEC 333 Part 1: Probability and Statistics I will indicate the relevant exercises for each week at the end of the Wednesday lecture. Numbered exercises are back-of-chapter exercises from

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### 1 Simple Linear Regression I Least Squares Estimation

Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

### Solutions for the exam for Matematisk statistik och diskret matematik (MVE050/MSG810). Statistik för fysiker (MSG820). December 15, 2012.

Solutions for the exam for Matematisk statistik och diskret matematik (MVE050/MSG810). Statistik för fysiker (MSG8). December 15, 12. 1. (3p) The joint distribution of the discrete random variables X and

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Comparing Means in Two Populations

Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

### Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

### AP Statistics 2010 Scoring Guidelines

AP Statistics 2010 Scoring Guidelines The College Board The College Board is a not-for-profit membership association whose mission is to connect students to college success and opportunity. Founded in

### August 2012 EXAMINATIONS Solution Part I

August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

### Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

### Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

### Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Lecture Notes Module 1

Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

### STAT 350 Practice Final Exam Solution (Spring 2015)

PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Homework 5 Solutions

Math 130 Assignment Chapter 18: 6, 10, 38 Chapter 19: 4, 6, 8, 10, 14, 16, 40 Chapter 20: 2, 4, 9 Chapter 18 Homework 5 Solutions 18.6] M&M s. The candy company claims that 10% of the M&M s it produces

### ACTM State Exam-Statistics

ACTM State Exam-Statistics For the 25 multiple-choice questions, make your answer choice and record it on the answer sheet provided. Once you have completed that section of the test, proceed to the tie-breaker

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 1. Which of the following will increase the value of the power in a statistical test

### Simple Linear Regression

STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

### AP STATISTICS (Warm-Up Exercises)

AP STATISTICS (Warm-Up Exercises) 1. Describe the distribution of ages in a city: 2. Graph a box plot on your calculator for the following test scores: {90, 80, 96, 54, 80, 95, 100, 75, 87, 62, 65, 85,

### Chi-Square Test. Contingency Tables. Contingency Tables. Chi-Square Test for Independence. Chi-Square Tests for Goodnessof-Fit

Chi-Square Tests 15 Chapter Chi-Square Test for Independence Chi-Square Tests for Goodness Uniform Goodness- Poisson Goodness- Goodness Test ECDF Tests (Optional) McGraw-Hill/Irwin Copyright 2009 by The

### Two-sample hypothesis testing, II 9.07 3/16/2004

Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### AP Statistics 2002 Scoring Guidelines

AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### Results from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu

Results from the 2014 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu The six free-response questions Question #1: Extracurricular activities

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

### 2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or

Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.\$ and Sales \$: 1. Prepare a scatter plot of these data. The scatter plots for Adv.\$ versus Sales, and Month versus

### Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### , for x = 0, 1, 2, 3,... (4.1) (1 + 1/n) n = 2.71828... b x /x! = e b, x=0

Chapter 4 The Poisson Distribution 4.1 The Fish Distribution? The Poisson distribution is named after Simeon-Denis Poisson (1781 1840). In addition, poisson is French for fish. In this chapter we will

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### Math 141. Lecture 7: Variance, Covariance, and Sums. Albyn Jones 1. 1 Library 304. jones/courses/141

Math 141 Lecture 7: Variance, Covariance, and Sums Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 Last Time Variance: expected squared deviation from the mean: Standard

### Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

### Quantitative Methods for Finance

Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

### MATH 10: Elementary Statistics and Probability Chapter 9: Hypothesis Testing with One Sample

MATH 10: Elementary Statistics and Probability Chapter 9: Hypothesis Testing with One Sample Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### Using R for Linear Regression

Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

### Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

### STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science Mondays 2:10 4:00 (GB 220) and Wednesdays 2:10 4:00 (various) Jeffrey Rosenthal Professor of Statistics, University of Toronto

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption Last time, we used the mean of one sample to test against the hypothesis that the true mean was a particular

### Expected Value and the Game of Craps

Expected Value and the Game of Craps Blake Thornton Craps is a gambling game found in most casinos based on rolling two six sided dice. Most players who walk into a casino and try to play craps for the

### Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

### 9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

### MA 1125 Lecture 14 - Expected Values. Friday, February 28, 2014. Objectives: Introduce expected values.

MA 5 Lecture 4 - Expected Values Friday, February 2, 24. Objectives: Introduce expected values.. Means, Variances, and Standard Deviations of Probability Distributions Two classes ago, we computed the

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Name: Date: Use the following to answer questions 3-4:

Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

### Chris Slaughter, DrPH. GI Research Conference June 19, 2008

Chris Slaughter, DrPH Assistant Professor, Department of Biostatistics Vanderbilt University School of Medicine GI Research Conference June 19, 2008 Outline 1 2 3 Factors that Impact Power 4 5 6 Conclusions

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

### Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

### Statistics 641 - EXAM II - 1999 through 2003

Statistics 641 - EXAM II - 1999 through 2003 December 1, 1999 I. (40 points ) Place the letter of the best answer in the blank to the left of each question. (1) In testing H 0 : µ 5 vs H 1 : µ > 5, the

### Null Hypothesis H 0. The null hypothesis (denoted by H 0

Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property

### WISE Sampling Distribution of the Mean Tutorial

Name Date Class WISE Sampling Distribution of the Mean Tutorial Exercise 1: How accurate is a sample mean? Overview A friend of yours developed a scale to measure Life Satisfaction. For the population

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

### Mathematics. Probability and Statistics Curriculum Guide. Revised 2010

Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

### 1.5 Oneway Analysis of Variance

Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

### The Math. P (x) = 5! = 1 2 3 4 5 = 120.

The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct

### University of Chicago Graduate School of Business. Business 41000: Business Statistics Solution Key

Name: OUTLINE SOLUTIONS University of Chicago Graduate School of Business Business 41000: Business Statistics Solution Key Special Notes: 1. This is a closed-book exam. You may use an 8 11 piece of paper

### Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

### Chapter 4. Probability Distributions

Chapter 4 Probability Distributions Lesson 4-1/4-2 Random Variable Probability Distributions This chapter will deal the construction of probability distribution. By combining the methods of descriptive

### INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of