Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011


 Agatha Gallagher
 3 years ago
 Views:
Transcription
1 Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this exam. When time is called please stop writing immediately. There are 9 questions. Unless otherwise indicated, each part of each question is worth 2 points. You may use a calculator and two letter size (both sides) cheat sheets of your own notes. Present your answers in a clear and concise manner. 1
2 Question 1: 13 parts, 26 points Question 2: 5 parts, 10 points Question 3: 6 parts, 12 points Question 4: 3 parts, 6 points Question 5: 7 parts, 15 points Question 6: 7 parts, 15 points Question 7: 3 parts, 6 points Question 8: 9 parts, 18 points Question 9: 6 parts, 12 points Total: 120 points 2
3 Question # 1. The data in this question are returns on three portfolios called Market, SMB, and HML. These portfolios were made famous by Eugene Fama and Kenneth French and have been widely used by finance practitioners for over 30 years. For this question I collected monthly annualized returns on the three portfolios from January 1983 through February 2008, for a total of n = 302 observations. All returns are in percent Time series plot of SMB Use the time series plot above to answer the following questions. (a.) The sample mean of the SMB returns is Answer: (ii) (i.) (ii.) (iii.) 3.07 (iv.)
4 (b.) The sample standard deviation of SMB returns is approximately Answer: (ii) (i.) 1.56 (ii.) 3.27 (iii.) 5.38 (iv.) 9.12 NOTE: You can see this by noting that about 5% of the returns lie outside 6.5 to 6.5, which would be roughly 2 standard deviations above and below the mean. Remember to use the empirical rule as a rough approximation. (c.) Suppose I conduct a hypothesis test for the null hypothesis that the above data are i.i.d.. The pvalue associated with this test is approximately Answer: (iii) (i.) (ii.) (iii.) 0.98 (iv.) 1.14 NOTE: From the plot, the data look approximately i.i.d.. Therefore we would expect a large pvalue from the test. A large pvalue would provide no evidence against the null hypothesis that they are i.i.d.. 4
5 Below are histograms of the three variables. The horizontal and vertical axes are the same in all three plots SMB Market HML Answer the following questions using the histograms on the previous page. (d.) Which of the three portfolio returns has the largest sample variance? Answer: (iii) (i.) SMB (ii.) HML (iii.) Market 5
6 (e.) Which of the three portfolio returns is most leftskewed? Answer: (iii) (i.) SMB (ii.) HML (iii.) Market (iv.) none of them (f.) I conducted a test of normality on each variable. The null hypothesis is the data are i.i.d. normal. Which produces the smallest pvalue? Answer: (iii) (i.) SMB (ii.) HML (iii.) Market (iv.) all three are the same NOTE: A small pvalue means that it provides evidence against the null hypothesis. Of the three histograms, the Market histogram looks the least bellshaped, i.e. nonnormal. (g.) The sample variance of the HML portfolio returns is approximately Answer: (iii) (i.) 3.12 (ii.) 6.57 (iii.) 9.52 (iv.) NOTE: Remember to use the empirical rule as a rough approximation. We can see that 95% of the HML returns are between 5.5 and 6.5. This means that the standard dev. must be about 3. We can see that 9.52 =
7 Below is a scatter plot of SMB versus HML returns February SMB HML Answer the following questions using the scatter plot on the previous page. (h.) The sample correlation between the SMB and HML returns is Answer: (ii) (i.) (ii.) (iii.) 0.42 (iv.) 0.85 (i.) If we deleted the February 2000 observation (indicated on the plot) from the sample, the sample correlation would be Answer: (ii) (i) closer to 1 (ii) closer to 0 (iii) closer to 1 (iv) unchanged (j.) What is the sample covariance between SMB and HML? s SMB,HML = s SMB s HML r s = (3.27) (3.085) ( 0.42) =
8 Suppose that we estimate the following linear regression model: SMB i = α + βhml i + ε i (k.) The Rsquared of this regression will be approximately Answer: (i) (i.) 0.17 (ii.) 0.42 (iii.) 0.73 (iv.) 0.85 NOTE: remember that for simple linear regression, the Rsquared is equal to the correlation squared! (l.) What is our estimate of the slope, b? NOTE: you have to know the formula for the slope coefficient from class. b = s SMB,HML s 2 HML = = (m.) Suppose we know that the HML return next month will be 2%, and we use the regression above to construct a 95% plugin predictive interval for next month s SMB return. Such an interval implicitly assumes that Answer: (ii) (i.) HML is i.i.d. normal (ii.) ε is i.i.d. normal (iii.) both (i) and (ii) (iv.) none of the above NOTE: To compute the predictive interval for y, we don t need to make assumptions about the distribution of the righthand side variable x but we do need to make an assumption about the distribution of the errors ε. 8
9 Question # 2. Multiple choice. For each question, choose one answer. (a.) Fill in the blanks in the following phrase, in order: Answer: (iv) Statistical methods draw conclusions about unknown based on computed from. (i.) parameters, samples, a statistic. (ii.) samples, statistics, a parameter. (iii.) statistics, parameters, a sample. (iv.) parameters, statistics, a sample. (v.) samples, parameters, a statistic. (b.) Suppose that most of the observations in a given data set are of the same magnitude, except for a few data points that are substantially larger. Which of the following would be true? Answer: (ii) (i.) The sample mean would be smaller than the median, and the histogram would be skewed with a long right tail. (ii.) The sample mean would be larger than the median, and the histogram would be skewed with a long right tail. (iii.) The sample mean would be smaller than the median, and the histogram would be skewed with a long left tail. (iv.) The sample mean would be larger than the median, and the histogram would be skewed with a long left tail. (v.) The sample mean and median would be approximately the same, and the histogram would be roughly symmetric. NOTE: We saw an example of this in the bank arrival time data. Large outliers affect the mean more than the median. The histogram had a pronounced right tail. 9
10 (c.) An achievement test is given each year to 3rd graders in a certain school district. Scores on the test are normally distributed with a mean of 100 points and a standard deviation of 15 points. If Jane s zscore was 1.2, how many points did she score on the test? Answer: (v) (i.) 82 (ii.) 88 (iii.) 100 (iv.) 112 (v.) 118 (d.) The Central Limit theorem implies that: Answer: (v) (i.) If we simulate 5,000 i.i.d. draws from any probability distribution, the histogram will appear bellshaped. (ii.) If we simulate 5,000 i.i.d. draws from any probability distribution, the time series plot should not display any obvious patterns. (iii.) The average of 5,000 i.i.d. draws from any probability distribution should exactly equal the population mean. (iv.) If our sample consists of 5,000 iid draws from any probability distribution, the data points will be approximately normally distributed around the sample mean (v.) If we start with 5,000 i.i.d. draws from any probability distribution and let x 1 be the average of the first 50 data points, x 2 be the average of the 51st through 100th data points, x 3 be the average of the 101st through 151st data points, etc. then a histogram of the numbers x 1 through x 100 should appear bellshaped. 10
11 (e.) You obtain a sample of 25 students from the same high school. Based on this data, a 95% confidence interval for the expected value of a student s SAT score is 900 to Which of the following is a valid interpretation of this interval? Answer: (iv) (i.) 95% of the 25 students in the sample have an SAT score between 900 and (ii.) 95% of the population of students at this high school will have an SAT score between 900 and (iii.) Given the outcomes in this sample, there is a 95% probability that the true expected value of SAT scores is between 900 and (iv.) If all high schools were the same and we repeated this procedure at many other schools, 95% of the resulting intervals would contain the true expected value of a student s SAT score. (v.) If all high schools were the same and we repeated this procedure at many other schools, 95% of the sample means would be between 900 and NOTE: Before we see the data, the 95% confidence interval has a 95% probability of covering the true (population) value of the parameter. For a particular sample (after we have seen the data), the true value is either in the interval or it is not. The answer (iii) may seem correct but it is technically wrong. 11
12 Question # 3. The following joint probability distribution is based on survey data collected by a major financial publication in For a randomly selected person living in the U.S., define the random variable S as the percentage of retirement income invested in the stock market. Define the random variable A as A = 1 A = 2 A = 3 if the person is below 30 years of age if the person is between 30 and 50 years old if the person is above 50 years old Based on the survey, we have come up with the following joint probability distribution for S and A: S 10% 30% 60% A (a.) What is the marginal probability that A = 3? P (A = 3) = P (A = 3, S = 0.1) + P (A = 3, S = 0.3) + P (A = 3, S = 0.6) = = 0.43 (b.) What is the expected value of S? First, we need to know the marginal distribution of S. This is s p(s) 10% 0.19 S 30% %
13 E[S] = 0.19 (10) (30) (60) = 34.3% (c.) What is the standard deviation of S? V [S] = 0.19 ( ) ( ) ( ) 2 = This implies that the standard deviation is SD(S) = = 17.3%. (d.) What is the probability that a randomly selected investor is below 50 years of age and has 30% or more of his retirement savings invested in the stock market? = 0.48 (e.) Suppose we know a particular investor has only 10% of her retirement savings invested in the stock market. What is the probability she is over 50 years old? P (A = 3, S = 10%) P (A = 3 S = 10%) = P (S = 10%) = =
14 (f.) Are A and S independent? Briefly justify your answer. From part (e), P (A = 3 S = 10%) = 0.526, while the marginal probability of being over 50 is P (A = 3) = Therefore, the random variables A and S are not independent. 14
15 Question # 4. Suppose starting next Monday that I go to a casino every night for a week and play 125 hands of blackjack. Suppose that I bet $10 per hand, so on each hand I will either lose $10, push, win $10, or double down and win $20 (assume that house rules prohibit me from doubling down or splitting more than once). Suppose that my winnings on each hand, i = 1, 2,..., 125, is a random variable W i with E[W i ] = $0.10 σ(w i ) = $12.30 (a.) Suppose I look at my average winnings per hand on a given night, w = w 1 + w w where each hand W i is i.i.d.. What are the expected value and variance of w? We can use our linear formulas from Lecture #4 to show that: E[W ] = 0.10 V [W ] = (12.3)2 125 = 1.21 (b.) Now suppose I do this every night for a month (30 days). Assuming I play 125 hands each night, on approximately how many nights will I average a $1 or more loss per hand? By the Central Limit Theorem, we know W N(0.10, 1.21). On a normal distribution with µ = 0.1 and σ = 1.1, P (W < 1) = 0.16 because this is 1 standard deviation to the left of the mean. Consequently, I average a $1 loss or more on about 0.16*30 = 4.8 nights. 15
16 (c.) Your answer to part (b) implicitly makes use of the Central Limit Theorem. Which of the following correctly justifies your use of the CLT: Answer: (ii) (i.) Outcomes for each hand are very nearly normally distributed. (ii.) Outcomes for each hand are i.i.d. and I am playing a large number of hands per night. (iii.) Outcomes for each hand are i.i.d. and I am playing for a sufficiently large sample of days. (iv.) Outcomes for each night are i.i.d. and I am averaging over a large sample of nightly outcomes. NOTE: We are taking an average over the hands played each night, which are assumed to be i.i.d.. 16
17 Question # 5. First People s Bank (FPB) has most of their commercial loan department working with small business clients. The bank s managers consider this their most important growth area and several years ago hired a consulting team to improve two aspects of their loan process. In particular, they want to decrease the default rate on the loans (that is, the proportion of loans for which the borrower is unable to make payments). They also want to improve customer service by decreasing the time it takes to process loan applications. Historically (prior to the consultants being hired), management has found that the number of business days required to process a small business loan application is i.i.d. normal with a mean of 14 and a variance of 4. (a.) Before the consultants were hired, approximately what percentage of loan applications were processed in 10 days or less? Use the normal distribution with µ = 14 and σ = 2. P (X < 10) = because we are 2 standard deviations to the left of the mean. (b.) The consulting team identified and implemented a number of measures to speed up the application process. Management has reviewed a sample of 25 loan applications processed after these measures were implemented. The average processing time in the sample was 11.2 days and the sample standard deviation of processing times is 2.0 days. Were the consultants measures effective? Formulate an appropriate hypothesis test or confidence interval and state your conclusions. The null hypothesis is H 0 : µ = 14. This is like saying that the processing time is the same as it was before the consultants were hired. 17
18 z = x µ0 σ n = 2 25 = 7 We reject the null hypothesis and conclude that the measures were effective. (c.) If we treat the estimates in part (b) (mean of 11.2 and standard deviation of 2.0) as if they were the actual mean and standard deviation, approximately what percentage of loans will be processed in 10 days or less? Answer: (iii) (i.) 5.3% (ii.) 15.9% (iii.) 27.4% (iv.) 51.1% NOTE: 10 is less than 1 standard deviation below the mean, so P (X < 10) is bigger than 16% and less than 50%. Historically (prior to the consultants being hired), 15% of FPB s small business loans resulted in default. The consulting team trained FPB s analysts to use software designed to reduce the default rate by more effectively identifying high risk businesses that are more likely to default. (d.) In a typical year, FPB grants 120 loans to small businesses. Assume that defaults are i.i.d. events; that is, if two firms are granted loans, whether the first firm defaults is independent of whether the second firm defaults. Let Y be the number of loans granted in a typical year that will eventually end up in default. What is the distribution of Y? Binomial(120,0.15) We are looking at n = 120 i.i.d. Bernoulli outcomes where each one has p =
19 (e.) Give an interval that is 95% likely to contain the number of loans granted in a typical year which will eventually end up in default. E[Y ] = np = 120 (0.15) = 18 V [Y ] = np(1 p) = 120 (0.15) (0.85) = 15.3 A 95% interval is 18 ± , which is approximately (10,26). (f.) Looking at a sample of 100 loans granted after FPB s analysts started using the new software, management finds that 7 of those loans ended up in default. Was the new software effective in reducing defaults? Formulate an appropriate hypothesis test and state your conclusions. An appropriate null hypothesis is H 0 : p = z = ˆp p 0 p 0 (1 p 0 ) n = = 2.24 We would reject the null hypothesis at a 5% level and conclude that the software is effective. 19
20 In reality, defaults on small business loans are probably not independent. One reason for this is that a broad economic downturn can cause lots of small businesses to default in a relatively short time period. Because of this, defaults may be positively correlated across firms. That is, if we look at a sample of n loans given in the same year, and let X i = 1 if loan i ends up in default and 0 otherwise for i = 1, 2, 3,..., n, we now assume that: cov(x i, X j ) > 0 for any loans i j (g.) [3 points] Let n = 120 and again define Y = X 1 + X X 120 as the number of loans given in a particular year that end up in default. Suppose we still believe that any single loan has a 15% chance of ending up in default. This is the same random variable we considered in parts (d)(e), except there we assumed that the individual loan defaults were i.i.d. and now we are assuming cov(x i, X j ) > 0. How does this affect the expected value of Y? How does it affect the variance of Y? Briefly explain. The expected value of a sum of random variables is always the sum of the expected values, so E[Y ] is unaffected. However, the variance will be affected. We know that in general V [Y ] = V [X 1 ] + V [X 2 ] V [X 120 ] + 2 [Cov (X 1, X 2 ) Cov (X 119, X 120 )]. Since the covariances are positive, this means that the variance of Y is substantially larger. NOTE: When discussing the affect on the variance, it would be fine if you just state the case for n = 2, i.e. V [Y ] = V [X 1 ] + V [X 2 ] + 2Cov (X 1, X 2 ). 20
21 Question # 6. In this problem we estimate the market model using returns on an asset GE and returns on the S&P 500. The sample size is n = 254. GE: returns on General Electric stock Market: The market portfolio (the S&P 500) I took monthly returns on each asset and ran the following regression: GE = α + βmarket + ε Some of the results from running this regression in StatPro are reported here: ANOVA table Source df SS MS F pvalue Explained Unexplained Regression coefficients Coefficient Std Err Constant SP (a.) Give a 95% confidence interval for β, the coefficient on Market. b ± 2 s b = ± = ± = (1.1123, ) (b.) Test the null hypothesis that the Market is not related to GE (β = 0) at the 5% level. t = b β0 s b = =
22 We would reject the null hypothesis at the 5% level. (c.) What is the standard deviation of the residuals s e? unexplained sum of squares s e = (n 2) = 252 = (d.) What is the sample correlation between the Fitted Values and Residuals? The sample correlation between the fitted values ŷ and the residuals e is zero. This is one of the major properties of the residuals and is a result of using least squares. 22
23 Suppose returns for the Market next month are given by: SP500 Fitted Values Residuals 4/1/ ?? (e.) [3 points] Construct a 95% plugin predictive interval for GE for this month. A 95% plugin predictive interval is: (a + b x 2 s e, a + b x + 2 s e ) = ( ± ) = ( , ) (f.) In the table above, what is the fitted value? Can you calculate the residual? The fitted value is: a + b x = = No. You cannot calculate the residual because you have not observed the value of GE for this month yet. (g.) Test the null hypothesis that H 0 : α = at the 5% level. t = a α0 s a = = = We would clearly reject this null hypothesis at the 5% level. 23
24 Question # 7. When coded messages are received, there are sometimes errors in transmission creating uncertainty about the message that was actually sent. In particular, Morse code uses dots and dashes as a way to encode messages. Specifically, each letter of the alphabet and each number are given a special sequence of dots and dashes. Let the random variable S = 1 if a dot is sent and S = 0 if a dash is sent. Define the random variable R = 1 if a dot is received and R = 0 if a dash is received. Dots and dashes are known to occur in the proportion 3:4. This means that P (S = 1) = 3 7 and P (S = 0) = 4 7. Suppose there is interference on the transmission line, and with probability 1 8 received as a dash, and vice versa. a dot is mistakenly (a.) What is the probability that a dot was received given a dot was sent P (R = 1 S = 1)? This is a 1 8 chance of a mistake, which makes the probability of getting it right equal to: P (R = 1 S = 1) = 7 8 (b.) What is the marginal probability that a dot is received P (R = 1)? The marginal probability is the sum of the two joint probabilities. P (R = 1) = P (R = 1 S = 1)P (S = 1) + P (R = 1 S = 0)P (S = 0) = =
25 (c.) If we receive a dot, can we be sure that a dot was sent? Calculate the probability of a dot being sent given that a dot was received. We use Bayes Rule. P (S = 1 R = 1) = P (R = 1 S = 1)P (S = 1) P (R = 1) = ( 7 8 ) ( 3 7 ) =
26 Question # 8. Suppose I toss two sixsided dice, like those in this picture: Let the random variable X 1 be the number shown on the first die. Let the random variable X 2 be the number shown on the second die. The possible outcomes for each of X 1 and X 2 are 1, 2, 3, 4, 5, or 6. Each of the six outcomes is equally likely. Assume that X 1 and X 2 are independent. If it helps you visualize, the joint distribution of X 1 and X 2 would look like: X /36 1/36 1/36 1/36 1/36 1/36 X 1 2 1/36 1/36 1/36 1/36 1/36 1/36 3 1/36 1/36 1/36 1/36 1/36 1/36 4 1/36 1/36 1/36 1/36 1/36 1/36 5 1/36 1/36 1/36 1/36 1/36 1/36 6 1/36 1/36 1/36 1/36 1/36 1/36 (a.) What is P (X 1 > 3)? Using the marginal distribution of X 1, we get P (X 1 > 3) = P (X 1 = 4) + P (X 1 = 5) + P (X 1 = 6) = = 1 2 (b.) Given that X 1 + X 2 = 10, what is the probability that X 1 = 5 and X 2 = 5? There are 3 ways for the sum to equal 10: (4,6), (5,5), (6,4) and these are all equally likely. Therefore, the probability is 1. 3 (c.) Given that X 1 = 5, what is the expected value of X 1 + X 2? (In other words, suppose the first die shows a 5. What is the expected value of the sum of the two die rolls?) E [X 2 ] = (1)(1/6) + (2)(1/6) + (3)(1/6) + (4)(1/6) + (5)(1/6) + (6)(1/6) =
27 We know that X 1 = 5. Therefore, we get: E [X 1 + X 2 ] = E [5 + X 2 ] = 5 + E [X 2 ] =
28 (d.) The popular dice game craps begins with a person (the shooter ) rolling two dice. If the sum of the two dice (X 1 + X 2 ) equals 7 or 11, the shooter is said to have rolled a natural and automatically wins. If the sum is 2, 3, or 12, the shooter is said to crap out and automatically loses. What is the probability that X 1 + X 2 equals 2, 3, or 12? = P (X 1 = 1, X 2 = 1) + P (X 1 = 1, X 2 = 2) + P (X 1 = 2, X 2 = 1) + P (X 1 = 6, X 2 = 6) = = 4 36 = 1 9 Let s suppose the first time the two dice are rolled, the total is ten (X 1 + X 2 = 10). In this case, 10 becomes the point. The shooter then continues rolling the two dice over and over again (both dice are always thrown at the same time). Each time the two dice are thrown, one of the following three things happens: If the total is 10, the game ends and the shooter wins. If the total is 7, the game ends and the shooter loses. Otherwise, the game continues, and the shooter rolls again. Theoretically this could continue forever! (e.) Each time the two dice are rolled, what is the probability the game continues? The game ends if either a 7 or 10 is rolled. We need to compute these probabilities. There are six ways to roll a 7 and three ways to roll a 10. P (X 1 + X 2 = 7 or X 1 + X 2 = 10) = = 9 36 = 1 4 The probability that the game continues is 1  P (X 1 + X 2 = 7 or X 1 + X 2 = 10) or
29 (f.) Suppose we know the game is going to end on the next roll. What is the probability the shooter wins? Similar to part (b), if we know we re going to get 7 or 10, there are 9 total possibilities, 3 of which result in a win. Therefore, it is 3 or (g.) Starting with i = 2, let U i = 1 if the game ends on the ith roll and 0 otherwise. What is the probability distribution of U i? (Hint: use your answer to (e).) U i Bernoulli(0.25) 29
30 Assume that each U i is i.i.d.. Let R be a random variable equal to the number of rolls before the game ends. R can be any positive integer (1, 2, 3,...). We assumed the first roll was 10. If the game ends on the second roll (U 2 = 1), then R = 1. If the game ends on the third roll (that is, U 2 = 0 and U 3 = 1), then R = 2. If the game ends on the fourth roll (that is, U 2 = 0, U 3 = 0, and U 4 = 1), then R = 3, etc. As an interesting side note, it turns out that the probability distribution of the random variable R is known as the geometric distribution. (h.) What is the probability that R = 3, i.e. the craps game ends after four rolls? P (R = 3) = P (U 2 = 0, U 3 = 0, U 4 = 1) = P (U 2 = 0)P (U 3 = 0)P (U 4 = 1) = = 9 64 Be careful! Actual rules for craps can differ from what we ve assumed here (e.g., sometimes a 12 will end the game as well a 7). In casinos, betting the pass line is equivalent to betting that the shooter wins as we defined it here. After the point is established, you can then take odds, which here would mean betting that a 10 will be rolled before a 7. The interesting thing is the odds bet is actually a fair bet (if the point is 10, it would pay 2to1), i.e. there is no house advantage! Because of this many casinos limit odds bets to 67 times your bet on the pass line. (i.) What is the probability that R > 3, i.e. the craps game lasts longer than four rolls? Here, you must recognize that this is 1 minus the probability of being less than or equal to 3. P (R > 3) = 1 P (R = 1) P (R = 2) P (R = 3) = = We could go on from here, and you d see that, despite it being possible for craps to continue forever, there s a 90% probability the game ends within the first 8 rolls. 30
31 Question # 9. Suppose we are working with 01 data (i.e., a dummy variable), and as usual we have assumed that X i Bernoulli (p) i.i.d. We are going to look at a sample of size n, and use the sample proportion of one s, ˆp, as an estimator of p. Recall that for dummy variables, the sample proportion is an average: ˆp = X 1 + X X n n (a.) Suppose that n is large enough for us to use the Central Limit Theorem. What is the sampling distribution of ˆp? (HINT: Your answer should depend on the unknown parameter p.) ( The sampling distribution is ˆp N p, p(1 p) n ) Now suppose we want to build a confidence interval for p, but we run into two issues. First, we have a sample of only n = 10 observations. Second, our actual data is {1, 1, 1, 1, 1, 1, 1, 1, 1, 1} All ten of our observations in the sample are equal to one, which means that ˆp = 1! (b.) Give a 90% confidence interval for p using your answer to part (a). (NOTE: the appropriate critical value here is 1.64, but it doesn t matter, you still get a very silly answer!) Based on the sampling distribution from part (a), the confidence interval is (1, 1). This is pretty obviously messed up... we are in no way absolutely certain that the true value of p is equal to 1 based on a sample of n = 10 observations! 31
32 Because n = 10 is a relatively small sample size and our data is highly nonnormal, we should probably not rely on the Central Limit Theorem here. However, we can actually build a 90% confidence interval for p without using the CLT. Remember that p is a probability, so it must be somewhere between 0 and 1. (c.) Suppose we knew the true value of p. Without using the CLT, we can find the sampling distribution of ˆp = 1 by recognizing that ˆp = Y n distribution of Y? where Y is a random variable. What is the probability The random variable Y is the sum of n = 10 i.i.d. Bernoulli random variables. Therefore, the distribution of Y is binomial(n,p) or binomial(10,p) (d.) Suppose that p = 0.9. What is P (ˆp = 1) in a sample of size n = 10? Should our 90% confidence interval include p = 0.9? (i.e., is 0.9 a reasonable value of p?) With p = 0.9, P (ˆp = 1) = (0.9) 10 = Our confidence interval SHOULD include p = 0.9. With n = 10 observations and p = 0.9, it is definitely possible (there is a 35% chance) we would see a value of ˆp = 1. (e.) Suppose that p = 0.7. What is P (ˆp = 1) in a sample of size n = 10? Should our 90% confidence interval include p = 0.7? With p = 0.7, P (ˆp = 1) = (0.7) 10 = Our confidence interval should probably NOT include p = 0.7. With n = 10 observations and p = 0.7, it is pretty unlikely (there is only a 3% chance) we would see a value of ˆp = 1. (f.) Based on the sample of n = 10 observations on the previous page, give a 90% confidence interval for p without using the Central Limit Theorem. Our confidence interval should obviously include p = 1. What is the smallest value of p we d call reasonable? Well, for a 90% CI, we d rule out any p for which P (ˆp = 1) < Solving p 10 = 0.1 for p, we get p = (0.1) 1/10 = The exact 90% confidence interval is (0.794, 1). 32
33 USE FOR SCRATCH PAPER. WORK ON THIS PAGE WILL NOT BE GRADED. 33
34 USE FOR SCRATCH PAPER. WORK ON THIS PAGE WILL NOT BE GRADED. 34
Statistics 104: Section 6!
Page 1 Statistics 104: Section 6! TF: Deirdre (say: Deardra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm3pm in SC 109, Thursday 5pm6pm in SC 705 Office Hours: Thursday 6pm7pm SC
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 14)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 14) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informatione = random error, assumed to be normally distributed with mean 0 and standard deviation σ
1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationChapter 7 Section 1 Homework Set A
Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationKSTAT MINIMANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINIMANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
More informationAMS 5 CHANCE VARIABILITY
AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and
More informationUniversity of Chicago Graduate School of Business. Business 41000: Business Statistics
Name: University of Chicago Graduate School of Business Business 41000: Business Statistics Special Notes: 1. This is a closedbook exam. You may use an 8 11 piece of paper for the formulas. 2. Throughout
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGrawHill/Irwin, 2008, ISBN: 9780073319889. Required Computing
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationProblem sets for BUEC 333 Part 1: Probability and Statistics
Problem sets for BUEC 333 Part 1: Probability and Statistics I will indicate the relevant exercises for each week at the end of the Wednesday lecture. Numbered exercises are backofchapter exercises from
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationCourse Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGrawHill/Irwin, 2010, ISBN: 9780077384470 [This
More information1 Simple Linear Regression I Least Squares Estimation
Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and
More informationSolutions for the exam for Matematisk statistik och diskret matematik (MVE050/MSG810). Statistik för fysiker (MSG820). December 15, 2012.
Solutions for the exam for Matematisk statistik och diskret matematik (MVE050/MSG810). Statistik för fysiker (MSG8). December 15, 12. 1. (3p) The joint distribution of the discrete random variables X and
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationComparing Means in Two Populations
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
More informationRegression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between
More informationAP Statistics 2010 Scoring Guidelines
AP Statistics 2010 Scoring Guidelines The College Board The College Board is a notforprofit membership association whose mission is to connect students to college success and opportunity. Founded in
More informationAugust 2012 EXAMINATIONS Solution Part I
August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,
More informationStatistics 151 Practice Midterm 1 Mike Kowalski
Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and
More informationExploratory Data Analysis
Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationHypothesis Testing for Beginners
Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easytoread notes
More informationSection 1: Simple Linear Regression
Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationLecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationHomework 5 Solutions
Math 130 Assignment Chapter 18: 6, 10, 38 Chapter 19: 4, 6, 8, 10, 14, 16, 40 Chapter 20: 2, 4, 9 Chapter 18 Homework 5 Solutions 18.6] M&M s. The candy company claims that 10% of the M&M s it produces
More informationACTM State ExamStatistics
ACTM State ExamStatistics For the 25 multiplechoice questions, make your answer choice and record it on the answer sheet provided. Once you have completed that section of the test, proceed to the tiebreaker
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationBA 275 Review Problems  Week 6 (10/30/0611/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394398, 404408, 410420
BA 275 Review Problems  Week 6 (10/30/0611/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394398, 404408, 410420 1. Which of the following will increase the value of the power in a statistical test
More informationSimple Linear Regression
STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze
More informationAP STATISTICS (WarmUp Exercises)
AP STATISTICS (WarmUp Exercises) 1. Describe the distribution of ages in a city: 2. Graph a box plot on your calculator for the following test scores: {90, 80, 96, 54, 80, 95, 100, 75, 87, 62, 65, 85,
More informationChiSquare Test. Contingency Tables. Contingency Tables. ChiSquare Test for Independence. ChiSquare Tests for GoodnessofFit
ChiSquare Tests 15 Chapter ChiSquare Test for Independence ChiSquare Tests for Goodness Uniform Goodness Poisson Goodness Goodness Test ECDF Tests (Optional) McGrawHill/Irwin Copyright 2009 by The
More informationTwosample hypothesis testing, II 9.07 3/16/2004
Twosample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For twosample tests of the difference in mean, things get a little confusing, here,
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationAP Statistics 2002 Scoring Guidelines
AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought
More informationHypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam
Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests
More informationResults from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu
Results from the 2014 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu The six freeresponse questions Question #1: Extracurricular activities
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More information2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or
Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.$ and Sales $: 1. Prepare a scatter plot of these data. The scatter plots for Adv.$ versus Sales, and Month versus
More informationCurriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 20092010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 20092010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More information7 Hypothesis testing  one sample tests
7 Hypothesis testing  one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X
More information, for x = 0, 1, 2, 3,... (4.1) (1 + 1/n) n = 2.71828... b x /x! = e b, x=0
Chapter 4 The Poisson Distribution 4.1 The Fish Distribution? The Poisson distribution is named after SimeonDenis Poisson (1781 1840). In addition, poisson is French for fish. In this chapter we will
More informationSampling and Hypothesis Testing
Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA Email: peverso1@swarthmore.edu 1. Introduction
More informationMath 141. Lecture 7: Variance, Covariance, and Sums. Albyn Jones 1. 1 Library 304. jones/courses/141
Math 141 Lecture 7: Variance, Covariance, and Sums Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 Last Time Variance: expected squared deviation from the mean: Standard
More informationInterpreting Data in Normal Distributions
Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,
More informationQuantitative Methods for Finance
Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain
More informationMATH 10: Elementary Statistics and Probability Chapter 9: Hypothesis Testing with One Sample
MATH 10: Elementary Statistics and Probability Chapter 9: Hypothesis Testing with One Sample Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationUsing R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
More informationPermutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. JaeWan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
More informationSTA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science
STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science Mondays 2:10 4:00 (GB 220) and Wednesdays 2:10 4:00 (various) Jeffrey Rosenthal Professor of Statistics, University of Toronto
More informationREGRESSION LINES IN STATA
REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression
More informationTwosample ttests.  Independent samples  Pooled standard devation  The equal variance assumption
Twosample ttests.  Independent samples  Pooled standard devation  The equal variance assumption Last time, we used the mean of one sample to test against the hypothesis that the true mean was a particular
More informationExpected Value and the Game of Craps
Expected Value and the Game of Craps Blake Thornton Craps is a gambling game found in most casinos based on rolling two six sided dice. Most players who walk into a casino and try to play craps for the
More informationTutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrclmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
More information93.4 Likelihood ratio test. NeymanPearson lemma
93.4 Likelihood ratio test NeymanPearson lemma 91 Hypothesis Testing 91.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental
More informationMA 1125 Lecture 14  Expected Values. Friday, February 28, 2014. Objectives: Introduce expected values.
MA 5 Lecture 4  Expected Values Friday, February 2, 24. Objectives: Introduce expected values.. Means, Variances, and Standard Deviations of Probability Distributions Two classes ago, we computed the
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationName: Date: Use the following to answer questions 34:
Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin
More informationChris Slaughter, DrPH. GI Research Conference June 19, 2008
Chris Slaughter, DrPH Assistant Professor, Department of Biostatistics Vanderbilt University School of Medicine GI Research Conference June 19, 2008 Outline 1 2 3 Factors that Impact Power 4 5 6 Conclusions
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationt Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon
ttests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com
More informationProbability and Statistics Vocabulary List (Definitions for Middle School Teachers)
Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence
More informationStatistics 641  EXAM II  1999 through 2003
Statistics 641  EXAM II  1999 through 2003 December 1, 1999 I. (40 points ) Place the letter of the best answer in the blank to the left of each question. (1) In testing H 0 : µ 5 vs H 1 : µ > 5, the
More informationNull Hypothesis H 0. The null hypothesis (denoted by H 0
Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property
More informationWISE Sampling Distribution of the Mean Tutorial
Name Date Class WISE Sampling Distribution of the Mean Tutorial Exercise 1: How accurate is a sample mean? Overview A friend of yours developed a scale to measure Life Satisfaction. For the population
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationJoint Exam 1/P Sample Exam 1
Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question
More informationMathematics. Probability and Statistics Curriculum Guide. Revised 2010
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
More information1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
More informationThe Math. P (x) = 5! = 1 2 3 4 5 = 120.
The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct
More informationUniversity of Chicago Graduate School of Business. Business 41000: Business Statistics Solution Key
Name: OUTLINE SOLUTIONS University of Chicago Graduate School of Business Business 41000: Business Statistics Solution Key Special Notes: 1. This is a closedbook exam. You may use an 8 11 piece of paper
More informationIntroduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.
Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.
More informationChapter 4. Probability Distributions
Chapter 4 Probability Distributions Lesson 41/42 Random Variable Probability Distributions This chapter will deal the construction of probability distribution. By combining the methods of descriptive
More informationINTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the oneway ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More information1. How different is the t distribution from the normal?
Statistics 101 106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M 7.1 and 7.2, ignoring starred parts. Reread M&M 3.2. The effects of estimated variances on normal approximations. tdistributions.
More information9. Sampling Distributions
9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling
More information