Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

Size: px
Start display at page:

Download "Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but"

Transcription

1 Test Bias As we have seen, psychological tests can be well-conceived and well-constructed, but none are perfect. The reliability of test scores can be compromised by random measurement error (unsystematic error), and the validity of test score interpretations can be compromised by response biases that systematically obscure the psychological differences among respondents. Now we will examine the possibility that the validity of test score interpretations can be compromised further by test biases that systematically obscure the differences (or lack thereof) among groups of respondents. Psychological tests are often used to make important decisions that affect the lives of real people which colleges (if any) will decide to accept you, in which class will your child be enrolled, and will an employer decide to hire you? To the degree that such decisions are based on tests that are biased in favor of or against specific groups of people, such biases have extremely important personal and societal implications. Suppose you are interested in studying the possibility that gender differences exist in mathematical ability. You give a reasonably reliable mathematics test to a representative group of males and females, and you find that, on average, males have higher math scores than males. As a researcher you would be tempted to interpret your test scores in terms of the psychological construct that they are intended to reflect that males tend to have greater mathematical ability than females. However, it is possible that the participants test scores should not be interpreted as reflecting purely their mathematical ability. That is, it is possible that the test is biased in some way. For example, if the males test scores overestimated their true mathematical ability and the females test scores underestimated their true ability, then the test is biased. In this case, 1

2 the difference between the test scores for males and females might be due to test score bias, not due to a difference in their true mathematical abilities. There are two general methods used to detect test biases. Roughly speaking, the two types of test bias reflect biases in the meaning of a test and biases in the use of a test. Construct bias occurs when a test has different meanings for two groups, in terms of the precise construct that the test is intended to measure. Construct bias has to do with the relationship of observed scores to true scores on a psychological test. If this relationship can be shown to be systematically different for different groups, then we might conclude that the test is biased. Construct bias can lead to situations in which two groups have the same average true score on a psychological construct but different average test scores. The second type of bias that is predictive bias, which occurs when a test s use has different implications for two groups. Predictive bias has to do with the relationship between scores on two different tests. One of these tests (the predictor test) is thought to provide values that can be used to predict scores on the other test (the outcome test or measure). For example, college admissions officers might use SAT test scores to predict freshman GPAs. The SAT would be the predictor test and GPAs would be the outcome measure. In this context, test bias concerns the extent to which the link between predictor test true scores and outcome test observed scores differ for two groups. If the SAT is more strongly predictive of GPA for one group than for another, then the SAT suffers from predictive bias, in terms of its use as a predictor of GPA. The two types of bias construct and predictive are independent. For example, a test might have no construct bias, but suffer from predictive bias. The SAT might accurately reflect true academic aptitude differences among groups of people (and thus have no construct bias), 2

3 but academic aptitude might not be associated with freshman GPA equally for two groups of people (and thus predictive bias would exist). There are several ways to operationally define and identify test score bias. There are at least two categories of procedures that can be used to identify test score bias: a) internals methods to identify construct bias and b) external methods to identify predictive bias. We emphasize the operational nature of this task to remind you that test score bias in both of its forms is a theoretical concept; in part, because both types of bias depend on the theoretical notion of a true score. There is no one way to detect test score bias any more than there is one way to calculate directly such psychometric test score properties as reliability or validity. There are, however, various generally accepted ways to estimate the degree to which test bias exists. An overarching issue in the definition and detection of test bias is that the existence of a group difference in test scores does not necessarily mean that test scores are biased. Suppose you find that females have higher scores on self-esteem than males. This difference is not prima facie evidence that the test is biased (Jensen, 1980; 1998; Thorndike, 1971). The participants test scores might in fact be good estimates of their true self-esteem. In such a case, the test is not biased, and the group difference in test scores reflects a real difference in average self-esteem. Consider doing a study in which you weigh representative groups of males and females. You would doubtless find that the average weight of females is lower than the average weight of males. You would not take this difference to mean that the scale you used to measure weight produced scores that were biased. Why Worry about Test Score Bias? It is likely that everyone reading this book has taken a psychological test of some kind. Virtually all children schooled in the United States or other industrialized countries are exposed 3

4 on a regular basis to academic achievement tests. In the United States, most students who plan to attend an institution of higher education have taken the SAT or ACT test. Most graduate schools in the United States require student applicants to take the GRE. Applicants for most Federal Government jobs are required to take a civil service examination and corporations regularly test job applicants and sometimes even employees using psychological tests. Scores on these and other types of psychological tests are often used to make important decisions about people. In educational settings intelligence tests scores are used to place children in special programs. Intelligence test scores are used by law courts to make decisions about who can and who can not be sentenced to death following a murder conviction. Educational institutions use scores on standardized tests to make admissions decisions. Corporations and governments often make job decisions about people based, at least in part, on test scores. In the United States, most public school teachers have to take and pass standardized tests to become certified school teachers. The use of psychological tests in our society is pervasive and scores on these tests can have an important impact on people s lives and on our public and private institutions. Because testing is a pervasive feature of our society and because test scores have important consequences for people, we would like to develop tests that produces scores that allow us to differentiate among people based on true score differences and not on particular group membership. For example, if we have a self-esteem test, then we would like to be sure that scores are determined only by self-esteem and not contaminated by some other extraneous factor such as the biological sex of the person taking the test. In other words, we want unbiased tests. Our desire for unbiased test scores is rooted in our belief that we should not discriminate for or against a person because of their biological sex, their ethnicity, their race, their religious 4

5 preference, or their age. In some cases, the list of groups that should be protected from test score bias has been expanded to include factors such as sexual preference, pregnancy, marital status, linguistic background, and various disabilities. In each of these cases, we should be confident that observed score differences on psychological tests are a function of true score differences. It is especially important to be able to show that test scores are not biased in those instances in which average observed scores on some type of psychological test differ between groups. Detecting Construct Bias: Internal Evaluation of a Test Construct bias is often evaluated by examining responses to individual items on a test. An item on a test is biased if people belonging to different groups responded in different ways to the item and if it could be shown that these differing responses were not related to group differences associated with the psychological attribute that the test was designed to measure. For example, suppose you had a 100-item mechanical aptitude test. If you selected one item from the test and found that males responses were similar to females responses, then the item would not appear to be biased (assuming that the males and females had the same level of mechanical aptitude). On the other hand, if males and females with the same level of aptitude responded in different ways to the item, then you would suspect some type of bias in the item. Most psychological tests are composite tests they contain multiple items. For such composite tests, the overall test bias is a function of the bias associated with each of the items. If none of the items on a test seem to be biased, then we would assume that the total test score is unbiased. However, if there one or more items seem to be biased, then we would suspect that the total test score might also be biased. Remember that test bias concerns the relationship between group differences in true scores and group differences in observed test scores. In the case of construct bias, a test item 5

6 would be biased if responses to the item for people who belong to one group reflect their true scores on the relevant psychological attribute but responses to the item from people in another group are not simply a function of that attribute (we are assuming some minimum degree of reliability for the test of interest.) Of course, we can never know a person s true score with respect to any attribute. Therefore, the procedures that we are going to discuss are estimates of the existence and degree of construct bias. Construct bias is related to the meaning of test scores. Evidence of construct bias suggests that scores on a test might have different meanings for different groups of people. If we had evidence that suggested that scores on our mechanical aptitude test suffered from construct bias related to the biological sex of those taking the test, we would have to entertain the possibility the scores measure different psychological attributes in the two groups. For example, the males responses to the test might be determined primarily by a single construct mechanical aptitude but the females responses might be determined by two constructs mechanical aptitude and stereotype threat (the tendency to behave in ways that confirm stereotypes about one s group (Spencer, Steele, & Quinn, 1999). Thus, the mechanical aptitude test does not measure the exact same psychological attributes for the two sexes. Several procedures that can be used to estimate the existence and degree of construct bias will be described. These procedures focus on the internal structure of the test. The internal structure of a test has to do with the way that the parts of a test are related to each other. Most simply, internal structure refers to the pattern of correlations among items and/or the correlations between each item and the total test score. To evaluate the presence of construct bias, we examine the internal structure of a test separately for two groups. If the two groups exhibit the same internal structure to their test responses, then we conclude that the test is unlikely to suffer 6

7 from construct bias. However, if the two groups exhibit different internal structures to their test responses, then we conclude that the test is likely to suffer from construct bias. There are at least four methods for detecting construct bias. Item Discrimination Index (I d ) One method of detecting construct bias is by computing item discrimination indexes separately for two groups. An item s discrimination index reflects the degree to which the item is related to the total test score (i.e., that people who answered the item correctly tended to do better on the test as a whole than people who answered the item incorrectly), and by implication, it indicates that the item is highly similar to most of the other items on a test. In this way, item discrimination indexes reflect the structure of associations among test items. The I d is calculated by calculating a total score on a test and then rank ordering the scores from highest to lowest. You then select those people whose test score puts them in the top 30% of the scorers and those people who put them in the bottom 30% of the scorers. Now you go to each test item each question and compute the proportion of people in each of the two groups who answer the item correctly. For example, suppose that there are 50 people in the top group of test takers and you find that 40 of these people answered question #1 correctly. The proportion of people in this group to answer the item correctly would be.80. Now imagine that in the low scoring group only 10 of these 50 people answered question #1 correctly. The proportion of low scorers to answer the item correctly would be.20. Now if you subtract the low score proportion from the high score proportion you get the discrimination index for item #1 (see item discrimination example Table 1.1). I = P - P d hi l ow Where P = Proportion of people in a group who answer an item correctly 7

8 Historically, the item discrimination index was developed in association with classical test theory. The index is an important measure of the extent to which responses to test items can be used to differentiate among people on the basis of the amount of their knowledge of some topic, or on the amount of some other type of psychological attribute. Again, assume we give a mechanical aptitude test to a group of people. If people who have high mechanical aptitude have a high probability of answering a particular aptitude question correctly while people with low mechanical aptitude have a low probability of answering the item correctly, the question would have a high item discrimination index value (e.g.,.90). On the other hand, if the item discrimination index for a question was low (e.g.,.10) then the low aptitude respondents answered the item correctly nearly as often as the high aptitude respondents. Thus, the item does not clearly discriminate among people with varying levels of the construct being measured. The item discrimination index can be used to estimate construct bias. Specifically, we would select an item, compute its discrimination index separately for two groups of people, and then compare the groups indexes. If the two discrimination index values are approximately equal, then we conclude that the item is probably not biased. However, if the two discrimination index values are not approximately equal, then we conclude that the item is probably biased in some way. That is, we would conclude that the item seems to belong on the test for one group, but not for the other group. By including the item on the test for both groups, the test seems to be somewhat different for the two groups. This analysis would be conducted for each of the items on the test. An important feature of the item discrimination index as a measure of construct bias is that it is independent of the number of people in the groups that are being compared that answer an item correctly. For example, we might find that one of our mechanical aptitude items was 8

9 answered correctly by only 40% of the males but by 60% of the females. Even so, the item discrimination index for the question could be the same for both groups. In this case, we would assume that the item is functioning as a measure of mechanical aptitude in the same way for both groups, but that females know more about the material than males (i.e., more of them answered the item correctly). Factor Analysis A second method for examining construct bias is by conducting a factor analysis of items separately for two or more groups of people. Factor analysis as an important method for evaluating the internal structure of a test. Factor analysis as a statistical procedure for partitioning the variance or covariance among test items into clusters of factors that in some sense hang together. It is sometimes the case that responses to a group of items on a test are more highly positively correlated with each other than they are to responses given to other items on the test. The group of items that are highly correlated with each other statistically hang together, and they are believed to reflect a factor. If all of the items on a test have similar correlations with each other (i.e., there is no evidence of multiple groups of items), then we say that the test is homogeneous, or that all of the test score variance, other than error variance, is accounted for by a single factor. Factor analysis can be used to evaluate the internal structure of a test separately for two groups of people. For example, we might find that, among males, the mechanical aptitude test has a strong single-factor structure all of the items seem to be highly correlated with each other, suggesting that the test is a essentially measure of one and only one construct. To evaluate the potential presence of construct bias, we would need to examine the factor structure for females responses to the test items as well. If we found a single factor among females 9

10 responses, then we would conclude that the aptitude test has the same internal structure for males and females. Consequently, we would conclude that the test does not suffer from construct bias. However, we might conduct a factor analysis of females responses and found two factors or more. In this case, we would conclude that the test has different internal structure for males and females, and we would then conclude that the test does indeed suffer from construct bias. That is, the total test score reflects different psychological factors for males and for females. Differential Item Functioning Analyses Perhaps the best way to evaluate construct bias is a procedure called differential item functioning analysis. Differential item functioning analysis is a feature of a psychometric approach called Item Response Theory (IRT). An important aspect of IRT is the assumption that it is possible to estimate respondents trait levels directly from empirical sources of data. The trait levels are, in essence, estimates of participants true scores for the psychological attribute that is being measured. If we assume that we know the trait levels for all the people in two groups and we have their responses to a test item, then we can see if the trait levels and the item responses match-up in the same way for both groups. If they do not, then it is possible that the item is biased. IRT is based on the idea that there is a function relating a participant s trait level to the probability that he or she will answer a question on a test correctly. For example, we might find that an individual with a trait level that is one standard deviation above the mean has a.80 probability of answering a particular item correctly, but that an individual with a trait level that is one standard deviation below the mean has only a.20 probability of answering the item correctly. If you have a group of people take a test and you know their respective trait levels, then you can use specialized statistical software to draw an item characteristic curve (ICC) to 10

11 illustrate this function for each item. Furthermore, if you have two groups of people, then you can draw ICCs separately for each group. To evaluate the presence of construct bias, you would compare the ICCs of the two groups. If the item is not biased, then the two groups ICCs should be very similar. That is, the probability that two people will answer an item correctly should be the same if the two people have the same trait level. However, if the item is biased, then the two groups ICCs will be dissimilar. That is, the probability that two people (e.g., a male and a female) will answer an item correctly might be different even if the two people have the same trait level. Such a situation would clearly reflect the presence of construct bias. For example, suppose that you want to determine if an item on a mechanical aptitude test was biased with respect to biological sex; You could compute mechanical aptitude knowledge scores for each person in a study (these represent their trait levels), and you could compute the probability that the item is answered correctly for each person. You use this information to draw an ICC (See Figure 1.1). Now, you sort the people in your study into two groups (i.e., a group of males and a group of females) and draw ICC curves separately for each group. If the curves overlap, you would probably conclude that the item is not biased. Suppose however, you obtained the results illustrated in Figures 1.2 and 1.3. Results such as these would lead you to suspect item bias. Figure 1.3 is an example of uniform bias. In this example it appears that females, with the same mechanical knowledge as males, find the item more difficulty to answer than the males. Figure 1.4 illustrates non-uniform bias, a situation in which the ICCs differ in shape as well as location. In this case it appears that the item is measuring different traits for males and females. The ICC approach is a visual method for detecting construct bias, but there are IRT methods that are even more precise ways of evaluating the presence of construct bias (e.g., Smith, & Reise, 1998). 11

12 Although IRT s differential item functioning analysis is a strong method for identifying construct bias, it has a downside. IRT analyses are quite complex in a variety of ways which model to use, how to determine if parameter differences between groups are really different or simply due to measurement error, the need for very large sample sizes, the need for item samples and samples of people that are heterogeneous enough to represent the complete range of traits the test is designed to measure, and the need for specialized statistical software to conduct the analyses. These complexities are such that IRT is only still emerging as a widely-appreciated and understood method of detecting construct bias. Rank Order There is another quick and computationally easy way to get an estimate of construct bias if you have test items that can be ranked in order of difficulty. Using our 100 item aptitude test as an example, some of the test questions will probably be easier to answer than others. These questions can be ranked in order of difficulty. The rankings can be done separately for different groups (e.g., males and females). If the item ranks differ across groups, then we would suspect that test score construct bias exists. We would suspect this because each item does not appear to be a measure of the same thing for both groups. You can use the ranks to compute Spearman s rank order correlation coefficient (rho, interpreted in the same way as r ) to index rank order consistency across groups. If rho is low, e.g., <.90, we might suspect construct bias. If you found evidence of construct bias, you would probably want to follow-up on the finding with additional analyses to identify the particular source of the low correlation coefficient (see Jensen, 1980). Notice that the correlation between the ranks can be high even if the proportion of correct responses to each item differs across groups. Using our aptitude test as an example, males might be less likely than females to give correct answers to the test questions, but the rank ordering of xy 12

13 questions according to difficulty might be the same across groups. Again, as with the item discrimination index, group differences in correct responding are not by themselves an indication of test score bias. Detecting Predictive Bias: External Evaluation of a Test Predictive bias concerns the degree to which a test s scores are equally predictive of an outcome for two groups. For example, scores on the SAT are thought to measure academic achievement. On the assumption that academic achievement measured during secondary school years might be related to academic achievement during the freshman year in college (e.g., as measured by freshman GPA), institutions of higher education often use SAT scores to make admissions decisions. The idea is that it is possible to predict, at least with some degree of accuracy, student freshman year academic performance based on SAT scores. If it could be shown that the ability to successfully predict freshman academic achievement from SAT scores is different for different groups of people, then we might suspect that the SAT suffers from predictive test score bias. The existence of predictive bias is examined by obtaining scores on two variables or measures. Analyses are then conducted to examine the degree to which scores on the main test of interest (the predictor test) can be used to predict people s scores on another psychological measure (the outcome measure) that is thought to be conceptually related to scores on the main test of interest. Detection of predictive bias begins with the assumption that one size fits all that the test is equally predictive for all groups. As we will illustrate, analyses are conducted to evaluate this assumption. If those analyses confirm that the test is equally predictive for both groups, then we conclude that the test probably does not suffer from predictive bias (at least with regards to the specific outcome in question and the specific groups in question). However, if 13

14 those analyses indicate that one size does not fit all that the test is not equally predictive for both groups then we conclude that the test might suffer from predictive bias. Imagine that you are a training program selection officer working for a corporation that spends large sums of money training employees to develop mechanical skills needed by the corporation to run its operations. Your job is to select the most promising candidates for this training program. Because of the cost of the program, it is essential that you select only those people who are most likely to perform well in the training program. Your job depends on how well you make these selections. In an attempt to improve your selection success rate, you develop a mechanical aptitude test that you give to all trainee candidates. You assume that scores on the test are going to be related to some outcome measure of post-training performance. For example, following training each trainee might be rated by a supervisor, in terms of the trainee s level of mechanical competency. Further, you assume that there should be a positive linear relationship between the pre-training aptitude test scores and the post-training supervisor ratings of competence. That is, candidates with high aptitude scores (i.e., predictor scores) should have better ratings (i.e., outcome scores) than candidates with lower aptitude scores. In your development and evaluation of the aptitude test, you might be concerned about predictive test bias. Formally speaking, predictive bias has to do with the use of test scores to predict a relevant outcome (e.g., behavior, competency, or performance) in situations other than the testing situation in which the predictor test was administered. Thus, if you had reason to believe that the aptitude test was strongly predictive of supervisor ratings for males but not for females, then you would suspect that the test was predictively biased. To evaluate the efficacy of your new aptitude test and to evaluate any potential predictive bias, you will need to examine two issues a) does your test actually help you predict the 14

15 outcome of training, and b) does your test predict the outcome of training equally well for various groups of trainees? To address both issues, you will need data that can be used to evaluate the predictive effectiveness of your test. Data of this kind could be obtained by testing all trainees before they enter the program and then recording their scores on the outcome measure at the end of the training program. The two issues are often addressed by using a statistical procedure called regression, with which you can use the pre-training mechanical aptitude test scores to calculate predicted post-training supervisor rating scores. Basics of Regression Analysis Regression analysis is based on the assumption that there is a linear relationship between aptitude test scores and outcome scores. If there is such a relationship, then the formula for a straight line can be used to predict outcome scores from aptitude scores: Y = a + b( X) where Y is the predicted training outcome score for an individual training candidate, a is the intercept (the predicted value of a person s outcome score if that person had an aptitude test score of zero), b is the regression coefficient or slope (a number that tells you how much of a change you would expect to see in Y for a one point increase in aptitude test scores), and X is an individual s aptitude test score. Many popular statistical software packages can be used to conduct the regression analysis, which produces values of a and b. Once you have obtained the values for the intercept and slope of the regression equation, you can evaluate the predictive ability of the test. For example, you can take any individual s score on the aptitude test (X), plug it into the regression equation, and calculate a predicted score on the supervisor ratings ( Y ) for that individual. 15

16 To illustrate this process, we will use the data in Table 1.2. In this Table, we have aptitude scores for four trainees, along with each trainee s outcome score (note that an analysis of this kind would involve many more than four trainees). Based on a regression analysis conducted by using SPSS, the intercept (a) is and the slope (b) is.58. These results tell us that a trainee with an aptitude score of zero is predicted to obtain an outcome rating of and that a one-point difference in aptitude scores is associated with a.58 difference in outcome scores. As mentioned earlier, these values can be used to obtain predicted scores for all trainees, by plugging their aptitude scores into the following regression equation: Y = ( X ) Predicted supervisor rating = (Aptitude Score) For example, a trainee with an aptitude score of 69 is predicted to earn a supervisor rating of 96.05: Y = (69) Y = Similarly, a trainee with an aptitude score of 70 is predicted to earn a supervisor rating of 96.63: Y = (70) Y = Note that the difference between these two predictions is.58 ( =.58), which reflects the slope in the regression equation. That is, a one-point difference in aptitude test score (70-69 = 1) is associated with a.58 difference in outcome scores. If we calculate predicted rating scores for a wide range of aptitude test scores, then we can generate a regression line or a line of best fit. Each point on a regression line is associated 16

17 with the most likely (predicted) Y value for each possible X-value. The line is used to illustrate the association between predictor test scores and outcome scores. In Table 1.2, we have computed predicted scores ( Y ) for each trainee. In Figure 11.4, we have plotted each of our four candidate s observed outcome score against his or her observed predictor test score, and we have draw a regression line that reflect each candidate s predicted outcome score. Notice that each trainee s predicted score on the outcome differs from his or her actual score on the outcome. For example, for trainee #1, the observed outcome score is 75 but the predicted outcome score is A difference between a predicted score and an observed outcome score is referred to as a residual. The standard deviation of the residuals is the standard error of estimate ( se ) that we discussed previously. The se for the data in Table 1.2 is e This value reflects the inaccuracy of predictions; the larger the value, the less accurate the predictions. A test with strong predictive power will result in relatively small residuals, which will result in a relatively small se e. One Size Fits All: The Common Regression Equation The estimation of predictive bias usually begins by establishing what would happen if no bias exists. If a test is not biased, then one regression equation should be equally applicable to different groups of people. The assumption that different groups share a common regression equation is based on the idea that one size fits all regardless of biological sex, ethnicity, culture, or whichever group difference is being considered, a single regression equation adequately reflects the predictive ability of the test in question. Imagine that you give your aptitude test to a large number of trainee candidates (e.g., 100). Assume there is an equal number of male and female candidates and you want to make sure that your aptitude test is not biased with respect to the biological sex of the candidates. To e 17

18 begin your examination of this issue, you could compute the regression equation based on the data from the entire sample, regardless of sex. Imagine that you found that the intercept from this regression equation is a = 56.03, the slope is b =.58, and the standard error of estimate is se e = These values represent the common regression equation, and they will be called the common intercept, the common slope, and the common standard error of estimate, respectively, Again, if you aptitude test is unbiased in terms of gender, then the common regression equation (calculated from males and females together) should be equally applicable to males and females. To evaluate the presence of predictive bias, additional regression analyses must be conducted. To determine whether the common regression equation is indeed equally applicable to males and females, we must calculate one regression equation for males and a one for females. We must then compare these group-level regression equations with the common regression equation. Three of values that we just discussed can be used to assess predictive test bias: a) the intercept b) the slope, and c) the standard error of estimate. If the group-level values do not match the common regression equation, then you might suspect that your aptitude test scores are biased. In practice, there are a variety of sophisticated statistical analyses that can be conducted on these values to precisely estimate the presence of predictive test bias, but our discussion will focus on the more conceptual level. To elucidate the meaning of various patterns of results, we first focus on the meaning of biased intercepts, then on biased slopes. However, in practice, it may be more likely that groups would differ on both of these elements of prediction rather than being exactly equal on one but differing on the other. Thus, we will also illustrate the effect of bias in terms of intercepts and slopes. 18

19 Intercept Bias Suppose that group-level regression analyses reveal that males and females have slopes and se e values that are similar to the common regression equation, but that their intercept values differ from the common intercept. In this case, you would suspect that your test suffers from intercept bias. For example, imagine that, in your evaluation of your aptitiude test, you conduct regression analyses separately for the 50 males and the 50 females. You find that, for both groups, the slope is b =.58 and the se e is 6.76, which are equal to the common slope and common se e. However, you find the intercept for males is a = and the intercept for females is a = Note that these group-level intercept values differ from the common intercept, indicating that one size does not fit all, at least in terms of the intercept. Thus, the test appears to suffer from intercept bias. What are the implications of intercept bias? The fact that the males intercept is higher than the females intercept indicates that males at any given level of aptitude will tend to receive higher supervisor ratings (outcome scores) than females at the same level of aptitude. To illustrate this, let us compute the predicted outcome score for a male with an aptitude score of 70 and for a female with an aptitude score of 70: Predicted Outcome Score for Male = (70) Predicted Outcome Score for Male = Predicted Outcome Score for Female = (70) Predicted Outcome Score for Female = These computations show that for a male and a female who have the exact same level of aptitude, the male is predicted to obtain a supervisor rating that is 4 points higher than the 19

20 female. If we assume that the supervisor ratings are themselves unbiased (an assumption that we will revisit later in this chapter), then this discrepancy indicates that the aptitude test does not work the same for males and females. As we saw earlier, the common regression equation resulted in a predicted supervisor rating of for a trainee who had an aptitude score of 70. Comparing this result to the results of our group-level predictions, the common regression equation appears to under-estimate the prediction for males and to over-estimate the prediction for females. Thus, one size does not fit all, and the test appears to be predictively biased. If a test suffers only from intercept bias (i.e., the group-level intercepts are not equivalent to the common intercept, but the slopes and se e values are unbiased), then the size of the group discrepancy would be constant across all aptitude scores. We saw a four-point discrepancy for a male and a female who both had an aptitude score of 70, and if the aptitude test suffers only from intercept bias, then the sex difference will be four points at every level of aptitude. This is illustrated in Figure 1.5, which presents a common regression line (dashed) and two group-level regression lines. As this figure illustrates, the lines are parallel, suggesting that a male trainee of a given aptitude level will obtain a predicted rating that is always four-points higher than a female who has the same level of aptitude. Slope Bias A second way in which a test can be predictively biased is through slope bias Suppose that group-level regression analyses reveal that males and females have intercept and se e values that are similar to the common regression equation, but that their slope values differ from the common slope. This would indicate that the connection between predictor scores and outcome scores differs between the two groups. 20

21 For example, imagine that your analyses reveal that, for both groups, the intercept is a = and the se is 6.76, which are equal to the common intercept and common se. However, e you find the slope for males is b =.53 and the slope for females is b =.60. Note that these group-level slope values differ from the common slope (i.e.,.58), indicating that one size does not fit all, in terms of the connection between predictor test scores and outcome scores. Slope bias has important implications for the degree of discrepancy between the groups predicted outcome scores. The fact that the males slope is weaker than the females slope indicates that the amount of bias is not constant across aptitude levels. To illustrate this, let us compute the predicted outcome score for a male with an aptitude score of 70 and for a female with an aptitude score of 70: Predicted Male Outcome Score = (70) Predicted Male Outcome Score = Predicted Female Outcome Score = (70) Predicted Female Outcome Score = This shows that, for a male and female with an aptitude of 70, the female will be predicted to have an outcome score that is 4.9 points higher than the male. Now, let us compute the predicted outcome score for a male and a female who have aptitude scores of 60: Predicted Male Outcome Score = (60) Predicted Male Outcome Score = Predicted Female Outcome Score = (60) Predicted Female Outcome Score = In this case, the female will be predicted to have an outcome score that is 4.2 points higher than the male. Thus, the bias (i.e., the degree to which the predicted outcome score e 21

22 differs for males and females who have the same level of aptitude) is relatively small for relatively low levels of aptitude, but it is larger for higher levels of aptitude. That is, the discrepancy between male and female predicted scores will tend to increase as scores on the aptitude test increase. This type of pure slope bias is illustrated in Figure 1.6, which shows that the regression lines for males and for females gradually move apart. Intercept and Slope Bias So far, we have illustrated pure intercept bias and pure slope bias cases in which either the intercept is biased or the slope is biased, but not both. To summarize, pure intercept bias indicates that there is a discrepancy between groups predicted scores and that the size of this discrepancy does not change as aptitude scores increase or decease in size. In contrast, pure slope bias indicates that the size of this discrepancy does change as aptitude scores increase or decease in size. It is also possible (perhaps even more so than either form of pure bias) for intercept and slope biases to exist simultaneously. In this case, there will be a complex relationship between the size of aptitude scores and the outcome scores for the different groups. For example, we might find that, for people who have low levels of aptitude, the predicted outcome scores for males might be higher than predicted outcome scores for females. But our analyses might also reveal that, for people who have high levels of aptitude, the predicted outcome scores for males might be lower than predicted outcome scores for females. Although there are many patterns of discrepancy that might occur, one possible outcome of this type is illustrated in Figure Standard Error of Estimate The standard error of estimate is a value that represents the accuracy of your prediction of an outcome score based on values from a test used to make the prediction. In our example, we 22

23 are using aptitude test scores to predict scores on a training program effectiveness outcome measure. Our aptitude test would be considered biased if the group-level se e values differed from the common se e value. This bias would indicate that you can make more accurate predictions for people in one of the groups than for people in the other group. Outcome Score Bias Our discussion of predictive bias has focused on the possibility that the scores on the predictor test are biased. However, it is also possible that scores on the outcome variable could be biased. For example, it is possible that the supervisor who provide the post-training ratings of competence are biased in favor of one group and against another. The test we use to measure outcomes such as our 100 item mechanical competency test could also be biased. We have been assuming that the outcome measure is not biased but of course it could be. The Effect of Reliability As a final note, we should acknowledge that the standard error of estimate, the regression coefficient, and the intercept are all sensitive to test reliability. In our discussion of predictive bias we have been assuming high predictor test and outcome test score reliabilities, e.g., R xx greater than.90. A drop in test score reliability can have a profound effect on these parameters and thereby, at least potentially, affect predictive bias. These effects are complex and beyond the scope of our discussion but for interested readers we recommend Jensen (1980). Summary We have been focusing on test bias, which traditionally refers to the possibility that true differences among groups are systematically obscured (or artificially created). Although there are widely-used methods for coping with response biases, the methods that have been proposed for coping with test bias tend to be somewhat controversial and beyond the scope of our current 23

24 discussion. For a recent survey of the issues, interested readers are directed to Sackett, Schmitt, and Ellingson (2001). In sum, the validity of test score interpretation and use is a fundamental concern to behavioral scientists who are interested in psychological measurement. Through decades of conceptual and methodological development, psychometricians, test-users, and test-developers have articulated the meaning and evaluation of validity. Although threats to validity do exist, psychologists others interested in psychological measurement have made great strides in identifying such threats and in developing strategies for detecting, preventing, or minimizing them. Nevertheless, psychological tests should always be used and interpreted with close regard for the theoretical and evidential basis of their meaning and application. 24

25 Table 1.1 Item Discrimination Index Example Item Discrimination Index: Notice that I d DOES NOT depend solely on the proportion of test takers who get the item correct. If you look at item 9 you will see that the I d for that item is.45 which is exactly the same as the I d for item 5 although far fewer people answered item 9 correctly. Although I used 30% to identify the top and bottom groups, the actual percent used to identify these groups does not have to be 30%. You will find that the percentage tends to range from 25% to 33%. Items TOP 30% BOTTOM 30% Proportion Correct-TOP Proportion Correct BOTTOM N top =20 I d = TOP% - BOTTOM% N bottom =20 Item I d

26 Table 1.2 Data for Illustrating Regression Analysis Aptitude Supervisor Predicted Trainee Test Score Rating Supervisor Rating Variance (se e ) =

27 Figure

28 Figure

29 Figure

30 Figure 1.4 Scatterplot and Regression Line for Trainee s Aptitude Scores and Supervisor Ratings Regression Line Supervisor Rating Aptitude Test Score

31 Figure

32 Figure

33 Figure

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

X = T + E. Reliability. Reliability. Classical Test Theory 7/18/2012. Refers to the consistency or stability of scores

X = T + E. Reliability. Reliability. Classical Test Theory 7/18/2012. Refers to the consistency or stability of scores Reliability It is the user who must take responsibility for determining whether or not scores are sufficiently trustworthy to justify anticipated uses and interpretations. (AERA et al., 1999) Reliability

More information

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the

More information

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias Glossary of Terms Ability A defined domain of cognitive, perceptual, psychomotor, or physical functioning. Accommodation A change in the content, format, and/or administration of a selection procedure

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

DATA COLLECTION AND ANALYSIS

DATA COLLECTION AND ANALYSIS DATA COLLECTION AND ANALYSIS Quality Education for Minorities (QEM) Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. August 23, 2013 Objectives of the Discussion 2 Discuss

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

INTRODUCTION TO MULTIPLE CORRELATION

INTRODUCTION TO MULTIPLE CORRELATION CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATION Chapter 12 introduced you to the concept of partialling and how partialling could assist you in better interpreting the relationship between two primary

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

The Relationship between the Fundamental Attribution Bias, Relationship Quality, and Performance Appraisal

The Relationship between the Fundamental Attribution Bias, Relationship Quality, and Performance Appraisal The Relationship between the Fundamental Attribution Bias, Relationship Quality, and Performance Appraisal Executive Summary Abstract The ability to make quality decisions that influence people to exemplary

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

EQUATING TEST SCORES

EQUATING TEST SCORES EQUATING TEST SCORES (Without IRT) Samuel A. Livingston Listening. Learning. Leading. Equating Test Scores (Without IRT) Samuel A. Livingston Copyright 2004 Educational Testing Service. All rights reserved.

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88)

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Introduction The National Educational Longitudinal Survey (NELS:88) followed students from 8 th grade in 1988 to 10 th grade in

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Types of Error in Surveys

Types of Error in Surveys 2 Types of Error in Surveys Surveys are designed to produce statistics about a target population. The process by which this is done rests on inferring the characteristics of the target population from

More information

Assessment, Case Conceptualization, Diagnosis, and Treatment Planning Overview

Assessment, Case Conceptualization, Diagnosis, and Treatment Planning Overview Assessment, Case Conceptualization, Diagnosis, and Treatment Planning Overview The abilities to gather and interpret information, apply counseling and developmental theories, understand diagnostic frameworks,

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

Constructing a TpB Questionnaire: Conceptual and Methodological Considerations

Constructing a TpB Questionnaire: Conceptual and Methodological Considerations Constructing a TpB Questionnaire: Conceptual and Methodological Considerations September, 2002 (Revised January, 2006) Icek Ajzen Brief Description of the Theory of Planned Behavior According to the theory

More information

Schools Value-added Information System Technical Manual

Schools Value-added Information System Technical Manual Schools Value-added Information System Technical Manual Quality Assurance & School-based Support Division Education Bureau 2015 Contents Unit 1 Overview... 1 Unit 2 The Concept of VA... 2 Unit 3 Control

More information

Basic Concepts in Research and Data Analysis

Basic Concepts in Research and Data Analysis Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Chapter 9 Assessing Studies Based on Multiple Regression

Chapter 9 Assessing Studies Based on Multiple Regression Chapter 9 Assessing Studies Based on Multiple Regression Solutions to Empirical Exercises 1. Age 0.439** (0.030) Age 2 Data from 2004 (1) (2) (3) (4) (5) (6) (7) (8) Dependent Variable AHE ln(ahe) ln(ahe)

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

MAY 2004. Legal Risks of Applicant Selection and Assessment

MAY 2004. Legal Risks of Applicant Selection and Assessment MAY 2004 Legal Risks of Applicant Selection and Assessment 2 Legal Risks of Applicant Selection and Assessment Effective personnel screening and selection processes are an important first step toward ensuring

More information

Statistics, Research, & SPSS: The Basics

Statistics, Research, & SPSS: The Basics Statistics, Research, & SPSS: The Basics SPSS (Statistical Package for the Social Sciences) is a software program that makes the calculation and presentation of statistics relatively easy. It is an incredibly

More information

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis Economics of Strategy (ECON 4550) Maymester 015 Applications of Regression Analysis Reading: ACME Clinic (ECON 4550 Coursepak, Page 47) and Big Suzy s Snack Cakes (ECON 4550 Coursepak, Page 51) Definitions

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

NOTES ON HLM TERMINOLOGY

NOTES ON HLM TERMINOLOGY HLML01cc 1 FI=HLML01cc NOTES ON HLM TERMINOLOGY by Ralph B. Taylor breck@rbtaylor.net All materials copyright (c) 1998-2002 by Ralph B. Taylor LEVEL 1 Refers to the model describing units within a grouping:

More information

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1. Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Mgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side

Mgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side Mgmt 469 Model Specification: Choosing the Right Variables for the Right Hand Side Even if you have only a handful of predictor variables to choose from, there are infinitely many ways to specify the right

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Empirical Methods in Applied Economics

Empirical Methods in Applied Economics Empirical Methods in Applied Economics Jörn-Ste en Pischke LSE October 2005 1 Observational Studies and Regression 1.1 Conditional Randomization Again When we discussed experiments, we discussed already

More information

Elasticity. I. What is Elasticity?

Elasticity. I. What is Elasticity? Elasticity I. What is Elasticity? The purpose of this section is to develop some general rules about elasticity, which may them be applied to the four different specific types of elasticity discussed in

More information

Partial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests

Partial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests Partial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests Final Report Sarah Maughan Ben Styles Yin Lin Catherine Kirkup September 29 Partial Estimates of Reliability:

More information

Chapter 2 - Why RTI Plays An Important. Important Role in the Determination of Specific Learning Disabilities (SLD) under IDEA 2004

Chapter 2 - Why RTI Plays An Important. Important Role in the Determination of Specific Learning Disabilities (SLD) under IDEA 2004 Chapter 2 - Why RTI Plays An Important Role in the Determination of Specific Learning Disabilities (SLD) under IDEA 2004 How Does IDEA 2004 Define a Specific Learning Disability? IDEA 2004 continues to

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

Florida s Plan to Ensure Equitable Access to Excellent Educators. heralded Florida for being number two in the nation for AP participation, a dramatic

Florida s Plan to Ensure Equitable Access to Excellent Educators. heralded Florida for being number two in the nation for AP participation, a dramatic Florida s Plan to Ensure Equitable Access to Excellent Educators Introduction Florida s record on educational excellence and equity over the last fifteen years speaks for itself. In the 10 th Annual AP

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

Raw Score to Scaled Score Conversions

Raw Score to Scaled Score Conversions Jon S Twing, PhD Vice President, Psychometric Services NCS Pearson - Iowa City Slide 1 of 22 Personal Background Doctorate in Educational Measurement and Statistics, University of Iowa Responsible for

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

Characteristics of Binomial Distributions

Characteristics of Binomial Distributions Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS)

Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS) Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS) April 30, 2008 Abstract A randomized Mode Experiment of 27,229 discharges from 45 hospitals was used to develop adjustments for the

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

This chapter discusses some of the basic concepts in inferential statistics.

This chapter discusses some of the basic concepts in inferential statistics. Research Skills for Psychology Majors: Everything You Need to Know to Get Started Inferential Statistics: Basic Concepts This chapter discusses some of the basic concepts in inferential statistics. Details

More information

Interpreting and Using SAT Scores

Interpreting and Using SAT Scores Interpreting and Using SAT Scores Evaluating Student Performance Use the tables in this section to compare a student s performance on SAT Program tests with the performance of groups of students. These

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Technical Information

Technical Information Technical Information Trials The questions for Progress Test in English (PTE) were developed by English subject experts at the National Foundation for Educational Research. For each test level of the paper

More information

Missing data in randomized controlled trials (RCTs) can

Missing data in randomized controlled trials (RCTs) can EVALUATION TECHNICAL ASSISTANCE BRIEF for OAH & ACYF Teenage Pregnancy Prevention Grantees May 2013 Brief 3 Coping with Missing Data in Randomized Controlled Trials Missing data in randomized controlled

More information

Feifei Ye, PhD Assistant Professor School of Education University of Pittsburgh feifeiye@pitt.edu

Feifei Ye, PhD Assistant Professor School of Education University of Pittsburgh feifeiye@pitt.edu Feifei Ye, PhD Assistant Professor School of Education University of Pittsburgh feifeiye@pitt.edu Validity, reliability, and concordance of the Duolingo English Test Ye (2014), p. 1 Duolingo has developed

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Introduction In the summer of 2002, a research study commissioned by the Center for Student

More information

interpretation and implication of Keogh, Barnes, Joiner, and Littleton s paper Gender,

interpretation and implication of Keogh, Barnes, Joiner, and Littleton s paper Gender, This essay critiques the theoretical perspectives, research design and analysis, and interpretation and implication of Keogh, Barnes, Joiner, and Littleton s paper Gender, Pair Composition and Computer

More information

Chapter 8: Quantitative Sampling

Chapter 8: Quantitative Sampling Chapter 8: Quantitative Sampling I. Introduction to Sampling a. The primary goal of sampling is to get a representative sample, or a small collection of units or cases from a much larger collection or

More information

Eight things you need to know about interpreting correlations:

Eight things you need to know about interpreting correlations: Research Skills One, Correlation interpretation, Graham Hole v.1.0. Page 1 Eight things you need to know about interpreting correlations: A correlation coefficient is a single number that represents the

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,

More information

Relating the ACT Indicator Understanding Complex Texts to College Course Grades

Relating the ACT Indicator Understanding Complex Texts to College Course Grades ACT Research & Policy Technical Brief 2016 Relating the ACT Indicator Understanding Complex Texts to College Course Grades Jeff Allen, PhD; Brad Bolender; Yu Fang, PhD; Dongmei Li, PhD; and Tony Thompson,

More information

Review of Fundamental Mathematics

Review of Fundamental Mathematics Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

More information

Gender Effects in the Alaska Juvenile Justice System

Gender Effects in the Alaska Juvenile Justice System Gender Effects in the Alaska Juvenile Justice System Report to the Justice and Statistics Research Association by André Rosay Justice Center University of Alaska Anchorage JC 0306.05 October 2003 Gender

More information

10. Analysis of Longitudinal Studies Repeat-measures analysis

10. Analysis of Longitudinal Studies Repeat-measures analysis Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

More information

Teaching Multivariate Analysis to Business-Major Students

Teaching Multivariate Analysis to Business-Major Students Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service THE SELECTION OF RETURNS FOR AUDIT BY THE IRS John P. Hiniker, Internal Revenue Service BACKGROUND The Internal Revenue Service, hereafter referred to as the IRS, is responsible for administering the Internal

More information

CENTER FOR EQUAL OPPORTUNITY. Preferences at the Service Academies

CENTER FOR EQUAL OPPORTUNITY. Preferences at the Service Academies CENTER FOR EQUAL OPPORTUNITY Preferences at the Service Academies Racial, Ethnic and Gender Preferences in Admissions to the U.S. Military Academy and the U.S. Naval Academy By Robert Lerner, Ph.D and

More information

Solution: The optimal position for an investor with a coefficient of risk aversion A = 5 in the risky asset is y*:

Solution: The optimal position for an investor with a coefficient of risk aversion A = 5 in the risky asset is y*: Problem 1. Consider a risky asset. Suppose the expected rate of return on the risky asset is 15%, the standard deviation of the asset return is 22%, and the risk-free rate is 6%. What is your optimal position

More information

Paid and Unpaid Labor in Developing Countries: an inequalities in time use approach

Paid and Unpaid Labor in Developing Countries: an inequalities in time use approach Paid and Unpaid Work inequalities 1 Paid and Unpaid Labor in Developing Countries: an inequalities in time use approach Paid and Unpaid Labor in Developing Countries: an inequalities in time use approach

More information

Measurement. How are variables measured?

Measurement. How are variables measured? Measurement Y520 Strategies for Educational Inquiry Robert S Michael Measurement-1 How are variables measured? First, variables are defined by conceptual definitions (constructs) that explain the concept

More information

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA) UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design

More information

Reception baseline: criteria for potential assessments

Reception baseline: criteria for potential assessments Reception baseline: criteria for potential assessments This document provides the criteria that will be used to evaluate potential reception baselines. Suppliers will need to provide evidence against these

More information

The Capital Asset Pricing Model (CAPM)

The Capital Asset Pricing Model (CAPM) Prof. Alex Shapiro Lecture Notes 9 The Capital Asset Pricing Model (CAPM) I. Readings and Suggested Practice Problems II. III. IV. Introduction: from Assumptions to Implications The Market Portfolio Assumptions

More information

Time Series and Forecasting

Time Series and Forecasting Chapter 22 Page 1 Time Series and Forecasting A time series is a sequence of observations of a random variable. Hence, it is a stochastic process. Examples include the monthly demand for a product, the

More information