Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

Size: px
Start display at page:

Download "Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but"

Transcription

1 Test Bias As we have seen, psychological tests can be well-conceived and well-constructed, but none are perfect. The reliability of test scores can be compromised by random measurement error (unsystematic error), and the validity of test score interpretations can be compromised by response biases that systematically obscure the psychological differences among respondents. Now we will examine the possibility that the validity of test score interpretations can be compromised further by test biases that systematically obscure the differences (or lack thereof) among groups of respondents. Psychological tests are often used to make important decisions that affect the lives of real people which colleges (if any) will decide to accept you, in which class will your child be enrolled, and will an employer decide to hire you? To the degree that such decisions are based on tests that are biased in favor of or against specific groups of people, such biases have extremely important personal and societal implications. Suppose you are interested in studying the possibility that gender differences exist in mathematical ability. You give a reasonably reliable mathematics test to a representative group of males and females, and you find that, on average, males have higher math scores than males. As a researcher you would be tempted to interpret your test scores in terms of the psychological construct that they are intended to reflect that males tend to have greater mathematical ability than females. However, it is possible that the participants test scores should not be interpreted as reflecting purely their mathematical ability. That is, it is possible that the test is biased in some way. For example, if the males test scores overestimated their true mathematical ability and the females test scores underestimated their true ability, then the test is biased. In this case, 1

2 the difference between the test scores for males and females might be due to test score bias, not due to a difference in their true mathematical abilities. There are two general methods used to detect test biases. Roughly speaking, the two types of test bias reflect biases in the meaning of a test and biases in the use of a test. Construct bias occurs when a test has different meanings for two groups, in terms of the precise construct that the test is intended to measure. Construct bias has to do with the relationship of observed scores to true scores on a psychological test. If this relationship can be shown to be systematically different for different groups, then we might conclude that the test is biased. Construct bias can lead to situations in which two groups have the same average true score on a psychological construct but different average test scores. The second type of bias that is predictive bias, which occurs when a test s use has different implications for two groups. Predictive bias has to do with the relationship between scores on two different tests. One of these tests (the predictor test) is thought to provide values that can be used to predict scores on the other test (the outcome test or measure). For example, college admissions officers might use SAT test scores to predict freshman GPAs. The SAT would be the predictor test and GPAs would be the outcome measure. In this context, test bias concerns the extent to which the link between predictor test true scores and outcome test observed scores differ for two groups. If the SAT is more strongly predictive of GPA for one group than for another, then the SAT suffers from predictive bias, in terms of its use as a predictor of GPA. The two types of bias construct and predictive are independent. For example, a test might have no construct bias, but suffer from predictive bias. The SAT might accurately reflect true academic aptitude differences among groups of people (and thus have no construct bias), 2

3 but academic aptitude might not be associated with freshman GPA equally for two groups of people (and thus predictive bias would exist). There are several ways to operationally define and identify test score bias. There are at least two categories of procedures that can be used to identify test score bias: a) internals methods to identify construct bias and b) external methods to identify predictive bias. We emphasize the operational nature of this task to remind you that test score bias in both of its forms is a theoretical concept; in part, because both types of bias depend on the theoretical notion of a true score. There is no one way to detect test score bias any more than there is one way to calculate directly such psychometric test score properties as reliability or validity. There are, however, various generally accepted ways to estimate the degree to which test bias exists. An overarching issue in the definition and detection of test bias is that the existence of a group difference in test scores does not necessarily mean that test scores are biased. Suppose you find that females have higher scores on self-esteem than males. This difference is not prima facie evidence that the test is biased (Jensen, 1980; 1998; Thorndike, 1971). The participants test scores might in fact be good estimates of their true self-esteem. In such a case, the test is not biased, and the group difference in test scores reflects a real difference in average self-esteem. Consider doing a study in which you weigh representative groups of males and females. You would doubtless find that the average weight of females is lower than the average weight of males. You would not take this difference to mean that the scale you used to measure weight produced scores that were biased. Why Worry about Test Score Bias? It is likely that everyone reading this book has taken a psychological test of some kind. Virtually all children schooled in the United States or other industrialized countries are exposed 3

4 on a regular basis to academic achievement tests. In the United States, most students who plan to attend an institution of higher education have taken the SAT or ACT test. Most graduate schools in the United States require student applicants to take the GRE. Applicants for most Federal Government jobs are required to take a civil service examination and corporations regularly test job applicants and sometimes even employees using psychological tests. Scores on these and other types of psychological tests are often used to make important decisions about people. In educational settings intelligence tests scores are used to place children in special programs. Intelligence test scores are used by law courts to make decisions about who can and who can not be sentenced to death following a murder conviction. Educational institutions use scores on standardized tests to make admissions decisions. Corporations and governments often make job decisions about people based, at least in part, on test scores. In the United States, most public school teachers have to take and pass standardized tests to become certified school teachers. The use of psychological tests in our society is pervasive and scores on these tests can have an important impact on people s lives and on our public and private institutions. Because testing is a pervasive feature of our society and because test scores have important consequences for people, we would like to develop tests that produces scores that allow us to differentiate among people based on true score differences and not on particular group membership. For example, if we have a self-esteem test, then we would like to be sure that scores are determined only by self-esteem and not contaminated by some other extraneous factor such as the biological sex of the person taking the test. In other words, we want unbiased tests. Our desire for unbiased test scores is rooted in our belief that we should not discriminate for or against a person because of their biological sex, their ethnicity, their race, their religious 4

5 preference, or their age. In some cases, the list of groups that should be protected from test score bias has been expanded to include factors such as sexual preference, pregnancy, marital status, linguistic background, and various disabilities. In each of these cases, we should be confident that observed score differences on psychological tests are a function of true score differences. It is especially important to be able to show that test scores are not biased in those instances in which average observed scores on some type of psychological test differ between groups. Detecting Construct Bias: Internal Evaluation of a Test Construct bias is often evaluated by examining responses to individual items on a test. An item on a test is biased if people belonging to different groups responded in different ways to the item and if it could be shown that these differing responses were not related to group differences associated with the psychological attribute that the test was designed to measure. For example, suppose you had a 100-item mechanical aptitude test. If you selected one item from the test and found that males responses were similar to females responses, then the item would not appear to be biased (assuming that the males and females had the same level of mechanical aptitude). On the other hand, if males and females with the same level of aptitude responded in different ways to the item, then you would suspect some type of bias in the item. Most psychological tests are composite tests they contain multiple items. For such composite tests, the overall test bias is a function of the bias associated with each of the items. If none of the items on a test seem to be biased, then we would assume that the total test score is unbiased. However, if there one or more items seem to be biased, then we would suspect that the total test score might also be biased. Remember that test bias concerns the relationship between group differences in true scores and group differences in observed test scores. In the case of construct bias, a test item 5

6 would be biased if responses to the item for people who belong to one group reflect their true scores on the relevant psychological attribute but responses to the item from people in another group are not simply a function of that attribute (we are assuming some minimum degree of reliability for the test of interest.) Of course, we can never know a person s true score with respect to any attribute. Therefore, the procedures that we are going to discuss are estimates of the existence and degree of construct bias. Construct bias is related to the meaning of test scores. Evidence of construct bias suggests that scores on a test might have different meanings for different groups of people. If we had evidence that suggested that scores on our mechanical aptitude test suffered from construct bias related to the biological sex of those taking the test, we would have to entertain the possibility the scores measure different psychological attributes in the two groups. For example, the males responses to the test might be determined primarily by a single construct mechanical aptitude but the females responses might be determined by two constructs mechanical aptitude and stereotype threat (the tendency to behave in ways that confirm stereotypes about one s group (Spencer, Steele, & Quinn, 1999). Thus, the mechanical aptitude test does not measure the exact same psychological attributes for the two sexes. Several procedures that can be used to estimate the existence and degree of construct bias will be described. These procedures focus on the internal structure of the test. The internal structure of a test has to do with the way that the parts of a test are related to each other. Most simply, internal structure refers to the pattern of correlations among items and/or the correlations between each item and the total test score. To evaluate the presence of construct bias, we examine the internal structure of a test separately for two groups. If the two groups exhibit the same internal structure to their test responses, then we conclude that the test is unlikely to suffer 6

7 from construct bias. However, if the two groups exhibit different internal structures to their test responses, then we conclude that the test is likely to suffer from construct bias. There are at least four methods for detecting construct bias. Item Discrimination Index (I d ) One method of detecting construct bias is by computing item discrimination indexes separately for two groups. An item s discrimination index reflects the degree to which the item is related to the total test score (i.e., that people who answered the item correctly tended to do better on the test as a whole than people who answered the item incorrectly), and by implication, it indicates that the item is highly similar to most of the other items on a test. In this way, item discrimination indexes reflect the structure of associations among test items. The I d is calculated by calculating a total score on a test and then rank ordering the scores from highest to lowest. You then select those people whose test score puts them in the top 30% of the scorers and those people who put them in the bottom 30% of the scorers. Now you go to each test item each question and compute the proportion of people in each of the two groups who answer the item correctly. For example, suppose that there are 50 people in the top group of test takers and you find that 40 of these people answered question #1 correctly. The proportion of people in this group to answer the item correctly would be.80. Now imagine that in the low scoring group only 10 of these 50 people answered question #1 correctly. The proportion of low scorers to answer the item correctly would be.20. Now if you subtract the low score proportion from the high score proportion you get the discrimination index for item #1 (see item discrimination example Table 1.1). I = P - P d hi l ow Where P = Proportion of people in a group who answer an item correctly 7

8 Historically, the item discrimination index was developed in association with classical test theory. The index is an important measure of the extent to which responses to test items can be used to differentiate among people on the basis of the amount of their knowledge of some topic, or on the amount of some other type of psychological attribute. Again, assume we give a mechanical aptitude test to a group of people. If people who have high mechanical aptitude have a high probability of answering a particular aptitude question correctly while people with low mechanical aptitude have a low probability of answering the item correctly, the question would have a high item discrimination index value (e.g.,.90). On the other hand, if the item discrimination index for a question was low (e.g.,.10) then the low aptitude respondents answered the item correctly nearly as often as the high aptitude respondents. Thus, the item does not clearly discriminate among people with varying levels of the construct being measured. The item discrimination index can be used to estimate construct bias. Specifically, we would select an item, compute its discrimination index separately for two groups of people, and then compare the groups indexes. If the two discrimination index values are approximately equal, then we conclude that the item is probably not biased. However, if the two discrimination index values are not approximately equal, then we conclude that the item is probably biased in some way. That is, we would conclude that the item seems to belong on the test for one group, but not for the other group. By including the item on the test for both groups, the test seems to be somewhat different for the two groups. This analysis would be conducted for each of the items on the test. An important feature of the item discrimination index as a measure of construct bias is that it is independent of the number of people in the groups that are being compared that answer an item correctly. For example, we might find that one of our mechanical aptitude items was 8

9 answered correctly by only 40% of the males but by 60% of the females. Even so, the item discrimination index for the question could be the same for both groups. In this case, we would assume that the item is functioning as a measure of mechanical aptitude in the same way for both groups, but that females know more about the material than males (i.e., more of them answered the item correctly). Factor Analysis A second method for examining construct bias is by conducting a factor analysis of items separately for two or more groups of people. Factor analysis as an important method for evaluating the internal structure of a test. Factor analysis as a statistical procedure for partitioning the variance or covariance among test items into clusters of factors that in some sense hang together. It is sometimes the case that responses to a group of items on a test are more highly positively correlated with each other than they are to responses given to other items on the test. The group of items that are highly correlated with each other statistically hang together, and they are believed to reflect a factor. If all of the items on a test have similar correlations with each other (i.e., there is no evidence of multiple groups of items), then we say that the test is homogeneous, or that all of the test score variance, other than error variance, is accounted for by a single factor. Factor analysis can be used to evaluate the internal structure of a test separately for two groups of people. For example, we might find that, among males, the mechanical aptitude test has a strong single-factor structure all of the items seem to be highly correlated with each other, suggesting that the test is a essentially measure of one and only one construct. To evaluate the potential presence of construct bias, we would need to examine the factor structure for females responses to the test items as well. If we found a single factor among females 9

10 responses, then we would conclude that the aptitude test has the same internal structure for males and females. Consequently, we would conclude that the test does not suffer from construct bias. However, we might conduct a factor analysis of females responses and found two factors or more. In this case, we would conclude that the test has different internal structure for males and females, and we would then conclude that the test does indeed suffer from construct bias. That is, the total test score reflects different psychological factors for males and for females. Differential Item Functioning Analyses Perhaps the best way to evaluate construct bias is a procedure called differential item functioning analysis. Differential item functioning analysis is a feature of a psychometric approach called Item Response Theory (IRT). An important aspect of IRT is the assumption that it is possible to estimate respondents trait levels directly from empirical sources of data. The trait levels are, in essence, estimates of participants true scores for the psychological attribute that is being measured. If we assume that we know the trait levels for all the people in two groups and we have their responses to a test item, then we can see if the trait levels and the item responses match-up in the same way for both groups. If they do not, then it is possible that the item is biased. IRT is based on the idea that there is a function relating a participant s trait level to the probability that he or she will answer a question on a test correctly. For example, we might find that an individual with a trait level that is one standard deviation above the mean has a.80 probability of answering a particular item correctly, but that an individual with a trait level that is one standard deviation below the mean has only a.20 probability of answering the item correctly. If you have a group of people take a test and you know their respective trait levels, then you can use specialized statistical software to draw an item characteristic curve (ICC) to 10

11 illustrate this function for each item. Furthermore, if you have two groups of people, then you can draw ICCs separately for each group. To evaluate the presence of construct bias, you would compare the ICCs of the two groups. If the item is not biased, then the two groups ICCs should be very similar. That is, the probability that two people will answer an item correctly should be the same if the two people have the same trait level. However, if the item is biased, then the two groups ICCs will be dissimilar. That is, the probability that two people (e.g., a male and a female) will answer an item correctly might be different even if the two people have the same trait level. Such a situation would clearly reflect the presence of construct bias. For example, suppose that you want to determine if an item on a mechanical aptitude test was biased with respect to biological sex; You could compute mechanical aptitude knowledge scores for each person in a study (these represent their trait levels), and you could compute the probability that the item is answered correctly for each person. You use this information to draw an ICC (See Figure 1.1). Now, you sort the people in your study into two groups (i.e., a group of males and a group of females) and draw ICC curves separately for each group. If the curves overlap, you would probably conclude that the item is not biased. Suppose however, you obtained the results illustrated in Figures 1.2 and 1.3. Results such as these would lead you to suspect item bias. Figure 1.3 is an example of uniform bias. In this example it appears that females, with the same mechanical knowledge as males, find the item more difficulty to answer than the males. Figure 1.4 illustrates non-uniform bias, a situation in which the ICCs differ in shape as well as location. In this case it appears that the item is measuring different traits for males and females. The ICC approach is a visual method for detecting construct bias, but there are IRT methods that are even more precise ways of evaluating the presence of construct bias (e.g., Smith, & Reise, 1998). 11

12 Although IRT s differential item functioning analysis is a strong method for identifying construct bias, it has a downside. IRT analyses are quite complex in a variety of ways which model to use, how to determine if parameter differences between groups are really different or simply due to measurement error, the need for very large sample sizes, the need for item samples and samples of people that are heterogeneous enough to represent the complete range of traits the test is designed to measure, and the need for specialized statistical software to conduct the analyses. These complexities are such that IRT is only still emerging as a widely-appreciated and understood method of detecting construct bias. Rank Order There is another quick and computationally easy way to get an estimate of construct bias if you have test items that can be ranked in order of difficulty. Using our 100 item aptitude test as an example, some of the test questions will probably be easier to answer than others. These questions can be ranked in order of difficulty. The rankings can be done separately for different groups (e.g., males and females). If the item ranks differ across groups, then we would suspect that test score construct bias exists. We would suspect this because each item does not appear to be a measure of the same thing for both groups. You can use the ranks to compute Spearman s rank order correlation coefficient (rho, interpreted in the same way as r ) to index rank order consistency across groups. If rho is low, e.g., <.90, we might suspect construct bias. If you found evidence of construct bias, you would probably want to follow-up on the finding with additional analyses to identify the particular source of the low correlation coefficient (see Jensen, 1980). Notice that the correlation between the ranks can be high even if the proportion of correct responses to each item differs across groups. Using our aptitude test as an example, males might be less likely than females to give correct answers to the test questions, but the rank ordering of xy 12

13 questions according to difficulty might be the same across groups. Again, as with the item discrimination index, group differences in correct responding are not by themselves an indication of test score bias. Detecting Predictive Bias: External Evaluation of a Test Predictive bias concerns the degree to which a test s scores are equally predictive of an outcome for two groups. For example, scores on the SAT are thought to measure academic achievement. On the assumption that academic achievement measured during secondary school years might be related to academic achievement during the freshman year in college (e.g., as measured by freshman GPA), institutions of higher education often use SAT scores to make admissions decisions. The idea is that it is possible to predict, at least with some degree of accuracy, student freshman year academic performance based on SAT scores. If it could be shown that the ability to successfully predict freshman academic achievement from SAT scores is different for different groups of people, then we might suspect that the SAT suffers from predictive test score bias. The existence of predictive bias is examined by obtaining scores on two variables or measures. Analyses are then conducted to examine the degree to which scores on the main test of interest (the predictor test) can be used to predict people s scores on another psychological measure (the outcome measure) that is thought to be conceptually related to scores on the main test of interest. Detection of predictive bias begins with the assumption that one size fits all that the test is equally predictive for all groups. As we will illustrate, analyses are conducted to evaluate this assumption. If those analyses confirm that the test is equally predictive for both groups, then we conclude that the test probably does not suffer from predictive bias (at least with regards to the specific outcome in question and the specific groups in question). However, if 13

14 those analyses indicate that one size does not fit all that the test is not equally predictive for both groups then we conclude that the test might suffer from predictive bias. Imagine that you are a training program selection officer working for a corporation that spends large sums of money training employees to develop mechanical skills needed by the corporation to run its operations. Your job is to select the most promising candidates for this training program. Because of the cost of the program, it is essential that you select only those people who are most likely to perform well in the training program. Your job depends on how well you make these selections. In an attempt to improve your selection success rate, you develop a mechanical aptitude test that you give to all trainee candidates. You assume that scores on the test are going to be related to some outcome measure of post-training performance. For example, following training each trainee might be rated by a supervisor, in terms of the trainee s level of mechanical competency. Further, you assume that there should be a positive linear relationship between the pre-training aptitude test scores and the post-training supervisor ratings of competence. That is, candidates with high aptitude scores (i.e., predictor scores) should have better ratings (i.e., outcome scores) than candidates with lower aptitude scores. In your development and evaluation of the aptitude test, you might be concerned about predictive test bias. Formally speaking, predictive bias has to do with the use of test scores to predict a relevant outcome (e.g., behavior, competency, or performance) in situations other than the testing situation in which the predictor test was administered. Thus, if you had reason to believe that the aptitude test was strongly predictive of supervisor ratings for males but not for females, then you would suspect that the test was predictively biased. To evaluate the efficacy of your new aptitude test and to evaluate any potential predictive bias, you will need to examine two issues a) does your test actually help you predict the 14

15 outcome of training, and b) does your test predict the outcome of training equally well for various groups of trainees? To address both issues, you will need data that can be used to evaluate the predictive effectiveness of your test. Data of this kind could be obtained by testing all trainees before they enter the program and then recording their scores on the outcome measure at the end of the training program. The two issues are often addressed by using a statistical procedure called regression, with which you can use the pre-training mechanical aptitude test scores to calculate predicted post-training supervisor rating scores. Basics of Regression Analysis Regression analysis is based on the assumption that there is a linear relationship between aptitude test scores and outcome scores. If there is such a relationship, then the formula for a straight line can be used to predict outcome scores from aptitude scores: Y = a + b( X) where Y is the predicted training outcome score for an individual training candidate, a is the intercept (the predicted value of a person s outcome score if that person had an aptitude test score of zero), b is the regression coefficient or slope (a number that tells you how much of a change you would expect to see in Y for a one point increase in aptitude test scores), and X is an individual s aptitude test score. Many popular statistical software packages can be used to conduct the regression analysis, which produces values of a and b. Once you have obtained the values for the intercept and slope of the regression equation, you can evaluate the predictive ability of the test. For example, you can take any individual s score on the aptitude test (X), plug it into the regression equation, and calculate a predicted score on the supervisor ratings ( Y ) for that individual. 15

16 To illustrate this process, we will use the data in Table 1.2. In this Table, we have aptitude scores for four trainees, along with each trainee s outcome score (note that an analysis of this kind would involve many more than four trainees). Based on a regression analysis conducted by using SPSS, the intercept (a) is and the slope (b) is.58. These results tell us that a trainee with an aptitude score of zero is predicted to obtain an outcome rating of and that a one-point difference in aptitude scores is associated with a.58 difference in outcome scores. As mentioned earlier, these values can be used to obtain predicted scores for all trainees, by plugging their aptitude scores into the following regression equation: Y = ( X ) Predicted supervisor rating = (Aptitude Score) For example, a trainee with an aptitude score of 69 is predicted to earn a supervisor rating of 96.05: Y = (69) Y = Similarly, a trainee with an aptitude score of 70 is predicted to earn a supervisor rating of 96.63: Y = (70) Y = Note that the difference between these two predictions is.58 ( =.58), which reflects the slope in the regression equation. That is, a one-point difference in aptitude test score (70-69 = 1) is associated with a.58 difference in outcome scores. If we calculate predicted rating scores for a wide range of aptitude test scores, then we can generate a regression line or a line of best fit. Each point on a regression line is associated 16

17 with the most likely (predicted) Y value for each possible X-value. The line is used to illustrate the association between predictor test scores and outcome scores. In Table 1.2, we have computed predicted scores ( Y ) for each trainee. In Figure 11.4, we have plotted each of our four candidate s observed outcome score against his or her observed predictor test score, and we have draw a regression line that reflect each candidate s predicted outcome score. Notice that each trainee s predicted score on the outcome differs from his or her actual score on the outcome. For example, for trainee #1, the observed outcome score is 75 but the predicted outcome score is A difference between a predicted score and an observed outcome score is referred to as a residual. The standard deviation of the residuals is the standard error of estimate ( se ) that we discussed previously. The se for the data in Table 1.2 is e This value reflects the inaccuracy of predictions; the larger the value, the less accurate the predictions. A test with strong predictive power will result in relatively small residuals, which will result in a relatively small se e. One Size Fits All: The Common Regression Equation The estimation of predictive bias usually begins by establishing what would happen if no bias exists. If a test is not biased, then one regression equation should be equally applicable to different groups of people. The assumption that different groups share a common regression equation is based on the idea that one size fits all regardless of biological sex, ethnicity, culture, or whichever group difference is being considered, a single regression equation adequately reflects the predictive ability of the test in question. Imagine that you give your aptitude test to a large number of trainee candidates (e.g., 100). Assume there is an equal number of male and female candidates and you want to make sure that your aptitude test is not biased with respect to the biological sex of the candidates. To e 17

18 begin your examination of this issue, you could compute the regression equation based on the data from the entire sample, regardless of sex. Imagine that you found that the intercept from this regression equation is a = 56.03, the slope is b =.58, and the standard error of estimate is se e = These values represent the common regression equation, and they will be called the common intercept, the common slope, and the common standard error of estimate, respectively, Again, if you aptitude test is unbiased in terms of gender, then the common regression equation (calculated from males and females together) should be equally applicable to males and females. To evaluate the presence of predictive bias, additional regression analyses must be conducted. To determine whether the common regression equation is indeed equally applicable to males and females, we must calculate one regression equation for males and a one for females. We must then compare these group-level regression equations with the common regression equation. Three of values that we just discussed can be used to assess predictive test bias: a) the intercept b) the slope, and c) the standard error of estimate. If the group-level values do not match the common regression equation, then you might suspect that your aptitude test scores are biased. In practice, there are a variety of sophisticated statistical analyses that can be conducted on these values to precisely estimate the presence of predictive test bias, but our discussion will focus on the more conceptual level. To elucidate the meaning of various patterns of results, we first focus on the meaning of biased intercepts, then on biased slopes. However, in practice, it may be more likely that groups would differ on both of these elements of prediction rather than being exactly equal on one but differing on the other. Thus, we will also illustrate the effect of bias in terms of intercepts and slopes. 18

19 Intercept Bias Suppose that group-level regression analyses reveal that males and females have slopes and se e values that are similar to the common regression equation, but that their intercept values differ from the common intercept. In this case, you would suspect that your test suffers from intercept bias. For example, imagine that, in your evaluation of your aptitiude test, you conduct regression analyses separately for the 50 males and the 50 females. You find that, for both groups, the slope is b =.58 and the se e is 6.76, which are equal to the common slope and common se e. However, you find the intercept for males is a = and the intercept for females is a = Note that these group-level intercept values differ from the common intercept, indicating that one size does not fit all, at least in terms of the intercept. Thus, the test appears to suffer from intercept bias. What are the implications of intercept bias? The fact that the males intercept is higher than the females intercept indicates that males at any given level of aptitude will tend to receive higher supervisor ratings (outcome scores) than females at the same level of aptitude. To illustrate this, let us compute the predicted outcome score for a male with an aptitude score of 70 and for a female with an aptitude score of 70: Predicted Outcome Score for Male = (70) Predicted Outcome Score for Male = Predicted Outcome Score for Female = (70) Predicted Outcome Score for Female = These computations show that for a male and a female who have the exact same level of aptitude, the male is predicted to obtain a supervisor rating that is 4 points higher than the 19

20 female. If we assume that the supervisor ratings are themselves unbiased (an assumption that we will revisit later in this chapter), then this discrepancy indicates that the aptitude test does not work the same for males and females. As we saw earlier, the common regression equation resulted in a predicted supervisor rating of for a trainee who had an aptitude score of 70. Comparing this result to the results of our group-level predictions, the common regression equation appears to under-estimate the prediction for males and to over-estimate the prediction for females. Thus, one size does not fit all, and the test appears to be predictively biased. If a test suffers only from intercept bias (i.e., the group-level intercepts are not equivalent to the common intercept, but the slopes and se e values are unbiased), then the size of the group discrepancy would be constant across all aptitude scores. We saw a four-point discrepancy for a male and a female who both had an aptitude score of 70, and if the aptitude test suffers only from intercept bias, then the sex difference will be four points at every level of aptitude. This is illustrated in Figure 1.5, which presents a common regression line (dashed) and two group-level regression lines. As this figure illustrates, the lines are parallel, suggesting that a male trainee of a given aptitude level will obtain a predicted rating that is always four-points higher than a female who has the same level of aptitude. Slope Bias A second way in which a test can be predictively biased is through slope bias Suppose that group-level regression analyses reveal that males and females have intercept and se e values that are similar to the common regression equation, but that their slope values differ from the common slope. This would indicate that the connection between predictor scores and outcome scores differs between the two groups. 20

21 For example, imagine that your analyses reveal that, for both groups, the intercept is a = and the se is 6.76, which are equal to the common intercept and common se. However, e you find the slope for males is b =.53 and the slope for females is b =.60. Note that these group-level slope values differ from the common slope (i.e.,.58), indicating that one size does not fit all, in terms of the connection between predictor test scores and outcome scores. Slope bias has important implications for the degree of discrepancy between the groups predicted outcome scores. The fact that the males slope is weaker than the females slope indicates that the amount of bias is not constant across aptitude levels. To illustrate this, let us compute the predicted outcome score for a male with an aptitude score of 70 and for a female with an aptitude score of 70: Predicted Male Outcome Score = (70) Predicted Male Outcome Score = Predicted Female Outcome Score = (70) Predicted Female Outcome Score = This shows that, for a male and female with an aptitude of 70, the female will be predicted to have an outcome score that is 4.9 points higher than the male. Now, let us compute the predicted outcome score for a male and a female who have aptitude scores of 60: Predicted Male Outcome Score = (60) Predicted Male Outcome Score = Predicted Female Outcome Score = (60) Predicted Female Outcome Score = In this case, the female will be predicted to have an outcome score that is 4.2 points higher than the male. Thus, the bias (i.e., the degree to which the predicted outcome score e 21

22 differs for males and females who have the same level of aptitude) is relatively small for relatively low levels of aptitude, but it is larger for higher levels of aptitude. That is, the discrepancy between male and female predicted scores will tend to increase as scores on the aptitude test increase. This type of pure slope bias is illustrated in Figure 1.6, which shows that the regression lines for males and for females gradually move apart. Intercept and Slope Bias So far, we have illustrated pure intercept bias and pure slope bias cases in which either the intercept is biased or the slope is biased, but not both. To summarize, pure intercept bias indicates that there is a discrepancy between groups predicted scores and that the size of this discrepancy does not change as aptitude scores increase or decease in size. In contrast, pure slope bias indicates that the size of this discrepancy does change as aptitude scores increase or decease in size. It is also possible (perhaps even more so than either form of pure bias) for intercept and slope biases to exist simultaneously. In this case, there will be a complex relationship between the size of aptitude scores and the outcome scores for the different groups. For example, we might find that, for people who have low levels of aptitude, the predicted outcome scores for males might be higher than predicted outcome scores for females. But our analyses might also reveal that, for people who have high levels of aptitude, the predicted outcome scores for males might be lower than predicted outcome scores for females. Although there are many patterns of discrepancy that might occur, one possible outcome of this type is illustrated in Figure Standard Error of Estimate The standard error of estimate is a value that represents the accuracy of your prediction of an outcome score based on values from a test used to make the prediction. In our example, we 22

23 are using aptitude test scores to predict scores on a training program effectiveness outcome measure. Our aptitude test would be considered biased if the group-level se e values differed from the common se e value. This bias would indicate that you can make more accurate predictions for people in one of the groups than for people in the other group. Outcome Score Bias Our discussion of predictive bias has focused on the possibility that the scores on the predictor test are biased. However, it is also possible that scores on the outcome variable could be biased. For example, it is possible that the supervisor who provide the post-training ratings of competence are biased in favor of one group and against another. The test we use to measure outcomes such as our 100 item mechanical competency test could also be biased. We have been assuming that the outcome measure is not biased but of course it could be. The Effect of Reliability As a final note, we should acknowledge that the standard error of estimate, the regression coefficient, and the intercept are all sensitive to test reliability. In our discussion of predictive bias we have been assuming high predictor test and outcome test score reliabilities, e.g., R xx greater than.90. A drop in test score reliability can have a profound effect on these parameters and thereby, at least potentially, affect predictive bias. These effects are complex and beyond the scope of our discussion but for interested readers we recommend Jensen (1980). Summary We have been focusing on test bias, which traditionally refers to the possibility that true differences among groups are systematically obscured (or artificially created). Although there are widely-used methods for coping with response biases, the methods that have been proposed for coping with test bias tend to be somewhat controversial and beyond the scope of our current 23

24 discussion. For a recent survey of the issues, interested readers are directed to Sackett, Schmitt, and Ellingson (2001). In sum, the validity of test score interpretation and use is a fundamental concern to behavioral scientists who are interested in psychological measurement. Through decades of conceptual and methodological development, psychometricians, test-users, and test-developers have articulated the meaning and evaluation of validity. Although threats to validity do exist, psychologists others interested in psychological measurement have made great strides in identifying such threats and in developing strategies for detecting, preventing, or minimizing them. Nevertheless, psychological tests should always be used and interpreted with close regard for the theoretical and evidential basis of their meaning and application. 24

25 Table 1.1 Item Discrimination Index Example Item Discrimination Index: Notice that I d DOES NOT depend solely on the proportion of test takers who get the item correct. If you look at item 9 you will see that the I d for that item is.45 which is exactly the same as the I d for item 5 although far fewer people answered item 9 correctly. Although I used 30% to identify the top and bottom groups, the actual percent used to identify these groups does not have to be 30%. You will find that the percentage tends to range from 25% to 33%. Items TOP 30% BOTTOM 30% Proportion Correct-TOP Proportion Correct BOTTOM N top =20 I d = TOP% - BOTTOM% N bottom =20 Item I d

26 Table 1.2 Data for Illustrating Regression Analysis Aptitude Supervisor Predicted Trainee Test Score Rating Supervisor Rating Variance (se e ) =

27 Figure

28 Figure

29 Figure

30 Figure 1.4 Scatterplot and Regression Line for Trainee s Aptitude Scores and Supervisor Ratings Regression Line Supervisor Rating Aptitude Test Score

31 Figure

32 Figure

33 Figure

The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias Glossary of Terms Ability A defined domain of cognitive, perceptual, psychomotor, or physical functioning. Accommodation A change in the content, format, and/or administration of a selection procedure

More information

X = T + E. Reliability. Reliability. Classical Test Theory 7/18/2012. Refers to the consistency or stability of scores

X = T + E. Reliability. Reliability. Classical Test Theory 7/18/2012. Refers to the consistency or stability of scores Reliability It is the user who must take responsibility for determining whether or not scores are sufficiently trustworthy to justify anticipated uses and interpretations. (AERA et al., 1999) Reliability

More information

Hints for Success on the AP Statistics Exam. (Compiled by Zack Bigner)

Hints for Success on the AP Statistics Exam. (Compiled by Zack Bigner) Hints for Success on the AP Statistics Exam. (Compiled by Zack Bigner) The Exam The AP Stat exam has 2 sections that take 90 minutes each. The first section is 40 multiple choice questions, and the second

More information

Regression. In this class we will:

Regression. In this class we will: AMS 5 REGRESSION Regression The idea behind the calculation of the coefficient of correlation is that the scatter plot of the data corresponds to a cloud that follows a straight line. This idea can be

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

More information

11/20/2014. Correlational research is used to describe the relationship between two or more naturally occurring variables.

11/20/2014. Correlational research is used to describe the relationship between two or more naturally occurring variables. Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

More information

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Introduction to Regression. Dr. Tom Pierce Radford University

Introduction to Regression. Dr. Tom Pierce Radford University Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.

More information

where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.

where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis. Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Readings: Ha and Ha Textbook - Chapters 1 8 Appendix D & E (online) Plous - Chapters 10, 11, 12 and 14 Chapter 10: The Representativeness Heuristic Chapter 11: The Availability Heuristic Chapter 12: Probability

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

More information

Distributions: Population, Sample and Sampling Distributions

Distributions: Population, Sample and Sampling Distributions 119 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 9 Distributions: Population, Sample and Sampling Distributions In the three preceding chapters

More information

ACES. Report Requested: Study ID: R08xxxx. Placement Validity Report for ACCUPLACER Sample ADMITTED CLASS EVALUATION SERVICE TM

ACES. Report Requested: Study ID: R08xxxx. Placement Validity Report for ACCUPLACER Sample ADMITTED CLASS EVALUATION SERVICE TM ACES Report Requested: 02-01-2008 Study ID: R08xxxx Placement Validity Report for ACCUPLACER Sample Your College Board Validity Report is designed to assist your institution in validating your placement

More information

AP Statistics 2002 Scoring Guidelines

AP Statistics 2002 Scoring Guidelines AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought

More information

Regression Analysis: Basic Concepts

Regression Analysis: Basic Concepts The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

More information

4. Describing Bivariate Data

4. Describing Bivariate Data 4. Describing Bivariate Data A. Introduction to Bivariate Data B. Values of the Pearson Correlation C. Properties of Pearson's r D. Computing Pearson's r E. Variance Sum Law II F. Exercises A dataset with

More information

Lecture 5: Correlation and Linear Regression

Lecture 5: Correlation and Linear Regression Lecture 5: Correlation and Linear Regression 3.5. (Pearson) correlation coefficient The correlation coefficient measures the strength of the linear relationship between two variables. The correlation is

More information

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88)

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Introduction The National Educational Longitudinal Survey (NELS:88) followed students from 8 th grade in 1988 to 10 th grade in

More information

Measurement Types of Instruments

Measurement Types of Instruments Measurement Types of Instruments Y520 Strategies for Educational Inquiry Robert S Michael Measurement - Types of Instruments - 1 First Questions What is the stated purpose of the instrument? Is it to:

More information

ACES. Report Requested: Study ID: R090xxx. Placement Validity Report for CLEP Sample ADMITTED CLASS EVALUATION SERVICE TM

ACES. Report Requested: Study ID: R090xxx. Placement Validity Report for CLEP Sample ADMITTED CLASS EVALUATION SERVICE TM ACES Report Requested: 08-01-2009 Study ID: R090xxx Placement Validity Report for CLEP Sample Your College Board Validity Report is designed to assist your institution in validating your placement decisions.

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

AP Statistics 2012 Scoring Guidelines

AP Statistics 2012 Scoring Guidelines AP Statistics 2012 Scoring Guidelines The College Board The College Board is a mission-driven not-for-profit organization that connects students to college success and opportunity. Founded in 1900, the

More information

Analysis of Reading Fluency and Comprehension Measures for First

Analysis of Reading Fluency and Comprehension Measures for First Technical Report # 25 Analysis of Reading Fluency and Comprehension Measures for First Grade Students Julie Alonzo Gerald Tindal University of Oregon Published by Behavioral Research and Teaching University

More information

, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients ( Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

More information

Lecture #2 Overview. Basic IRT Concepts, Models, and Assumptions. Lecture #2 ICPSR Item Response Theory Workshop

Lecture #2 Overview. Basic IRT Concepts, Models, and Assumptions. Lecture #2 ICPSR Item Response Theory Workshop Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

7. Tests of association and Linear Regression

7. Tests of association and Linear Regression 7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information

E205 Final: Version B

E205 Final: Version B Name: Class: Date: E205 Final: Version B Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The owner of a local nightclub has recently surveyed a random

More information

A possible formula to determine the percentage of candidates who should receive the new GCSE grade 9 in each subject. Tom Benton

A possible formula to determine the percentage of candidates who should receive the new GCSE grade 9 in each subject. Tom Benton A possible formula to determine the percentage of candidates who should receive the new GCSE grade 9 in each subject Tom Benton ARD, Research Division Cambridge Assessment Research Report 15 th April 2016

More information

EQUATING TEST SCORES

EQUATING TEST SCORES EQUATING TEST SCORES (Without IRT) Samuel A. Livingston Listening. Learning. Leading. Equating Test Scores (Without IRT) Samuel A. Livingston Copyright 2004 Educational Testing Service. All rights reserved.

More information

DATA COLLECTION AND ANALYSIS

DATA COLLECTION AND ANALYSIS DATA COLLECTION AND ANALYSIS Quality Education for Minorities (QEM) Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. August 23, 2013 Objectives of the Discussion 2 Discuss

More information

REGRESSION LINES IN STATA

REGRESSION LINES IN STATA REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

More information

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ 1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

More information

17.0 Linear Regression

17.0 Linear Regression 17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was

More information

Mgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side

Mgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side Mgmt 469 Model Specification: Choosing the Right Variables for the Right Hand Side Even if you have only a handful of predictor variables to choose from, there are infinitely many ways to specify the right

More information

Sales Price

Sales Price Stat 213 Review Questions Note: Just because a topic is not on the review, does not mean that it will not be on the final. Review all the tests, labs, assignments and class notes. 1. A psychologist wished

More information

UNDERSTANDING MULTIPLE REGRESSION

UNDERSTANDING MULTIPLE REGRESSION UNDERSTANDING Multiple regression analysis (MRA) is any of several related statistical methods for evaluating the effects of more than one independent (or predictor) variable on a dependent (or outcome)

More information

Schools Value-added Information System Technical Manual

Schools Value-added Information System Technical Manual Schools Value-added Information System Technical Manual Quality Assurance & School-based Support Division Education Bureau 2015 Contents Unit 1 Overview... 1 Unit 2 The Concept of VA... 2 Unit 3 Control

More information

INTRODUCTION TO MULTIPLE CORRELATION

INTRODUCTION TO MULTIPLE CORRELATION CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATION Chapter 12 introduced you to the concept of partialling and how partialling could assist you in better interpreting the relationship between two primary

More information

Chapter 10 - Practice Problems 1

Chapter 10 - Practice Problems 1 Chapter 10 - Practice Problems 1 1. A researcher is interested in determining if one could predict the score on a statistics exam from the amount of time spent studying for the exam. In this study, the

More information

Alarge part of psychological science is concerned with identifying, cataloguing,

Alarge part of psychological science is concerned with identifying, cataloguing, Chapter 12 Behavioural Genetics: The Study of Differences Martin Lalumière Alarge part of psychological science is concerned with identifying, cataloguing, and explaining individual differences. One does

More information

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

More information

Introduction to Stata

Introduction to Stata Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

More information

LEARNING OBJECTIVES SCALES OF MEASUREMENT: A REVIEW SCALES OF MEASUREMENT: A REVIEW DESCRIBING RESULTS DESCRIBING RESULTS 8/14/2016

LEARNING OBJECTIVES SCALES OF MEASUREMENT: A REVIEW SCALES OF MEASUREMENT: A REVIEW DESCRIBING RESULTS DESCRIBING RESULTS 8/14/2016 UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION LEARNING OBJECTIVES Contrast three ways of describing results: Comparing group percentages Correlating scores Comparing group means Describe

More information

Relating the ACT Indicator Understanding Complex Texts to College Course Grades

Relating the ACT Indicator Understanding Complex Texts to College Course Grades ACT Research & Policy Technical Brief 2016 Relating the ACT Indicator Understanding Complex Texts to College Course Grades Jeff Allen, PhD; Brad Bolender; Yu Fang, PhD; Dongmei Li, PhD; and Tony Thompson,

More information

Sampling Distributions and the Central Limit Theorem

Sampling Distributions and the Central Limit Theorem 135 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 10 Sampling Distributions and the Central Limit Theorem In the previous chapter we explained

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

Misconceptions, Problems, and Fallacies in Correlational Analysis

Misconceptions, Problems, and Fallacies in Correlational Analysis Misconceptions, Problems, and Fallacies in Correlational Analysis James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Misconceptions,

More information

Teaching Multivariate Analysis to Business-Major Students

Teaching Multivariate Analysis to Business-Major Students Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis

More information

Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

EAA492/6: FINAL YEAR PROJECT DEVELOPING QUESTIONNAIRES PART 2 A H M A D S H U K R I Y A H A Y A E N G I N E E R I N G C A M P U S U S M

EAA492/6: FINAL YEAR PROJECT DEVELOPING QUESTIONNAIRES PART 2 A H M A D S H U K R I Y A H A Y A E N G I N E E R I N G C A M P U S U S M EAA492/6: FINAL YEAR PROJECT DEVELOPING QUESTIONNAIRES PART 2 1 A H M A D S H U K R I Y A H A Y A E N G I N E E R I N G C A M P U S U S M CONTENTS Reliability And Validity Sample Size Determination Sampling

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

When Does it Make Sense to Perform a Meta-Analysis?

When Does it Make Sense to Perform a Meta-Analysis? CHAPTER 40 When Does it Make Sense to Perform a Meta-Analysis? Introduction Are the studies similar enough to combine? Can I combine studies with different designs? How many studies are enough to carry

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Basic Concepts in Research and Data Analysis

Basic Concepts in Research and Data Analysis Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the

More information

AP Statistics 1998 Scoring Guidelines

AP Statistics 1998 Scoring Guidelines AP Statistics 1998 Scoring Guidelines These materials are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use must be sought from the Advanced Placement

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

3:3 LEC - NONLINEAR REGRESSION

3:3 LEC - NONLINEAR REGRESSION 3:3 LEC - NONLINEAR REGRESSION Not all relationships between predictor and criterion variables are strictly linear. For a linear relationship, a unit change on X produces exactly the same amount of change

More information

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Introduction In the summer of 2002, a research study commissioned by the Center for Student

More information

Simultaneous Equation Models As discussed last week, one important form of endogeneity is simultaneity. This arises when one or more of the

Simultaneous Equation Models As discussed last week, one important form of endogeneity is simultaneity. This arises when one or more of the Simultaneous Equation Models As discussed last week, one important form of endogeneity is simultaneity. This arises when one or more of the explanatory variables is jointly determined with the dependent

More information

CHAPTER 1 The Item Characteristic Curve

CHAPTER 1 The Item Characteristic Curve CHAPTER 1 The Item Characteristic Curve Chapter 1: The Item Characteristic Curve 5 CHAPTER 1 The Item Characteristic Curve In many educational and psychological measurement situations, there is an underlying

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Psychometrics 101 Part 2: Essentials of Test Score Interpretation. Steve Saladin, Ph.D. University of Idaho

Psychometrics 101 Part 2: Essentials of Test Score Interpretation. Steve Saladin, Ph.D. University of Idaho Psychometrics 101 Part 2: Essentials of Test Score Interpretation Steve Saladin, Ph.D. University of Idaho Standards for Educational and Psychological Testing 15.10 Those responsible for testing programs

More information

Item response theory (IRT) is a second contemporary alternative to classical test

Item response theory (IRT) is a second contemporary alternative to classical test 13-Furr-45314.qxd 8/30/2007 5:44 PM Page 314 CHAPTER 13 Item Response Theory and Rasch Models Item response theory (IRT) is a second contemporary alternative to classical test theory (CTT). Although the

More information

Gender Effects in the Alaska Juvenile Justice System

Gender Effects in the Alaska Juvenile Justice System Gender Effects in the Alaska Juvenile Justice System Report to the Justice and Statistics Research Association by André Rosay Justice Center University of Alaska Anchorage JC 0306.05 October 2003 Gender

More information

Testing Scientific Explanations (In words slides page 7)

Testing Scientific Explanations (In words slides page 7) Testing Scientific Explanations (In words slides page 7) Most people are curious about the causes for certain things. For example, people wonder whether exercise improves memory and, if so, why? Or, we

More information

Step 8: Considering Validity and Discussing Limitations Written and Compiled by Amanda J. Rockinson-Szapkiw and Anita Knight

Step 8: Considering Validity and Discussing Limitations Written and Compiled by Amanda J. Rockinson-Szapkiw and Anita Knight Step 8: Considering Validity and Discussing Limitations Written and Compiled by Amanda J. Rockinson-Szapkiw and Anita Knight Introduction It is important to think about threats to validity prior to planning

More information

AP Statistics 2008 Scoring Guidelines Form B

AP Statistics 2008 Scoring Guidelines Form B AP Statistics 2008 Scoring Guidelines Form B The College Board: Connecting Students to College Success The College Board is a not-for-profit membership association whose mission is to connect students

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

For example, enter the following data in three COLUMNS in a new View window.

For example, enter the following data in three COLUMNS in a new View window. Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Empirical Methods in Applied Economics

Empirical Methods in Applied Economics Empirical Methods in Applied Economics Jörn-Ste en Pischke LSE October 2005 1 Observational Studies and Regression 1.1 Conditional Randomization Again When we discussed experiments, we discussed already

More information

What Does the Correlation Coefficient Really Tell Us About the Individual?

What Does the Correlation Coefficient Really Tell Us About the Individual? What Does the Correlation Coefficient Really Tell Us About the Individual? R. C. Gardner and R. W. J. Neufeld Department of Psychology University of Western Ontario ABSTRACT The Pearson product moment

More information

Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 2 Simple Linear Regression

Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 2 Simple Linear Regression Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 2 Simple Linear Regression Hi, this is my second lecture in module one and on simple

More information

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods

More information

10. Analysis of Longitudinal Studies Repeat-measures analysis

10. Analysis of Longitudinal Studies Repeat-measures analysis Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

More information

www.collegeboard.com Research Report No. 2008-1 Predicting Grades in Different Types of College Courses

www.collegeboard.com Research Report No. 2008-1 Predicting Grades in Different Types of College Courses Research Report No. 2008-1 Predicting Grades in Different Types of College Courses Brent Bridgeman, Judith Pollack, and Nancy Burton www.collegeboard.com College Board Research Report No. 2008-1 ETS RR-08-06

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

interpretation and implication of Keogh, Barnes, Joiner, and Littleton s paper Gender,

interpretation and implication of Keogh, Barnes, Joiner, and Littleton s paper Gender, This essay critiques the theoretical perspectives, research design and analysis, and interpretation and implication of Keogh, Barnes, Joiner, and Littleton s paper Gender, Pair Composition and Computer

More information

Chapter 6 Partial Correlation

Chapter 6 Partial Correlation Chapter 6 Partial Correlation Chapter Overview Partial correlation is a statistical technique for computing the Pearson correlation between a predictor and a criterion variable while controlling for (removing)

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

Asymmetry and the Cost of Capital

Asymmetry and the Cost of Capital Asymmetry and the Cost of Capital Javier García Sánchez, IAE Business School Lorenzo Preve, IAE Business School Virginia Sarria Allende, IAE Business School Abstract The expected cost of capital is a crucial

More information

The Correlation Heuristic: Interpretations of the Pearson Coefficient of Correlation are Optimistically Biased

The Correlation Heuristic: Interpretations of the Pearson Coefficient of Correlation are Optimistically Biased The Correlation Heuristic: Interpretations of the Pearson Coefficient of Correlation are Optimistically Biased Eyal Gamliel, Behavioral Sciences Department, Ruppin Academic Center. E-mail: eyalg@ruppin.ac.il

More information

SOME NOTES ON STATISTICAL INTERPRETATION. Below I provide some basic notes on statistical interpretation for some selected procedures.

SOME NOTES ON STATISTICAL INTERPRETATION. Below I provide some basic notes on statistical interpretation for some selected procedures. 1 SOME NOTES ON STATISTICAL INTERPRETATION Below I provide some basic notes on statistical interpretation for some selected procedures. The information provided here is not exhaustive. There is more to

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results. BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

More information