GENDER BIAS IN STUDENT EVALUTIONS OF TEACHING

GENDER BIAS IN STUDENT EVALUTIONS OF TEACHING John F. Repede, McColl School of Business, Queens University of Charlotte, 1900 Selwyn Ave., Charlotte, NC 28274, (704) 337-2347, repedej@queens.edu M. Cherie Clark, Department of Psychology, Queens University of Charlotte, 1900 Selwyn Ave., Charlotte, NC 28274, (704) 337-2479, clarkc@queens.edu ABSTRACT A large data set of student course evaluations from a private, liberal arts, master s university is analyzed. Both the faculty and the student body are approximately seventy percent female. Some previous published research found higher evaluations or perceptions of women, others found higher scores for men, yet others reported no differences. The effect of institution, discipline, and course has been reported to mediate some but not all gender biases. The data set was analyzed to isolate gender from other factors such as rank and seniority. There was no difference by gender in student evaluation scores of faculty at senior faculty ranks. However, at junior ranks, male faculty received significantly higher teaching evaluations than their female colleagues. A survey was created to assess explicit gender and other bias among the student raters of faculty. Results of the analysis of the data set and the gender bias survey are reported. Implications for future research and the use of student evaluations are discussed. INTRODUCTION Effective teaching has many definitions [9]. As a result, there is little agreement in how to evaluate teaching effectiveness. Agreement centers on the ideas that institutions should first identify the uses of their evaluation system and should formulate their own system that includes student input. In an increasingly customer-centric system of higher education, it is difficult to argue against giving students an opportunity to provide feedback on their perceptions of the education they are receiving. There is a large body of work that questions how student feedback should be collected and interpreted and how that information should be used (e.g. promotion and tenure decisions). An additional body of literature has identified a number of factors that appear to mediate evaluations of faculty, orthogonal to actual teaching performance. These include factors related to the professor such as attractiveness, ethnicity and gender, and factors more relevant to the course such as discipline, level, and institution type. While the literature has been clear to identify these biases as potential contributors to evaluations, administrators often fail to consider bias in interpretation and use of evaluation data. In our university, course evaluations are conducted at the end of each semester for every course. Students are asked to rate their agreement with each of sixteen aspects of the course and instructor on a five-point scale. Contrary to what would be predicted by chance, thirty-four percent of all course evaluations contained the highest scores possible across every item in the evaluation. Although we might aspire to having students who are raving fans, this type of

response confounds other analyses, and may be due to reasons other than the students perception of the highest level of excellence in the course and instructor. In the remaining sections, a brief review of relevant literature from a vast body of work on student course evaluations is presented. We then analyze the differences observed between male and female faculty evaluations. A survey instrument was designed to elicit whether students have a gender bias in evaluating instructors. The findings from an analysis of this survey are presented. Conclusions and implications are discussed. LITERATURE REVIEW A considerable amount of research has focused on the reliability and validity of student evaluations. In an effort to clarify the issues surrounding student evaluations, Peterson, et.al. [11] developed a taxonomy of the literature based on a Journal Storage (JSTOR) search of student evaluations of teaching which returned over 5,200 listings. Using Peterson s taxonomy of the literature, the work reported here fits into the category factors influencing students ratings. They divide this category into teaching-related factors and non-teaching-related factors, with the latter group including semester, course session, faculty type, course level, course focus, and course type. They found that non-teaching factors could cloud the assessment of teaching-related factors. In a specific test of factors that influence evaluations, Felton & Stinson [5] found that about half of the variation in students evaluations of instructor quality could be explained by the students ratings of the instructor s easiness and sexiness. They caution If these findings reflect the thinking of American college students when they complete in-class student opinion surveys, then universities need to rethink the validity of student opinion surveys as a measure of teaching effectiveness [5, p. 91]. Based on a review of the literature, Wright [14] also found that student consumers rate instructors using a different set of criteria than do faculty peers or administrators. He further suggest that students may, in fact, use criteria unrelated to learning and further, prefer styles detrimental to their learning. The work of Shevlin, et. al. [13] suggests that there is a central trait which influences student s evaluations of the lecturer. They used a confirmatory factor analysis model of lecturer ability, module effectiveness, and lecturer charisma. They found that the charisma factor explained 69% of the variance in lecturer ability and 37% of the variance in module attributes. In a review and call for more research on course evaluations, Trout [14] suggested that what numerical forms apparently measure is the degree to which students are happy or satisfied with the instructor (personality), the course (requirements), and the outcome (grade). Feeley [4] defined a halo effect in student evaluations as the individual rater s failure to discriminate among conceptually distinct aspects of a stimulus person s behavior [4, p. 226]. He presents a detailed review of studies of the halo effect in psychological measurement. He found considerable overlap among factors related and unrelated to teaching effectiveness, content, and

teaching behaviors. He concluded that student evaluations of teaching are influenced by a halo effect. Repede, Clark and McGrath [12] found a halo effect that artificially inflated course evaluation. The halo effect was attributed to students perception of instructor likeability and sense of humor. Other factors, however, clearly affect likeability. Results from studies of gender differences in student evaluations of teaching have been complex and at times equivocal. Research in this area was at its peak in the 1980 and 1990s with numerous simulation, survey, and actual teaching evaluation examinations. While some research found higher evaluations or perceptions of women, others found higher scores for men, yet others reporting no differences [7]. The effect of institution, discipline and course appear to mediate some but not all gender biases with women receiving higher ratings in feminine stereotyped courses (e.g., service related courses) and men receiving higher scores in masculine stereotyped courses (e.g., business courses) [1]. Other research found a relationship between student gender and faculty gender with female students rating women faculty higher than men, while male students showed no gender bias [2]. One fairly consistent earlier finding was that classes taught by women contained more interaction and more participation than classes taught by men [7]. Interestingly, in those same studies, classes with more student participation were associated with lower competency ratings for faculty [8]. There is no question about the continued existence of general gender biases favoring men, at least on an implicit level. People are much less willing to openly acknowledge explicit sexism (or racism for that matter). Since faculty evaluations would fall prey to implicit bias, it would make sense that these biases translate to differences in evaluations of men and women faculty. Most researchers have concluded that general gender based attitudes can and do affect student evaluations of faculty. However, there has been little recent research on gender bias in student evaluations of university faculty. This paper presents an examination of both implicit and explicit bias in faculty evaluations. METHODOLOGY Our university is small, liberal arts, master s university. The university uses an in-class survey administered at the end of each semester for students to rate sixteen aspects of the course and instructor (Appendix 1). Because the survey is administered within regular class meetings, it has a response rate exceeding ninety-five percent. The survey was developed in 1994. It was created using questions identified in the literature as exhibiting the highest validity and reliability, at that time. Students rate their agreement with these sixteen items on a five-point Likert scale. The survey is constructed in such a way that all items are positive expressions. A computed variable, total score was also created within this study. Therefore, strongly agreeing (5 points) is the highest rating for each item and the maximum possible is 80 points, representing a perfect score of all fives. A total of 22,224 student evaluations of teaching in undergraduate lecture courses were completed in the fall and spring semesters from fall 2008 through fall 2010. Online, hybrid, and

directed-study courses were excluded from this study. De-identified faculty data were linked to course evaluations. This data included faculty status (full-time vs. part-time and adjunct), faculty seniority (Professor and associate vs. assistant and instructor) and faculty gender. This research has received the approval of the Queens University IRB. An analysis of variance (ANOVA) was initially conducted to explore the main and interaction effects of gender, seniority, and status on the dependent variable total score. The main effect of gender and the interaction effect of gender and seniority were significant (p<.00 for each). The main effect of seniority was not significant (p=.28). No other effects were significant. Based on this result, a t-test of means for total score was conducted, controlling for seniority. Within the subset of evaluations of faculty at the rank of associate or full professor, there was no significant difference between faculty gender (Table 1). For junior faculty, however, the difference of total score by gender was significant (Table 2). A series of t-tests for the mean by gender of each of the sixteen evaluation items was conducted. Students rated male faculty significantly higher than female faculty on every one of the sixteen items (Table 3). TABLE 1: EVALUATION TOTAL SCORE BY GENDER FOR SENIOR FACULTY Gender N Mean Std. Deviation Std. Error Mean significance Total F 3644 71.70 15.552.258.191 M 3458 71.24 13.582.231 TABLE 2: EVALUATION TOTAL SCORE BY GENDER FOR JUNIOR FACULTY Gender N Mean Std. Deviation Std. Error Mean significance Total F 9892 68.52 16.593.167.000 M 2923 70.82 14.733.273

TABLE 3: EVALUATION ITEMS BY GENDER FOR JUNIOR FACULTY Gender N Mean Std. Deviation Std. Error Mean significance R1 F 9892 4.44 1.091.011.000 M 2923 4.53.956.018 R2 F 9892 4.40 1.131.011.002 M 2923 4.48 1.029.019 R3 F 9892 4.38 1.152.012.000 M 2923 4.51 1.019.019 R4 F 9892 4.15 1.291.013.000 M 2923 4.34 1.150.021 R5 F 9892 3.99 1.377.014.000 M 2923 4.24 1.224.023 R6 F 9892 4.28 1.202.012.000 M 2923 4.42 1.060.020 R7 F 9892 4.25 1.191.012.000 M 2923 4.37 1.097.020 R8 F 9892 4.35 1.158.012.000 M 2923 4.48 1.036.019 R9 F 9892 4.26 1.230.012.000 M 2923 4.36 1.129.021 R10 F 9892 4.44 1.134.011.000 M 2923 4.58.993.018 R11 F 9892 4.40 1.200.012.000 M 2923 4.54 1.050.019 R12 F 9892 4.22 1.272.013.000 M 2923 4.37 1.093.020

R13 F 9892 4.34 1.256.013.000 M 2923 4.50 1.066.020 R14 F 9892 4.28 1.272.013.000 M 2923 4.45 1.097.020 R15 F 9892 4.19 1.280.013.000 M 2923 4.30 1.144.021 R16 F 9892 4.15 1.366.014.000 M 2923 4.36 1.192.022 A survey instrument was designed to elicit more explicit gender bias in evaluating instructors. In order to reduce reactivity and to gather data on other areas of bias, student were asked to compare men and women, majority and minority ethnic/race faculty, junior, senior faculty, and old and young faculty. The survey asks students to rate each of these instructors as a group on each of the sixteen items in the course evaluation (Appendix 2). Anonymous demographic data about the student respondent was also collected as part of the survey. The survey was a voluntary, anonymous questionnaire presented to a convenience sample of undergraduate students. Fifty-nine students responded to the survey. The proportions of students by ethnicity, gender, and class are approximately equal to those proportions within the student body as a whole (Table 4) TABLE 4: FREQUENCY OF RESPONSES What is your student status? Freshman Sophomore Junior Senior Total What is your gender Male 5 5 7 4 21 Female 2 3 19 14 38 Total 7 8 26 18 59 ethnicity majority minority Total What is your gender Male 14 6 20 Female 28 9 37 Total 42 15 57

FINDINGS A series of paired t-tests of means was conducted for each of the faculty groups on the evaluation item This was an excellent course. There were no significant differences (Table 5). TABLE 5: THIS WAS AN EXCELLENT COURSE Mean Std. Deviation Std. Error Mean t df Sig. (2- tailed) Pair 1 Part -time/adjunct Faculty - Full-time Faculty Pair 2 Male Faculty - Female Faculty Pair 3 Minority Race/Ethnicity Pair 4 Senior Faculty - Junior Faculty Pair 5 Younger Faculty - Older Faculty.03333 1.14931.14837.225 59.823.15000.68458.08838 1.697 59.095.16667.71702.09257 1.800 59.077 -.03333.51967.06709 -.497 59.621 -.01695.47312.06160 -.275 58.784 A similar test was conducted for the item This was an excellent instructor. The only significant difference was on the dimension of faculty ethnicity (Table 6). TABLE 6: THIS WAS AN EXCELLENT INSTRUCTOR Mean Std. Deviation Std. Error Mean t df Sig. (2- tailed) Pair 1 Part -time/adjunct Faculty - Full-time Faculty Pair 2 Male Faculty - Female Faculty Pair 3 Minority Race/Ethnicity Pair 4 Senior Faculty - Junior Faculty Pair 5 Younger Faculty - Older Faculty.07018.97942.12973.541 56.591.01754.40050.05305.331 56.742.17544.57080.07560 2.320 56.024 -.01724.51269.06732 -.256 57.799 -.06897.55763.07322 -.942 57.350

Based on the significant difference of ethnicity on This is an excellent instructor, minority and majority ethnicity faculty groups were compared across all items. Survey respondents reported that minority faculty demonstrated each and every item on the survey more often (i.e., were better) than majority ethnicity faculty. These differences were significant (p<.05) for seven of the sixteen items (Table 7). TABLE 7: EVALUATION ITEMS BY ETHNICITY Mean Std. Deviation Std. Error Mean t df Sig. (2- tailed) Pair 1 Minority Race/Ethnicity.13333.59565.07690 1.734 59.088 Pair 2 Minority Race/Ethnicity.13333.43048.05557 2.399 59.020 Pair 3 Minority Race/Ethnicity.15000.48099.06210 2.416 59.019 Pair 4 Minority Race/Ethnicity.14754.40150.05141 2.870 60.006 Pair 5 Minority Race/Ethnicity.16667.71702.09257 1.800 59.077 Pair 6 Minority Race/Ethnicity.11667.64022.08265 1.412 59.163 Pair 7 Minority Race/Ethnicity.21667.64022.08265 2.621 59.011 Pair 8 Minority Race/Ethnicity.18644.65586.08539 2.183 58.033 Pair 9 Minority Race/Ethnicity.18333.46910.06056 3.027 59.004 Pair 10 Minority Race/Ethnicity.08333.49717.06418 1.298 59.199 Pair 11 Minority Race/Ethnicity.13333.50310.06495 2.053 59.045

Pair 12 Pair 13 Pair 14 Pair 15 Pair 16 Minority Race/Ethnicity Minority Race/Ethnicity Minority Race/Ethnicity Minority Race/Ethnicity Minority Race/Ethnicity.16949.49663.06466 2.621 58.011.07143.49935.06673 1.070 55.289.07143.49935.06673 1.070 55.289.12069.42209.05542 2.178 57.034.17544.57080.07560 2.320 56.024 CONCLUSIONS AND IMPLICATIONS A number of different results emerged in the data. First, gender differences in evaluations were complex, in keeping with previous literature. In the explicit test of gender bias (survey items), students declared a clear lack of bias on all aspects of teaching. However, on the implicit measures of evaluation, biases were evident, but again complex. Women who hold junior faculty positions were rated lower than junior men, with no similar difference at senior ranks. So while students say they are not biased, in practice they are. This is consistent with the broader literature on explicit and implicit gender and ethnicity bias. It is socially unacceptable to say that men are better than women or that majority ethnicity/race is better than minority/ethnicity. As was the case in early changes in sexism, showing that you favor minority over majority individuals is preferred. This appears to be the case in the survey of explicit biases. Only implicit gender biases were examined in this study and reveal ongoing negative views of women faculty, specifically at the junior level. Gender bias in lower ranks may be much more problematic than at senior levels since course evaluations impact junior faulty more than senior faculty. Senior faculty are more immune to poor evaluations from students since they do not have tenure and promotion looming over them. Junior faculty do have to meet expectations for tenure and promotion and course evaluations are often a heavily weighted factor in these decisions.

Moreover, although there are more women in academia now than in 1980 and 1990s, women still hold disproportionally more lower level (part time, non-tenured) and receive lower salaries than men. Finkelstein [6], in summarizing an extensive review of studies on female faculty, found that women tended to be segregated by discipline and by institutional type; to be disproportionately represented at lower ranks; to get promoted at a slower rate than their male colleagues; to participate less in governance and administration; and to be compensated at a rate that averaged only 85 percent of that of their male colleagues. Newell and Kuh [10], who had conducted a fairly large national survey of professors of higher education, reported that women had generally lower academic-year salaries and heavier teaching loads than men and that they perceived more pressure to publish and were less happy with the structure of their departments. Differences such as these have led some authors to use the term "chilly" to describe the academic climate experienced by women faculty member. Add implicit biases, and junior women faculty have indeed a harder path to success. Another interesting possibility is not to look at junior (usually younger) women faculty as receiving lower evaluations than men or senior faculty but rather as junior male faculty as being advantaged relative to other groups. This finding is interesting in light of the high percentage (70%) female student body. It may be that female students see young male faculty as better than other groups based on general likeability and attractiveness. Young women may see these young men as more attractive and therefore better teachers. This would be consistent with literature finding that evaluations are higher for those faculty perceived as more attractive. In our earlier study of halo effects in teaching evaluations, our analysis suggested that if a university wants only to maximize teaching evaluation scores, then the university should hire likeable faculty who have a good sense of humor [12, p. 849]. The work reported here might suggest that our original suggestion should be amended to young, likeable, male faculty who have good sense of humor. Alternatively, biases could and should be considered in the interpretation and use of evaluation data.

APPENDIX 1: COURSE-INSTRUCTOR EVALUATION (Strongly Disagree, Disagree, Neither Agree Nor Disagree, Agree, Strongly Agree) 1. Course objectives were clearly stated on the syllabus 2. Course requirements and grading system were clearly stated on the syllabus 3. Specific course content was related to the overall course objectives 4. This course significantly increased my understanding of the subject 5. This was an excellent course 6. Tests, projects, presentations, and papers were graded fairly 7. Tests and assignments were returned promptly 8. The instructor was well prepared for class 9. The instructor used class time productively 10. The instructor demonstrated knowledge of the subject 11. The instructor showed enthusiasm and genuine interest in this course 12. The instructor demanded the best work possible from me 13. The instructor was courteous and respectful to students 14. I felt free to express ideas and ask questions in class 15. The instructor was available outside of class for help 16. This is an excellent instructor (Sample Question) APPENDIX 2: RESPONSE STYLE SURVEY

REFERENCES [1] Basow, S. A. Student Evaluations: The Role of Gender Bias and Teaching Styles. In Career Strategies for Women in Academe: Arming Athena. Eds Lynn H. Collins, Joan C. Chrisler, and Kathryn Quina. (1998) Thousand Oaks: Sage Publications. [2] Bennett, S. K. Student Perceptions and Expectations for Male and Female Instructors: Evidence Relating to the Question of Gender Bias in Teaching Evaluation. Journal of Educational Psychology. 74/ 2 (1982): 170-179. [3] Cooper, W.H. Ubiquitous Halo. Psychological Bulletin, 1981, 90, 218-244. [4] Feely, Thomas H. Evidence of Halo Effects in Student Evaluations of Communication Instruction. Communication Education, 2002, 51(3), 225-236. [5] Felton, J., Mitchell, J. & Stinson, M. Web-based Student Evaluations of Professors: The Relations Between Perceived Quality, Easiness and Sexiness. Assessment & Evaluation in Higher Education, 2004, 29(1), 91-108. [6] Finkelstein, M. J. "The Status of Academic Women: An Assessment of Five Competing Explanations." Review of Higher Education, 7 (1984), 233-46. [7] Goodwin, L.D. & Stevens, E.A. The influence of gender on university faculty members perceptions of good teaching. The Journal of Higher Education, 1993, 64(2), 166-185 [8] Macke, A. S., L. W. Richardson, and J. Cook. Sex-typed Teaching Styles of University Profesors and Student Reactions. Columbus: The Ohio State University Research Foundation, 1980. [9] Marsh, H. W. Weighting for the Right Criteria in the Instructional Development and Effectiveness Assessment (IDEA) System: Global and Specific Ratings of Teaching Effectiveness and their Relations to Course Objectives. Journal of Educational Psychology, 1994, 86, 631-648. [10] Newell, L. J., and G. D. Kuh. "Taking Stock: The Higher Education Professoriate." Review of Higher Education, 13 (1989), 63-90. [11] Peterson, Richard L., Berenson, Mark L., Misra, Ram B., and Radosevich, David J. An Evaluation of Factors Regarding Students Assessment of Faculty in a Business School, Decision Sciences Journal of Innovative Education, 2008, 6(2), 375-402. [12] Repede, J., Clarke, C., and McGrath, R. Can Evaluations of Teaching be Too Good? Proceedings of the Southeast Decision Sciences Institute, 2011. 843-851.

[13] Shevlin, M., Banyard, P., Davies, M. & Griffiths, M. The Validity of Student Evaluation of Teaching in Higher Education: Love me, love my lectures? Assessment & Evaluation in Higher Education, 2000, 25(4), 397-405. [14] Trout, P. Flunking the Test: The Dismal Record of Student Evaluations. Academe Online: Magazine of the AAUP, 2000, 86(4). [15] Wright, R. Student Evaluations of Faculty: Concerns Raised in the Literature, and Possible Solutions. College Student Journal, 2006, 40(2), 417-422.