1 Calibrated Peer Review Essays Increase Student Confidence in Assessing Their Own Writing By Lauren Likkel The online writing software Calibrated Peer Review (CPR) is a useful tool for assigning writing assignments in large college classes. In this system, students submit essays online and are guided in how to rate essays using criteria written by the instructor. The instructor does not have to grade the essays, and CPR has educational benefits that make it desirable. I used the CPR system for four essays in the largest section of my undergraduate astronomy class, and to create a comparison group, I assigned the same essays in the sections of the course that did not use the CPR system. I surveyed students at the beginning and end of the semester about confidence in evaluating their own essays and confidence in their ability to write a good essay. I found that students who were not already confident in the ability to evaluate their own writing tended to increase their confidence after using the CPR system. There was no increase in confidence in the comparison group, indicating that the CPR system helps students become more confident at assessing their own writing. When I first tried written assignments in my large-enrollment classes, the time it took to grade them was prohibitive. I wanted to use writing assignments because they improve writing ability and help students learn course content (e.g., Felder & Brent, 1992). But at primarily undergraduate institutions like mine, we do not have graduate students to grade papers for us, so it is difficult to include significant writing assignments in a large class. However, I found that I can include essay assignments without an overwhelming grading burden if I use the Calibrated Peer Review (CPR) web-based tool for teachers ( With CPR, students submit an essay online, which is scored by three other students; I do not do the grading. I now use CPR for five short writing assignments in my 120-student introductory astronomy course and in my online version of the course. Although CPR is useful for making writing assignments practical in large classes, the research I present here was inspired by the educational benefits of CPR. One useful aspect of the CPR system is that students are asked to reflect on their own work, and the instructor can allocate part of their score on how accurately they assess the quality of their own essay. I wondered if students using CPR became more confident in their ability to judge the quality of their own writing. Students must believe that they can evaluate what they have written if they are going to develop proficient writing skills, an important outcome of a quality liberal arts education. In this article, I present results of my research investigating the effect of CPR assignments on student perception of their ability to evaluate their own essays. 42 Journal of College Science Teaching
2 Calibrated Peer Review Essays Increase Student Confidence CPR CPR was developed by the University of California, Los Angeles (UCLA) Chemistry Department, supported by the National Science Foundation and Howard Hughes Medical Institute, as a way to incorporate writing into large enrollment classes (Russell, Chapman, & Wegner, 1998). CPR is not discipline specific and has been used in the teaching of most subjects, for example, biology (Robinson, 2001; Gerdeman, Russell, & Worden, 2007), physiology (Pelaez, 2002), neuroscience (Prichard, 2005), geology (Heise, Palmer-Julson, & Su, 2002), and engineering (Carlson & Berry, 2003). Briefly, here s how the CPR system works: The instructor develops instructions and resources for the essay assignment, crafts questions for a rubric to guide the student in evaluating the essays, writes three sample calibration essays, and decides how to award points for each part of the CPR system. Students write the essay; evaluate the calibration essays and see how their ratings match the instructor s; evaluate three peer essays (anonymously to each other); and finally, evaluate their own essay. A full description of CPR is available at ucla.edu, and reviews are available (e.g., Prichard, 2005; Strong, 2008; Teacher Education, 2003; Walvoord, Hoefnagels, Gaffin, Chumchal, & Long, 2008). Actively involving the student in the material leads to learning gains well beyond the straight presentation of facts in the classroom (e.g., McDermott, 1993), and CPR is an effective tool for teaching specific concepts. In the CPR system, important concepts are embedded in the grading rubric and the three calibration essays that the instructor writes, and they may be contained in the three peer essays that the student reviews. The student must be engaged with the material to complete the assignment and, after multiple exposures to the concepts, will know the material at a deeper level. In fact, students tend to score higher on traditional exams when they use the CPR system in their coursework (Enders, Jenkins, & Hoverman, 2010; Pelaez, 2002; Stokstad, 2001). Both clear writing and evaluative thinking are an integral part of the CPR system. In one study, students using CPR improved their technical writing and critical-thinking skills more from the first assignment to the second than if they had completed traditional assignments (Heise et al., 2002). Several studies have reported that using CPR facilitated improvement of writing and reviewing skills, especially for students who initially performed at lower levels (e.g., Gunersel & Simpson, 2009, and references therein). However, results are not consistent, with some studies finding no improvement in technical writing skills (Walvoord et al., 2008). Studies have shown that students using the CPR system improve their ability to correctly evaluate their own writing (Gerdeman et al., 2007; Stokstad, 2001). The research project I surveyed students about their ability to evaluate their own writing, both before and after using the CPR system for four essay assignments (Table 1). I was interested in their perception of that ability, which I describe as their confidence in evaluating their own work. I included a comparison group to compare the effect of CPR against the use of traditional writing assignments. At the beginning and end of the semester (September and December 2005), students filled out a multiple-choice survey that included questions about the confidence they had in evaluating their own essays. The results were confidential, but students names were associated with their responses so that I could track how individual students changed their TABLE 1 The four essay topics. (1) Sunset/sunrise location where on the horizon the sun rises and sets at different times of the year Concept: Seasonal changes relating to equinoxes and solstices (2) Should Intelligent Design be taught in a high school science class? Concept: What composes a scientific theory (3) Evidence for the Big Bang Theory Concept: Lines of evidence that led to the Big Bang Theory, were predicted by it, or that support it (4) Write an astronomy essay for a nonscientific audience Concept: Summarize an astronomy topic (specific topic not assigned) aimed at a popular level TABLE 2 The survey questions and possible responses. 1. When you write an essay, can you tell if it is a good essay? (A) Yes, I have a good idea of the quality. (B) I am not usually sure, but have a general idea. (C) I am not very confident about how good it is. (D) No, I can t really tell if it will be graded high or low. (E) I choose not to answer this. 2. How skilled do you feel you are at assessing your own writing? (A) I can read my own written work and know its quality. (B) I have a good idea of the quality of my own written work. (C) I don t usually have a good idea of the quality of my own written work. (D) I cannot read my own written work and know its quality. (E) I choose not to answer this. 3. Do you know how to write a good essay? (A) Yes. (B) I have learned to write an essay fairly well. (C) I haven t really learned how to write a good essay. (D) No. (E) I choose not to answer this. Vol. 41, No. 3,
3 responses by the end of the semester. Only responses from students who agreed to participate and who took both surveys were included in the data analysis. The two key questions used in the study (Table 2) were When you write an essay, can you tell if it is a good essay? and How skilled do you feel you are at assessing your own writing? These questions are similar in order to provide a check on reliability. A third question was Do you know how to write a good essay? The questions were posed in multiple-choice format, with the answer choices listed in Table 2. The participants in the study were students in three sections of the introductory astronomy course (Survey of Astronomy) at the University of Wisconsin Eau Claire (enrollment at this university is about 10,000 students). I taught all three sections and used the same material and assignments for each, including the same four essay assignments (Table 1). One section (the ) used the CPR system for the essays. The comparison group, comprised of students in two smaller sections, did not use the CPR system. The comparison group s essays were graded with the same criteria used in the CPR system, and students were provided a score and a few written comments. Only students who completed at least three of the essay assignments were included in the study, 104 students in the and 34 in the comparison group. Results The student responses to the three research questions on the surveys are in Tables 3 5. The possible responses to the questions (Table 2) indicate confidence levels, ranging from A as most confident to D as not confident. The changes in individual student s responses on the two surveys are also tabulated as more positive, no change, or more negative. For example, if a student chose answer B to a question on TABLE 3 Question 1: When you write an essay, can you tell if it is a good essay? a. Survey responses. Question 1 First survey Last survey Response confidence non- non- Confident 52% (55) 48% (16) 77% (80) 52% (17) Fairly confident 38% (39) 42% (14) 19% (20) 33% (11) Not very confident 9% (9) 6% (2) 4% (4) 9% (3) Not confident 1% (1) 3% (1) 0% (0) 6% (2) b. How students changed responses by the end of the semester. semester all students (104 students) non- (33 students) More positive 34% (35) 18% (6) No change 59% (61) 61% (20) More negative 8% (8) 21% (7) c. Changed responses after excluding the most confident students. semester less confident* (54 students) non- (21 students) More positive 65% (35) 29% (6) No change 20% (11) 38% (8) More negative 15% (8) 33% (7) Note: CPR = Calibrated Peer Review. *excludes students who were confident at both beginning and end the first survey and chose answer A on the final survey, the change was more positive, but if the student chose B on the first survey and the final survey answer was C or D, the change was more negative. Because not all students answered all three questions on both surveys, there are slightly different total numbers of students for each question. The percentages in the tables have been rounded so may not total to 100%. To evaluate the differences between groups and between surveys, I used a χ 2 test of independence with two degrees of freedom (Preacher, 2001). Survey question: When you write an essay, can you tell if it is a good essay? For this survey question (Table 2), the responses of the and the comparison group were statistically equal on the first survey. For example, about half of each group answered (A) Yes, I have a good idea of the quality (Table 3a). By the end of the semester, the two groups clearly differed (p <.003), with 77% of the now selecting (A) but still only 52% of the comparison group answering (A) (Table 3a). Comparing how each group changed from the first survey to the last survey showed a clear change for the (p <.004) but no statistical change in the responses from the comparison group. Across the two surveys (Table 3b), I found that 34% of the CPR group responded more positively at the end of the semester than they had at the beginning, and 8% responded less positively (59% did not change their answers). In contrast, only 18% of the comparison group responded more positively at the end of the semester, and 21% ended less confident in their ability to tell if they had written a good essay (61% didn t change their answers, similar to the ). 44 Journal of College Science Teaching
4 Calibrated Peer Review Essays Increase Student Confidence Comparing statistically how the individual students changed their answers (more positive, no change, or less positive) shows that students in the changed differently from those in the comparison group (p <.05). Students who did not change their answer to this question by the second survey usually selected the most positive answer for both surveys. They included half of the (50 of 104 students) and a third of the comparison group (12 of 33 students). I did a comparison excluding these students to see the effect the CPR system had on the confidence level of the less-confident students (Table 3c). After excluding students who chose the most positive response at the beginning (and end) of the semester, I found that 65% of students in the CPR group moved to a more positive response by the end of the semester, compared with only 29% of the comparison group. Comparing how these less confident students changed their answers shows that students in the changed differently from those in the comparison group (p <.02). Survey question: How skilled do you feel you are at assessing your own writing? For this question, only 17% in the and 9% in the comparison group chose the most positive answer at the beginning of the semester (Table 4a). But by the end of the semester, 42% of the responded that way, compared with 18% of the comparison group. Statistically, the groups were the same on the first survey but were clearly different on the last survey (p <.03). Comparing how each group changed from the first survey to the last survey showed a dramatic change for the (p <.001), but no TABLE 4 Question 2: How skilled do you feel you are at assessing your own writing? a. Survey responses. Question 2 First survey Last survey Response confidence non- non- Confident 17% (17) 9% (3) 42% (42) 18% (6) Fairly confident 68% (69) 64% (21) 52% (53) 67% (22) Not very confident 15% (15) 24% (8) 6% (6) 12% (4) Not confident 0% (0) 3% (1) 0% (0) 3% (1) b. How students changed responses by the end of the semester. semester all students (101 students) non- (33 students) More positive 38% (38) 21% (7) No change 57% (58) 79% (26) More negative 5% (5) 0% (0) c. Changed responses after excluding the most confident students. semester less confident* (88 students) non- (30 students) More positive 43% (38) 23% (7) No change 51% (45) 77% (23) More negative 6% (5) 0% (0) Note: CPR = Calibrated Peer Review. *excludes students who were confident at both beginning and end TABLE 5 Question 3: Do you know how to write a good essay? a. Survey responses. Question 3 First survey Last survey Response confidence non- non- Confident 43% (45) 28% (9) 61% (63) 34% (11) Fairly confident 51% (53) 63% (20) 37% (38) 63% (20) Not very confident 5% (5) 9% (3) 3% (3) 3% (1) Not confident 1% (1) 0% (0) 0% (0) 0% (0) b. How students changed responses by the end of the semester. semester all students (104 students) non- (32 students) More positive 24% (25) 22% (7) No change 73% (76) 69% (22) More negative 3% (3) 9% (3) c. Changed responses after excluding the most confident students. semester less confident* (61 students) non- (26 students) More positive 41% (25 ) 27% (7) No change 54% (33) 62% (16) More negative 5% (3) 12% (3) Note: CPR = Calibrated Peer Review. *excludes students who were confident at both beginning and end Vol. 41, No. 3,
5 significant change in the responses for the comparison group. Across the surveys, 38% of students in the responded more positively at the end of the semester than at the beginning (57% didn t change their response), compared with only 21% in the comparison group (and a full 79% didn t change their response; Table 4b). How individual students changed their answers (more positive, no change, or less positive) was compared statistically, and no significant difference between groups was found. However, about 10% of students selected the most positive response on both the first and last survey (13 students out of 101 for the and 3 of 33 in the comparison group). After excluding these students, 43% of the students, but only 23% of the comparison group, changed to a more positive response at the end of the semester (Table 4c). This shows a clear difference, with a significantly larger percentage of initially nonconfident students in the gaining confidence than in the comparison group (p <.04). Survey question: Do you know how to write a good essay? For this question, the groups were not statistically different on the first survey, but on the last survey there was a small but statistically significant difference between the and the comparison group (p <.05; Table 5a). Comparing how each group changed from the first survey to the last survey showed no significant change for either group. Across surveys, the fraction of students in each group who changed to a more positive response was about the same, 24% and 22% for the CPR and comparison groups, respectively (Table 5b). Excluding students who chose the most positive choice for both surveys, 41% in the moved to a more positive response, compared with 27% for the comparison group (Table 5c). The difference in how individual students changed their responses between the CPR and comparison groups on this question is not statistically significant. Thus, using the CPR system didn t have a big effect on students opinion of their knowledge of how to write a good essay. This is in contrast to the results of the other two questions, which show that many students in the changed their perception of their self-assessment skill, in particular students who initially had lower confidence. Discussion I found that using the CPR system for essay assignments positively influenced many of the students perception of their ability to accurately assess what they have written. In contrast, there was no statistically significant change seen for the group that wrote essays but did not use the CPR system. I was surprised to find such a clear effect with only four CPR essay assignments. Students who used CPR, if not already confident about that ability, were about twice as likely to show increased confidence as students whose essays were scored with traditional grading. The result appears to be due to the use of the CPR system and not to the fact that they wrote essays or otherwise become more confident during the course of the semester. A component of the CPR system that may have caused the increase in confidence is the intense student use of an instructor-provided grading rubric for each assignment. In a CPR assignment, students received guidance on how to evaluate the essay as well as a score and minor comments. Students in the comparison group did not see the rubric and did not evaluate their essay or any other essay. They received scores on their work and some written comments, but apparently that feedback on the quality of their essays was not sufficient to increase their confidence in evaluating their work. In the CPR system, students gain experience with assessing essays and must fully examine the rubric for each assignment. They must evaluate three sample essays, comparing their ratings to those of the instructor; evaluate three peer essays; and finally, rate their own essay. Understanding and using the rubric may be responsible for increasing students confidence in evaluating their own essay. This may occur because they learn to focus on specifics when evaluating an essay, or they see that a rubric relates to the instructions given for writing an essay, or they recognize their misconception that grading essays is subjective. An integral part of the CPR system is that students must evaluate their own essays and compare their evaluation with the ratings of others. This self-assessment component might also be responsible for building confidence in self-evaluation. Reflecting on and rating their own essay is perhaps the key to building confidence in evaluating their own work, but students are rarely asked to carefully rate their work and compare that to the ratings of others. It is possible that the way the rubrics are constructed influences the research results in studies involving CPR. Each set of guidelines for grading was specific to the assignment, with assignment-related questions such as Did the author avoid the common misconception that...? and Did the author make the point that...? I did not investigate whether either the form or content of the rubrics was related to increasing student confidence. It is also possible that the educational benefits of the CPR system could be gained with carefully constructed assignments and procedures. This is of particular interest if an instructor did 46 Journal of College Science Teaching
6 Calibrated Peer Review Essays Increase Student Confidence not plan to use CPR but wanted to have the same educational benefits. Future research could investigate which of the writing and reviewing components of the CPR system are responsible for the improved student outcomes, as suggested for example by Gerdeman et al. (2007). Conclusion In the semesters since I collected the data for this research, I have continued to use CPR in my Survey of Astronomy classes of all size enrollments. In addition to the educational benefits of CPR cited earlier that CPR assignments help students learn the material and develop thinking, writing, and reviewing skills I have shown from the research reported here that many students also become more confident in their ability to evaluate the quality of their own work. These results bolster the already strong reasons why instructors might consider using the CPR system for their classes. Details on implementation can be important for learning outcomes and student satisfaction (e.g., Walvoord et al., 2008). It takes a couple of hours to draft a CPR assignment and several more hours to carefully develop it, but once created it can be used in future classes. A carefully constructed grading rubric is critical to prevent students from being unhappy about perceived uneven grading. Student reaction to CPR will be better if you speak of its benefits and carefully explain in class or in writing how to get started. It takes some instructor time to deal with problems that arise, but students will respond better to the system if you are available to help and are flexible on deadlines. It is important to emphasize to your students that you use the CPR system to help them learn the material and to develop thinking, writing, and reviewing skills; they may otherwise think you are trying to avoid grading. The CPR system has documented educational benefits relative to traditional grading of assignments. There is still no cost to use it and only a modest cost expected for use of future versions, and it reduces the grading burden on the instructor. CPR allows me to have valuable writing assignments in large classes and have confidence that students benefit from the assignments in multiple ways. n Acknowledgment Support through the University of Wisconsin Eau Claire s Center for Teaching and Learning was instrumental in the completion of this work. References Carlson, P. A., & F. C. Berry. (2003, November). Calibrated Peer Review and assessing learning outcomes. In Proceedings of the 33rd ASEE/IEEE Frontiers in Education Conference (pp. F3E1 F3E6). Boulder, CO: Frontiers in Education. Enders, F. B., Jenkins, S., & Hoverman, V. (2010). Calibrated Peer Review for interpreting linear regression parameters: Results from a graduate course. Journal of Statistics Education, 18(2), Felder, R. M., & Brent, R. (1992). Writing assignments: Pathways to connections, clarity, creativity. College Teaching, 40, Gerdeman, R. D., Russell, A. A., & Worden, K. J. (2007). Web-based student writing and reviewing in a large biology lecture course. Journal of College Science Teaching, 36(5), Gunersel, A. B., & Simpson, N. (2009). Improvement in writing and reviewing skills with Calibrated Peer Review. International Journal for the Scholarship of Teaching and Learning, 3(2), Heise, E. A., Palmer-Julson, A., & Su, T. M. (2002). Geological Society of America, Abstracts with Programs, 34, A-345. McDermott, L. C. (1993). How we teach and how students learn a mismatch? American Journal of Physics, 61, Pelaez, N. J. (2002). Problem-based writing with peer review improves academic performance in physiology. Advances in Physiology Education, 26, Preacher, K. J. (2001). Calculation for the chi-square test: An interactive calculation tool for chi-square tests of goodness of fit and independence [Computer software]. Available from Prichard, J. R. (2005). Writing to learn: An evaluation of the Calibrated Peer Review program in two neuroscience courses. The Journal of Undergraduate Neuroscience Education, 4, A34 A39. Robinson, R. (2001). An application to increase student reading and writing skills. The American Biology Teacher, 63, Russell, A. A., Chapman, O. L., & Wegner, P. A. (1998). Molecular science: Network-deliverable curricula. Journal of Chemical Education, 75, Stokstad, E. (2001). Reading, writing, and chemistry are potent mix. Science, 293(5535), Strong, K. E. (2008). CPR: Adopting an out-of-discipline innovation. College Teaching Methods & Styles Journal, 4, Teacher Education. (2003). Review of Calibrated Peer Review. Available at CompositeReview.htm?id= Walvoord, M.E., Hoefnagels, M. H., Gaffin, D. D., Chumchal, M. M., & Long, D. A. (2008). An analysis of Calibrated Peer Review (CPR) in a science lecture classroom. Journal of College Science Teaching, 37(4), Lauren Likkel is a professor in the Physics and Astronomy Department at the University of Wisconsin Eau Claire. Vol. 41, No. 3,
7 Copyright of Journal of College Science Teaching is the property of National Science Teachers Association and its content may not be copied or ed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or articles for individual use.