Computer-based testing: An alternative for the assessment of Turkish undergraduate students

Available online at www.sciencedirect.com Computers & Education 51 (2008) 1198 1204 www.elsevier.com/locate/compedu Computer-based testing: An alternative for the assessment of Turkish undergraduate students Omur Akdemir a, *, Ayse Oguz b a Assistant Professor and Chair of the Computer Education, and Instructional Technology Department, Zonguldak Karaelmas University, Eregli Education Faculty, Kdz. Eregli, Zonguldak, Turkey b Assistant Professor, Mugla University, Education Faculty, Science Education Department, Kotekli-Mugla, Turkey Received 28 March 2007; received in revised form 14 November 2007; accepted 15 November 2007 Abstract Virtually errorless high speed data processing feature has made computers popular assessment tools in education. An important concern in developing countries considering integrating computers as an educational assessment tool before making substantial investment is the effects of computer-based testing on students test scores as compared to paperand-pencil tests. This study investigated whether test scores of Turkish students were different in the computer-based test and in the paper-and-pencil test, with forty-seven undergraduate students studying at a public university located in the Blacksea region of Turkey. Findings of this study showed that test scores of undergraduate students were not different in the computer-based test and in the paper-and-pencil test which led us to reach the conclusion that computer-based testing can be considered as a promising alternative technique for the undergraduate students in Turkey. Ó 2007 Elsevier Ltd. All rights reserved. Keywords: Computer-based tests; Paper-based test; Test performance 1. Introduction The use of computers had become popular in education in the last decade. Computers are used in education for many purposes ranging from presenting the course contents to student assessment. Virtually errorless high speed data processing feature has made computers accepted assessment tools in education. Accordingly, computer-based testing (CBT) has become one of the most common forms of testing since 1990s (Educational Testing Service., 1992). Computer-based testing has been developing quickly since then as new question formats, alternative models of measurement, improvements in test administration, immediate feedback to test takers, and more efficient information gathering are possible through using computers (Mills, 2002; Wise & Plake, 1990). A growing trend among organizations and test developers is increasingly moving their paperand-pencil tests to computer-based tests (Mills, 2002). Although gradually more and more computer-based * Corresponding author. Tel.: +90 5369711033. E-mail addresses: omurakdemir@gmail.com (O. Akdemir), ayseoguz@mu.edu.tr (A. Oguz). 0360-1315/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.compedu.2007.11.007

O. Akdemir, A. Oguz / Computers & Education 51 (2008) 1198 1204 1199 tests are used as an assessment tool, some concerns have been raised for the extent to which whether individuals scores are different in the computer-based tests and paper-based tests. Earlier computerized measurement studies were concentrated on comparing individuals scores on the Internet-based surveys to surveys completed with paper-and-pencil (e.g., Church, 2001; Mead & Drasgow, 1993; Potosky & Bobko, 1997). Later on, studies intended to compare participants performance in the computer-based tests and paper-based tests. Despite the many likely advantages of the computer-based testing, there are potentially unexpected disadvantages of using computers for testing (e.g., Bunderson, Inouye, & Olsen, 1989; Mazzeo & Harvey, 1988). While several possible disadvantages of computers have been stated in various studies, American Psychological Association (American Psychological Association. (APA), 1986) pointed out that computer-based testing and paper-based testing versions of the Graduate Record Exam (GRE) showed extremely tight equivalence between scores based on the two versions. Similarly, Bugbee (1996) research showed the equivalence of paper-and-pencil-administered tests and computer-based tests. Bugbee made a number of general conclusions about the equivalence of CBTs and paper-and-pencil tests. Bugbee (1996) study listed the following conditions that should be taken into consideration before considering computer-based testing: CBTs can be equivalent to paper-and-pencil tests Special considerations are necessary when computers are used for tests Test users must have at least a basic understanding of the computer and its vicissitudes, in conjunction with the knowledge of the psychometric properties of the test, to effectively use computer testing. Several research studies have been conducted to investigate whether test takers prior computer experience has any effects on the scores of test takers as compared to their scores on paper-based testing. Findings of research studies are contradictory. Some studies reported that computer unfamiliarity was related to lower test performance in computerized tests for students with no previous experience with computers (Bugbee, 1996; Lee, Moreno, & Sympson, 1986). However, other studies found no relationship between computer inexperience or anxiety and performance on computerized versus paper-based testing (Mills, 2002; Smith & Caputi, 2004; Wise & Plake, 1989; Wise & Plake, 1990). In a similar study, Lee et al. (1986) found that there was no significant difference in performance among the test takers who had no previous experience with computers. Correspondingly, Powers and O Neill (1993) also found that extra assistance from a test supervisor on a CBT did not have a noticeable effect on test performance. In their research on the effects of administering tests via computers, Wise and Plake (1989) found that computer anxiety and computer experience did not significantly affect CBTs scores. In their extensive review of the literature, Bunderson et al. (1989) found that about half of the studies indicated the equivalence of computerized tests and paper-based tests. In the following years, Eid (2004) study revealed that students achieved similar scores for math problem-solving tests administered on the computer and on paper. Computer experience and anxiety did not affect students online test scores. Even though numerous studies found that computer experience did not directly affect the test performance, some studies still claimed that computer experience and other informal factors such as computer anxiety, and computer attitude influence the performance in computerized tests (Chua, Chen, & Wong, 1999; Mahar, Henderson, & Deane, 1997). Furthermore, studies conducted with primary school students presented that computer versions of tests were more difficult than paper versions (Choi & Tinkler, 2002; Pomplun & Custer, 2005). Three equivalences between conventional and CBTs that should be taken into consideration listed in the guidelines of American Psychological Association (American Psychological Association. (APA), 1986) are as follows: (1) descriptive statistics: means, variances, distributions, and rank orders of scores; (2) construct validity; (3) reliability. Further studies offer additional similarities such as identical reliability, and have comparable correlations with other variables (Bartram, 1994; Mazzeo & Harvey, 1988). Investigating possible factors that have an effect on the performance of test takers, studies claimed that computers and computer testing hardware and software are other factors that mainly affect the performance of the computerized tests since these factors have limited the usefulness of the computer-based testing format (Olsen, Cox, Price, Strozeski, & Vela, 1990). Moreover, the instruction for the computer-based tests must not

1200 O. Akdemir, A. Oguz / Computers & Education 51 (2008) 1198 1204 be taken from the paper-based version, but should be adapted to the computer (Kyllonen, 1991). In addition, studies presented that some factors that may be ignored in studies such as the presentation of stimuli on a computer monitor, size of monitor, screen refresh rate and even two computerized forms of the same test created by different companies can affect the test takers performance on computerized tests (Krantz, 2000; Kveton, Jelinek, Voboril, & Klimusova, 2004). Incompatible findings of studies lead to the conclusion that the effects of using computers as an assessment tool will continue to be discussed. Computers are becoming an unavoidable part of individuals daily life. Everyday more and more services and functions are becoming automated with computers in colleges, universities, libraries, and offices. It is not surprising, then, that in the near future computers and applications will be an indispensable tool for a variety of educational purposes but the question still remains whether test scores of students are different in the computer-based test and in the paper-based test. The purpose of this study is to investigate whether test scores of selected Turkish undergraduate students are different in the computer-based test and in the paper-based test. The following research questions were formulated for the purpose of investigation: 1. Are the test scores of selected Turkish undergraduate students different in the computer-based test and in the paper-based test? 2. Are the test scores of selected male Turkish undergraduate students different in the computer-based test and in the paper-based test? 3. Are the test scores of selected female Turkish undergraduate students different in the computer-based test and in the paper-based test? 2. Method 2.1. Instructional context The study was conducted at a public university located in the Blacksea region of Turkey with undergraduate students enrolled in the department of Primary School Teaching and the department of Turkish Language and Literature. Students of these departments had completed the educational measurement course before the study was conducted offered by the same instructor in the spring semester. This investigation focused on determining whether the test scores of Turkish students are different in the computer-based test and in the paper-based test. 2.2. Participants The participants were student teachers enrolled at the four year public college. The student populations of the school were selected to the college according to their scores in the nationwide centralized university entrance exam and their preferences. Generally coming from middle class working families, students come to the college from the different parts of the nation. Data were collected in the Fall of 2006 and included forty-seven junior student teachers selected randomly from the department of Primary School Teaching and the department of Turkish Language and Literature. 2.3. Instrument Two versions of the multiple-choice test with thirty questions investigating students knowledge for the educational measurement were developed. Content validity of the developed multiple-choice test was ensured by three course professors who taught the educational measurement course before. The paper-and-pencil and the computer-based versions of multiple-choice test included same thirty questions. A java script was also written for the computer-based version of the test for immediate scoring of the test at the end. Hand scoring was done for the paper-and-pencil version of the test. To make participants familiar with the computer-based test environment, a sample computer-based exercise test was also developed.

O. Akdemir, A. Oguz / Computers & Education 51 (2008) 1198 1204 1201 2.4. Design The participants were selected randomly among junior students of the department of Primary School Teaching and the department of Turkish Language and Literature. The paper-and-pencil version of the test was first completed by the forty-seven students. Then, participating student teachers responses were scored. Four weeks after the administration of the paper-and-pencil test, the computer-based exercise test was completed by the students in the computer clusters of the school of education. Any problem or question that students had was answered by the attending research assistant. After participants became familiar with the computer-based test, participants completed the computer-based version of the test containing educational measurement questions. After completing the computer-based test, students were able to see their scores on computers. Participating students scores in the computer-based test were recorded by the research assistant. Participants were given one hour of time to complete both forms of the test (not including the time given for the exercise before the administration of the computer-based test). 3. Analysis Participants correct answers without counting the incorrect answers in the paper-based and computerbased versions of the test were ported into a statistical analysis package (SPSS 13) for later analysis. Oneway analysis of variance was used to test three hypotheses. All statistical analysis reported in this research were conducted with a significant level of.05. 4. Results Data were collected from seventeen male and thirty female student teachers. The distribution of participants scores in the paper and in the computer versions of the test is presented at the Table 1. The first research question investigated whether the test scores of selected Turkish undergraduate students were different in the computer-based test and in the paper-based test? One-way analysis of variance failed to reject the first null hypothesis that test scores of selected Turkish undergraduate students were not different in the computerbased test and in the paper-based test (F = 0.153, p > 0.05). Participating students scores did not vary in the paper-based and in the computer-based version of the test (see Table 2). The second research question explored whether the test scores of selected male undergraduate Turkish students were different in the computer-based test and in the paper-based test. The distribution of male participants scores in the paper and in the computer versions of the test is presented in Table 3. Results of the oneway analysis of variance failed to reject the second null hypothesis that test scores of selected male undergraduate Turkish students were not different in the computer-based test and in the paper-based test (F = 1.128, Table 1 The distribution of participants scores in the paper and in the computer versions of the tests N Mean Standard deviation Standard error 95% Confidence interval for mean Minimum Maximum Lower bound Upper bound Paper-based 47 12.9 2.1.31 12.2 13.5 8.0 17.0 Computer-based 47 13.6 2.6.39 12.8 14.4 9.0 21.0 Total 94 13.2 2.4.25 12.7 13.7 8.0 21.0 Table 2 One-way ANOVA comparing scores of participants in the paper and computer versions of the test Sum of Squares Df Mean square F Sig. Between groups 12.298 1 12.298 2.078.153 Within groups 544.511 92 5.919 Total 556.809 93

1202 O. Akdemir, A. Oguz / Computers & Education 51 (2008) 1198 1204 Table 3 The distribution of male participants scores in the paper and in the computer versions of the tests N Mean Standard deviation Standard error 95% Confidence interval for mean Minimum Maximum Lower bound Upper bound Paper-based 17 12.0 2.1.52 10.8 13.1 8.0 15.0 Computer-based 17 12.8 2.6.64 11.5 14.2 9.0 18.0 Total 34 12.4 2.4.41 11.5 13.2 8.0 18.0 Table 4 One-way ANOVA comparing scores of male participants in the paper and computer versions of the test Sum of squares Df Mean square F Sig. Between groups 6.6 1 6.6 1.128.296 Within groups 187.7 32 5.8 Total 194.3 33 Table 5 The distribution of female participants scores in the paper and in the computer versions of the tests N Mean Standard deviation Standard error 95% Confidence interval for mean Minimum Maximum Lower bound Upper bound Paper-based 30 13.4 2.0.36 12.6 14.1 9.0 17.0 Computer-based 30 14.0 2.6.48 13.0 15.0 9.0 21.0 Total 60 13.7 2.3.30 13.1 14.3 9.0 21.0 Table 6 One-way ANOVA comparing scores of female participants in the paper and computer versions of the test Sum of squares Df Mean square F Sig. Between groups 6.0 1 6.0 1.093.300 Within groups 319.2 58 5.5 Total 325.2 59 p > 0.05). Male students scores were not statistically different in the paper-based and in the computer-based versions of the test (see Table 4). The last research question investigated whether the test scores of selected female undergraduate Turkish students were different in the computer-based test and in the paper-based test. The distribution of female participants scores in the paper and in the computer versions of the test is presented in Table 5. One-way analysis of variance failed to reject the last null hypothesis that test scores of selected undergraduate female Turkish students were not different in the computer-based test and in the paper-based test (F = 1.093, p > 0.05). Female students scores were not statistically different in the paper-based and in the computer-based versions of the test (see Table 6). 5. Discussion and conclusion The main purpose of this study was to compare selected Turkish undergraduate students performance in computer-based tests and paper pencil tests. Even though some researchers have pointed out that computerbased tests produced lower scores than paper-based tests on students achievement (Bunderson et al., 1989; Mazzeo & Harvey, 1988), participating students scores in this study did not vary in the paper-based and in the computer-based versions of the test.

O. Akdemir, A. Oguz / Computers & Education 51 (2008) 1198 1204 1203 This study also investigated the question whether the student performance was different in the computerbased test and in the paper pencil test for male and female participants. No significant difference was found for male and female student groups scores in the two versions of the test. Eid (2004) study, which was conducted with fifth grade female students, found similar scores in the math problem-solving test administered on the computer and on the paper. Although male and female participants were included in the study, results of this study are compatible with the results of Eid (2004) study. Therefore, even though more studies are needed, results of this study indicate that gender differences may not be a factor affecting participants scores in the paper and computer-based versions of the test. Studies comparing the effects of computer-based tests and paper-based tests on the student achievement are mostly conducted in countries where the duration of technology integration process is short. This study is among the pioneers that have been conducted in a developing country to investigate the effects of computer-based tests and paper-based tests on the achievement of undergraduate students study at the school of education. Conducted in a different culture, this study has made an important contribution to the literature by investigating the question with participants from different cultures. Even though possible effects of time to complete both forms of the test were not considered in the study design, it was observed during the data collection process that all participants finished both forms of the test before the given time. Therefore, possible influence of time was disregarded when the results of the study were discussed. Computer experience of participants could have affected the results of the study. The study group had completed all level of computer courses in their departments before the study and was expected to have sufficient skills to use computer to complete the computer-based test. Therefore, possible effects of early computer experience are expected not to influence the results of this study but further researchers are recommended to measure their participants level of computer experience if there is no adequate information available for their study group. Taken collectively, the study was a case coped with comparing undergraduate students performance in the computer-based test to their performance in the paper-and-pencil test. The study was conducted only at two different departments in a public university located in Turkey. Thus, the results cannot be generalized beyond the sample population studied. Further researchers should continue to investigate the study with larger participants in different departments. In addition, current research compared only students multiple-choice test performance in the computer-based test to their test performance in the paper-and-pencil test since nationwide centralized tests in Turkey contain only multiple-choice tests. Therefore, other forms of test such as shortanswer questions, and multi-select questions should also be made part of the computer-based test and their effects on the students achievement should be investigated in different subjects to get a better understanding about the effects of other test types. The findings of this study lead us to reach the conclusion that no limitations for the computer-based testing method in Turkey have been found. Students have to take several exams during the academic year in Turkish schools every year. As a result, considerable amount of time for scoring and financial resources for the test multiplication have to be allocated each year for the paper-and-pencil tests. After initial investment, utilizing computer-based testing can offer many benefits as computers have the ability to do all routine works, facilitate the standardization of procedures, substantially save time and decrease economical costs of data input, and reduce scoring errors (e.g., Bunderson et al., 1989; Lee et al., 1986; Mills, 2002; Smith & Caputi, 2004; Wise & Plake, 1989; Wise & Plake, 1990). Moreover, illuminating the possible effects of using computers in educational testing as compared to the classical means that have been used for student s assessment for years would lead us better understanding about the possible use of computers in educational testing. This study has showed that student achievements do not vary on the administration of computer-based tests and paper-based tests which indicated that computer-based testing could be an alternative to paper-based testing for Turkish students. References American Psychological Association (APA) (1986). Guidelines for computer-based tests and interpretations. Washington, DC: Author. Bartram, D. (1994). Computer-based assessment. In C. L. Cooper (Ed.), International review of industrial and organizational psychology (pp. 31 69). London: Wiley.

1204 O. Akdemir, A. Oguz / Computers & Education 51 (2008) 1198 1204 Bugbee, A. C. (1996). The equivalence of paper-and-pencil and computer-based testing. Journal of Research on Computing in Education, 28(3), 282 299. Bunderson, C. V., Inouye, D. K., & Olsen, J. B. (1989). The four generations of computerized educational measurement. In R. L. Lin (Ed.), Educational measurement (3rd ed.). NY: Macmillan. Bunderson, C. V., Inouye, D. K., & Olsen, J. B. (1989). The four generations of computerized educational measurement. In R. L. Linn (Ed.), Educational measurement (3rd ed.) (pp. 367 407). NY: American Council on Education Macmillan. Choi, S. W., & Tinkler, T. (2002, April). Evaluating comparability of paper-and-pencil and computer-based assessment in a K-12 setting. Paper presented at the annual meeting of AERA, New Orleans. Chua, S. L., Chen, D., & Wong, A. F. L. (1999). Computer anxiety and its correlates: A meta-analysis. Computers in Human Behavior, 15, 609 623. Church, A. H. (2001). Is there a method to our madness? The impact of data collection methodology on organizational survey results. Personnel Psychology, 54, 937 969. Educational Testing Service (1992). Computer-based testing at ETS 1991 1992. Princeton, NJ: Author. Eid, G. K. (2004). An investigation into the effects and factors influencing computer-based online math problem-solving in primary schools. Journal of Educational Technology Systems, 33(3), 223 240. Krantz, J. H. (2000). Tell me, what did you see the stimulus on computers. Behavior Research Methods, Instruments & Computers, 32(2), 221 229. Kveton, P., Jelinek, M., Voboril, D., & Klimusova, H. (2004). Computer-based tests: The impact of test design and problem of equivalency. Computers in Human Behavior, 23(1), 32 51. Kyllonen, P. C. (1991). Principles for creating a computerized test battery. Intelligence, 15(1), 1 15. Lee, J., Moreno, K. E., & Sympson, J. B. (1986). The effects of past computer experience on computerized aptitude test performance. Educational and Psychological Measurement, 46, 727 733. Mahar, D., Henderson, R., & Deane, F. (1997). The effects of computer anxiety, and computer experience on users performance of computer based tasks. Personal and Individual Differences, 22(5), 683 692. Mazzeo, J., & Harvey, A. I. (1988). The Equivalence of Scores from Automated and Conventional Educational and Psychological Tests. College Board Report No. 88-8, NY: College Entrance Examination Board. Mead, A. D., & Drasgow, F. (1993). Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. Psychological Bulletin, 114, 449 458. Mills, C. N. (Ed.). (2002). Computer-based testing: Building the foundation for future assessment. NJ: Lawrence Erlbaum. Olsen, J. B., Cox, A., Price, C., Strozeski, M., & Vela, I. (1990). Development implementation and validation of a computerized test for statewide assessment. Educational Measurement: Issues and Practice, 9(2), 7 10. Pomplun, M., & Custer, M. (2005). The score comparability of computerized and paper-and-pencil formats for K-3 reading tests. Journal of Educational Computing Research, 32(2), 153 166. Potosky, E., & Bobko, P. (1997). Computer versus paper-and-pencil administration mode and response distortion in noncognitive selection tests. Journal of Applied Psychology, 82, 293 299. Powers, D. E., & O Neill, K. (1993). Inexperienced and anxious computer users: Coping with a computer-administered test of academic skills. Educational Assessment, 1(2), 153 173. Smith, B., & Caputi, P. (2004). The development of the attitude towards computerized assessment scale. Journal of Educational Computing Research, 31(4), 407 422. Wise, S. L., & Plake, B. S. (1990). Computer-based testing in higher education. Measurement and Evaluation in Counseling and Development, 23, 3 10. Wise, S., & Plake, B. (1989). Research on the effects of administering tests via computers. Educational Measurement: Issues and Practice, 8(3), 5 10.