Device Comparability of Tablets and Computers for Assessment Purposes
|
|
|
- Elfrieda O’Brien’
- 10 years ago
- Views:
Transcription
1 Device Comparability of Tablets and Computers for Assessment Purposes National Council on Measurement in Education Chicago, IL Laurie Laughlin Davis, Ph.D. Xiaojing Kong, Ph.D. Yuanyuan McBride, Ph.D. April, 2015
2 Device Comparability 1 Abstract The definition of what it means to take a test online continues to evolve with the inclusion of a broader range of item types and a wide array of devices used by students to access test content. To assure the validity and reliability of test scores for all students, device comparability research should be conducted to evaluate the impact of testing device on student test performance. The current study looked at the comparability of test scores across tablets and computers for high school students in three commonly assessed content areas and for a variety of different item types. Results indicate no statistically significant differences across device type for any content area or item type. Student survey results suggest that students may have a preference for taking tests on devices with which they have more experience, but that even limited exposure to tablets in this study increased positive responses for testing on tablets. Keywords: tablets, mode comparability, device comparability, score comparability
3 Device Comparability 2 Device Comparability of Tablets and Computers for Assessment Purposes Use of digital devices for instruction and assessment continues to increase as Bring Your Own Device (BYOD), 1:1 technology programs, and flipped learning change the way students are interacting with academic content, with their teachers and peers, and demonstrating their mastery of academic knowledge and skills (Hamdon, McKnight, McKnight, & Arfstrom, 2013; Johnson, 2012; McCrea, 2011; Ballagas, Rohs, Sheridan, & Borchers, 2004). According to the Speak Up 2013 national survey results of K-12 students (Speak Up, 2013) the most common academic uses for devices include looking up information on the internet (63% of students report this use), accessing online textbooks (43% of students report this use), communicating with peers and teachers (42% of students report this use), and taking an online test (40% of students report this use). Online testing, as the fourth most common academic use of devices, merits specific attention as the definition of what it means to take an online test continues to evolve and diversify. Online testing has moved beyond the traditional multiple choice item as the primary vehicle for evaluating student competencies. Instead, this generation of tests includes a variety of different technology-enhanced item types (TEIs for short) that seek to provide more authentic measurement of student skills with reduced guessing and more use of constructed response. Additionally the devices used by students to access the online test content vary considerably across classrooms and school districts. Desktop computers with large monitors and external keyboards and pointing devices (aka mice) may exist side-by-side in school districts with more portable laptop computers which have smaller keyboards, monitors, and a variety of different pointing device options. Laptop computers, in turn, may exist side-by-side
4 Device Comparability 3 in school districts with even more portable tablet devices which have yet smaller screens and keyboards (onscreen or external) and use fingers as the pointing device. The increasing diversity of technology in schools raises the question of whether all devices are created equal relative to how students can use them to demonstrate their knowledge and skills when taking an online test. Professional testing standards (APA, 1986; AERA, APA, NCME, 2014, Standards 9.7 & 9.9) require evidence that neither the mode of delivery nor the device used to access test content should influence the interpretations of students scores and assessment outcomes. Thus it is important to evaluate the comparability of scores across devices. Way, Davis, Keng, and Strain-Seymour (in press) suggest that form factor of the device is an important consideration when evaluating comparability. The form factor describes the way in which the student uses the device to access and manipulate digital content the more similar the form factor of two devices the more comparable scores which result from testing on those two devices can be expected to be. Desktop computers and laptop computers both have similar form factors and research has shown that student performance across these devices is relatively comparable (Keng, Kong, and Bleil, 2011; Sandene, Horkay, Bennett, Allen, Braswell, Kaplan, and Oranje, 2005; Bridgeman, Lennon, & Jackenthal, 2001; Powers & Potenza, 1996). However, there are significant differences in the form factors between computers and touch-screen tablets that may influence student experience in using the devices, and, in turn, their resulting test scores. Specifically the screen size of tablets (typically 7-inch to 10-inch) is smaller than that of computers which may limit how much information students can see on screen at one time. Additionally, the method of input for student responses on tablets relies on using the finger to select and move information on screen and either the onscreen keyboard or an external keyboard to enter text.
5 Device Comparability 4 While there have been a number of qualitatively conducted cognitive labs and usability studies looking at student interactions with tablets (Yu, Lorié, & Sewall, 2014; Piscreta, 2013; Lopez & Wolf, 2013; Strain-Seymour, Craft, Davis, & Elbom, 2013; Davis, Strain-Seymour, & Gay, 2013) the quantitative research evidence in this area is more limited. Davis, Orr, Kong, and Lin (in press) found no effects of device (tablet vs. computer) for student writing of short essays for either 5 th grade or high school students. Similarly Olsen (2014) concluded that there were no effects of device (tablet vs. computer) for a set of multiple choice items in reading, writing, and mathematics across a K-12 grade span set of tests. As both the PARCC and Smarter-Balanced assessment consortia are allowing students to use tablets in taking their tests across a wide variety of item types the need for further research in this area is evident (PARCC, 2013; SBAC, 2013). As such, the current study looked at the comparability of student test scores across tablets and computers for three commonly assessed content areas (reading, mathematics, and science) and for a variety of different item types (multiple choice and TEIs). Method Participants Data were collected in spring 2014 from a sample of 964 high school students from five different school districts in Virginia Frederick, Henrico, Isle of Wight, Prince George, and Stafford. Student participants were required to have completed or be currently enrolled in coursework in Algebra I, Biology, and English II by the time of the study. All students who participated in the study had prior experience with taking tests online as part of the Virginia Standards of Learning (SOL) assessment program. Schools were offered incentives based on the proportion of eligible students who participated in the study. Schools that provided 50% or
6 Device Comparability 5 more of their eligible students for study participation were provided with one ipad and cover. Schools that provided 75% or more of their eligible students for study participation were provided with two ipads and covers. Recruitment of participating schools was conducted with support of the Virginia Department of Education. Rather than obtaining a permission slip for each student in the study, schools were asked to notify parents and legal guardians of the study and to follow an opt-out procedure for student permissions. Some schools additionally followed their own local policies with regard to obtaining permission slips for students to participate in the study. A total of 964 students submitted responses to the test. Table 1 shows the demographic characteristics students participating in the study broken out by study condition. Table 1: Demographic characteristics of participants Tablet Computer Male 258 (53.2%) 248 (51.8%) Female 224 (46.2%) 227 (47.4%) Missing 3 (0.6%) 4 (0.8%) White 310 (63.9%) 305 (63.7%) African-American 105 (21.6%) 87 (18.2%) Hispanic 28 (5.8%) 33 (6.9%) Other 36 (7.4%) 46 (9.6%) Missing 6 (1.2%) 8 (1.7%) TOTAL Apparatus/Materials Hardware Researchers arrived at each school location and set-up the study equipment prior to beginning the study sessions. Each school participated in both the computer and tablet conditions. Specific set-up requirements differed across school locations, but, in general,
7 Device Comparability 6 students in the computer condition participated in computer labs or classrooms using school provided computers. Computers for this study included a mix of desktop and laptop models with the only specifications being that they meet the requirements for running the testing software. For some schools, students had been issued their own computers (1:1 device assignment with laptops) whereas for other schools students worked on computers in computer labs. For the majority of schools, tablets were rented and provided for the study so that data collection could be conducted efficiently across a one to two day period with up to 100 students testing at one time. One school (Smithfield High School in Isle of White) had issued tablets to each of their students (1:1 device assignment with tablets) and students used their own tablets for the study. In all cases, tablets were required to be full size (9.7 ) ipads running ios 6 or higher. For rented tablets, study facilitators installed and configured software in advance of the study session. For student provided tablets, this process was conducted under the direction of study facilitators prior to beginning the test session. For this study tablets were used without peripherals (e.g. styluses, stands, or external keyboards). Students used the onscreen keyboard to provide brief typewritten responses to fill-in-the-blank questions. Figure 1 provides photos that shows the configuration of a testing room in one participating school where both computer and tablet were set-up within the same room.
8 Device Comparability 7 Software Figure 1: Example classroom set-up for data collection Students accessed the test content through the online testing software application. The software used in this study was accessible by computer via any web browser without special software installation. To access the software from the tablets an application had to be downloaded so a small amount of additional set-up was needed. Once the software is launched (from either the computer or the tablet) it locks out all other applications to allow for secure test administration. The tools offered within the software were limited for this study and only included a four-function calculator (which should have been sufficient to answer all math and science items on the test). Additionally, students did have access to flag and review functionality which allowed them to flag an item for review and navigate back to that item from later points in the test. Due to timing of the study and the version of software available, no test-taking tools (e.g. answer choice eliminator, highlighter, etc.) were available. Students were, however, given scratch paper and pencils to support any interim work or calculations they wanted to record.
9 Device Comparability 8 Measures Each student in the study responded to a set of 59 items divided into 3 sections (reading, science, and mathematics) and a short set of survey questions about their experiences. Only items which could be computer or algorithmically scored (i.e. short fill-in items) were included. Study content was selected from a variety of sources and was not intended to reflect or explicitly align with a specific set of content standards or test blueprint. Reading passages were selected to reflect a range of genres and lengths. In mathematics, item selection targeted computation, geometry, and pre-algebra strands. The majority of content was selected to reflect skills which were somewhat below grade level (grade range 7-10) for participating high school students so that students could reasonably be expected to have learned the material. Table 1 shows the item allocation across content area and item type for the study. Content Area Multiple Choice Table 1. Test Blueprint for the Study Item Type Drag Fill In Multiple and the Select Drop Blank Hot Spot Inline Choice Graph Point TOTAL Reading Mathematics Science TOTAL % of Test 44% 10% 19% 10% 10% 5% 2% 100% The test was sequenced so that students completed the reading section first, followed by the science section, and finally the math section. In between sections students were prompted with a motivation screen (see figure 2) designed to provide positive reinforcement for their effort in the study (Bracey, 1996; Brown and Walberg, 1993).
10 Device Comparability 9 Procedures Figure 2: Example motivation screen Students were randomly assigned to condition either in advance of the study (based on classroom assignment) or at the time of the study (students were alternately assigned either to tablet or computer conditions). At the beginning of each study session, a facilitator introduced themselves, briefly discussed the purpose of the study, provided directions to the students about what to do, and answered any questions. The version of the software used for this study differed from the version students had used previously for online testing, however, study facilitators reviewed functionality such as navigation and tools with the students prior to the beginning of the test session. Students were then given 80 minutes to read and respond to the test items. Following completion of all three subject area sections, students were asked to complete a 10-question survey about their home and school use of different devices as well as their experience in the study itself. Results Although random assignment to condition was employed throughout the data collection process, state reading test scores were obtained and used as a check on the effectiveness of the random assignment. Table 2 shows a comparison of scale scores on the state high school
11 Device Comparability 10 reading assessment for students in the tablet and computer conditions. Mean scores are very similar and differences were not statistically significant. Motivation Filtering Table 2: Mean score by content area Study Condition N* State Reading Score Tablet (40.12) Computer (32.62) *Note that reading scores were not available for all students who participated in the study. Motivation filtering (Wise, Kingsbury, Thomason, & Kong, 2004; Sundre & Wise, 2003) was conducted based on student response times within each section of the test. Motivation filtering was applied to remove student participants from the data when it was clear that their responses were provided too quickly to support accurate score interpretations. Descriptive information about student response time for each section is presented in Table 3. Response time information was examined separately for students who were in the computer and tablet conditions but was very similar for both conditions. Based on review of the response time information, students who spent less than four minutes responding to the reading section (n=8), less than two minutes responding to the science section (n=11), and less than two minutes responding to the math section (n=47) were excluded from further analysis. Table 3: Response time by content area Content Area Response Time (in Minutes) Tablet Mean (SD) Min Max N Reading (6.64) Science (6.35) Mathematics 9.80 (4.16) Computer Mean (SD) Min Max N Reading (7.35) Science (6.75) Mathematics 9.04 (4.29)
12 Device Comparability 11 Content Area and Item Type Analysis Multiple choice items were scored as 0/1(wrong/right) as was the single graph point item on the test form. All other item types were scored to allow for partial credit where applicable. Points were assigned for full credit (worth 2 points), partial credit (worth 1 point), or no credit (worth 0 points) as indicated in figure 3 where n is the number of correct components within an item for Group 1 items and the number of components within an item for Group 2 items, and Sum reflects the sum of +1 for all correctly selected components and -1 for all incorrectly selected components. Additional details on the scoring rules for each item can be found in Appendix A. Group Item Type Scoring Rule Applied Multiple select Fully correct --> 2 Hot spot 1 Sum = n-1 --> 1 Drag and drop (single bay or single Else --> 0 dragger) 2 Fill in the Blank Inline choice Drag and drop (multiple bays) Fully correct --> 2 Correct picks n/2 -->1 Else --> 0 Figure 3: Scoring rules for assigning partial credit The primary inferential analysis was an independent samples t-test conducted for the total score within each of the three content areas (across all item types) and within each item type (across all content areas). Student responses to each item were scored and aggregated within content area to produce a total score for reading, science, and mathematics. Additionally, responses were aggregated by item type across all content areas to produce a total score for each item type. Table 4 provides the descriptive statistics for total score in each content area. Means across study condition are very similar and t-tests showed no statistically significant differences (alpha level 0.01) between student performance on tablets and computers.
13 Device Comparability 12 Content Area Study Condition Table 4: Mean score by content area N-count Mean Standard Mean % of Score Deviation Total Points Reading Tablet Computer Science Tablet Computer Mathematics Tablet Computer Statistical Test t=1.35; p=.1767 t=0.27; p=.7900 t=-0.34; p=.7369 Table 5 provides the descriptive statistics for total score by item type. Means across study condition are very similar and t-tests showed no statistically significant differences (alpha level 0.01) between student performance on tablets and computers. Item Type Study Condition Table 5: Mean score by item type N-count Mean Standard Mean % of Score Deviation Total Points Multiple Choice Tablet Computer Drag and Tablet Drop Computer Hot Spot Tablet Computer Fill in the Tablet Blank Computer Multiple Tablet Select Computer Inline Choice Tablet Computer Graph Point Tablet Computer Statistical Test t=0.24; p= t=0.54; p= t=-0.72; p= t=0.83; p= t=2.04; p=.0412 t=-0.25; p= t=-1.13; p=.2596 Item Analysis Individual item means were divided by the total number of points available for each item to create a p-value on a scale of for each item. Item p-values for the tablet
14 Device Comparability 13 condition were then compared to item p-values for the computer condition using an independent samples t-test. Figure 4 shows the p-value differences for items within each content area. Across all 59 items only 2 items (both within the reading content area) showed statistically significant differences at the p<0.01 level favoring tablet items 7 (t=2.77; p=.0056) and 18 (t=2.60; p=.0093). Both items were of multiple choice item type Difference in Item P-Value (Tablet-Computer) Reading Science Math P-value difference Item Number Figure 4: Differences in item p-value by content area Survey Analysis Figures 5 through 11 show selected responses to the survey questions. A total of 962 high school students responded to the survey. However, student response rates to individual questions varied somewhat, therefore percentages are reported in relationship to the number of students responding to each question and do not include missing responses. For some survey questions, students were asked to choose all responses that applied so percentages will total to more than 100%. As seen in figures 5 and 6 more students in this study report using tablets and smart phones for personal use than for school work (44% vs. 28% for tablets and 84% vs. 31% for
15 Device Comparability 14 smart phones) though rates of usage for desktop and laptop computers are more similar between personal use and school work (27% vs. 36% for desktop computers and 61% vs. 66% for laptop computers). Figure 7 shows information about students previous use of devices specifically for the purpose of taking tests. Not surprisingly, most students reported previous experience taking tests on paper (95%) and on computer (both desktop at 85% and laptop at 75%), but few had experience taking a test on a tablet (24%) or smart phone (5%). Which of these do you use regularly for personal or recreational use? (select all that apply) 84% 61% 44% 27% Desktop computer Laptop computer Touch-screen tablet Smart phone Figure 5: Device use for personal/recreational (n=961) 1% None of the above
16 Device Comparability 15 Which of these devices do you use regularly for school work? (select all that apply) 66% 36% 28% 31% 5% Desktop computer Laptop computer Touch-screen tablet Smart phone Figure 6: Device use for school work (n=960) None of the above 95% Before today, which of the following have you EVER used to take a test? (select all that apply) 85% 75% 24% 5% Paper Desktop computer Laptop computer Touch-screen tablet Smart phone Figure 7: Previous experience with devices for taking tests (n=960) Figure 8 shows the response of students broken out by study condition when asked what mode or device type they preferred for taking a test. Interesting the most frequently endorsed options were paper only (selected by 30% of students in the computer condition and 22% of students in the tablet condition) or paper and computer (selected by 27% of students in the computer condition and 18% of students in the tablet condition). Across both study
17 Device Comparability 16 conditions, few students indicated that they would prefer to take a test on a touch-screen device (such as a tablet or smart phone). However, those who participated in the tablet condition of the study tended to select those options at a higher rate than those who participated in the computer condition (e.g. 14% vs. 7% for touch-screen only; 13% vs. 9% for computer and touch-screen). As such it appears that the experience with taking a test on tablet during the study may have positively impacted student perceptions toward the device. On which of the following do you prefer to take a test? (select all that apply) Computer Tablet 30% 27% 22% 16% 12% 7% 14% 13% 9% 18% 7% 7% 13% 4% Paper Only Computer Only Touchscreen Only Computer / Touchscreen Paper / Computer Paper / Touchscreen Figure 8: Devices preference for taking tests (n=959) Figures 9 through 11 show student perceptions of the difficulty of questions in each of the three content area sections (reading, science, and math) broken out by study condition. For all three content areas, students in the tablet condition were more likely to respond that they thought the questions were either very easy or easy than students in the computer condition. This was most pronounced for reading where 58% of students in the tablet selected very easy or easy compared with 44% of students in the computer condition. Differences were somewhat smaller for science (49% for tablet vs. 40% for computer) and math (41% for tablet vs. 37% for computer). This suggests that the experience with taking a test on tablet All 3
18 Device Comparability 17 during the study may have also positively impacted student perceptions toward the test questions themselves. Overall, I Found the Test Questions in the Reading Section Computer Tablet 47% 32% 36% 21% 26% 23% 6% 5% 3% 2% Very Easy Easy Average Hard Very Hard Figure 9: Question Difficulty by Study Condition for Reading (n=958) Overall, I Found the Test Questions in the Science Section Computer Tablet 48% 42% 31% 25% 15% 18% 8% 7% 4% 2% Very Easy Easy Average Hard Very Hard Figure 10: Question Difficulty by Study Condition for Science (n=957)
19 Device Comparability 18 Overall, I Found the Test Questions in the Math Section Computer Tablet 41% 44% 22% 26% 15% 15% 14% 11% 7% 5% Very Easy Easy Average Hard Very Hard Figure 11: Question Difficulty by Study Condition for Math (n=956) Additional Qualitative Analysis In addition to the formal survey efforts conducted across all study participants, one participating school division (Prince George; n=322 students half of whom would have tested in the tablet condition) captured comments from their students after the study and summarized these as likes and dislikes about testing on tablets. These comments were used by the study researchers to create word clouds (as seen in Figures 12 and 13) with larger words indicating more frequent mention within the student comments and smaller words indicating less frequent mention within the student comments.
20 Device Comparability 19 Figure 12: Word Cloud for Tablet Likes Figure 13: Word Cloud for Tablet Dislikes A summary of theses which emerged from review of these word clouds is provided in Table 6. Many of the themes for tablet likes centered on the novelty of the device or the method of interacting with the device (e.g. new, modern, look, touch, feel, etc.). Many of the themes for tablet dislikes had to do with the physical impacts of using the device (e.g. small, glare, fingers, neck, etc.).
21 Device Comparability 20 Table 6. Summary of Themes from Tablet Like and Dislike Word Clouds Tablet Likes Tablet Dislikes Easy/Easier Uncomfortable Reading/Read Small Navigate Glare Look, touch, feels Stand New Fingers, Neck Convenient Sitting Zoom Hurt Practical Typing Modern Math Discussion Overall the results from this study are consistent with the findings of previous studies in this area (Davis, Orr, Kong, and Lin, in press; Olsen, 2014). There were no observable performance differences in student test scores across device for any of the three content areas or across any item type evaluated (though admittedly the number of observations for any given item type varied considerably with some item types having a very small number of items included in the study). In many ways this is reassuring as use of tablets for large scale testing has moved forward with the PARCC and Smarter-Balanced assessments as well as for many individual state next generation assessment programs. The findings of this study should not, however, be interpreted to suggest that tablets should only be used as a means of expanding the set of available devices on testing day. Student familiarity with tablets in an academic context is crucial and tablets are best used as part of a technology rich learning environment throughout the school year. The pattern of high school student preferences observed in this study relative to preference of device for testing is consistent with previous findings (Davis, Orr, Kong, & Lin, in press; Strain-Seymour, Craft, Davis, & Elbom, 2013; Davis, Strain-Seymour, & Gay, 2013). Older students tend to be more aware of the uses and consequences associated with test scores
22 Device Comparability 21 than younger students (where preference for tablets tends to be greater) such that they prefer to use devices with which they have more experience. However, this study did demonstrate that students who had experienced testing on the tablet as part of this research held somewhat more positive views of using tablets for testing than those who did not have that opportunity. This suggests that with additional experience and practice using tablets for test-taking (such as through tutorials or practice test sessions) that student comfort levels with tablets may further improve. Despite the lack of statistically significant differences in student performance for reading, there is some evidence in the survey and qualitative results to suggest that reading may be an area where students prefer to work with tablets. The scrolling interface used to present the reading passages provides for a very natural gesture with the finger on the touchscreen device whereas the use of the mouse as an intermediary device to scroll the passage on the computer may be somewhat more cumbersome. Additional tablets and ereaders are becoming more and more common such that students may find interacting with text on a touch-screen device a very familiar activity. However, it should be noted that the reading passages presented in the current study were relatively short. Different performance and preference outcomes might be observed if longer reading selections were studied. Future research might look specifically at student interactions with reading on tablets to understand more about these factors. Lastly, this study used a single size (10-inch form factor) of tablet for data collection. This decision was deliberate and reflects current practice relative to allowable device sizes on large scale assessment programs. However, many school districts have smaller tablets (7-8- inch, 5-inch, or even 3-4-inch smart phones) that students use in daily classroom activities.
23 Device Comparability 22 Future research might consider the comparability of student performance across a range of tablet and touch-screen sizes. This might allow districts to test their students with the same devices they use in classroom activities, and might encourage districts who currently do not use tablets at all to consider purchasing smaller, more cost effective tablets if they can be used for both instructional and assessment purposes.
24 References American Educational Research Association (AERA), American Psychological Association (APA), and the National Council on Measurement in Education (NCME). (2014). Standards for educational and psychological testing. Washington, DC. American Psychological Association Committee on Professional Standards and Committee on Psychological Tests and Assessments (APA) (1986). Guidelines for computer-based tests and interpretations. Washington, DC: Author. Ballagas, R., Rohs, M., Sheridan, J. & Borchers, J. (2004). BYOD: Bring Your Own Device. UbiComp 2004 Workshop on Ubiquitous Display Environments, September, Nottingham, UK. Bennett, R.E.(2003). Online Assessment and the comparability of score meaning (ETS-RM ). Princeton, NJ: Educational Testing Service. Bracey, G.W. (1996). Altering the motivation in testing. Phi Delta Kappan, Bridgeman, B., Lennon, M. L., & Jackenthal, A. (2001). Effects of screen size, screen resolution, and display rate on computer-based test performance (ETS-RR-01-23). Princeton, NJ: Educational Testing Service. Bridgeman, B., Lennon, M. L., & Jackenthal, A. (2003). Effects of screen size, screen resolution, and display rate on computer-based test performance. Applied Measurement in Education, 16(3), Brown, S.M & Walberg, H.J. (1993). Motivational Effects on Test Scores of Elementary Students. Journal of Educational Research, 86 (3), Davis, L.L., Orr, A., Kong, X., & Lin, C. (in press) Assessing student writing on tablets. Educational Assessment. Davis, L.L., Strain-Seymour, E., & Gay, H. (2013). Testing on tablets: Part II of a series of usability studies on the use of tablets for K-12 assessment programs. Retrieved from II_formatted.pdf Hamdon, N., McKnight, P., McKnight, K., & Arfstrom, K.M. (2013). A review of flipped learning. Retrieved from ppedlearning.pdf Johnson, D. (2012). Power up! On board with BYOD. Educational Leadership, 70 (2), pages Retrieved from:
25 Device Comparability 1 Keng, L., Kong, X. J., & Bleil, B. (2011, April). Does size matter? A study on the use of netbooks in K-12 assessment. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, Louisiana Kingston, N. M. (2009). Comparability of computer- and paper-administered multiple-choice tests for K-12 populations: A synthesis. Applied Measurement in Education, 22(1), Lopez, A, & Wolf, M.K. (2013, December). A Study on the Use of Tablet Computers to Assess English Learners Language Proficiency. Paper presented at the annual meeting of the California Educational Research Association, Annaheim, CA. Mead, A. D., & Drasgow, F. (1993). Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. Psychological Bulletin, 114, McCrea, B. (2011). Evolving 1:1: THE Journal. Retrieved from: Olsen, J.B. (2014,April). Score comparability for web and ipad delivered adaptive tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Philadelphia, PA. Partnership for the assessment of Readiness for College and Careers (2013, February). Technology Guidelines for PARCC assessments version 2.1 February 2013 Update. Retrieved from date.pdf Pisacreta, D. (2013, June). Comparison of a test delivered using an ipad versus a laptop computer: Usability study results. Paper presented at the Council of Chief State School Officers (CCSSO) National Conference on Student Assessment (NCSA), National Harbor, MD. Powers, D.E., & Potenza, M.T. (1996). Comparability of testing using laptop and desktop computers. (ETS Report No. RR-96-15) Princeton, NJ: Educational Testing Service. Sandene, B., Horkay, N., Bennett, R., Allen, N., Braswell, J., Kaplan, B., and Oranje, A. (2005). Online Assessment in Mathematics and Writing: Reports From the NAEP Technology-Based Assessment Project, Research and Development Series (NCES ). U.S. Department of Education, National Center for Education Statistics. Washington, DC: U.S. Government Printing Office. Retrieved from: Smarter Balanced Assessment Consortium (SBAC 2013, February). The Smarter Balanced technology strategy framework and system requirements specifications. Retrieved from Strategy-Framework-Executive-Summary_ pdf
26 Device Comparability 2 Speak Up (2013). The new digital learning playbook:understanding the spectrum of students activities and aspirations. Retrieved from Strain-Seymour, E., Craft, J., Davis, L.L, & Elbom, J. (2013). Testing on tablets: Part I of a series of usability studies on the use of tablets for K-12 assessment programs. Retrieved from Sundre, D.L, & Wise, S.L. (April, 2003). Motivation Filtering : An exploration of the impact of low examinee motivation on the psychometric quality of tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL. Texas Education Agency (2008). A review of literature on the comparability of scores obtained from examinees on computer-based and paper-based tests. Retrieved from erature_review_of_comparability_report.pdf. Wang, S. (2004). Online or paper: Does delivery affect results? Administration mode comparability study for Stanford Diagnostic Reading and Mathematics tests. San Antonio, Texas: Harcourt. Wang, S., Jiao, H., Young, M. J., Brooks, T., & Olsen, J. (2007). A meta-analysis of testing mode effects in grade K-12 mathematics tests. Educational and Psychological Measurement, 67(2), Wang, S., Jiao, H., Young, M. J., Brooks, T., & Olsen, J. (2008). Comparability of computerbased and paper-and-pencil testing in K 12 reading assessments. Educational and Psychological Measurement, 68(1), Way, W.D., Davis, L.L., Keng, L., & Strain-Seymour, E. (in press). From standardization to personalization: The comparability of scores based on different testing conditions, modes, and devices. In Technology in testing: Measurement issues, ed. F. Drasgow. Vol 2 of the NCME book series. Wise, S.L, Kingsbury, G.G., Thomason, J., & Kong, X. (April, 2004). An investigation of motivation filtering in a statewide achievement testing program. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA. Winter, P. (2010). Evaluating the comparability of scores from achievement test variations. Council of Chief State School Officers: Washington, DC. Retrieved from: Yu, L., Lorié, W. & Sewall, L. (2014, April). Testing on tablets. Paper presented at the Annual meeting of the National Council on Measurement in Education, Philadelphia, PA.
27 Appendix A: Scoring Rules by Item Type MC items: 1 for correct answer and 0 for wrong answer. Inline choice items: o If there is one inline question in the item, it could be treated as a MC item; o If there are multiple (n) inline questions in the item: Each answer is either scored as 1 or 0. If the sum is less than n/2, code the score as incorrect (0); If the sum is equal to or larger than n/2 but less than n, code the score as partially correct (1); If the sum is equal to n, code the score as correct (2); The possible scores for this item are 0, 1, and 2. Fill-in-the-blank items: o If there is one fill-in-the-blank question in the item, it is scored as either correct (1) or incorrect (0). o If there are multiple (n) fill-in-the-blank questions in the item: Each answer is either scored as 1 or 0. If the sum is less than n/2, code the score as incorrect (0); If the sum is equal to or larger than n/2 but less than n, code the score as partially correct (1); If the sum is equal to n, code the score as correct (2); The possible scores for this item are 0, 1, and 2. Multiple-select and Hot-spot items: o Score each correct pick as 1 and each incorrect pick as -1 and add these up. If the sum is less than n-1, code the score as incorrect (0, n is equal to the number of correct answers, same below); If the sum is equal to n-1, code the score as partially correct (1); If the sum is equal to n, code the score as correct (2); The possible scores for this item are 0, 1, and 2. Graph-point items: o 1 for correct answer and 0 for wrong answer. Drag-and-drop items: o Some drag-and-drop items in this study can be treated as multiple-select items One bay with several draggers; Multiple bays with one dragger (drag and drop the dragger to the appropriate bay(s)); o The rest of the drag-and-drop items include: (1) Multiple bays with several (more than the number of bays) draggers: drag and drop one dragger for each bay and the bays are identical. (2) matching game : There are multiple bays (each of which is unique) and multiple draggers in the item and the number of draggers could be equal to or higher than the number of bays. The testers are required to match each dragger to one of the bays. There are two situations for matching:
28 Device Comparability 1 each bay can only be matched to one of the dragger or each bay can be matched to multiple draggers. Each dragger that is put in the correct bay is scored as 1, otherwise 0; If the sum is less than n/2 (n is equal to the number of draggers, same below), code the score as incorrect (0); If the sum is equal to or larger than n/2 but less than n, code the score as partially correct (1); If the sum is equal to n, code the score as correct (2). The possible scores for this item are 0, 1, and 2.
Assessing Student Writing on Tablets
Assessing Student Writing on Tablets National Council on Measurement in Education Philadelphia, PA Laurie Laughlin Davis, Ph.D. Aline Orr, Ph.D. Xiaojing Kong, Ph.D. Chow-Hong Lin, Ph.D. April, 2014 Writing
Testing on Tablets: Part II of a Series of Usability Studies on the Use of Tablets for K- 12 Assessment Programs
Testing on Tablets: Part II of a Series of Usability Studies on the Use of Tablets for K- 12 Assessment Programs White Paper Laurie Laughlin Davis, Ph.D. Vice President, Measurement Services Pearson Ellen
Minnesota Tablet Usability Study Report
Background Minnesota Tablet Usability Study Report May 2015 The Minnesota Comprehensive Assessments (MCA) are currently administered using desktop and laptop computers. With the recent rise in the use
Computer-based testing: An alternative for the assessment of Turkish undergraduate students
Available online at www.sciencedirect.com Computers & Education 51 (2008) 1198 1204 www.elsevier.com/locate/compedu Computer-based testing: An alternative for the assessment of Turkish undergraduate students
BALTIMORE CITY PUBLIC SCHOOLS (CITY SCHOOLS) 147
BOARD PRIORITY 1: City Schools will have quality curricula and instruction that provide rigor, engage students, raise the bar and deliver targeted interventions to increase learning. I. PURPOSE: The focus
Online Assessment and the Comparability of Score Meaning
Research Memorandum Online Assessment and the Comparability of Score Meaning Randy Elliott Bennett Research & Development November 2003 RM-03-05 Online Assessment and the Comparability of Score Meaning
Passing When It Counts Math courses present barriers to student success in California Community Colleges
Engaging Californians on Key Education Challenges I S S U E B R I E F F E B U A R Y 2 0 1 2 Passing When It Counts Math courses present barriers to student success in California Community Colleges Overview
Calculator Use on Stanford Series Mathematics Tests
assessment report Calculator Use on Stanford Series Mathematics Tests Thomas E Brooks, PhD Betsy J Case, PhD Tracy Cerrillo, PhD Nancy Severance Nathan Wall Michael J Young, PhD May 2003 (Revision 1, November
A Study of the Efficacy of Apex Learning Digital Curriculum Sarasota County Schools
A Study of the Efficacy of Apex Learning Digital Curriculum Sarasota County Schools November 2015 Copyright 2015 Apex Learning Inc. Apex Learning and the Apex Learning logo are either registered trademarks
Florida Department of Education TECHNOLOGY GUIDELINES
Florida Department of Education TECHNOLOGY GUIDELINES The Florida Department of Education (FDOE) is pleased to provide these technology guidelines to inform schools and districts as they make technology
The Effects of Read Naturally on Grade 3 Reading: A Study in the Minneapolis Public Schools
The Effects of Read Naturally on Grade 3 Reading: A Study in the Minneapolis Public Schools David Heistad, Ph.D. Director of Research, Evaluation, and Assessment Minneapolis Public Schools Introduction
Data Analysis, Statistics, and Probability
Chapter 6 Data Analysis, Statistics, and Probability Content Strand Description Questions in this content strand assessed students skills in collecting, organizing, reading, representing, and interpreting
Evaluation of Rocketship Education s Use of DreamBox Learning s Online Mathematics Program. Haiwen Wang Katrina Woodworth
Evaluation of Rocketship Education s Use of DreamBox Learning s Online Mathematics Program Haiwen Wang Katrina Woodworth Center for Education Policy SRI International August 2011 Introduction Rocketship
Jericho Elementary School Action Plan 2013 to 2015
Jericho Elementary School Action Plan 2013 to 2015 Summary of Previous Action Plan: The 2010 2012 action plan was completed in December 2012. This plan focused on reading, mathematics, school climate and
Mapping State Proficiency Standards Onto the NAEP Scales:
Mapping State Proficiency Standards Onto the NAEP Scales: Variation and Change in State Standards for Reading and Mathematics, 2005 2009 NCES 2011-458 U.S. DEPARTMENT OF EDUCATION Contents 1 Executive
Evaluation of Chicago Public Schools Virtual Pre-K Ready For Math Program That Integrates PBS KIDS Lab Digital Math Content
Evaluation of Chicago Public Schools Virtual Pre-K Ready For Math Program That Integrates PBS KIDS Lab Digital Math Content A Report to the CPB-PBS Ready To Learn Initiative Betsy McCarthy, PhD Donna Winston,
A Matched Study of Washington State 10th Grade Assessment Scores of Students in Schools Using The Core-Plus Mathematics Program
A ed Study of Washington State 10th Grade Assessment Scores of Students in Schools Using The Core-Plus Mathematics Program By Reggie Nelson Mathematics Department Chair Burlington-Edison School District
Illinois State Board of Education
Illinois State Board of Education 100 North First Street Springfield, Illinois 62777-0001 www.isbe.net Gery J. Chico Chairman Christopher A. Koch, Ed.D. State Superintendent of Education Frequently Asked
Achievement of Children Identified with Special Needs in Two-way Spanish/English Immersion Programs
The Bridge: From Research to Practice Achievement of Children Identified with Special Needs in Two-way Spanish/English Immersion Programs Dr. Marjorie L. Myers, Principal, Key School - Escuela Key, Arlington,
Assessing Quantitative Reasoning in GE (Owens, Ladwig, and Mills)
Assessing Quantitative Reasoning in GE (Owens, Ladwig, and Mills) Introduction Many students at CSU, Chico, receive much of their college-level mathematics education from the one MATH course they complete
Principals Use of Computer Technology
Principals Use of Computer Technology 85 Lantry L. Brockmeier James L. Pate Don Leech Abstract: The Principal s Computer Technology Survey, a 40-item instrument, was employed to collect data on Georgia
State Technology Report 2008
About This Report This State Technology Report is a supplement to the 11th edition of Technology Counts, a joint project of Education Week and the Editorial Projects in Education Research Center. As in
Abstract Title Page. Title: Conditions for the Effectiveness of a Tablet-Based Algebra Program
Abstract Title Page. Title: Conditions for the Effectiveness of a Tablet-Based Algebra Program Authors and Affiliations: Andrew P. Jaciw Megan Toby Boya Ma Empirical Education Inc. SREE Fall 2012 Conference
Effects of technology and K12 student achievement beyond academic success. An educators perspective
Effects of technology and K12 student achievement beyond academic success. An educators perspective Abstract Kent A. Thompson Educational Technology for reading literacy, mathematics and science are designed
66.1 59.9. District: LITTLESTOWN AREA SD AUN: 112015203 Test Date: PSSA Spring 2015. English Language Arts. Mathematics 46.8 39.7. Science 75.5 67.
PENNSYLVANIA System of School Assessment (PSSA) Summary Report Dear Leader: This report provides you with information about your district's performance in English Language Arts, Mathematics, and Science
THE EDUCATIONAL PROGRESS OF WOMEN
NATIONAL CENTER FOR EDUCATION STATISTICS Findings from THE CONDITION OF EDUCATION 1995 NO. 5 THE EDUCATIONAL PROGRESS OF WOMEN U.S. Department of Education Office of Educational Research and Improvement
Colorado Academic Standards-Aligned Assessments
Parents Guide to New Assessments IN COLORADO Colorado adopted new academic standards in 2009 and 2010 in all subject areas. Known as the Colorado Academic Standards (CAS), these standards include the Common
Jersey City Public Schools 346 CLAREMONT AVENUE JERSEY CITY, NEW JERSEY 07305 (201) 915-6201 FAX (201) 915-6084
Jersey City Public Schools 346 CLAREMONT AVENUE JERSEY CITY, NEW JERSEY 07305 (201) 915-6201 FAX (201) 915-6084 Office of the Superintendent Dr. Marcia V. Lyles Superintendent of Schools February, 2015
ANALYZING THE BENEFITS OF USING TABLET PC-BASED FLASH CARDS APPLICATION IN A COLLABORATIVE LEARNING ENVIRONMENT A Preliminary Study
ANALYZING THE BENEFITS OF USING TABLET PC-BASED FLASH CARDS APPLICATION IN A COLLABORATIVE LEARNING ENVIRONMENT A Preliminary Study YoungJoo Jeong Human Computer Interaction Institute, Carnegie Mellon
MIST TEST EXAMINER S MANUAL. Grades 5 and 8 Science
MIST TEST EXAMINER S MANUAL Grades 5 and 8 Science 2015 CSDE Web site: www.ct.gov/sde CONNECTICUT STATE DEPARTMENT OF EDUCATION and MEASUREMENT INCORPORATED (MI) WEB SITE LISTINGS CAPT/CMT Accommodations
California Standards Implementation. Teacher Survey
California Standards Implementation Teacher Survey October 2015 Thank you for taking this survey endorsed by the California State Board of Education and California Department of Education. Its purpose
State Transition to High-Quality, College/Career-Ready Assessments: A Workbook for State Action on Key Policy, Legal, and Technical Issues
State Transition to High-Quality, College/Career-Ready Assessments: A Workbook for State Action on Key Policy, Legal, and Technical Issues Updated as of November 14, 2013 Page 1 Table of Contents Part
Higher Performing High Schools
COLLEGE READINESS A First Look at Higher Performing High Schools School Qualities that Educators Believe Contribute Most to College and Career Readiness 2012 by ACT, Inc. All rights reserved. A First Look
Student Preferences for Learning College Algebra in a Web Enhanced Environment
Abstract Student Preferences for Learning College Algebra in a Web Enhanced Environment Laura Pyzdrowski West Virginia University Anthony Pyzdrowski California University of Pennsylvania It is important
College Teaching Methods & Styles Journal May 2008 Volume 4, Number 5
Evidence On The Effectiveness Of On-Line Homework Jane Dillard-Eggers, Belmont University Tommy Wooten, Belmont University Brad Childs, Belmont University John Coker, Belmont University ABSTRACT The purpose
2013 New Jersey Alternate Proficiency Assessment. Executive Summary
2013 New Jersey Alternate Proficiency Assessment Executive Summary The New Jersey Alternate Proficiency Assessment (APA) is a portfolio assessment designed to measure progress toward achieving New Jersey
Abstract Title: Identifying and measuring factors related to student learning: the promise and pitfalls of teacher instructional logs
Abstract Title: Identifying and measuring factors related to student learning: the promise and pitfalls of teacher instructional logs MSP Project Name: Assessing Teacher Learning About Science Teaching
The Importance of Statistics
The Importance of Statistics My ideal world: All educated citizens should be able to answer those and similar questions by the time they graduate from secondary school. Basic literacy in statistics and
The National Educational Technology Standards. (Upon which our local standards are based)
The National Educational Standards (Upon which our local standards are based) Students demonstrate a sound understanding of the nature and operation of technology systems. Students are proficient in the
Jennifer Durham, Ph.D.
Jennifer Durham, Ph.D. 2318 North Quantico Street Arlington, VA 22205 703.869.2140 [email protected] EDUCATION Ph.D. Education 2010 George Mason University Fairfax, Virginia Major: Special Education, Minor:
Florida Interim Assessment Item Bank and Test Platform (IBTP) Pilot 2 Toolkit
Florida Interim Assessment Item Bank and Test Platform (IBTP) Pilot 2 Toolkit Table of Contents Florida Interim Assessment Item Bank and Test Platform (IBTP) Overview... 3 Tested Subjects and Grade Levels...
Students Track Their Own Learning with SAS Data Notebook
Students Track Their Own Learning with SAS Data Notebook Lucy Kosturko SAS Institute, Inc. [email protected] Jennifer Sabourin SAS Institute, Inc. [email protected] Scott McQuiggan SAS Institute,
TECHNICAL REPORT #33:
TECHNICAL REPORT #33: Exploring the Use of Early Numeracy Indicators for Progress Monitoring: 2008-2009 Jeannette Olson and Anne Foegen RIPM Year 6: 2008 2009 Dates of Study: September 2008 May 2009 September
Evaluating Analytical Writing for Admission to Graduate Business Programs
Evaluating Analytical Writing for Admission to Graduate Business Programs Eileen Talento-Miller, Kara O. Siegert, Hillary Taliaferro GMAC Research Reports RR-11-03 March 15, 2011 Abstract The assessment
State of New Jersey 2014-15
Page 1 of 22 OVERVIEW COUNTY VOCATIONAL GRADE SPAN 0912 1.00 313995050 WAYNE, NJ 074702210 The New Jersey Department of Education (NJDOE) is pleased to present these annual reports of Performance. These
General Information. ONLINE ASSESSMENT TUTORIAL SCRIPT for END-OF-COURSE
General Information Online Assessment Tutorial Effective with the 2012 13 school year, schools must ensure every student participating in online assessments for the completes the Online Assessment Tutorial
Common Core State Standards K 12 Technology Skills Scope and Sequence
This scope and sequence is aligned to the Common Core State Standards requirements for Mathematics and English Language Arts & Literacy in History/Social Studies, Science, and Technical Subjects as well
GMAC. Predicting Success in Graduate Management Doctoral Programs
GMAC Predicting Success in Graduate Management Doctoral Programs Kara O. Siegert GMAC Research Reports RR-07-10 July 12, 2007 Abstract An integral part of the test evaluation and improvement process involves
Louisiana Believes: Louisiana s Technology Footprint
July 2012 Issue 1 Louisiana Believes: Louisiana s Technology Footprint Technology Readiness Assessment to Prepare for 2014-15 District Technology Checklist q Determine the current technology readiness
Curriculum Blueprint
2014 GED Test Curriculum Blueprint from GED Academy An Educator s Guide to Being GED Test-Ready 2014 GED Test Curriculum Blueprint from GED Academy. Copyright 2013 by Essential Education. All rights reserved.
Using OfficeMate/ExamWRITER on an Apple ipad. October 2013
Using OfficeMate/ExamWRITER on an Apple ipad October 2013 Limitation of Liability and Damages The information contained herein is subject to change without notice and is provided "as-is" without warranty
Advantage Cloud Access: Microsoft Remote Desktop for Android
Advantage Cloud Access: Microsoft Remote Desktop for Android 2645 Townsgate Road, Suite 200 Westlake Village, CA 91361 Support: 800.888.8075 Fax: 805.497.4983 2013 Compulink Business Systems, Inc. All
Test Anxiety, Student Preferences and Performance on Different Exam Types in Introductory Psychology
Test Anxiety, Student Preferences and Performance on Different Exam Types in Introductory Psychology Afshin Gharib and William Phillips Abstract The differences between cheat sheet and open book exams
Reading with Mobile Phone & Large Display
Reading with Mobile Phone & Large Display Roswitha Gostner Computing Department Lancaster University, [email protected] Hans Gellersen Computing Department Lancaster University Lancaster, LA1 4WY, UK
Virtual School Option TWO RIVERS VIRTUAL SCHOOL HANDBOOK
Virtual School Option TWO RIVERS VIRTUAL SCHOOL HANDBOOK 2014-2015 Overview The Two Rivers Virtual School Option was implemented in the 2011-2012 school year for students in grades 9-12. Providing students
2OO7. Distance Learning Course Guide migrant student graduation enhancement program
2OO7 Distance Learning Course Guide migrant student graduation enhancement program K 16 Education Center * Continuing Education The University of Texas at Austin 2OO7 Distance Learning Course Guide migrant
Improving Developmental College Counseling Programs
By Dr. Kevin L. Ensor Improving Developmental College Counseling Programs Utilizing the Advancement Via Individual Determination (AVID) to Motivate At-Risk Students DR. KEviN L. ENsoR, has more than 20
Alabama Profile of State High School Exit Exam Policies
ALABAMA PROFILE 1 Alabama Profile of State High School Exit Exam Policies State exit exam policy All students must pass all parts of the Alabama High School Graduation Exam (AHSGE), 3 rd edition in order
Aurora University Master s Degree in Teacher Leadership Program for Life Science. A Summary Evaluation of Year Two. Prepared by Carolyn Kerkla
Aurora University Master s Degree in Teacher Leadership Program for Life Science A Summary Evaluation of Year Two Prepared by Carolyn Kerkla October, 2010 Introduction This report summarizes the evaluation
Achievement and Satisfaction in a Computer-assisted Versus a Traditional Lecturing of an Introductory Statistics Course
Australian Journal of Basic and Applied Sciences, 3(3): 1875-1878, 2009 ISSN 1991-8178 2009, INSInet Publication Achievement and Satisfaction in a Computer-assisted Versus a Traditional Lecturing of an
Spring 2015 ELA and Math Assessment Results
Spring 2015 ELA and Math Assessment Results Agenda 2014 2015 Assessment Overview 2014 2015 Statewide Results 2014 2015 District Results Comparability Accountability Decisions and Calculations 2 Timeline
Georgia s New Tests. Language arts assessments will demonstrate: Math assessments will demonstrate: Types of assessments
Parents Guide to New Tests in Georgia In 2010, Georgia adopted the Common Core State Standards (CCSS) in English language arts and mathematics and incorporated them into the existing Georgia Performance
THE NORTH CAROLINA TESTING PROGRAM 2014 2015 ELEMENTARY AND MIDDLE SCHOOL GRADES 3 8
THE NORTH CAROLINA TESTING PROGRAM 2014 2015 For additional information about the North Carolina Testing Program, visit the Accountability Services Division website at http://www.ncpublicschools.org/accountability/.
Middle Grades Action Kit How To Use the Survey Tools!
How To Use the Survey Tools Get the most out of the surveys We have prepared two surveys one for principals and one for teachers that can support your district- or school-level conversations about improving
INTER-OFFICE CORRESPONDENCE LOS ANGELES UNFIED SCHOOL DISTRICT LOCAL DISTRICT EAST
INTER-OFFICE CORRESPONDENCE LOS ANGELES UNFIED SCHOOL DISTRICT LOCAL DISTRICT EAST TO: Elementary Principals Date: 9/21/15 FROM: SUBJECT: David Baca, EdD, Administrator of Instruction 2015 2016 ELEMENTARY
ACT National Curriculum Survey 2012. Policy Implications on Preparing for Higher Standards. improve yourself
ACT National Curriculum Survey 2012 Policy Implications on Preparing for Higher Standards improve yourself ACT is an independent, not-for-profit organization that provides assessment, research, information,
