1 Assessing Student Writing on Tablets National Council on Measurement in Education Philadelphia, PA Laurie Laughlin Davis, Ph.D. Aline Orr, Ph.D. Xiaojing Kong, Ph.D. Chow-Hong Lin, Ph.D. April, 2014
2 Writing on Tablets 1 Abstract There is an increasing expectation that schools should be able to use tablets for formative, diagnostic, and high stakes summative assessments. To assure the validity and reliability of test scores for all students, research should be conducted to understand the full impact of introducing tablets into assessment programs in a way that is fair to all students. This paper focuses specifically on the comparability of student writing on tablets and laptops. Data were collected in spring 2013 from a sample of 826 students from Virginia and South Dakota at two grade levels. Each student in the study was provided with a laptop, a tablet, or a tablet with an external keyboard and asked to respond to a grade level appropriate essay prompt. Results indicated no difference in the essay score or surface level essay features (such as word count) across study conditions. Additionally, student perceptions of the device keyboards and the quality of the writing they generated on the devices were generally positive. However, the relatively short length of essays written, the fact that reference materials were not required to respond to the essay prompts, and the general motivational level of student participants were likely contributing factors to the observed outcomes. Keywords: tablets, writing assessment, score comparability
3 Writing on Tablets 2 Assessing Student Writing on Tablets Like many academic disciplines, writing has evolved over the course of time to accommodate the introduction of new tools and new technologies into classroom instruction and assessment practices. Just as mathematics instruction has evolved from using a slide rule to a physical calculator to graphing and virtual calculators, written composition has similarly evolved from handwritten compositions to typewritten compositions to the inclusion of word processing tools such as cut, copy, paste, and even spell-check. Way, Davis, & Strain- Seymour (2008) reported that the cognitive processes students use when writing depends to a large extent on what tools they are using to write. When handwriting compositions, students typically make use of pre-writing skills such as drafting and outlining; however, when students have word processing tools available they use fewer pre-writing skills and instead opt to write and revise on the fly using the convenient revision tools within the word processing software. Tablets introduce a new twist on digital writing techniques with the advent of the virtual (aka onscreen or soft) keyboard as well as screen sizes typically smaller than what students would have with desktop or laptop computers. The virtual keyboard differs from a traditional physical keyboard in a number of ways. Perhaps most importantly, the virtual keyboard allows two states relative to hand positioning (fingers are off the keys; fingers are depressing a key) compared to the three states possible with a physical keyboard (fingers are off the keys; fingers are resting on the keys; fingers are depressing a key) (Findlater & Wobbrock, 2012). This lack of a resting state with a virtual keyboard creates challenges for the use of conventional keyboarding techniques where students are taught to rest their fingers on home row. The compact size of a virtual keyboard also means that students are working in a
4 Writing on Tablets 3 more constrained space when reaching fingers to select keys. Both of these issues result in significantly slower typing speeds when using a virtual keyboard (Pisacreta, 2013). Additionally, virtual keyboards typically have multiple screens with alpha characters displayed on one screen and numeric and symbolic characters displayed on one or more alternate screens. Students must know how to navigate between these screens and where to find the specific character for which they are looking (Davis & Strain-Seymour, 2013b). Because of this, students can make typing mistakes when using the virtual keyboard that they would not normally make when using a physical keyboard such as selecting a comma instead of an apostrophe (Lopez & Wolf, 2013). Similarly, the virtual keyboard is hidden when not in use which requires students to know how to open and close the keyboard when needed. When open, the virtual keyboard also takes up a significant amount of screen real estate and often pushes content off the screen requiring the student to scroll up to locate or reference information not in view. However, student familiarity and flexibility with devices may overcome any potential disadvantages. Strain-Seymour, Craft, Davis, & Elbom (2012) postulated that facility with using virtual keyboards may be inversely related to the level of keyboarding training. They observed that students (typically younger) who had little or no keyboarding training were very facile with the virtual keyboard and even indicated preference for it over a physical keyboard that was external to the tablet device. In addition, tablet manufacturers have taken steps to enhance the usability of the virtual keyboard with features such as haptic feedback (the key vibrates to let the student know it has been selected) or a magnified image of the key which is highlighted when the key is pressed. In fact, students report that they like these and other features of the virtual keyboard. For example, some students report that they like that, when
5 Writing on Tablets 4 activated, the caps lock key on the virtual keyboard turns all the letters from lower case into capital which makes the state of the caps lock function very clear (Strain-Seymour, Craft, Davis, & Elbom, 2013). Academic use of digital devices is increasing as 1:1 technology initiatives (1 device for each student) and concepts such as the flipped classroom become more commonplace in educational pedagogy (Hamdon, McKnight, McKnight, & Arfstrom 2013; McCrea, 2011). According to survey results from a 2011 report by Simba Information (Raugust, 2011), 75% of educators said students in their districts use device technology (including tablets, smart phones, ereaders, and even MP3 players) for educational purposes in school. The report lists increased student engagement as the primary driver for incorporation of device technology and, based on observations from early school pilots, suggests that using device technology may result in higher rates of homework completion and even increased test scores. Many schools are experimenting with Bring Your Own Device (BYOD; Johnson, 2012; Ballagas, Rohs, Sheridan, & Borchers, 2004) initiatives or rerouting technology budgets to low cost device purchases to capitalize on these perceived positive effects as well as the potential cost savings. Consistent with this increased classroom presence, the number of instructional programs and educational apps for devices has also continued to grow (Ash, 2013; Glader, 2013; Rosin, 2013; van Mantgem, 2008). As device use become more prevalent in classroom use, supporting devices for assessment use should follow to allow students to access assessment content and respond in a way consistent with their classroom and real life experiences. Students should be able to use the same device on testing day that they do in the classroom or as part of instruction. Professional testing standards (APA, 1986; AERA, APA, NCME, 1999, Standard 4.10)
6 Writing on Tablets 5 require that comparability across different modes of testing be evaluated such that the mode of delivery should not influence the student s scores or assessment outcomes. The majority of comparability research conducted to date has been focused on differences between paper and computer-delivered assessments (Winter, 2010; Kingston, 2009; Texas Education Agency, 2008; Wang, Jiao, Young, Brooks, & Olson, 2008; Wang 2004; Wang, Jiao, Young, Brooks, & Olson, 2007; Meade & Drasgow, 1993). Bennett (2003), however, suggested a broader definition of comparability as the commonality of score meaning across testing conditions including delivery modes, computer platforms, and scoring presentation. It is in this context that the concept of device comparability becomes an important consideration especially as the volume and variety of digital devices used in classrooms continues to grow. Some early research in the area of device comparability considered differences between laptop and desktop keyboards. In their study, Powers and Potenza (1996) did find differences in student writing performance which they believed to be due to the different sizes and layouts of the device keyboards. However, nearly 20 years after this finding there seems to be little attention given to this issue as users facility with the laptop keyboards as well as the design of the keyboards themselves has improved. Another key feature which distinguishes devices is screen size. Bridgeman, Lennon, and Jackenthal (2001) compared different computer monitor sizes and found that the amount of information available on screen without scrolling contributed to differences in student performance in verbal skills when students had to scroll to view reading materials. A more recent study by Keng, Kong, and Bleil (2011) found no differences in student performance when using netbooks vs. using laptops or desktops when the amount of information shown on screen was kept constant. Davis, Strain-Seymour, and Gay (2013) conducted a qualitative
7 Writing on Tablets 6 study which compared student interactions with a 10 tablet and a 7 tablet for a set of test questions using a think-aloud protocol. Students generally found the 10 screen size to be acceptable for viewing and working with test content, but found the smaller 7 screen size to be more challenging. The current study will look at the comparability of student essay scores across devices with different screen sizes and different types of keyboards. Method Overview This study examined the quality and characteristics of student academic writing when the writing activity was conducted using different digital devices. Three device conditions (laptop, tablet with onscreen keyboard, and tablet with external keyboard) were evaluated for students at two different grade levels (grade 5 and high school). The devices differed in terms of screen size (15 laptop vs. 10 tablet) as well as type of keyboard (virtual vs. physical). Participants Data collection included students from 5 th and 10 th grades in South Dakota and students in 5 th and 11 th grades from Virginia. Recruitment of participating schools was conducted with support of the respective state departments of education. Five school districts located in the vicinity of Sioux Falls, South Dakota (Brandon Valley, Chester, Beresford, Sioux Falls, and Tea) and one school division from Virginia (Isle of Wight) volunteered to participate. Schools were asked to distribute permission slips to all participating students to obtain consent of legal guardians for students to participate in the study. Students were offered entry into a random drawing for a $25 gift card for their participation and best effort. A total of th grade and 442 high school (combined 10 th and 11 th grade) students successfully submitted essays to the online system. A handful of students experienced
8 Writing on Tablets 7 technical difficulties in submitting their essays (primarily due to temporary loss of internet connection), but were allowed to respond to the survey and were still considered for the random drawing for the gift card. Upon extraction of student essays, a spot check of student responses was conducted. Five essays were deleted from analysis due to unusable student responses (i.e. gibberish, random character entry, etc.). Valid essays were obtained from th grade students and 439 high school students. Table 1 shows the demographic characteristics of these students. Table 1: Demographic characteristics of participants 5th grade High School n (percent) n (percent) Male 187 (48%) 202 (46%) Female 200 (52%) 237 (54%) White 308 (80%) 356 (81%) African-American 41 (11%) 46 (10%) Hispanic 2 (< 1%) 7 (2%) Other 18 (5%) 22 (5%) Missing 18 (5%) 8 (2%) South Dakota 243 (63%) 231 (53%) Virginia 144 (37%) 208 (47%) TOTAL 387 (100%) 439 (100%) Measures Essay prompts were selected from the Pearson Write-to-Learn item bank by an English language arts content specialist. Write-to-Learn is a formative writing assessment program that uses automated scoring technology to provide feedback to students on classroom writing assignments. Each prompt is intended to elicit a short ( word) response from students. Although automated scoring was not used for this study, the bank provided a rich source from
9 Writing on Tablets 8 which to draw grade level appropriate essay prompts which would align with writing instruction across the two participating states. A single prompt was selected for each grade level (5 th grade and high school) and was intended to reflect an average level of difficulty for that grade level. Figure 1 shows the prompts used in the study. Grade 5 Prompt - Expository Sometimes, teamwork is needed to accomplish a goal. Think about a time you worked as part of a team to accomplish a goal. Think about what your team was trying to do. Think about how you helped the team. Write an essay describing a time you worked as part of a team. Explain how teamwork helped your team meet your goal. Be sure to include specific details and examples in your essay. High School Prompt - Persuasive Read the following statement: Social networking sites allow people to form meaningful relationships with each other. Do you agree or disagree with the statement? Write an argument: state a clear claim. Support your claim with valid reasoning and relevant and sufficient evidence. Figure 1: Essay Prompts Used in the Study Students were presented with each essay prompt embedded in an online data collection tool. The tool had three sections the first section which asked students to enter their student id, the second section which contained the essay prompt itself, and the third section which contained a text box area for students to compose their response to the essay prompt. In addition to the essay prompt, the second section also included directions for students on how to cut, copy, and paste within both the laptop and tablet environments. Appendix A shows a screenshot of the interface for the online tool. Following submission of their response to the
10 Writing on Tablets 9 essay prompt, students were asked to complete a 15-question survey about their home and school use of different devices as well as their experience in the study itself. Procedures Researchers arrived at each school location and set-up the study equipment prior to beginning the study sessions. Researchers set up stations in the study room which represented each of the three study conditions as follows: a) laptop computer, b) 10 tablet with a foldable cover laid flat on the table, or c) 10 tablet set upright in a stand with a paired Bluetooth keyboard in front. In addition, some students in condition b or c may have had a stylus available to them for use with the tablet 1. Students cycled through the study room in groups of until all participating students at the school were tested. Upon entering the room students were randomly assigned to a station. Figure 2 provides photos that show the station configuration within one of the study locations. The laptops used in this study were Dell Latitude model E550 with 15 inch screens. The tablets used in this study were second generation ipads with a 9.7 inch screen. For condition b, the virtual keyboards were configured to turn off auto-correct, auto-capitalization, spell check, and split keyboard features. The keyboard clicks feature was turned on so that students would have audible confirmation of striking a key with the onscreen keyboard. Students may have changed the keyboard configuration during their use (and were given no specific instruction in this regard), but the keyboard configuration was reset to the default 1 Initially the stylus was intended to be an additional independent variable to allow for a finer grained look at precision of cursor placement for essay revision; however, in practice students made such little use of the stylus and reported such little revision within their essays that this variable was dropped from the study analysis. Additional detail on this decision is presented in Appendix B.
11 Writing on Tablets 10 configuration in between each testing session. For condition c, the external keyboards were Amazon Basic Bluetooth keyboards that were individually paired with each tablet. The selection of this specific keyboard was made after a review of different keyboard options (Frakes, 2013) and a trial run of several different keyboards by the researchers. The final selection was made based on overall quality of typing experience as well as a consideration of price point to try to reflect what schools might realistically purchase for student use. Figure 2: Photos of study stations in Tea Elementary School in Tea, South Dakota At the beginning of each study session, a facilitator introduced themselves, briefly discussed the purpose of the study, provided directions to the students about what to do, and answered any questions. Students were then given 40 minutes to read and respond to the essay prompt and an additional minutes to answer the survey questions. Total study participation time for each student was approximately 60 minutes.
12 Writing on Tablets 11 Data Analyses Dependent variables Student essays were extracted from the online data collection tool and passed through a parsing software that generated a set of essay features that described the surface characteristics of each student response. These features included: Character count total number of characters in the student response Word count total number of words in the student response Sentence count total number of sentences in the student response Words per sentence average number of words per sentence in the student response Percent of content (non-function) words percent of total number of words in the student response which are content related not function words (i.e. a, an, the, etc.) Characters per content (non-function) word average number of characters per word in the student response excluding function words (i.e. a, an, the, etc.) Percent of misspelled words percent of words in the student response which are misspelled. In addition, student responses were scored by professional scorers who were blind to the study condition. Scorers applied a 6-point holistic rubric (6=An Exceptionally Skillful Composition; 1=A Below Basic Composition). The full rubric is provided in Appendix C. Student responses were double-scored and the final score was determined as the average of the two scores. Although random assignment to condition was employed throughout the data collection process, state test scores were obtained and used as covariates in all inferential data analysis to statistically control for academic proficiency. For 5 th grade students (in both Virginia and South Dakota) as well as 11 th grade students in Virginia, current year reading scores on the state assessment were used for this purpose. For 10 th grade students in South Dakota no state
13 Writing on Tablets 12 assessment scores were available, so writing performance from essays in the Write to Learn assessment were used instead. These students participated in the Write to Learn assessment program outside of this study and so had a history of writing scores available from their classroom writing assessments in the Write to Learn system. For students with more than one writing score, the average of all writing scores was used. All assessment scores were standardized by converting the individual scores to z-scores using the average of each group. For example, the z-scores for Virginia and South Dakota 5 th grade students were calculated separately using the average of reading scores for each group. Analyses Descriptive review of all dependent variables and survey responses was conducted. The primary inferential analysis was a one-way ANCOVA conducted for final essay scores resulting from the professional scoring process for the 5 th grade students and separately for the high school students. The three device conditions (laptop, tablet with virtual keyboard, and tablet with physical keyboard) were used as the independent variable and the z-scores derived from the students state reading or Write-to-Learn scores were used as a covariate. Results Table 2 provides the mean (and standard deviation) of final essay scores for each study condition along with the sample size for that condition. Means across study condition are very similar and are within rounding of 3 out of 6 rubric score points for all study conditions for both 5 th grade and high school students. As might be expected from the descriptive results, the ANCOVA showed a non-significant main effect for study condition for both the 5 th grade (F(2,378)=.08, p>.10) and the high school groups (F(2,431)=1.26, p>.10) after controlling for student reading/writing ability.
14 Writing on Tablets 13 Table 2: Mean essay score by condition Study Condition 5 th Grade Mean (SD) N Laptop (condition A) 2.92 (1.23) 71 Tablet with virtual keyboard (condition B) 3.03 (1.03) 158 Tablet with physical keyboard (condition C) 2.92 (1.16) 158 Study Condition High School Mean N Laptop (condition A) 3.22 (1.11) 87 Tablet with virtual keyboard (condition B) 3.08 (1.08) 178 Tablet with physical keyboard (condition C) 3.19 (1.08) 174 Figures 3 and 4 show the distribution of final essay scores by study condition for 5 th grade and high school students. For 5 th grade students the distributions show a slight positive skew (skewness values of 0.12, 0.05, and 0.32 for conditions A, B, and C, respectively) with fewer students obtaining the highest score points. Results for high school students are similar with skewness values of 0.25, 0.03, and 0.05 for conditions A, B, and C, respectively. No systematic pattern of differences in student performance is apparent across study conditions.
15 Writing on Tablets 14 30% 25% 20% 15% 10% 5% 0% Essay Score Distribution Grade Tablet Virtual Keyboard Tablet Physical Keyboard Laptop Figure 3: 5 th grade essay score distribution 30% 25% 20% 15% 10% 5% 0% Essay Score Distribution High School Tablet Virtual Keyboard Tablet Physical Keyboard Laptop Figure 4: High school essay score distribution
16 Writing on Tablets 15 Tables 3 and 4 show the mean values for the various essay surface features extracted by the parsing software for the 5 th grade and high school student essays respectively. Patterns across study conditions are very similar and support the overall descriptive and inferential results seen with the final essay scores. Table 3. Grade 5 Essay Feature Means Tablet with Virtual Tablet with Physical Keyboard (n=158) Keyboard (n=158) Laptop (n=71) Character Count Word Count Sentence Count Words per Sentence % Content (Non-Function) Words 48% 49% 48% Characters per Content (Non-Function) Word % Misspelled Words 5% 6% 5% Table 4. High School Essay Feature Means Tablet with Virtual Tablet with Physical Keyboard (n=178) Keyboard (n=174) Laptop (n=87) Character Count Word Count Sentence Count Words per Sentence % Content (Non-Function) Words 53% 53% 53% Characters per Content (Non-Function) Word % Misspelled Words 4% 4% 4% Figures 5 through 11 show selected results from the student surveys. A total of th graders and 432 high school students responded to the survey. However, student response rates to individual questions varied somewhat, therefore percentages are reported in relationship to the number of students responding to each question and do not include missing responses. For survey questions displayed in Figures 5-7, students were asked to choose all responses that applied so percentages will total to more than 100%.
17 Writing on Tablets 16 Figure 5: Device use at home Figure 6: Device use in school
18 Writing on Tablets 17 As seen in Figures 5 and 6, a large percentage of students in the study reported using laptop computers, desktop computers, and/or touch-screen tablets regularly at home and in schools. Very few students reported using a stylus or a separate keyboard with a touch-screen tablet. Both at home and in school, a higher percentage of high school students (71% and 58%, respectively) than 5 th grade students (56% and 42%, respectively) reported using laptop computers. However, a larger proportion of 5 th grade students (51%) than high school students (37%) reported using touch screen tablets at home. Figure 7: Device use for writing Figure 7 shows that a large proportion of high school and 5 th grade students reported using a laptop (74% and 48%, respectively) or desktop computer (55% and 52%, respectively) to write essays. Interestingly, large proportions of 5 th grade and high school students also indicated they used paper and pencil (61% and 57%, respectively) to write their essays. Very few students from either grade reported using touch screen tablets to write essays.
19 Writing on Tablets 18 Figure 8: Ease of keyboard use by study condition (5 th grade) Figure 9: Ease of keyboard use by study condition (high school)
20 Writing on Tablets 19 Figure 8 shows that 94% of 5 th grade students found the virtual keyboard either very easy or somewhat easy to use compared to 93% for the laptop keyboard. However, 5 th graders top preference was for the tablet with external keyboard as 99% reported that they found this keyboard either very easy or somewhat easy to use. Figure 9 shows a slightly different trend for high school students. Only 71% of high school students found the virtual keyboard either very easy or somewhat easy to use compared to 99% for the laptop keyboard. The external keyboard did seem to improve high school students comfort level as 94% reported that they found this keyboard either very easy or somewhat easy to use. Figure 10: Student evaluation of their essay quality (5 th grade)
21 Writing on Tablets 20 Figure 11: Student evaluation of their essay quality (high school) Figure 10 shows that majority of 5 th graders who used the tablet with either the virtual keyboard or the physical keyboard reported that the essay they wrote for the study was better than what they usually write (52% for virtual keyboard and 63% for physical keyboard). This contrasts with only 25% of 5 th grade students in the laptop condition who reported that the essay they wrote for the study better than what they usually write. Very few 5 th grade students in any condition reported that the essay they wrote for the study was worse than what they usually write. Results for high school students as seen in Figure 11 show a somewhat different pattern with 29% of students who used the tablet with the virtual keyboard reporting that the essay they wrote for the study was worse than what they usually write. Interestingly, high school students who used the tablet with the physical keyboard reported that the essay they wrote for the study was either about the same as (70%) or better than (21%) what they usually write. This is similar to the responses from students who used the laptop (79% reported their
22 Writing on Tablets 21 essay was about the same and 6% reported that their essay was better than what they usually write). Discussion Overall the results from this study are striking in that there were no observable performance differences in student writing across study conditions. In many ways this is counter to the expectations that most adults might have given their own experiences in using virtual keyboards. What is even more surprising is that these findings seem to hold across both grade levels studied. Previous research would have suggested that older students who had more training and experience with keyboarding skills might have been expected to struggle more with the virtual keyboard than younger students who had not yet developed this facility. While the survey responses of the high school students did indicate a definite preference for a physical keyboard, this preference did not translate into a performance difference across conditions. In addition, while high school students perceptions of the virtual keyboard were not as strongly positive as their perceptions of the physical keyboards, neither were they completely negative as 71% of high school students reported finding the virtual keyboard somewhat or very easy to use. Both the survey responses and the lack of performance differences across conditions suggest a certain degree of robustness and/or adaptability that students in this study had for working across different devices. However, the extent of this adaptability may not have been fully tested within the context of the current study. Essay responses were generally very short (in the vicinity of 275 words on average for high school students and 230 words on average for 5 th grade students). It is possible that responses of these lengths did not create the same level of challenge for students in using the virtual keyboard that they might have
23 Writing on Tablets 22 encountered if they were asked to write a lengthier response or paper. Motivation also likely played a role as there were no stakes for students associated with their performance on the study essay. While the possibility of winning a gift card for their participation and best effort may have partially offset this, it is unclear to what degree this experimental manipulation was effective. The shorter than expected essay lengths (compared to the word responses observed with these prompts in Write-to-Learn) gives some indication that students in the study (especially high school students) may not have provided their best effort. Lastly, the essay prompts themselves were relatively short and did not require reference to external stimuli like reading passages, charts, or data tables for students to be able to craft a response. As such the screen real estate taken up by the virtual keyboard may have been less of an issue since students did not have to manage across multiple pieces of information that might have otherwise required them to scroll between the stimuli and the response area. Further research should be conducted to evaluate how lengthier responses and/or greater dependency on external stimuli might impact the quality of student writing. It is likely that students familiarity and proficiency with writing on touch-screen devices will improve over time. Additionally, touch-screen devices themselves will continue to evolve to incorporate new methods for text entry such as adaptive keyboards, gestural input, selectable touch menus, Swype, and split keyboards (Pierce, 2012; Findlater & Wobbrock, 2012). While testing programs with significant written components should monitor these developments closely, the current study suggests that at least basic writing tasks can be assessed on tablets and laptops in a comparable way.