1 FURTHER RESEARCH ON THE READING-STORAGE TEST AS A MEASURE OF GAIN DURING READING a Ronald P. Carver b University of Missouri at Kansas City Abstract. The sensitivity of the reading-storage technique as a measure of gain during reading was investigated. Gains on this test were related to three other variables passage difficulty, reader ability, and the understanding judgments of readers themselves. Reading-storage tests on 6 prose passages were administered to about 600 students in Grades -. The passages ranged in difficulty from beginning reader level to college level. After reading a passage, the Ss made understanding judgments and took a reading-storage test on the passage. The same tests were given to other Ss who had not had the opportunity to read the passage first. The gain on the reading-storage test between nonreading and reading was highly correlated with understanding judgments. It was concluded that the reading-storage test was sensitive to the gain due to reading as long as the difficulty level of the material was approximately equal to the reading ability level of the individual. Some measures are more sensitive to gains due to reading than other measures. For example, Tuinman (97-7) found that scores on some standardized reading tests were more dependent upon reading the passages than other tests, i.e., standardized tests vary with respect to their sensitivity to gains due to reading the passages. Often, a researcher desires to investigate the beneficial effects of reading experimental passages that are not part of standardized tests. In these situations it is desirable to have a good dependent variable, i.e., one that is highly sensitive to the effect of reading. At present, the objectively developed measure that many researchers would consider using in this situation is the cloze task. Yet, the regular cloze task seems to be very insensitive to gains due to reading (e.g., see Carver, 97b). A measure that appears to be more sensitive than cloze to gains due to reading is the reading-storage type of test (Carver, 97b). like the cloze test, this test is also developed in a completely objective manner (see Carver, 975a). The reading-storage measure has also been compared to the paraphrase type of test and a This research was supported by the Office of Naval Research Personnel and Training Research Programs, Contact N000-7-C-00. b Rcquest reprints from the author, School of F.ducation, University of Missouri - Kansas City, 500 Rockhill Road, Kansas City, Missouri 60. 0
2 0 Journal of Reading Behavior 975 VII, it appears to be almost equal in sensitivity to this subjectively developed type of measure (Carver, 975b). The reading-storage test seems to have potential and it seems to be as good as or better than previously used measures. At least, its properties appear to deserve further research. One primary purpose of the present research was to investigate how the difficulty level of the reading material affected scores on the readingstorage measure, especially gains on this measure. Do individuals show more gain on this test for less difficult passages as compared to more difficult passages? Another primary purpose was to investigate how the level of ability of the individual affected scores on this measure, especially gains on this measure. Do higher ability individuals show more gain on this measure than lower ability individuals? The previously cited research on the reading-storage test has involved college students as subjects, i.e., readers at the higher levels of reading ability. It seemed important to determine the sensitivity of the reading-storage test at other levels of reading ability. Anothei purpose of the research was to compare scores on this new type of test to understanding judgments made by the subjects themselves. Previous research has indicated that college students can reliably and validly rate passages they have read with respect to the percent of the passage they thought they understood (e.g., see Carver, 97, Carver, 97b; Carver, 975b). Yet, the validity of this technique might be questionable at the lower levels of reading ability. In summary, the reading-storage technique was investigated further by systematically studying its relationship to three other variables the reading difficulty of passages, the reading ability of individuals, and the understanding judgments of individuals. Subjects METHOD Altogether about 600 students in Grades - of a small-town, rural school system c were tested. The tests were administered to all students in each grade who were attending class that particular day. In Grades -6, the tests were administered in 6 separate classrooms, and in Grades 7-, the tests were administered to each grade separately. Reading Ability The National Reading Standards (NRS), a new reading test (Carver, in press), had been administered to these students on a previous occasion so that these scores were available for separating individuals into levels of reading ability. There are 5 c The helpful cooperation of the staff and students of the Pierce City, Missouri, R-6 School System is gratefully acknowledge-mr. J. D. Smith, Superintendent; Mr. Donald Trotter, Secondary Principal; and Mr. Earle Staponski, Elementary Principal.
3 Carver 0 levels of this test with two forms, A and B, at each level. The higher levels of the test includes passages at higher levels of difficulty. The task is a multiple-choice version of the cloze task, called the reading-input task (see Carver, 975a). Two or more levels of one form of the test were administered and then two or more levels of the other form of the test were administered. This sequential type of testing, as prescribed in the test manual, was designed to provide the most reliable score on each form of the test. The NRS provides grade ability scores (G a ) varying from Grade 0 to Grade 6, and these G a scores are further categorized into level ability (I^) scores as follows: Level, Grades -; Level, Grades -6; Level, Grades 7-9; Level, Grades 0-; Level 5, Grades -5; Level 6, Grade 6+; Passages and Tests The 6 experimental passages were sampled from the 0, 00-word Bormuth (969) passages. Four passages were sampled from each of the first four levels of difficulty as measured by the RIDE Scale. The RIDE Scale estimates difficulty in terms of average word length in letters per word (Carver, 97a). This brief introduction to the RIDE Scale will facilitate the explanation of the study design, to be presented later. However, the primary measure of material difficulty used in this research was the Reading Scale of Prose Difficulty (Carver, 97a). The more valid Rauding Scale had not been completed when the study was designed, and this is the reason why the RIDE Scale was used in the original sampling design. The Rauding Scale uses the subjective judgments of three qualified experts to rate passage difficulty on a grade difficulty (G<j) scale which corresponds to the grade ability scale noted earlier. In order to qualify as an expert, an individual must first pass a qualification test, called the Rauding Scale Qualification Test (Carver, 97c). When an individual's grade ability, G a, as measured by the NRS, equals the grade difficulty, Gj, of the passage as indicated by the Rauding Scale, then the probability is.50 that the individual can read and understand the passage. The use of an ability scale and a difficulty scale that have been calibrated to reflect the same dimension has major advantages. Although traditional standardized reading tests and readability formulas give grade level scores, the commonality of the scale is more apparent than real. For example, scoring at the eighth grade level on a standardized reading test does not necessarily mean than an individual can read and understand material that has received an eighth grade rating by a readability formula. Use of the NRS test for estimating reader ability and use of the Rauding Scale for estimating material difficulty assures that both measures are reflecting values along the same measurement scale. The grade difficulty, G^, values may be further categorized into the less refined level of difficulty (L,j) values in the same manner as explained earlier for G a and L a except there is no Grade 0 difficulty and Level 6 difficulty includes Grade 6-8 difficulties. The Gj values were not used in the present study except for determing the L(j values. Their description, however, facilitates the description of the L(j values.
4 0 Journal of Reading Behavior 975 VII, Table contains the Bormuth I.D. number for the sixteen passages, their RIDE Levels, and their estimated difficulty using the Rauding Scale, i.e., G^ and L(j. The Rauding Scale values were determined as part of another study reported elsewhere (Carver, 97a). Notice that there were three passages at Level on the Rauding Scale, 5 at Level, at Level, at Level, and at Level 5. Table Descriptive Information for the Sixteen Experimental Passages Bormuth Identification Number Ride Level G d L d The RS-Tests on the passages were developed using the standard algorithm for developing this type of test (see Carver, 975a). Figure includes an example of the reading-storage type of test. This figure includes a reading passage and a reading-storage test on the passage. Notice that every other word on each line has been deleted except fot the initial letter. One of the five initial letters has been replaced with a wrong letter. The task for the individual is to read the passage and then recognize and circle the wrong letter on each line without referring back to the passage. Notice that the wrong letter has already been circled for the first five items in the example. Procedures Each member of each class of students was given a test booklet which contained directions, an example passage, an example test, and six RS-tests. They were told that the testing would only take about 0 minutes and that they could find out their scores on the tests. After the directions and examples had been
5 Carver 05 completed, the reading and testing was initiated. The st, rd, and 5th tests on passages were administered after the Ss had had an opportunity to read the passages first. The Ss were given one minute to read each of the three passages and then four minutes to work on each of RS-tests on the passages. The nd, th, and 6th tests on passages were administered without any opportunity to read the passage on which the test was based. Under these nonreading conditions, the 5s were asked to make their best guesses on the tests. EXAMPLE PASSAGE This is our Post Office. It is in our city. Many people work here. There is a Post Office in every city in our country. And Post Offices in every country in the world. A Post Office helper must be honest. He must be a good worker. A Post Office helper handles lots of mail. A Post Office helper handles lots of money. The Post Office sends letters and packages, magazines, and newspapers all over the world. It sends small animals and plants, too. It sends money for us. It saves money for us. It puts money to work for us, too. Reading Storage Test on an Example Passage THIS IS. O POST O IT I IN( H/_ CITY. M PEOPLE. W HERE. IS A POST O IN E CITY. (E) OUR C AND P OFFICES I EVERY C IN. T WORLD. A POST O HELPER M BE HE 5. M BE A GOOD W A(É? OFFICE H HANDLES 6. S OF M AP OFFICE H HANDLES L OF 7. M THE C OFFICE S LETTERS A PACKAGES, M, AND 8. N ALL O THE W IT S SMALL W AND 9. H, TOO. I SENDS M FOR U IT S MONEY 0. F US. A PUTS M TOW FOR U, TOO. Figure : An example RS-test on an example passage. The Ss were also instructed to rate their percentage of understanding for each passage they read, immediately after they had read a passage and immediately before taking the test on the passage. They were instructed to circle one of understanding judgments printed at the top of their tests, ranging from 0 to 00 in increments of 0. They were told to estimate the percentage of the sentences contained in the passage which they thought they had understood. Design The first two RS-tests, i.e., one given after reading the passage and one given without getting to read the passage, were exactly the same for all 5s and were
6 06 Journal of Reading Behavior 975 VII, regarded by E as practice. The last four RS-tests were all at the same RIDE Level for each S. Within each set of four passages at each RIDE Level, the order of presentation of the four tests was varied according to a Latin Square design so that each test was administered once in the four possible order positions to a set of four Ss. Since there were four possible orders of tests at each RIDE Level and since there were four RIDE Levels altogether, this made a total of 6 different test booklets. After the test booklets had been assembled, they were stacked in order so that when they were passed out to the Ss, each consecutive set of 6 individuals would receive all 6 possible treatment conditions. This design provided control over possible practice or fatigue effects associated with the order of presentation, and it also allowed each test to be administered under both the reading and nonreading condition. Data Analysis Each RS-test was scored, a correction for guessing formula was applied, and finally these scores were converted into a percent correct score. The average of the two NRS Level scores, Form A and Form B, was used to determine each individual's level of ability, I^. For all the Ss at each ^, the mean of the RS-test scores on all the passages at each Lj was calculated. The same type of analysis was conducted for the percent understanding judgments except due to a great deal of skew in these data the medians were used instead of means. The final sample for analysis included 98 individuals who were administered the RS-tests and who also had two NRS test scores available. RESULTS AND DISCUSSION Fig. contains the mean percent correct scores on the RS-tests as a function of the level of difficulty, L<j. In Fig. there is a curve presented for each level of ability, ^. The values in Fig. are for the two RS-tests administered to the «Ss under the reading condition. Notice that for each level of individual ability, ^, the RS-test scores decrease almost linearly as a function of the passage difficulty, Lj. The exception to linearity is the Level ability readers whose scores approached zero at Level difficulty. Notice also that each higher level of ability, Lj,, had higher RS-test scores. Fig. presents the same relationships as Fig. except the scores are on the tests that were given under the nonreading condition. Notice that individuals at the lower levels of reading ability, Levels -, tend to score near zero at all five difficulty levels, Levels -5. For the higher levels of reading ability, Levels -5, scores under the nonreading condition tend to increase as the difficulty level of the material decreases. Under the nonreading condition, there was also a tendency for the higher ability levels, ^, to score higher on the RS-test. Fig. contains the gain in RS-test means from nonreading to reading as a function of the level of difficulty, Lj, of the material. As in Fig. and Fig., there
7 Carver 07 is a curve for each of the five levels of reading ability, L a. The values in Fig. were calculated by subtracting each value in Fig. from its counterpart in Fig.. Notice that the gains for the Levels and 5 ability readers seem to be restricted by a maximum of 0 percentage points. The gains for the individuals at Level reading ability seem to be similarly affected by the absolute minimum gain of zero percentage points. Notice again that there was a general tendency for the higher ability levels, ^, to gain more than the lower ability levels and there also tended to be higher gains on the lower difficulty levels, L^, as compared to the higher difficulty levels. 00 w LU I CO Œ Ü ÏÏ0 DC O Ü h- 0 Ü cc LU Figure : I 5 DIFFICULTY LEVEL, L d Mean percent correct on the RS-tests as a function of passage difficulty level, Ld, for different ability levels, L a. I
8 08 Journal of Reading Behavior 975 VII, 00 CO HI Ï 80 CC z o O 60 HI CC GC o ë 0 HI Ü CC 0 C5 / L =X L a= a HI 0 CC - 0 DIFFICULTY LEVEL, L d Figure : Mean percent correct on the RS-tests given without a prior reading of the passages, i.e., nonreading, as a function of passage difficulty level, L(j, for different ability levels, ^. Returning to the results under the nonreading condition in Fig., there appears to be a pronounced interaction between the effect of difficulty levels and ability levels, upon the reading-storage test scores. All scores seem to approach zero as difficulty levels increase, but there is a disproportionate decrease in scores for each lower ability level as difficulty levels increase. This indicates that individuals at the highest ability levels are able to correctly infer a great deal about all of the words in easy passages when they are given a complete knowledge of only every other word in the passages, and that this ability increases disproportionately with
9 Carver 09 TOO w 80 CO cc O 60 o HI CC g 0 ü,lo=5 LU O 0 LU Q.? 0 0 Figure : i 5 DIFFICULTY LEVEL, L d Gain from nonreading to reading on the RS-tests as a function of passage level, Lj, for different ability levels, Lg. i the joint increase of ability and decrease of difficulty. If there had been individuals in the study at Level 6 ability, extrapolations from these data would suggest that they would score about 70 percent on the Level reading-storage tests without any opportunity to read the original passage. These data serve to point out the inherent difficulty of developing tests to determine what was gained as a result of reading. The higher ability individuals are able to use the information given in the test itself, their general background knowledge about the subject matter, their general knowledge of the redundancies of the language itself, and their superior intellectual
10 0 Journal of Reading Behavior 975 VII, 00 o 80 Q < ce LU Q 0 LU O DC LU a- 0 Figure 5: 0 I I I 5 DIFFICULTY LEVEL, L d Median percent understanding ratings as a function of passage difficulty level, L(j, for different ability levels, L a. skills to correctly infer the answers to test items on prose without ever reading the specific prose in question. Fig. 5 contains the percent understanding medians as a function of difficulty levels. As in Fig. -, there is a curve for each of the five levels of reading ability, I^. It may be noted that these curves are not smooth, thus suggesting a certain amount of unreliability. For individuals at Levels -5 in ability, there is a general decrease in percent understanding as passage difficulty increases. However, for the Level individuals, percent understanding seems to fluctuate around 50 percent for all five levels of passage difficulty. It seems likely that the Level individuals simply did not understand "percent" because if circles were drawn randomly around one of the possible understanding estimates then 50 would be the expected median. These data also indicate a general tendency for the individuals of higher ability levels, Lg, to understand more than the individuals at the lower ability levels at each level of difficulty.
11 Carver One surprising result in Fig. 5 is the consistency of the percent understanding ratings when L a -L(j=l, i.e., 7, 7, 76, and 7. By definition this is the Rauding Level (Carver, 976). When an individual is given material to read at a difficulty level that is one level below his ability level, then the individual is said to be reading at his Rauding Level. The individual is expected to be able to read and understand most of the material at his or her Rauding Level. Previous data from teachers, rating whether or not their students could read passages at varying difficulty levels indicated that the probability was about.75 that individuals could read and understand passages at their Rauding Level (Carver, in press). Each student received a rating by his teacher regarding whether the teacher thought the student could read and understand each of 0 passages. The mean rating for all the students at each ability level, ^, and all the passages at each difficulty level, Lj, were calculated and the result was that about 75% of the passages that were one difficulty level below the ability level of the individual were rated as being capable of being understood when read. Therefore, this previous finding that teachers estimate students are able to read and understand about 75% of the passages that are one difficulty level below their ability level seems to compare favorably with the current finding that students themselves, on the average, estimated that they understood 7, 7, 76, and 7 percent of each passage that was one level below their ability level. The percent understanding data in Fig. 5 roughly parallels the gain data in Fig., for individuals at Levels -5 of ability. The correlation between the 0 reading-storage medians in Fig. 5 and their 0 corresponding means in Fig. was.86. This high relationship suggests that reading-storage test gains could be used to estimate the degree to which a passage was understood. It should be remembered that this would be a relative prediction because a 0 percentage point gain on the highly difficult RS-test would not necessarily mean that only 0 percent understanding had been gained. Even these relative estimates would not be valid, however, for high ability individuals who were reading low difficulty passages because their gain scores are artifactually attenuated. Gain scores probably should not be used to estimate relative degrees of understanding when the score on the nonreading reading-storage test score is above about 0 percent. The reading-storage test gain data in Fig. and the percent understanding data in Fig. 5 also may be analyzed with respect to ability-difficulty differences, L^-Lj. For each ability-difficulty difference, the mean of all the values included in each difference category was determined. These data are presented in Fig. 6. The percent understanding data from the individuals at Level ability were not included in this analysis. Also presented in Fig. 6 are the complete data, mentioned earlier, regarding the teacher ratings of whether or not students could read and understand specific passages (data taken from Carver, in press). It is readily apparent from Fig. 6 that the teachers'judgments were the most sensitive to the entire range of ability-difficulty differences; they ranged from about 0 to 00. The student understanding judgments ranged from about 0 to about 90 and the gains on the reading-storage test ranged from about 0 to 0. The average
12 Journal of Reading Behavior 975 VII, gains on the reading-storage tests presented in Fig. 5 do not discriminate nearly as well as the teacher judgment data but they do discriminate about as well as the students' judgments. It should also be noted in Fig. 6 that the discrimination is comparable between the students own judgments of understanding and the teacher's judgments when the ability level is above the difficulty level, i.e., L a > Lj. This comparability B 0 ce LU PERCENT UNDERSTANDING y 0 + ö RS-TEST GAIN Figure 6: 0 I I I I I I I ABILITY-DIFFICULTY DIFFERENCES, L a -L d Percent understanding as rated by the individuals, percent of passages that could be read and understood as rated by teachers, and RS-test gain as a function of the difference between the level of ability of the individual and the level of difficulty of the passage,
13 Carver may be noted by the almost perfect coincidence between the curve for the understanding judgments of the students and the curve for the teacher judgments between the ability-difficulty differences of 0 and. The reading-storage test was most sensitive to gains when the ability-difficulty differences ranged between - to +, i.e., from one ability level below the difficulty level, -, to one ability level above the difficulty level, +. The gain in percentage points through from - to + was. For the understanding ratings and the teacher judgments, the corresponding gains from - to + were 8 and 55, respectively. Thus, for ability-difficulty differnces ranging from - to +, the reading-storage test was more sensitive to the gain due to reading than the understanding ratings of students but considerably less sensitive than the teacher's judgments of how much the students would understand when they read. It appears that the gains on the reading-storage test do tend to reflect the degree to which a passage was read and understood. However, reading ability levels and passage difficulty levels seem to place limits upon the efficacy of this test. Gain on the test is likely to underestimate the degree to which Level and 5 individuals understand Level and passages. No data were available on Level 6 individuals but it could be assumed that the same limitation would hold for these individuals except it may extend up as far as Level difficulty, or farther. The understanding judgments appear to be relatively valid for individuals at Level ability and above. As mentioned earlier, the reason they are not valid for Level individuals is probably because these individuals do not understand percent. The evidence suggests that even the individuals at Levels -5 of reading ability were overestimating their percent of understanding when the ability level is below the difficulty level, i.e., ^ < Lj. The evidence also suggests that individuals in Levels -5 are relatively accurate in estimating their percent understanding for material at or near their Rauding Level, i.e., L a -L ( j=l, whereas the reading-storage test is most sensitive to gain differences at or near the Reading Level, i.e., L a -L a =0. CONCLUSIONS AND IMPLICATIONS In general, gain scores on the reading-storage test tend to be higher for low difficulty material as compared to high difficulty material. This is the type of result that would be expected from a measure of the effect of reading a passage, i.e., a good dependent variable for investigating the reading effect. This relationship between test gains and material difficulty seems to hold for elementary through high school level readers, i.e., Levels,, and, but it does not hold for beginning level readers, Level, or college level readers, Level 5. The gain scores of the Level readers drop to zero at Level difficulty materials and remain there for the higher levels of material difficulty. The gain scores of the Level 5 readers approach a maximum of 0 percentage points at Level difficulty materials and the gain does not increase at the lower levels of material difficulty. In general, gain scores on the reading-storage test tend to be higher for high ability individuals as compared to low ability individuals. Again, this is the type of
14 Journal of Reading Behavior 975 VII, result which would be expected from a good dependent variable for investigating the reading effect. This relationship between test gains and reader ability seems to hold for all the difficulty levels except the gains at the lower levels of difficulty appear to be approximately equal to each other for the higher ability individuals. There seems to be a high relationship between gains on the reading-storage measure and other estimates of understanding during reading. When these gain scores at various ability-difficulty differences are compared to these other estimates of understanding, it appears reasonable to conclude that the gain on the reading-storage test is reasonably sensitive to gain during reading as long as the difficulty level of the material, Lj, is not more than one level above or below the ability level of the individual, ^. Since much research involves the use of reading material at a difficulty level that is approximately equal to the ability level of the individuals participating in the research, it appears that the reading-storage test is a sensitive measure for use in many reading research situations where a measure of gain due to reading is desired. REFERENCES BORMUTH, J. R. Development of readability analyses. U. S. Office of Education Final Report, Proj. No , Contract No. OEC , University of Chicago, March 969. CARVER, R. P. Understanding, information-processing, and learning from prose materials. Journal of Educational Psychology, 97, 6, CARVER, R. P. Improving reading comprehension: Measuring readability. Washington, D. C: American Institutes for Research, Final Report 7-, May 97. (a) CARVER, R. P. Measuring the primary effect of reading: Reading-storage technique, understanding judgments, and cloze. Journal of Reading Behavior, 97, 6, 9-7. (b) CARVER, R. P. Manual for the Rauding Scale Qualification Test. Kansas City, Mo.: Revrac Publications, 97. (c) CARVER, R. P. Revised procedures for developing reading-input materials and reading-storage tests. Journal of Reading Behavior, 975, in press, (a) CARVER, R. P. Comparing the reading-storage test to the paraphrase test as measures of the primary effect of prose reading. Journal of Educational Psychology, 975, in press, (b) CARVER, R. P. Manual for the National Reading Standards. Kansas City, Mo.: Revrac Publications, in press. TUINMAN, J. J. Determining the passage dependence of comprehension questions in 5 major tests. Reading Research Quarterly, 97-7, 9, 06-.