1 Predictive Validity of the Get Ready to Read! Screener Concurrent and Long-Term Relations With Reading-Related Skills Journal of Learning Disabilities Volume 42 Number 2 March/April Hammill Institute on Disabilities / hosted at Beth M. Phillips Christopher J. Lonigan Marcy A. Wyatt Florida State University This study examined concurrent and longitudinal relations for the Get Ready to Read! (GRTR) emergent literacy screener. This measure, within a battery of oral language, letter knowledge, decoding, and phonological awareness tests, was administered to 204 preschool children (mean age = 53.6, SD = 5.78; 55% male) from diverse socioeconomic backgrounds. Subgroups were reassessed at 6 months and 16 and 37 months later. Results indicate strong relations between the GRTR and the literacy and language assessments. Long-term follow-up indicated that the screener was significantly related to some reading-related measures, including decoding skills. These results support the utility of the GRTR as a brief, valid measure of children s emergent literacy skills. The GRTR holds promise as a tool useful for educators, parents, and others in regular contact with preschool children to help determine those who may be at risk for later reading difficulties and could benefit from intervention and focused instruction in emergent literacy. Keywords: preschool age; identification; assessment; early literacy; reading The past several decades have been witness to a notable increase in knowledge about the causes, correlates, and predictors of children s reading success and failure. An associated phenomenon is greater interest in early identification and intervention for children determined to be at risk for reading problems. In large measure, this stems from research suggesting that children s reading performance is highly stable from early in elementary school (e.g., Juel, 1988) and from research indicating more success with prevention and earlier intervention than with remediation for older students (Berninger et al., 2002; Coyne, Kame enui, Simmons, & Harn, 2004; Torgesen, 2000). Of course, to successfully prevent reading difficulties for at-risk students, they first have to be identified accurately. As the focus of federal and state educational policies and early learning standards has shifted toward the inclusion of early literacy skill development, the need has grown to identify constructs and reliable measures of these constructs that best predict who will and will not develop significant reading difficulties or well-belowaverage reading ability. Several large-scale longitudinal studies have demonstrated that assessments of key skills, such as phonological awareness (PA), letter knowledge, rapid naming (RN), and oral language, conducted just before or during the onset of reading instruction (i.e., in kindergarten and first grade) are significant predictors of reading ability 1 or more years later (e.g., Catts, Fey, Zhang, & Tomblin, 1999; de Jong & van der Leij, 2003; Schatschneider, Fletcher, Francis, Carlson, & Foorman, 2004; Wagner, Torgesen, & Rashotte, 1994). Whereas most of the predictive studies have first assessed children in kindergarten or just before, a few studies have investigated the predictive power of batteries administered to preschool children. For instance, studies by Badian (e.g., 1988, 1994, 1998) have shown that multidimensional assessments have adequate predictive power up to 9 years later (e.g., Badian, 1998, reported an 87% hit rate). Badian (1994) achieved good results with a battery that included demographic and 133
2 134 Journal of Learning Disabilities family history information as well as child performance on PA, RN, and visual matching tasks. Fowler and Cross (1986) used a combination of family characteristics and child data to predict reading and math performance for preschool children after they completed 2 to 3 years of formal schooling and found that a score of 7 or higher on their 21-point risk index had a 98% positive predictive power for successful grade completion. Studies of preschool children that have relied only on child assessments also have predicted successfully later reading abilities (e.g., Blatchford, Burke, Farquhar, & Plewis, 1987; Bowey, 1995; Chaney, 1998; Stevenson & Newman, 1986; Storch & Whitehurst, 2002). For both kindergarten and preschool children, several published norm- or criterion-referenced measures have been created to serve as diagnostic assessment instruments. Examples include the Test of Phonological Awareness 2 (Torgesen & Bryant, 2004), the Test of Preschool Early Literacy (TOPEL; Lonigan, Wagner, Torgesen, & Rashotte, 2007), the Test of Early Reading Ability 3 (Reid, Hresko, & Hammill, 2001), the Texas Primary Reading Inventory (TPRI; Texas Education Agency, 1999), and the Phonological Awareness Literacy Screening in both kindergarten and preschool versions (PALS-K and PALS-PreK; Invernizzi, Juel, Swank, & Meier, 2007; Invernizzi, Sullivan, Meier, & Swank, 2004). These measures have demonstrated adequate psychometric properties and concurrent validity; however, few of them have established longitudinal predictive capacity in terms of either correlations with later decoding or comprehension measures or power to identify children at risk for actual reading disabilities (Havey, Story, & Buker, 2002). An alternative possibility to criterion- or norm-referenced tests is kindergarten-entry readiness tests that typically include broad measures of cognitive and academic skills. For example, Duncan and Rafter (2005) found the Phelps Kindergarten Readiness Scale to be a moderate predictor of Woodcock-Johnson Achievement scores across an 8-month interval in kindergarten. Some of these tests were designed to determine whether children had adequate skill levels to begin school, a practice that, although still widespread, has not shown demonstrable validity (May & Kundert, 1997; Morrison, Griffith, & Alberts, 1997; Shepard, 1997; Stipek, 2002). Several reviews report widely varying success for these kindergarten entry measures in predicting later academic outcomes, including reading performance (e.g., La Paro & Pianta, 2000; Pianta & McCoy, 1997; Shepard, 1997; Tramontana, Hooper, & Selzer, 1988). La Paro and Pianta s (2000) meta-analysis of 32 studies predicting academic outcomes in kindergarten or first grade from academic measures in preschool showed an average predictive correlation of.43, with a range from.08 to.72. Data from these studies and from the recently completed National Early Literacy Panel s predictive study (Lonigan, Schatschneider, & Westberg, in press) suggest that the measures that are more successful predictors of decoding skill typically include key emergent literacy skills, such as letter knowledge and PA, rather than general cognitive or developmental abilities (e.g., Bramlett, Rowell, & Mandenberg, 2000; Chew & Morris, 1989; Flynn & Rahbar, 1998; Lonigan, Schatschneider, & Westberg, in press). Although findings from predictive studies are largely favorable and suggest that it is possible to identify children at risk for reading problems, most of the research batteries or clinical assessments are lengthy and time consuming and often require highly trained assessors or specialized equipment. This is also an issue for many, although not all, of the kindergarten-entry diagnostic assessments mentioned previously. These factors make such assessments prohibitively expensive and inefficient for school systems that want to screen large numbers of students each year. That is, center-based programs prefer to use informal teacher judgments or brief, universal survey measures initially that can be used to help identify those children who may benefit from receiving a longer, diagnostic assessment, as determined formally by a cutoff score indicating categorical risk status. Moreover, there is a significant issue of how to identify preschool children who may not be attending center-based educational or child care programs where a trained teacher could administer a measure. To be truly universally applicable, a screening tool would need to be usable by parents, pediatricians, and other professionals who come into regular contact with 3- and 4-year-old children. Development of Get Ready to Read! To help fill the need for a reliable, research-based screening measure for use with 3- to- 5-year-old children, Whitehurst and Lonigan developed the Get Ready to Read! (GRTR) screening tool in conjunction with the National Center for Learning Disabilities (NCLD; Whitehurst, 2001). The goal for development was to create a brief, user-friendly measure that had strong concurrent relations to more comprehensive measures of children s print knowledge, letter knowledge, and PA skills that themselves had established validity in predicting reading skill. Items similar to those from measures used in prior research were used to create an initial pool of 100 candidate
3 Phillips et al. / GRTR Screener Follow-Up 135 items, all in a multiple-choice format using easily identified pictures of both target and alternate responses. All items required only a pointing response from the child, an asset when developing a measure designed to be administered by lay individuals. Elimination of redundancy and items that were deemed overly challenging led to the creation of a 60-item alpha pool. The 60 items assessed print concepts, letter-name and letter-sound knowledge, rhyming, initial sound matching, compound word blending, and emergent writing (Whitehurst, 2001). These items were then administered to cohorts of 3-, 4-, and 5-year-old children in preschool classrooms (N = 342 total). The children were representative of a wide array of socioeconomic backgrounds but were somewhat overrepresentative of lower-income students attending Head Start (i.e., N = 223), as these children are known to be at risk for later literacy problems. Most of the children (e.g., 80%) were 4-year-olds. Analyses of internal consistency, item-total correlations, correlations with the Developing Skills Checklist (DSC; CTB/McGraw-Hill, 1990), and item difficulty were used to reduce the alpha pool to the final 20-item screener. This measure demonstrated a coefficient alpha of.78 and a split-half reliability of.80. The total score correlation with the DSC was.69. Children s scores showed a relatively normal distribution, with 68% of the children scoring between 5 and 13 correct, and scores from 1 to 20 were represented (M = 9.14, SD = 4.31). GRTR scores were somewhat correlated with age, although visual and statistical analyses suggested that this correlation was driven mostly by the 3- and 5-year-old children in the sample (i.e., the correlation was reduced and nonsignificant within the exclusively 4-year-old sample). Furthermore, reliability and validity coefficients for subsamples of middle- and lower-income children were highly comparable (i.e., splithalf reliability of.78 and.80 for middle- and lower-income samples, respectively), with only mean scores differing significantly. Principal components analysis indicated that one factor best described the screening measure in its current 20-item form (Whitehurst, 2001). Since its development, NCLD has conducted several large-scale demonstration projects involving dissemination of the GRTR and evaluation of growth in scores from the beginning to the end of the preschool year. Data from one such project, a study that included approximately 1,200 4-year-old children attending preschools in Arizona, Georgia, and Maryland, indicated that the average raw score in the fall was whereas the average score at the end of the year was 16.14, a gain of 15% on average (NCLD, 2003). In this study, about two thirds of the children achieved scores equal to or higher than 16 out of 20 on the end-of-year administration. According to Whitehurst (2001), scores in this range are indicative of strong skills. To date, the only published studies using the GRTR tool were reported by Molfese and colleagues (e.g., Molfese et al., 2006; Molfese, Molfese, Modglin, Walker, & Neamon, 2004). The 2004 study was a concurrent validity assessment of the relations between the GRTR and measures of general cognitive ability, expressive and receptive vocabulary, rhyming, blending, and environmental print. The sample included and 4-year-old children attending subsidized preschool programs for children from economically disadvantaged families. Results indicated that the 4-year-old children scored higher, on average, than did the 3-year-old children on all measures except expressive vocabulary; GRTR mean scores were 7.9 and for 3- and 4-year-old children, respectively. The GRTR had statistically significant and moderate to large correlations with all measures except the blending task for 4-year-old children, and with all measures except environmental print for 3-year-old children. In the later study, Molfese et al. (2006) found that the GRTR administered in the spring significantly correlated with fall to spring gains in letter knowledge as well as with both fall and spring letter knowledge total scores. GRTR scores also differentiated children in the high- and low-letter knowledge growth subgroups. Whereas existing data on the GRTR are promising and suggest that the GRTR measures constructs relevant to early literacy skills in a reliable and valid way, the ultimate test of a screening tool s utility is how well it predicts later success on reading-related measures. Therefore, the goal of this study was to evaluate the short- and long-term predictive relations of the 20-item GRTR measure both to more detailed measures of emergent literacy skills and to measures of actual reading ability. These analyses are conceptualized as an intermediary goal in the process of determining whether the GRTR is appropriate for use as a formal screening measure, with a cutoff score establishing risk for poor performance on a larger diagnostic measure. Specifically, the research questions were as follows: (a) What are the concurrent validity relations between the GRTR and longer single-focus measures of language, print, decoding, and phonological ability? (b) What are the shortterm predictive validity relations between the GRTR and these established measures? and (c) What is the longer term predictive validity of the GRTR with respect to measures of language, foundational literacy skills such as letter knowledge and phonological awareness, and actual decoding and reading comprehension skill?
4 136 Journal of Learning Disabilities Participants Method Original sample. All participants were randomly selected for participation in this project from three longitudinal assessment projects ongoing in fall Participants were selected for inclusion if they were 3- to 5-years-old at the time and were attending a Head Start center (41.2%), public prekindergarten (33.8%), or a private preschool center (25%). Fewer than 11% of these children were exposed to a research-based curriculum or intervention specifically targeting early literacy skill development, although their various classroom curricula did sometimes include literacy-focused activities. The GRTR was completed by 204 children who represented a range of socioeconomic backgrounds. Children ranged in age from 37 to 63 months (M = 52.15, SD = 5.57) and included 113 boys (55.4%). Whereas some of the children were designated by their schools as having mild to moderate language delays or other mild disabilities (this was a condition of preferential admission for the public preschool programs), the sample did not include any children who were known to have moderate to severe developmental disabilities (as reported by teachers or as observed by assessors). Consistent with the desire to have a sample overrepresentative of children who may be at risk for reading difficulties because of poverty and associated factors and with the fact that in the local area, the majority of lower-socioeconomic status (SES) families are minorities, the sample included 65.7% African American children, 30.9% Caucasian children, and 3.4% children of other ethnic backgrounds. Although no exclusionary rules were in place with regard to home language, consistent with these ethnic backgrounds, it is estimated that less than 1% of the children were not native English speakers as observed by the field assessors. Parent report of home language was not collected. Follow-up samples. Between 3 and 7 months (M = 5.6 months, SD = 0.86) following the initial assessment, 159 of the original participants were available for assessment. Four participants were removed from analyses because of excessive missing data, leaving a final sample of 155 children (76% of original sample). At baseline, these children were 38 to 62 months old (M = 52.17, SD = 5.50) and included 85 boys (54.8%). Their age range at follow-up was 43 to 69 months. As with the original sample, the majority of these children were African American (74.8%), 22.6% were Caucasian, and 2.6% represented other ethnic backgrounds. This sample did not differ significantly from the larger baseline sample in age or gender but did differ in ethnicity such that there was a greater representation of African American children within the short-term follow-up sample, χ 2 (2, N = 204) = 23.99, p <.001. This subsample did not differ significantly from the larger group on average screener score or any baseline measure except receptive vocabulary (i.e., full sample: M = 81.11, SD = 17.44; short-term follow-up sample: M = 79.48, SD = 17.29), F(1, 203) = 5.76, p <.05. A subset of the original sample was available for longterm follow-up assessment 16 to 37 months later. The wide range of follow-up intervals resulted from these children being part of three separate longitudinal study cohorts that had different follow-up timelines and between one and three waves of follow-up. Moreover, these varying intervals and the smaller sample size are indicative of the transitory living circumstances of many of the children from lower-ses families who participated. It was often very difficult to acquire and maintain contact with their families to arrange for follow-up assessment once they left their preschool centers. Some children not included at a first follow-up wave because they were not locatable may have been found later and included for a second follow-up wave. Data for this project are not demarcated by wave, although length of interval is noted. A small subset of children participated in assessments during multiple waves of follow-up; to provide larger sample sizes on some of the most relevant measures, we included only the first wave of follow-up data for these participants. Overall, this long-term follow-up sample included 114 children (56% of the original sample). At the time of screening, these children were 39 to 63 months old (M = 53.86, SD = 5.55) and included 68 boys (59.6%). At the time of follow-up, the children ranged in age from 58 to 98 months (M = 80.04, SD = 8.66), with intervals between assessments ranging in length between 16 and 37 months (M = 26.18, SD = 6.02). Comparable to the full baseline sample, this subset included 64.9% African American children, 30.7% Caucasian children, and 4.4% children of other ethnicities. This sample did not differ significantly from the larger baseline sample in age, gender, or ethnicity. When compared on baseline data, the long-term follow-up sample was equivalent to the full sample on all measures (i.e., all ps >.05). Moreover, the sample retained the original proportionality of including approximately two children from a lower-income background for every child from a middle-income background. Measures Baseline and Younger Child Follow-Up Measures Get Ready to Read! screener. The GRTR, as was described above, is a 20-item multiple-choice measure
5 Phillips et al. / GRTR Screener Follow-Up 137 that includes items measuring letter-name and lettersound knowledge (5 items), PA (7 items), print concepts (5 items), and emergent writing knowledge (3 items). Each item, including a sample item, includes an orally presented question (e.g., point to the letter that makes the /b/ sound) and a picture page with four choices, including the target response and three foils (i.e., B, L, K, S). Children were instructed to respond by pointing to their answer choice. Development study findings indicated that the measure represents a single factor; therefore, only total scores were computed for analyses. At baseline, participating children were administered the GRTR in one-to-one sessions in their preschool centers. Administration time was approximately 5 minutes per child. Cronbach s alpha for the full initial sample (N = 204) was.79, and it was.78 for the subgroup that participated in the short-term follow-up and.71 for the subsample that participated in the long-term follow-ups. Oral language measures. The Peabody Picture Vocabulary Test Revised (PPVT-R; Dunn & Dunn, 1981) and the Expressive One-Word Picture Vocabulary Test Revised (EOWPVT-R; Gardner, 1990) were administered at baseline and at long-term follow-up. Manuals for both measures report high reliability and validity for this age group. Decoding measures. All children completed the Word Identification (WID) and Word Attack (WA) subtests of the Woodcock Reading Mastery Test Revised (WRMT-R; Woodcock, 1987) at baseline and one or more follow-ups. For WID, children were asked to read aloud a series of ageappropriate real words; likewise, for WA, children were asked to read aloud a series of pronounceable nonwords. All children also completed a task in which they had to read 15 high-frequency words (Frequent Words) printed on large index cards. All items were scored 1 (correct) or 0 (incorrect); no credit was given if the child only named the letters or spelled the word. This measure was administered at all assessment points. Raw scores were used for these decoding measures at all time points because Frequent Words is not a standardized measure and because we have administered the WRMT measures to children younger than the standardization sample as a means of having continuity of measurement for decoding skill. Print knowledge. Children were shown individual cards depicting 25 uppercase letters (due to a clerical error, the letter W was not included in the assessment) and asked to say the letter name. Letters were shown to the children in a standardized, nonalphabetical order. Administration stopped if the child failed to correctly name 5 consecutive letters. Immediately following this task, children were shown 8 of the letters (i.e., M, B, D, A, C, O, P, S) and asked to say what sound the letter made. Children were shown these letters regardless of whether they had been able to state the letter s name. If the child responded with the letter name, they were told, That s right, but what sound does it make? and were not given credit for naming the letter or saying a word that began with the letter. All letter-name and lettersound items were scored 1 (correct) or 0 (incorrect). At baseline and long-term follow-up, children also completed an adapted version of Clay s (1979) Concepts About Print Test on which they were asked to identify common features of a book (e.g., front cover, print is read left to right) and name several punctuation marks. Only children younger than 84 months at the time of follow-up assessment completed these measures, as most children at or near that age score at or close to ceiling on these measures. Phonological awareness tasks. All children younger than 84 months at the time of their baseline assessment or any follow-up completed a group of eight PA tasks. These tasks represented an early research version of the PA subtest on the TOPEL (Lonigan et al., 2007). Children completed three blending tasks that assessed the child s PA by asking him or her to blend words, syllables, and phonemes to create real words (e.g., What word do you get when you say cow boy together? What word do you get when you say /m/ /oo/ /n/ together? ). Children also completed three elision tasks in which they had to remove phonemes, syllables, or half of a compound word to state what word remained (e.g., Say sunshine. Now say sunshine without saying sun. Say candy. Now say candy without saying dee. ). Children completed two tasks involving rhyming words, one in which they had to identify which of two choices rhymed with a presented word, and one in which they had to identify which of three presented words did not rhyme with a presented word. Four of these eight tasks involved the use of pictures for some or all words to reduce the memory load of the tasks. Within each skill area (i.e., rhyming, blending, elision), the relevant tasks were summed to create composite scores. Internal consistency coefficients for these measures in prior research ranged from adequate to excellent (e.g., average alpha coefficients for each composite were.59 for rhyming,.82 for blending, and.75 for elision; Lonigan, Anthony, et al., in press). Prior investigations using these and similar measures indicate significant correlations with measures of RN, phonological memory (PM), and letter knowledge, as well as with decoding tasks (e.g., Lonigan, Anthony, et al., in press; Lonigan, Burgess, & Anthony, 2000). Rapid naming tasks. At baseline and short-term follow-up assessments only, children were administered three RN task versions in which they had to orally label four familiar objects arrayed in random order in six rows of four on a single page. Across the three versions, the four pictures rhymed (i.e., cat, bat, hat, rat), did not
6 138 Journal of Learning Disabilities rhyme (dog, ball, man, tree), or were two sizes of circles or squares that the children had to identify as big or little. Children practiced naming each item before timing began, and they completed two trials for each array. Scoring was the seconds required to name all objects, and the score used here represents the average naming time across the three arrays. In a recent large study of more than to 5-year-old children, these RN tasks had strong test-retest reliabilities ranging from.82 to.85 (Lonigan, Anthony, et al., in press). Whereas the measures of PA and lexical access (i.e., RN) abilities used in this study have face validity for the construct they are intended to assess, use of these or similar measures with other preschool samples provides evidence for their predictive convergent and discriminant validity. In a sample of 100 preschool and kindergarten children (M age = 68.0 months, SD = 11.12) who completed the measures of PA and lexical access abilities used in this study and the Comprehensive Test of Phonological Processing (CTOPP; Wagner, Torgesen, & Rashotte, 1999) 18 to 24 months later, partial correlations (controlling for age at preschool testing) between composite PA (r =.31) and lexical access (r =.31) measures with CTOPP PA and lexical access subtest standard scores, respectively, were statistically significant for within-construct longitudinal correlations and not significant for between-construct longitudinal correlations (average r =.08), and the longitudinal correlations within construct were significantly higher than the longitudinal correlations across constructs. Older Child Follow-Up Measures Children who were at least 60 months of age or older (to minimize floor effects with younger children) at the time of their long-term follow-up were administered some or all of the measures described below instead of or in addition to the previously described blending, elision, and RN measures. Decision rules based on ages (i.e., 60 months or older, younger than 84 months) were used within these projects to maximize the likelihood that children would provide a score for each construct of interest (e.g., PA) that was not at floor or ceiling levels. Thus, children between these two decision marks may have received both the measures designed for younger children (e.g., the eight PA tests) and the measures designed for older children (e.g., the CTOPP subtests). Including children who received either one or both of the measurement sets for each construct in the analyses maximized the sample sizes available at follow-up. Comprehensive Test of Phonological Processing. The CTOPP (Wagner et al., 1999) assesses skills in all three domains of phonological processing PA, RN, and PM and composite standard scores are available for each domain. Between six and eight subtests of the CTOPP were administered to older children at their long-term follow-up, depending on their age and the project from which they were drawn. Depending on the subtests administered, the test yields up to three composite standard scores for PA, RN, and PM. Internal consistency reliability for the three composites ranges from.83 to.96 for children age 5 years and older, and 2-week testretest coefficients range from.79 to.86. Validity for the CTOPP is supported by significant concurrent and predictive relations with the WRMT-R, the Wide Range Achievement Test Third Edition, and the Gray Oral Reading Test Third Edition. Test of Word Reading Efficiency (TOWRE). The TOWRE (Torgesen, Wagner, & Rashotte, 1999) includes two subtests, each of which require a child to accurately read aloud as many words (Sight Word Efficiency) or pronounceable nonwords (Phonetic Decoding Efficiency) as he or she can within 45 seconds. The TOWRE is normed for children starting at age 6 years and yields several types of standard scores, normal curve equivalents, and percentile ranks, and an overall standard score combines the results of both subtests. The concurrent correlation of Sight Word Efficiency with the WRMT-R WID was.89, whereas the correlation between Phonetic Decoding Efficiency and the WRMT-R WA was.85 (Torgesen, Wagner, Herron, & Rashotte, 1998). For this study, only the combined total word reading standard score (TOWRE-SS) was used. The TOWRE was administered only at long-term follow-up to children older than 60 or 84 months, depending on the particular project s protocol (see Table 1). Passage Comprehension (PC). A subset of children completed PC from the WMRT-R at long-term followup. This measure has adequate reliability and validity within this age group (Pae et al., 2005; Woodcock, 1987). Because some of the children who received this measure were younger than the standardization sample, normative standard scores were not used. Gray Oral Reading Test 4th Edition (GORT-4). A subset of the participants received the GORT-4 at long-term follow-up assessment. The GORT-4 (Wiederholt & Bryant, 2001) yields scores of reading rate, accuracy, fluency, and comprehension appropriate for children age 6 years to 18 years. An overall reading ability standardized Oral Reading Quotient (ORQ) combines all subcomponent standard scores. All coefficient alphas for the subscores and ORQ across all ages and student subgroups exceed.90, and the average alternate form and test-retest reliabilities for the four subscores all exceed.85 (Wiederholt & Bryant, 2001). Substantial concurrent and predictive relations with other established measures of reading and related abilities