1 This article was downloaded by: [Wayne State University] On: 30 July 2014, At: 10:36 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: Registered office: Mortimer House, Mortimer Street, London W1T 3JH, UK Applied Neuropsychology: Child Publication details, including instructions for authors and subscription information: Attention Problems and Stability of WISC-IV Scores Among Clinically Referred Children Marla Green Bartoi a, Jaclyn Beth Issner b, Lesley Hetterscheidt c, Alicia M. January d, Jeffrey Garth Kuentzel a & Douglas Barnett a a Psychology, Wayne State University, Detroit, Michigan b Child/Adolescent Psychiatry, University of Michigan, Ann Arbor, Michigan c Pine Rest Christian Mental Health Services, Grand Rapids, Michigan d Psychology, Marquette/Shriners Hospitals for Children, Chicago, Illinois Published online: 29 Jul To cite this article: Marla Green Bartoi, Jaclyn Beth Issner, Lesley Hetterscheidt, Alicia M. January, Jeffrey Garth Kuentzel & Douglas Barnett (2014): Attention Problems and Stability of WISC-IV Scores Among Clinically Referred Children, Applied Neuropsychology: Child, DOI: / To link to this article: PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the Content ) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at
2 APPLIED NEUROPSYCHOLOGY: CHILD, 0: 1 8, 2014 Copyright # Taylor & Francis Group, LLC ISSN: print= online DOI: / Attention Problems and Stability of WISC-IV Scores Among Clinically Referred Children Marla Green Bartoi Psychology, Wayne State University, Detroit, Michigan Jaclyn Beth Issner Child=Adolescent Psychiatry, University of Michigan, Ann Arbor, Michigan Downloaded by [Wayne State University] at 10:36 30 July 2014 Lesley Hetterscheidt Pine Rest Christian Mental Health Services, Grand Rapids, Michigan Alicia M. January Psychology, Marquette=Shriners Hospitals for Children, Chicago, Illinois Jeffrey Garth Kuentzel and Douglas Barnett Psychology, Wayne State University, Detroit, Michigan We examined the stability of Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV) scores among 51 diverse, clinically referred 8- to 16-year-olds (M age ¼ years, SD ¼ 2.36). Children were referred to and tested at an urban, university-based training clinic; 70% of eligible children completed follow-up testing 12 months to 40 months later (M ¼ 22.05, SD ¼ 5.94). Stability for index scores ranged from.58 (Processing Speed) to.81 (Verbal Comprehension), with a stability of.86 for Full-Scale IQ. Subtest score stability ranged from.35 (Letter-Number Sequencing) to.81 (Vocabulary). Indexes believed to be more susceptible to concentration (Processing Speed and Working Memory) had lower stability. We also examined attention problems as a potential moderating factor of WISC-IV index and subtest score stability. Children with attention problems had significantly lower stability for Digit Span and Matrix Reasoning subtests compared with children without attention problems. These results provide support for the temporal stability of the WISC-IV and also provide some support for the idea that attention problems contribute to children producing less stable IQ estimates when completing the WISC-IV. We hope our report encourages further examination of this hypothesis and its implications. Key words: attention problems, stability, WISC-IV The Wechsler Intelligence Scale for Children and its revisions (WISC, Revised [WISC-R], Third Edition [WISC-III], Fourth Edition [WISC-IV]; Wechsler, 1949, Address correspondence to Marla Green Bartoi, Ph.D., Psychology, Wayne State University, 60 Farnsworth, Psychology Clinic, Detroit, MI , 1991, 2003a, 2003b) have been the most frequently used standardized, nationally normed, psychometric measure of a child s intellectual functioning as compared with same-age peers (Reschley, 1997). Intelligence is generally considered a stable ability measurable by at least the preschool years (Kaufman, 2009; Sattler, 2001). Through three revisions, the WISC Full-Scale
3 2 BARTOI ET AL. IQ (FSIQ) score has been found to be strongly (.79.95) stable across intervals of a few months to a few years (Canivez & Watkins, 1998; Gehman & Matyas, 1956; Schwean & Saklofske, 1998; Wechsler, 1974, 1991). The WISC-IV (Wechsler, 2003a, 2003b) includes numerous changes that might have an impact on score stability. These revisions include eliminating or substituting subtests (e.g., Picture Arrangement, Arithmetic), adding new subtests (e.g., Matrix Reasoning), reducing timed tasks, and using a new factor structure yielding four index scores: Verbal Comprehension Index (VCI), Perceptual Reasoning Index (PRI), Working Memory Index (WMI), and Processing Speed Index (PSI). Stability of the WISC-IV was examined across 12 to 63 days among 243 children from the standardization sample with the following results: IQ ¼.89, VCI ¼.85, PRI ¼.85, WMI ¼.85, and PSI ¼.79 (Wechsler, 2003b). Because of the widespread clinical use of the WISC-IV, its numerous revisions, and the unique social and motivational context of a potentially high-stakes clinical assessment, we think it is necessary to examine this new edition beyond its standardization sample, especially its practical clinical use. Ryan, Glass, and Bartels (2010) examined the test retest stability of 43 children attending private school with an 11-month interval and found reliabilities of.88,.76,.68,.75, and.54 for IQ, VCI, PRI, WMI, and PSI, respectively. In their study of WISC-IV process score stability, Ryan, Umfleet, and Kane (2013) suggested that future stability studies include a more diverse sample and different test retest intervals. It was also suggested that these studies include a sample of children with attention-deficit disorder. Consequently, we designed a study utilizing a diverse sample taking the test following a clinical referral. In addition, we also were interested in whether and how attention problems and impulsivity, common concerns among clinically referred children, might negatively affect test performance and lead to inconsistent retest performance. Findings on the effects of attention problems and attention-deficit hyperactivity disorder (ADHD) on IQ scores based on prior editions of the WISC have been mixed: In one study, attention problems were estimated to have a 2- to 5-point effect on IQ and were particularly likely to appear on subtests such as Digit Span, Coding, Information, and Arithmetic the latter two being optional subtests on the WISC-IV (Jepsen, Fagerlund, & Mortensen, 2009). Another study showed that nonverbal tests have lower stability estimates than verbal measures among children with attention problems (Nyden, Billstedt, Hjelmquist, & Gillberg, 2001). Perhaps nonverbal tasks are more susceptible to lapses in attention because they are less interactive and more time-dependent than verbal subtests. A few studies have examined the stability of IQ scores among children with ADHD in previous versions of the Wechsler intelligence tests. Nyden et al. (2001) found stable FSIQ, Verbal IQ (VIQ), and Performance IQ (PIQ) scores 1 to 2 years after initial assessments for boys with ADHD tested with the WISC-R (n ¼ 1) or WISC-III (n ¼ 13) with a significant increase in nonverbal subtests over time. Among clinically referred children with ADHD, Schwean and Saklofske (1998) also reported finding significant and high levels of WISC-III stability 30 months later in an unpublished study. Stability was.84,.86,.74,.87,.74,.74, and.58 for IQ, VIQ, PIQ, VCI, Perceptual Organization Index (POI), Freedom from Distractibility Index, and PSI respectively. Notably, the PSI, recognized as a measure of attention, had the lowest stability among the index scores. Although we have not found any publications that examine the stability of the WISC-IV among children with and without attention problems, a few studies have examined whether and how attention problems and ADHD may influence its scores. The WISC-IV manual (Wechsler, 2003b) reported that children with ADHD scored significantly lower on the PSI, WMI, and FSIQ compared with children without ADHD. Mayes and Calhoun (2007) found that children with ADHD scored significantly lower on the WMI and PSI on both the WISC-III and the WISC-IV, relative to their own performance on the VCI, POI (WISC-III), and PRI (WISC-IV). Schwean and Saklofske (2005) reported that while children with ADHD scored lower on the VIQ than the PIQ as well as on the VCI and Perceptual Organization Index (POI) factors on the WISC-III (Schwean, Saklofske, Yackulic, & Quinn, 1993), this was less pronounced on the WISC-IV. Perhaps, the WISC-IV is less susceptible to attention problems than its predecessor (Schwean & Saklofske, 2005). One reason for this difference may be the reduced dependence on time constraints on the WISC-IV. Time-dependent tests may be more susceptible to momentary lapses in attention than untimed tests. We expected lapses in attention among children with higher attention problems would lead to inconsistent responding and thereby lower their mean scores and stability estimates as compared with those of children with fewer attention problems. We expected that these attention effects would be most apparent on timed tasks (e.g., Block Design, Coding) during which even momentary lapses in attention and impulse control might result in delays on timed tasks and thereby have an impact on scores. Moreover, we thought impulsive responding especially would have a negative impact on subtests that relied upon multiple choice (e.g., Matrix Reasoning, Picture Concepts). Lastly, because working memory is often poorer among children with attention problems (Martinussen, Hayden, Hogg-Johnson, & Tannock, 2005), we also thought WMI scores would be lower among children with attention problems.
4 METHOD Participants Participants were 51 children aged 8 to 16 years old (M age ¼ years, SD ¼ 2.36) who had been referred for and completed a psychoeducational evaluation, minimally 1 year prior. The majority of the children came from lower-income households and were ethnically diverse with 43% Caucasian, 55% African American, and 2% Hispanic. Of the 51 participants, 35 were boys (69%). At the initial assessment, all children were referred by their parents because of concerns regarding academic or behavioral functioning, or a combination of the two. As a result of the assessment, participants were judged by the evaluator and Ph.D.-level supervisor as to whether the child met criteria for any Diagnostic and Statistical Manual for Mental Disorders-Fourth Edition diagnoses. Furthermore, the parents were provided these diagnoses during a feedback session and in writing as part of a psychoeducational evaluation report provided at the end of the Time 1 (T1) assessment. See Table 1 for percentages of participants who met criteria for the diagnoses in the full sample as well as in the high- and low-attention groups. Only those classified in the high-attention group had diagnoses of ADHD (n ¼ 10), oppositional defiance disorder (n ¼ 2), conduct disorder (n ¼ 2), and enuresis TABLE 1 Diagnoses in the Full Sample and in the Low- and High-Attention Problem Groups Diagnosis Total Sample (n ¼ 51) Percentage % Low-Attention Problems (n ¼ 21) Percentage % High-Attention Problems (n ¼ 29) Percentage % Attention-Deficit Hyperactivity Disorder Reading Learning Disability Writing Learning Disability Math Learning Disability Nonverbal Learning Disability Phonological Disorder Mild Mental Retardation Mood Disorder Anxiety Disorder Oppositional Defiant Disorder Conduct Disorder Enuresis No Diagnosis ATTENTION PROBLEMS AND STABILITY OF WISC-IV 3 (n ¼ 1); the high-attention group did not have diagnoses of a nonverbal learning disability, phonological disorder, or anxiety. Also, 12 participants in the high-attention group did not have any diagnoses, whereas only 5 participants in the low-attention group had no diagnosis. Because the reliability of the diagnosticians was not established systematically and because the diagnoses were not independent of the assessment data gathered, no analyses were conducted by clinical diagnosis. The average time between testing at T1 and Time 2 (T2) was months (SD ¼ 5.94, range ¼ months). Procedures Families were identified from a clinic database containing assessment information on all children seen at a Midwestern, urban, university training clinic between 2003 and One year after their initial assessment, parents and children were invited back for an assessment follow-up research study when they met the following criteria: (a) The parent sought the assessment concerned about problems their child was having at school; (b) the child completed the WISC-IV; (c) the child was aged 8 to 16 years old; and (d) the child had an IQ greater than 60. The criteria were utilized so that we could focus our resources on a more homogeneous sample. There were 72 children who met our criteria. Unless we heard otherwise from a parent, we made repeated attempts to contact each by phone and mail. Ultimately, 51 (70%) agreed and completed the assessment. Informed consent was obtained verbally over the phone during recruitment and again in writing along with child assent at the onset of the research visit. Parents were promised and paid a $50 honorarium for taking the time and resources to travel to and complete our follow-up study. Measures Attention problems. Children were judged to have attention problems based on parent and teacher ratings gathered within a few weeks of the initial child assessment. Specifically, the child s parent completed the Child Behavior Checklist (CBCL; Achenbach, 1991a) and their teacher completed the Teacher Report Form (TRF; Achenbach, 1991b) at T1. Both forms list more than 112 child emotion and behavior problems that the informant rated on a scale from 0 (not true), 1 (sometimes true), or 2 (very true) for the child in question. The CBCL and TRF are nationally normed, widely used, and well-validated measures of child psychopathology. Both include a 10-item subscale for attention problems (e.g., can t concentrate, impulsive ). Their validity is supported by significant correlations with other established measures of corresponding child
5 4 BARTOI ET AL. behavioral functioning and also their ability to discriminate statistically between clinically referred and nonreferred, normal children (Achenbach, 1991a, 1991b). A T score of 65 is considered by the test developers to be the cutoff for borderline clinical significance. Among the children in our sample, the average attention scores were (SD ¼ 9.96, n ¼ 48) and (SD ¼ 7.46, n ¼ 41) for parent and teacher reports, respectively. For the purposes of our analyses, we coded children as having an attention problem if the child received a score of greater than 65 on the Attention Problems Scale as rated by parent or teacher; all others were coded as not having attention problems. We chose this definition of attention problems for a variety of reasons: (a) The ratings were based on adults who know the child well; (b) they were independent of the child s behavior when completing the WISC; (c) every child had at least one rating from parent or teacher (n ¼ 39 had both); and parent and teacher ratings on attention problems were moderately associated (r ¼.41, n ¼ 39, p <.01). In contrast, utilizing an official diagnosis of ADHD would be based on examiner judgments and the pattern of WISC-IV performance itself, and consequently, it would not be independent of the WISC-IV. In addition, a diagnosis, in contrast to obtaining standardized ratings, would be harder to reproduce reliably across studies. However, it is noteworthy that even though an ADHD diagnosis was not used in determining attention group categorization, there were no participants with an ADHD diagnosis in the low-attention problems group, and there were 10 (34.5%) participants with an ADHD diagnosis in the high-attention problems group (see Table 1 for more details on diagnoses). The diagnoses were not independent of the assessment data gathered; rather, they were based upon the data. Intellectual functioning. Initially, children completed the WISC-IV (Wechsler, 2003a) as part of a psychoeducational assessment conducted because of school problems and secondly as part of a follow-up research study on child assessment. The WISC-IV is a standardized, nationally normed, individually administered IQ test that is widely used in assessments to assist in treatment planning and academic placement of children including eligibility for special education programs (Kaufman, 2009; Sattler, 2008). The WISC-IV is considered to have excellent reliability and validity. Only the 10 required subtests were administered. Examiners were advanced graduate students in clinical psychology who had been extensively trained in administering and scoring the WISC-IV. Different examiners were employed at the two assessments. Scoring was reviewed in a standardized independent double-checking procedure (Kuentzel, Hetterscheidt, & Barnett, 2011). RESULTS The double-checking procedure revealed that 1 child s initial Comprehension subtest and another child s follow-up Symbol Search subtest had administration errors, and they were excluded from analyses. Consequently, analyses for these two scores were based on 50 rather than 51 participants. For these cases, subtest scores were prorated to complete the composite indexes for these 2 participants. Also, 1 child did not have any parent- or teacher-reported attention ratings at T1, so that child s scores were excluded from the analyses that compared attention groups. There were no significant outliers, skew, or kurtosis on any of the focal variables. Test retest means, standard deviations, and Pearson product moment correlation coefficients between test and retest scores were calculated for the WISC-IV subtests, indexes, and FSIQ and are reported in Table 2. Correlation coefficients for the composite scores ranged from.58 (PSI) to.86 (FSIQ), and subtests reliabilities ranged from.35 (Letter-Number Sequencing) to.81 (Vocabulary). Individual variation in scores across the retest interval is presented in cumulative frequency distributions in Table 3. More than 80% of the children scored within 10 points (5) on FSIQ and the VCI. However, this number falls to <65% for PRI and <60% for PSI and WMI. Overall, the changes in scores over time were normally distributed for FSIQ and the indexes such that participants were equally likely to improve at T2 testing as they were to deteriorate, even for those who showed large retest changes TABLE 2 Comparison of T1 and T2 WISC-IV Composite Scores and Subtests Variable Second First Testing Testing Stability M SD M SD Pearson s r Full-Scale IQ Verbal Comprehension Perceptual Reasoning Working Memory Processing Speed Block Design Similarities Digit Span Picture Concepts Coding Vocabulary Letter-Number Sequencing Matrix Reasoning Comprehension Symbol Search Note. n ¼ 51 except for Comprehension T1 and Symbol Search T2 where n ¼ 50. There were no significant changes in scores over time. All correlations were significant at p <.01.
6 TABLE 3 Test Retest Changes: Cumulative Frequency Distributions (in Percentages) of Wechsler Intelligence Scale for Children-Fourth Edition FSIQ and Index Scores Difference Between T1 and T2 Scores Full- Scale IQ Verbal Comprehension Perceptual Reasoning Processing Speed Working Memory (e.g., >15-point difference). However, for the PSI, there were 2 participants who showed especially large changes; they deteriorated by 35 and 36 points, respectively. In contrast, for the WMI, there was 1 participant who showed a 30-point improvement. Time between testing was not related to variation in any subtests or indexes across the retest interval. To examine our hypotheses regarding the role of attention problems on the stability of WISC-IV scores, we calculated stability coefficients for the overall group, as well as for the high-attention problem and low-attention problem groups. Stability coefficients were calculated by correlating the scores from T1 to T2 using Pearson correlation statistics. Fisher s (1924) r-to-z transformation was ATTENTION PROBLEMS AND STABILITY OF WISC-IV 5 TABLE 4 Comparison of Test Retest Stability Scores for Children With Low and High Levels of Attention Problems Variable Low-Attention Problems (n ¼ 21) (Pearson s r) High-Attention Problems (n ¼ 29) (Pearson s r) Z Score Full-Scale IQ Verbal Comprehension Perceptual Reasoning Working Memory Processing Speed Block Design Similarities Digit Span Picture Concepts Coding Vocabulary Letter-Number Sequencing Matrix Reasoning Comprehension Symbol Search p <.05. p <.01. utilized to convert the Pearson s r coefficients to the normally distributed variable z to compare stability of IQ scores for the low-attention problems group to those of the high-attention problems group. Time between testing was not significantly different, t(39) ¼ 1.22, p >.05, when TABLE 5 Comparison of Low- and High-Attention Problems WISC-IV Composite Scores and Subtests Variable Low-Attention Problems N ¼ 21 First Testing M (SD) Second Testing M (SD) High-Attention Problems N ¼ 29 First Testing M (SD) Second Testing M (SD) Full-Scale IQ (18.96) (20.88) (12.31) (12.19) Verbal (15.09) (18.03) (13.63) (15.82) Comprehension Perceptual (21.05) (22.84) (13.29) (12.63) Reasoning Working (17.14) (19.24) (8.48) (12.19) Memory Processing (15.23) (16.74) (15.95) (11.28) Speed Block Design 8.38 (3.22) 8.38 (3.92) 8.90 (3.74) 8.66 (2.74) Similarities 9.62 (2.89) 9.05 (4.18) 9.69 (3.37) 9.69 (3.39) Digit Span 8.33 (2.96) 8.48 (3.17) 8.14 (1.60) 8.21 (2.02) Picture Concepts 9.86 (3.53) (3.95) 9.62 (2.69) 9.45 (3.05) Coding 7.38 (2.78) 7.43 (2.66) 7.69 (3.39) 7.34 (2.26) Vocabulary 9.52 (2.54) 9.24 (3.02) 9.79 (2.40) 9.72 (3.17) Letter-Number 8.29 (4.41) 8.24 (4.17) 8.90 (2.44) 8.76 (2.77) Sequencing Matrix Reasoning 9.43 (4.32) 8.14 (4.18) 9.00 (2.41) 9.10 (2.51) Comprehension 9.40 (2.89) (2.84) 9.55 (2.20) 9.72 (2.49) Symbol Search 8.10 (3.43) 8.52 (3.91) 9.28 (2.83) 8.96 (2.19)
7 6 BARTOI ET AL. comparing high-attention group members (M ¼ months, SD ¼ 4.55) to low-attention group members (M ¼ months, SD ¼ 6.08). Furthermore, time between testing was normally distributed in both groups. As indicated in Table 4, the high-attention problems group showed significantly lower stability than the low-attention problems group in performance on the Digit Span (r ¼.40 vs..73, p <.05) and Matrix Reasoning (r ¼.50 vs..88, p <.01) subtests for the high- and low-attention problem groups, respectively. T tests showed no significant differences at T1 or T2 in FSIQ, indexes, or subtests when comparing high- and low-attention problem groups. Hence, children with higher attention problems did not show significantly different scores when compared with children with lower attention problems. See Table 5 for the means and standard deviations for WISC-IV scores at T1 and T2 for the low- and high-attention problem groups. DISCUSSION Our study provides new data supporting the stability of the WISC-IV as well as extending psychometric support for WISC-IV scores to African American and Caucasian 8- to 16-year-olds referred because of scholastic problems. Our findings were in line with existing estimates including those of the test manual (Ryan et al., 2010; Wechsler, 2003b). We think it is important to make data available on youth with clinical concerns because they are a group who frequently seek out evaluations by clinical and school psychologists and differ from the standardization sample in their motivations for testing. Moreover, our sample had a roughly equal number of African American and Caucasian children and supported the stability of the WISC-IV for both ethnic groups as well as for children from low-income families. We did limit our sample to children with initial IQs greater than 60 on the WISC-IV. Consequently, our findings cannot generalize to those who were more than 2 standard deviations below the mean. On the other hand, our stability estimates appear rather robust, given we restricted the variance, which would presumably attenuate stability estimates. Although our general overall findings appear to support the stability of WISC-IV factor scores, we also found task and child attention problems appear to interact and perhaps influence the stability of factor and subtest scores. We say interaction, because not all areas and subtests of the WISC-IV demonstrated equivalent stability. For instance, we replicated the frequently found pattern whereby FSIQ was most stable along with VCI and PRI close behind. The WMI and PSI stability coefficients drop off quite a bit (see Table 2) a finding consistent with existing studies. The more modest estimates for WMI and PSI are hypothesized to be a result of both being made up of fewer subtests than the FSIQ, VCI, and PRI, as well as a result of WMI and PSI involving cognitive processes that may be less consistent or reliable. Performance on the subtests comprising the PSI and WMI is considered to be more dependent on attention and concentration, which frequently vary as a function of the examinee s mental state (e.g., mood, motivation level) and=or circumstantial factors such as fatigue (Duckworth, Quinn, Lynam, Loeber, & Stouthamer-Loeber, 2011; McGee, Clark, & Symons, 2000). Given that indexes more dependent on attention had lower long-term stability, we hypothesized that children with higher levels of attention problems, as measured on the CBCL and TRF scales (Achenbach, 1991a, 1991b), would have comparatively less stable performances on the WISC-IV. Significant differences in stability, however, were found for just two subtests (i.e., Matrix Reasoning and Digit Span). Interestingly, the stability of the WMI, arguably the index most affected by attention (Martinussen et al., 2005), did not differ significantly between children with low and high levels of attention problems. In addition, although Letter- Number Sequencing had the lowest stability for the overall sample, no difference in stability between children with low and high levels of attention problems was found. Perhaps attention lapses affected both children with and without clinically significant attention problems enough to lower stability for both groups. The significant difference in stability for Digit Span between the high- and low-attention problems groups is not surprising, given its attention demands. What is more surprising, however, is the significant difference in stability for Matrix Reasoning. When compared with Block Design, a subtest that requires handling and manipulating objects, Matrix Reasoning may be less engaging for children with attention problems. Performance on Matrix Reasoning also may be affected by impulsivity, such that children with ADHD features may be too impatient to carefully consider all response options before making their choice clinicians should observe closely for this possibility. This could be investigated further by (after the entire test has been completed) asking examinees who appeared impulsive about strategies that they used for Matrix Reasoning (Sattler, 2008), as impulsive children may admit that they guessed. A testing-of-limits approach might include (after the entire WISC-IV was administered under standard conditions) readmininstering failed Matrix Reasoning items with coaching or scaffolding to carefully consider all five response choices before selecting one. Children could also be reminded to double check their answers. Why the difference in stability between attention groups was not also found
8 for Picture Concepts is unclear, but it may be a result of the more meaningful, familiar, and engaging stimuli in that subtest as compared with the abstract patterns in Matrix Reasoning. Another factor contributing to the equivalence in stability between attention groups for Block Design may be the order in which the subtests are administered. Perhaps children with attention problems are less affected during the subtests that appear earlier in the testing session. Because children with ADHD in Wechsler s (2003b) standardization sample had lower scores on FSIQ, PSI, and WMI on the WISC-IV, we expected that the children in our sample with higher attention problems would exhibit a similar pattern, but their scores were not lower than those of their counterparts with lower levels of attention problems. However, all of the children in our sample had been referred for academic problems, so those reported to have low levels of attention problems likely had other kinds of impairments including lower general cognitive abilities (see Table 1 for the range of diagnoses). Given that children with higher attention problems in our sample did tend to show as much long-term stability on most subtests and indexes of the WISC-IV as children with lower attention problems, perhaps the WISC-IV can successfully and reliably maintain the children s attention to the tasks. One feature of the test that may promote sustaining the child s interest is the alternating order of the verbal- and performance-based tasks. The individual administration format of the WISC-IV may also contribute to its success in reliably maintaining examinees attention span. The WISC-IV also purports to have decreased its reliance on timed tasks (Wechsler, 2003b), perhaps reducing its susceptibility to being influenced by brief lapses in attention. When comparing children with high-attention problems to children with low-attention problems, this study used a cutoff of greater than 65 (T score) on Attention Problems on either the CBCL or the TRF. Because parent and teacher reports are sometimes incongruent, we did not want to restrict our attention-problems group to only children for whom there were congruent reports. If we had restricted group membership to children with a score of greater than 65 as reported by the teacher and parent, we may have excluded children who were in fact showing attention problems in at least one setting. The CBCL and TRF cutoff was used instead of comparing children diagnosed with ADHD to children without the diagnosis because WISC-IV performance had been considered when arriving at the diagnosis, which would have created a confound. Moreover, it would be difficult to establish reliability and validity on diagnosis. In contrast, the standardized nature of the CBCL and TRF makes replication more straightforward. Notably, the validity of the CBCL and TRF Attention Problems scale ATTENTION PROBLEMS AND STABILITY OF WISC-IV 7 is limited by the parents and teachers perceptions and ability to observe and report accurately their level of familiarity with the child, as well as their threshold for perceiving=tolerating different behaviors in children. Consequently, utilizing both reports reduced the impact of particular context or rater effects. A potential weakness of the present study may be related to the use of graduate student examiners, because they have been noted in the literature to make frequent scoring errors (Ryan & Schnankenberg-Ott, 2003). However, our clinic employs a double-checking system so that all scoring is thoroughly reviewed by an advanced graduate student (Kuentzel et al., 2011), and all clinicians administering the WISC-IV at T1 and T2 were extensively trained advanced graduate students in clinical psychology. Another possible limitation may have been the potential change in motivation between testing at T1 and T2. At T1, children were referred to the clinic for testing to determine diagnostic impressions and recommendations, whereas at T2, they were recruited for the sole purpose of research. This possible change in the children s testing investment may have had an impact on the stability of scores. Although a strength of the study was our use of a relatively diverse ethnic referred sample, a definite limitation was the sample size, which may have limited our ability to detect differences in the stability of WISC-IV scores between our low- and high-attention problems groups. For example, we had marginally significant differences in stability for FSIQ, PSI, PRI, and Symbol Search, which may have been statistically significant with a larger sample. The relatively small sample size and our decision to leave out children who scored less than 60 may have affected the overall stability that we obtained, although their magnitudes were comparable with what has been reported in the literature. An additional strength is the length of the retest interval, which constitutes a more rigorous test of temporal stability and may be more clinically meaningful than the shorter intervals (e.g., 1 month) that have appeared in the literature. Moreover, time between testing administrations varied but had no effect on variation between scores. As noted, the children in our low-attention problems group had other significant cognitive, behavioral, and academic problems. Future research should compare WISC-IV stability of children with attention problems to a matched community sample. Intelligence tests are not just of scientific interest; they are regularly used for a variety of potentially life-changing reasons, including determining children s eligibility for special education services and programs for the gifted. Our findings provide evidence for the stability of IQ scores as well as for the potential for attention problems to negatively affect IQ estimates. Individualized testing reports frequently include confidence intervals to
9 8 BARTOI ET AL. acknowledge the role of nonintellectual, situational factors that may have an impact on scores (Sattler, 2008). However, our findings suggest that confidence intervals may not be as uniform for all children and perhaps greater error in IQ estimating may be expected among children with attention problems, particularly for Digit Span and Matrix Reasoning. Despite significant stability for FSIQ (.86) during a nearly 2-year period as well as an overall average change score of near 0, we found FSIQ scores changed from a decrease of 20 points to an increase of 14 points. Moreover, nearly 15% of our sample of children demonstrated IQ change that exceeded expectations beyond the 95% confidence interval for IQ. Future research is needed to examine whether the direction and magnitude of this change is a predictable consequence of adherence to intervention recommendations made following the assessment. REFERENCES Achenbach, T. M. (1991a). Manual for the Child Behavior Checklist and 1991 Profile. Burlington: University of Vermont, Department of Psychiatry. Achenbach, T. M. (1991b). Manual for the Teacher s Report Form and 1991 Profile. Burlington: University of Vermont, Department of Psychiatry. Canivez, G. L., & Watkins, M. W. (1998). Long-term stability of the Wechsler Intelligence Scale for Children-Third Edition. Psychological Assessment, 10, Duckworth, A. L., Quinn, P. D., Lynam, D. R., Loeber, R., & Stouthamer-Loeber, M. (2011). Role of test motivation in intelligence testing. Proceedings of the National Academy of Sciences, 108, Fisher, R. A. (1924). On a distribution yielding the error functions of several well known statistics. Proceedings of the International Congress of Mathematics, 2, Gehman, I. H., & Matyas, R. P. (1956). Stability of the WISC and Binet tests. Journal of Consulting Psychology, 20, Jepsen, J. R. M., Fagerlund, B., & Mortensen, E. L. (2009). Do attention deficits influence IQ estimate in children and adolescents with ADHD? Journal of Attention Disorders, 12, Kaufman, A. (2009). IQ testing 101. New York, NY: Springer. Kuentzel, J. G., Hetterscheidt, L. A., & Barnett, D. (2011). Testing intelligently includes double-checking Wechsler IQ scores. Journal of Psychoeducational Assessment, 29, Martinussen, R., Hayden, J., Hogg-Johnson, S., & Tannock, R. (2005). A meta-analysis of working memory impairments in children with attention-deficit=hyperactivity disorder. Journal of the American Academy of Child & Adolescent Psychiatry, 44, Mayes, S. D., & Calhoun, S. L. (2007). Wechsler Intelligence Scale for Children-Third and Fourth Edition predictors of academic achievement in children with attention-deficit=hyperactivity disorder. School Psychology Quarterly, 22, McGee, R. A., Clark, S. E., & Symons, D. K. (2000). Does the Conners Continuous Performance Test aid in ADHD diagnosis? Journal of Abnormal Child Psychology, 28, Nyden, A., Billstedt, E., Hjelmquist, E., & Gillberg, C. (2001). Neurocognitive stability in Asperger syndrome, ADHD, and reading and writing disorder: A pilot study. Developmental Medicine and Child Neurology, 43, Reschly, D. J. (1997). Diagnostic and treatment utility of intelligence tests. In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp ). New York, NY: Guilford. Ryan, J. J., Glass, L. A., & Bartels, J. M. (2010). Stability of the WISC-IV in a sample of elementary and middle school children. Applied Neuropsychology, 17, Ryan, J. J., & Schnakenberg-Ott, S. D. (2003). Scoring reliability on the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III). Assessment, 10, Ryan, J. J., Umfleet, L. G., & Kane, A. (2013). Stability of WISC-IV process scores. Applied Neuropsychology: Child, 2, Sattler, J. M. (2001). Assessment of children: Cognitive applications (4th ed.). San Diego, CA: Jerome M. Sattler. Sattler, J. M. (2008). Assessment of children: Cognitive applications (5th ed.). San Diego, CA: Jerome M. Sattler. Schwean, V. L., & Saklofske, D. H. (1998). WISC III assessment of children with attention deficit=hyperactivity disorder. In A. Prifitera & D. H. Saklofske (Eds.), WISC-III clinical use and interpretation (pp ). San Diego, CA: Academic. Schwean, V. L., & Saklofske, D. H. (2005). Assessment of attention deficit hyperactivity disorder with the WISC-IV. In A. Prifitera & D. H. Saklofske (Eds.), WISC-IV clinical use and interpretation: Scientist-practitioner perspectives (pp ). Amsterdam, The Netherlands: Elsevier Academic. Schwean, V. L., Saklofske, D. H., Yackulic, R. A., & Quinn, D. (1993). WISC-III performance of ADHD children. Journal of Psychoeducational Assessment, WISC-III Monograph, Wechsler, D. (1949). Manual for the Wechsler Intelligence Scale for Children. San Antonio, TX: Psychological Corporation. Wechsler, D. (1974). Manual for the Wechsler Intelligence Scale for Children-Revised. San Antonio, TX: Psychological Corporation. Wechsler, D. (1991). WISC-III manual. San Antonio, TX: Psychological Corporation. Wechsler, D. (2003a). WISC-IV administration and scoring manual. San Antonio, TX: Psychological Corporation. Wechsler, D. (2003b). WISC-IV technical and interpretive manual. San Antonio, TX: Psychological Corporation.