Performance Validity and Symptom Validity in Neuropsychological Assessment

Similar documents

Performance and Symptom Validity

COMPENSATION AND MALINGERING IN TRAUMATIC BRAIN INJURY: A DOSE-RESPONSE RELATIONSHIP?

Comparison of Performance of the Test of Memory Malingering and Word Memory Test in a Criminal Forensic Sample

REFERENCE FILE 3: LEES-HALEY ET AL. FAKE BAD SCALE OR SYMPTOM VALIDITY SCALE REFERENCES. James N. Butcher

Effort has a greater effect on test scores than severe brain injury in compensation claimants

Article ID: WMC

NEW TRENDS AND ISSUES IN NEUROPSYCHOLOGY: Mild Traumatic Brain Injury and Postconcussive Syndrome Cases

General Disclaimer (learned from Dr. Melhorn)

Curriculum Vitae Christine B. Lloyd, Ph.D., ABN Board Certified Clinical Neuropsychologist Licensed Clinical Psychologist

Pennsylvania State University University Park, PA 16802

A comparison of complaints by mild brain injury claimants and other claimants describing subjective experiences immediately following their injury

CLINICAL DETECTION OF INTELLECTUAL DETERIORATION ASSOCIATED WITH BRAIN DAMAGE. DAN0 A. LELI University of Alabama in Birmingham SUSAN B.

Psychological and Neuropsychological Testing

Criterion validity of new WAIS III subtest scores after traumatic brain injury

V OCATIONAL E CONOMICS, I NC.

Thomas R. Wodushek, Ph.D., ABPP-CN

1695 N.W. 9th Avenue, Suite 3302H Miami, FL Days and Hours: Monday Friday 8:30a.m. 6:00p.m. (305) (JMH, Downtown)

Behavioral Health Psychological/Neuropsychological Testing Guidelines

Brain Injury Litigation. Peter W. Burg Burg Simpson Eldredge Hersh & Jardine, P.C.

TCHP Behavioral Health Psychological/Neuropsychological Testing Child/Adolescent Guidelines

Guidelines for Documentation of a A. Learning Disability

PSYCHOLOGICAL AND NEUROPSYCHOLOGICAL TESTING

Potential for Bias in MMPI-2 Assessments Using the Fake Bad Scale (FBS)

The Indiana Trial Lawyer Association s Lifetime Achievement Seminar. Honoring Peter L. Obremsky. May 23-24, 2005

A validation of multiple malingering detection methods in a large clinical sample

Traumatic Brain Injury for VR Counselors Margaret A. Struchen, Ph.D. and Laura M. Ritter, Ph.D., M.P.H.

TESTING GUIDELINES PerformCare: HealthChoices. Guidelines for Psychological Testing

A PEEK INSIDE A CLOSED HEAD INJURY CLAIM... 1

Investigations Into the Construct Validity of the Saint Louis University Mental Status Examination: Crystalized Versus Fluid Intelligence

curveballs & pitfalls

Frequently Asked Questions

Montreal Cognitive Assessment (MoCA) as Screening tool for cognitive impairment in mtbi.

SUMMARY OF THE WHO COLLABORATING CENTRE FOR NEUROTRAUMA TASK FORCE ON MILD TRAUMATIC BRAIN INJURY

CHAPTER 2: CLASSIFICATION AND ASSESSMENT IN CLINICAL PSYCHOLOGY KEY TERMS

7100 W. Camino Real, Suite 123 Boca Raton, Florida Ph: (561) Fax: (561)

SPECIFICATIONS FOR PSYCHIATRIC AND PSYCHOLOGICAL EVALUATIONS

Accommodations STUDENTS WITH DISABILTITES SERVICES

CURRICULUM VITAE. Bachelor of Science in Physical Therapy, University of New England College of Osteopathic Medicine (UNECOM)

The Role of Neuropsychological Testing in Guiding Decision- Making Related to Dementia

The Effect of Age and Education Transformations on Neuropsychological Test Scores of Persons With Diffuse or Bilateral Brain Damage 1

MENTAL IMPAIRMENT RATING

CLINICAL NEUROPSYCHOLOGY Course Syllabus, Summer 2010

Early Response Concussion Recovery

University of St. Thomas Health Services and Counseling ADD/ADHD Guidelines

To Err is Human: Abnormal Neuropsychological Scores and Variability are Common in Healthy Adults

WMS III to WMS IV: Rationale for Change

EMOTIONAL AND BEHAVIOURAL CONSEQUENCES OF HEAD INJURY

Cognitive Rehabilitation of Blast Traumatic Brain Injury

What is Neuropsychology and What s the (Power)Point?

FAA EXPERIENCE WITH NEUROPSYCHOLOGICAL TESTING FOR AIRMEN WITH DEPRESSION ON SSRI MEDICATIONS

Neuropsychological Testing

CRITICALLY APPRAISED PAPER (CAP)

C. Chyrelle Martin, Psy.D

Psychological and Neuropsychological Testing

EpicRehab, LLC To re c ogniz e a nd de v e lop the v a l u e in e a c h of us.

COURSE APPROVAL GUIDELINES APS COLLEGE OF CLINICAL NEUROPSYCHOLOGISTS

Integrated Neuropsychological Assessment

DEPRESSION CARE PROCESS STEP EXPECTATIONS RATIONALE

TYPE OF INJURY and CURRENT SABS Paraplegia/ Tetraplegia

Clinical Psychology Associates of North Central Florida

Pragmatic Evidence Based Review Substance Abuse in moderate to severe TBI

Disability Services Office Health, Counselling & Disability Services

Expert Witness Services for Personal Injury Lawyers

Special Populations in Alcoholics Anonymous. J. Scott Tonigan, Ph.D., Gerard J. Connors, Ph.D., and William R. Miller, Ph.D.

The Relationship Between Anhedonia & Low Mood

DD Procedural Codes for Administrative Examinations **To be used solely by DD staff**

Important guidelines for using and thinking about psychological assessment in family disputes:

Encyclopedia of School Psychology Neuropsychological Assessment

Overview. Neuropsychological Assessment in Stroke. Why a Neuropsychologist. How to make a referral. Referral Questions 11/6/2013

Attention & Memory Deficits in TBI Patients. An Overview

Miller was (mostly) right: Head injury severity inversely related to simulation

VITA. Bucknell University B.A Economics Lewisburg, PA M.S.Ed School Psychology

Patricia Beldotti, Psy.D. Tel: Web:

Anne Cecelia Chodzko, Ph.D., P.C.

Neuropsychological Assessment in Sports- Related Concussion: Part of a Complex Puzzle

Information for Applicants

Curriculum Vitae. Psychology Atlanta, GA Psy.D., 2011 Concentration: Clinical Neuropsychology

An Introduction to Neuropsychological Assessment. Robin Annis, PsyD, CBIS

Scores, 7: Immediate Recall, Delayed Recall, Yield 1, Yield 2, Shift, Total Suggestibility, Confabulation.

Eligibility for DD Services and the SIB-R

Brief, Evidence Based Review of Inpatient/Residential rehabilitation for adults with moderate to severe TBI

The WISC III Freedom From Distractibility Factor: Its Utility in Identifying Children With Attention Deficit Hyperactivity Disorder

Module V. Neuropsychological Assessments. "The Impact"

Regarding the District Attorney Response to Dunlap Clemency Petition

Advanced Clinical Solutions. Serial Assessment Case Studies

Neuropsychological Assessment in Sports-Related Concussion: Part of a Complex Puzzle

Traumatic Brain Injury and Incarceration. Objectives. Traumatic Brain Injury. Which came first, the injury or the behavior?

Requirements. Elective Courses (minimum 9 cr.) Psychology Major. Capstone Sequence (14 cr.) Required Courses (21 cr.)

SUBSTANCE USE DISORDER SOCIAL DETOXIFICATION SERVICES [ASAM LEVEL III.2-D]

APPIC APPLICATION Summary of Practicum Experiences

Documentation Guidelines for ADD/ADHD

CARE MANAGEMENT FOR LATE LIFE DEPRESSION IN URBAN CHINESE PRIMARY CARE CLINICS

Transcription:

Journal of the International Neuropsychological Society (2012), 18, 1 7. Copyright E INS. Published by Cambridge University Press, 2012. doi:10.1017/s1355617712000240 DIALOGUE Performance Validity and Symptom Validity in Neuropsychological Assessment Glenn J. Larrabee, Ph.D. Independent Practice, Sarasota, Florida. (RECEIVED November 14, 2011; FINAL REVISION February 7, 2012; ACCEPTED February 8, 2012) Abstract Failure to evaluate the validity of an examinee s neuropsychological test performance can alter prediction of external criteria in research investigations, and in the individual case, result in inaccurate conclusions about the degree of impairment resulting from neurological disease or injury. The terms performance validity referring to validity of test performance (PVT), and symptom validity referring to validity of symptom report (SVT), are suggested to replace less descriptive terms such as effort or response bias. Research is reviewed demonstrating strong diagnostic discrimination for PVTs and SVTs, with a particular emphasis on minimizing false positive errors, facilitated by identifying performance patterns or levels of performance that are atypical for bona fide neurologic disorder. It is further shown that false positive errors decrease, with a corresponding increase in the positive probability of malingering, when multiple independent indicators are required for diagnosis. The rigor of PVT and SVT research design is related to a high degree of reproducibility of results, and large effect sizes of d51.0 or greater, exceeding effect sizes reported for several psychological and medical diagnostic procedures. (JINS, 2012, 18, 1 7) Keywords: Malingering, False positive errors, Sensitivity, Specificity, Diagnostic probability, Research design Bigler acknowledges the importance of evaluating the validity of examinee performance, but raises concerns about the meaning of effort, the issue of what near pass performance means (i.e., scores that fall just within the range of invalid performance), the possibility that such scores may represent false positives, and the fact that there are no systematic lesion localization studies of Symptom Validity Test (SVT) performance. Bigler also discusses the possibility that illness behavior and diagnosis threat (i.e., the influence of expectations) can affect performance on SVTs. He further questions whether performance on SVTs may be related to actual cognitive abilities and to the neurobiology of drive, motivation and attention. Last, he raises concerns about the rigor of existing research underlying the development of SVTs. Bigler and I are in agreement about the need to assess the validity of an examinee s performance. Failure to do so can lead to misleading results. My colleagues and I (Rohling et al., 2011) reviewed several studies in which performance on response bias indicators (another term for SVTs) attenuated the correlation between neuropsychological test measures and Correspondence and reprint requests to: Glenn J. Larrabee, 2650 Bahia Vista Street, Suite 308, Sarasota, FL 34239. E-mail: glarrabee@aol.com an external criterion. For example, grade point average and Full Scale IQ correlated below the expected level of strength until those subjects failing an SVT were excluded (Greiffenstein & Baker, 2003); olfactory identification was only correlated with measures of brain injury severity (e.g., Glasgow Coma Scale) for those subjects passing an SVT (Green, Rohling, Iverson, & Gervais, 2003); California Verbal Learning Test scores did not discriminate traumatic brain injury patients with abnormal CT or MRI scans from those with normal scans until patients failing SVTs were excluded (Green, 2007); patients with moderate or severe traumatic brain injury (TBI), 88% of whom had abnormal CT or MRI, plus patients with known cerebral impairment (stroke, aneurysm) did not differ from those with uncomplicated mild TBI, psychiatric disorders, orthopedic injuries or chronic pain until those failing an SVT were dropped from comparison (Green, Rohling, Lees-Haley, & Allen, 2001). An association between memory complaints and performance on the California Verbal Learning Test, which goes counter to the oftreplicated finding of no association between memory or cognitive complaints and actual test performance (Brulot, Strauss, & Spellacy, 1997; Hanninen et al., 1994; Larrabee & Levin, 1986; Williams, Little, Scates, & Blockman, 1987), disappeared when those failing an SVT were excluded (Gervais, Ben-Porath, 1

2 G.J. Larrabee Wygant, & Green, 2008). Subsequent to the Rohling et al. (2011) review, Fox (2011) showed that the association between neuropsychological test performance and presence/absence of brain injury only was demonstrated in patients passing SVTs. Some of the debate regarding symptom validity testing results from use of the term effort. Effort suggests a continuum, ranging from excellent, to very poor. SVTs are constructed, however, based on patterns of performance that are atypical in either pattern or degree, in comparison to the performance of patients with bona fide neurologic disorder. Consequently, SVTs demonstrate a discontinuity in performance rather than a continuum of performance, with most neurologic patients either not showing the atypical pattern, or performing at ceiling on a particular SVT. Examples of atypical patterns of performance include poorer performance on measures of attention than on measures of memory (Mittenberg, Azrin, Millsaps, & Heilbronner, 1993), or poorer performance on gross compared to fine motor tests (Greiffenstein, Baker, & Gola, 1996). Examples of atypical degree include motor function performance at levels rarely seen in patients with severe neurologic dysfunction (Greiffenstein, 2007). Consequently, performance is so atypical for bona fide neurologic disease that persons with significant neurologic disorders rarely fail effort tests. For example, the meta-analysis of Vickery, Berry, Inman, Harris, & Orey, 2001, reported a 95.7% specificity or 4.3% false positive rate. Additionally, modern SVT research typically sets specificity at 90% or better on individual tests, yielding a false positive rate of 10% or less (Boone, 2007; Larrabee, 2007; Morgan & Sweet, 2009). Consequently, if SVTs are unlikely to be failed by persons with significant neurologic dysfunction, then performance on these tasks actually requires very minimal levels of neurocognitive capacity and consequently, very little effort. As a result, I have recently advocated for referring to SVTs as measures of performance validity to clarify the extent to which a person s test performance is or is not an accurate reflection of their actual level of ability (Bigler, Kaufmann, & Larrabee, 2010; Larrabee, 2012). This term is much more descriptive than the terms effort, symptom validity, or response bias, and in keeping with the longstanding convention of psychologists commenting on the validity of test results. Moreover, the term symptom validity is actually more descriptive of subjective complaint than it is of performance. Thus, I recommend that we use two descriptive terms in evaluating the validity of an examinee s neuropsychological examination: (1) performance validity to refer to the validity of actual ability task performance, assessed either by stand-alone tests such as Dot Counting or by atypical performance on neuropsychological tests such as Finger Tapping, and (2) symptom validity to refer to the accuracy of symptomatic complaint on self-report measures such as the MMPI-2. As previously noted, false positive rates are typically 10% or less on individual Performance Validity Tests (PVTs). For example, the manual for the Test of Memory Malingering (TOMM; Tombaugh, 1996) contains detailed information about the performance of aphasic, TBI, dementia, and neurologic patients, very few of whom (with the exception of dementia) perform below the recommended cutoff. Three patients with severe anoxic encephalopathy and radiologically confirmed hippocampal damage scored in the valid performance range on the recognition trials of the Word Memory Test (Goodrich- Hunsaker & Hopkins, 2009). Similarly, psychiatric disorders have not been found to impact PVT scores, including depression (Rees, Tombaugh, & Boulay, 2001), depression and anxiety (Ashendorf, Constantinou, & McCaffrey, 2004), and depression with chronic pain (Iverson, Le Page, Koehler, Shojania, & Badii, 2007). Illness behavior and diagnosis threat do not appear to impact PVT scores. Acute pain (cold pressor) has no impact on performance on Reliable Digit Span (Etherton, Bianchini, Ciota, & Greve, 2005) or on the TOMM (Etherton, Bianchini, Greve, & Ciota, 2005). Suhr and Gunstad (2005) did not find differences on the WMT for those mild TBI subjects in the diagnosis threat condition compared to those in the non-threat group. Arguments that neurological mechanisms related to drive and motivation underlie PVT performance are not supported in light of PVT profiles which are typically valid for patients with significant neurologic disease due to diverse causes, showing that these tasks require little in the way of effort, drive or motivation and, as mentioned, general neurocognitive capacity; that is, performance is usually near ceiling even in contexts of severe objectively verified cerebral disorders. For example, on TOMM Trial 2, 21 aphasic patients averaged 98.7% correct, and 22 TBI patients (range of 1 day to 3 months of coma) averaged 98.2% correct; indeed, one patient with gunshot wound, right frontal lobectomy, and 38 days of coma scored 100% on Trial 2 (Tombaugh, 1996). In this vein, a patient with significant abulia due to severe brain trauma, who would almost certainly require 24-hr supervision, and be minimally testable from a neuropsychological standpoint, would not warrant SVT or PVT assessment. In such a patient, there would of course be a legitimate concern about false positive errors on PVTs. As with any mental test, consideration of context is necessary and expected. One of Bigler s major concerns, the near pass (i.e., performance falls just within the invalid range on a PVT), is not restricted to PVT investigations, it is a pervasive concern in the field of assessment. One s child does or does not reach the cutoff to qualify for the gifted class or for help for a learning disability. One s neuropsychological test score does or does not reach a particular level of normality/abnormality (Heaton, Miller, Taylor, & Grant, 2004). Current PVT research focuses on avoiding the error of misidentifying as invalid the performance of a patient with a bona fide condition who is actually producing a valid performance. Moreover, there is a strong statistical argument for keeping false positive errors at a minimum: Positive Predictive Power (PPP), or the probability of a diagnosis, is more dependent upon Specificity (accurately diagnosing a person without the target disorder as not having the disorder) than Sensitivity (correctly identifying persons with the target disorder as having the disorder; see Straus, Richardson, Glasziou, & Haynes, 2005). Since the basic formula for PPP is (True Positives) C (True Positives 1 False positives), the PVT investigator attempts to keep false positives at a minimum. As noted in the previous meta-analysis (Vickery, Berry, Inman, Harris, & Orey, 2001) as well as in recent reviews

Performance validity and symptom validity 3 (Boone, 2007; Larrabee, 2007), false positives are typically 10% or less, with much lower sensitivities (56% per Vickery et al., 2001). Researchers also advocate reporting the characteristics of subjects identified as false positive cases in individual PVT investigations (Larrabee, Greiffenstein, Greve, & Bianchini, 2007; also see Victor, Boone, Serpa, Buehler, & Ziegler, 2009). This clarifies the characteristics of those patients with truly severe mental disorders who fail PVTs on the basis of actual impairment. This information provides the clinician with concrete injury/clinical fact patterns that legitimately correlate with PVT failure, thereby facilitating individual comparisons on a case by case basis (e.g., coma and structural lesions in the brain; Larrabee, 2003a; unequivocally severe and obvious neurologic symptoms, Merten, Bossink, & Schmand, 2007; or need for 24-hr supervision; Meyers & Volbrecht, 2003). Authors of PVTs have also included comparison groups with various neurologic, psychiatric and developmental conditions to further reduce the chances of false positive identification on an individual PVT (Boone, Lu, & Herzberg, 2002a, 2002b; Tombaugh, 1996). PVTs have two applications in current neuropsychological practice: (1) screening data for a research investigation to remove effects of invalid performance (see Larrabee, Millis, & Meyers, 2008) and (2) for evaluation of an individual patient to determine if performance of that patient is valid, and forensically, to address the issue of malingering. Concerns about false positives are of far greater import in the second application of PVTs, since there really is no consequence to the patient whose data are excluded from clinical research. Concerns about false positive identification ( near passes ) in the individual case are addressed by the diagnostic criteria for Malingered Neurocognitive Dysfunction (MND; Slick, Sherman, & Iverson, 1999). Slick et al. define malingering as the volitional exaggeration or fabrication of cognitive dysfunction for the purpose of obtaining substantial material gain, avoiding or escaping legally mandated formal duty or responsibility. Criteria for MND require a substantial external incentive (e.g., litigation, criminal prosecution), multiple sources of evidence from behavior (e.g., PVTs), and symptom report (e.g., SVTs) to define probable malingering, whereas significantly worse-than-chance performance defines definite malingering. Moreover, these sources of evidence must not be the product of neurological, psychiatric or developmental factors (note the direct relevance of this last criterion to the issue of false positives). The Slick et al. criteria for MND have led to extensive subsequent research using these criteria for known-group investigations of detection of malingering (Boone, 2007; Larrabee, 2007; Morgan and Sweet, 2009). These criteria have also influenced development of criteria for Malingered Pain Related Disability (MPRD; Bianchini, Greve, and Glynn (2005). As my colleagues and I have pointed out (Larrabee et al., 2007), the diagnostic criteria for MND and MPRD share key features: (1) the requirement for a substantial external incentive, (2) the requirement for multiple indicators of performance invalidity or symptom exaggeration, and (3) test performance and symptom report patterns that are atypical in pattern and degree for bona fide neurologic, psychiatric or developmental disorders. It is the combined improbability of findings, in the context of external incentive, without any viable alternative explanation, that establishes the intent of the examinee to malinger (Larrabee et al., 2007). Research using the Slick et al. MND criteria shows the value of requiring multiple failures on PVTs and SVTs to determine probabilities of malingering in contexts with substantial external incentives. I (Larrabee, 2003a) demonstrated that requiring failure of two embedded/derived PVTs and/or SVTs resulted in a sensitivity of.875 and specificity of.889 for discriminating litigants (primarily with uncomplicated MTBI) performing significantly worse than chance from clinical patients with moderate and severe TBI. The requirement that patients fail 3 or more PVTs and SVTs resulted in a sensitivity of.542, but a specificity of 1.00 (i.e., there were no false positives). These data were replicated by Victor et al. (2009) using a different set of embedded/derived indictors in a similar research design yielding sensitivity of.838 and specificity of.939 for failure of any two PVTs, and sensitivity of.514 and specificity of.985 for failure of three or more PVTs. The drop in false alarm rate and increase in specificity going from two to three failed PVTs/SVTs, is directly related to the PPP of malingering, as demonstrated by the methodology of chaining likelihood ratios (Grimes & Schulz, 2005). The positive likelihood ratio is defined by the ratio of sensitivity to the false positive rate. Hence, a score falling at a particular PVT cutoff that has an associated sensitivity of.50 and specificity of.90 would yield a likelihood ratio of.50 C.10, or 5.0. If this is then premultiplied by the base rate odds of malingering (assume a malingering base rate of.40, per Larrabee, Millis, & Meyers, 2009, yielding a base rate odds of (base rate) C (1 base rate) or (.40) C (1.40) or.67), this value becomes.67 3 5.0 or 3.35. This can be converted back to a probability of malingering by the formula (odds) C (odds 1 1), in this case, 3.35 C 4.35, or.77. If the indicators are independent, they can be chained, so that the post-test odds after premultiplying one indicator by the base rate odds, become the new pretest odds by which a second independent indicator is multiplied. Thus, if a second PVT is failed at the same cut off yielding sensitivity of.50 and specificity of.90, this yields a second likelihood ratio of 5.0, which is now multiplied by the post-test odds of 3.35 obtained after failure of the first indicator. This yields new post-test odds of 16.75, which can be converted back to a probability by dividing 16.75 by 17.75 to yield a probability of malingering of.94, in settings with substantial external incentive. The interested reader is referred to a detailed explanation of this methodology (Larrabee, 2008). The method of chaining of likelihood ratios shows how the probability of confidently determining malingered performance is enhanced by requiring multiple PVT and SVT failure, consistent with other results (Larrabee, 2003a; Victor et al., 2009). Boone and Lu (2003) make a related point regarding the decline in false positive rate by using several independent tests, each with a false positive rate of.10: failure of two PVTs yields a probability (false positive rate) of.01 (.1 3.1), whereas failure of

4 G.J. Larrabee three PVTs yields a probability of.001 (.1 3.1 3.1), and failure of six PVTs yields a probability as low as one in a million (.1 3.1 3.1 3.1 3.1 3.1). Said differently, the standard of multiple PVT and SVT failure protects against false positive diagnostic errors. Per Boone and Lu (2003), Larrabee (2003a; 2008), and Victor et al. (2009), failure of three independent PVTs is associated with essentially no false positive errors, a highly compelling empirically-based conclusion in the context of any form of diagnostic testing. PVT performance can vary in persons identified with multiple PVT failures and should be assessed throughout an examination (Boone, 2009). Malingering can lower performance as much as 1 to 2SD on select sensitive tests of memory and processing speed (Larrabee, 2003a), and PVT failure can lower the overall test battery mean (Green et al., 2001) by over 1 SD. In the presence of malingering, poor performances are more likely the result of intentional underperformance, particularly in conditions such as uncomplicated mild TBI in which pronounced abnormalities are unexpected (McCrea et al., 2009), and normal range performances themselves are likely underestimates of actual level of ability. Last, there is a lengthy history of strong experimental design in PVT and SVT investigations. Research designs in malingering are discussed over 20 years ago in Rogers first book (Rogers, 1988). The two most rigorous and clinically relevant designs are the simulation design, and the known groups or criterion group designs (Heilbronner, et al., 2009). In the simulation design, a non-injured group of subjects is specifically instructed to feign deficits on PVTs, SVTs, and neuropsychological ability tests, which are then contrasted with scores produced by a group of persons with bona fide disorders, usually patients with moderate or severe TBI. The resulting patterns discriminate known feigning from legitimate performance profiles associated with moderate and severe TBI, thereby minimizing false positive diagnosis in the TBI group. The disadvantage is that issues arise as to the real world generalization of non-injured persons feigning deficit compares to actual malingerers who have significant external incentives, for example, millions at stake in a personal injury claim. The second design, criterion groups, contrasts the performance of a group of litigating subjects, usually those with alleged non-complicated mild TBI, who have failed multiple PVTs and SVTs, commonly using the Slick et al. MND criteria, with a group of clinical patients, typically with moderate and severe TBI. This has the advantage of using a group with real world incentives, that is unlikely to have significant neurological damage and persistent neuropsychological deficits (McCrea et al., 2009), holding false positive errors at a minimum by determining performance patterns that are not characteristic of moderate and severe TBI. Although random assignment cannot be used for the simulation and criterion group designs just described, these designs are appropriate for case control comparisons. PVT and SVT research using simulation and criterion group designs has, for the most part, yielded very consistent and replicable findings. For example, Heaton, Smith, Lehman, and Vogt (1978) reported an average dominant plus non-dominant Finger Tapping score of 63.1 for a sample of simulators, which was essentially identical to the score of 63.0 for the simulators in Mittenberg, Rotholc, Russell, and Heilbronner (1996). In a criterion groups design, I reported an optimal dominant plus non-dominant hand Finger Tapping score of less than 63 for discriminating subjects with definite MND from patients with moderate or severe TBI (Larrabee, 2003a), which was identical to the cutting score one would obtain by combining the male and female subjects in the criterion groups investigation of Arnold et al. (2005). In a criterion groups design, I (Larrabee, 2003b) reported optimal MMPI-2 FBS Symptom Validity cutoffs of 21 or 22 in discriminating subjects with definite MND from those with moderate or severe TBI, which was identical to the value of 21 or 22 for discriminating subjects with probable MND from patients with moderate or severe TBI reported by Ross, Millis, Krukowski, Putnam, and Adams (2004). As already noted, Victor et al. (2009) obtained very similar sensitivities and specificities for failureofanytwooranythreeormorepvtstothevaluesi obtained for failure of any two or three or more PVTs or SVTs (Larrabee, 2003a). My colleagues and I have relied upon the similarity of findings in simulation and criterion group designs to link together research supporting the psychometric basis of MND criteria (Larrabee et al., 2007). The similarity of findings on individual PVTs for simulators and for litigants with definite MND (defined by worse than chance performance) demonstrates that worse-than-chance performance reflects intentional underperformance; in other words, the definite MND subjects performed identically to non-injured persons feigning impairment who are known to be intentionally underperforming because they have been instructed to do so. Additionally, the PVT and neuropsychological test performance of persons with probable MND (defined by multiple PVT failure independent of the particular PVT or neuropsychological test data being compared) did not differ from that of persons with definite MND, establishing the validity of the probable MND criteria. Last, the paper by Bianchini, Curtis, and Greve (2006) showing a doseeffect relationship between PVT failure and amount of external incentive, supports that intent is causally related to PVT failure. In closing, the science behind measures of performance and symptom validity is rigorous, well developed, replicable and specifically focused on keeping false positive errors at a minimum. I have also argued for a change in terminology that may reduce some of the confusion in this area, recommending the use of Performance Validity Test (PVT) for measures directed at assessing the validity of a person s performance, and Symptom Validity Test (SVT) for measures directed at assessing the validity of a person s symptomatic complaint. ACKNOWLEDGMENT This manuscript has not been previously published electronically or in print. Portions of this study were presented as a debate at the 38th Annual International Neuropsychological Society meeting on Admissibility and Appropriate Use of Symptom Validity Science in Forensic Consulting in Acapulco, Mexico, February, 2010,

Performance validity and symptom validity 5 moderated by Paul M. Kaufmann, J.D., Ph.D. Dr. Larrabee is engaged in the full time independent practice of clinical neuropsychology with a primary emphasis in forensic neuropsychology. He is the editor of Assessment of Malingered Neuropsychological Deficits (2007, Oxford University Press), and Forensic Neuropsychology. A Scientific Approach (2 nd Edition, 2012, Oxford University Press), and receives royalties from the sale of these books. REFERENCES Arnold, G., Boone, K.B., Lu, P., Dean, A., Wen, J., Nitch, S., & McPherson, S. (2005). Sensitivity and specificity of Finger Tapping Test scores for the detection of suspect effort. The Clinical Neuropsychologist, 19, 105 120. Ashendorf, L., Constantinou, M., & McCaffrey, R.J. (2004). The effect of depression and anxiety on the TOMM in communitydwelling older adults. Archives of Clinical Neuropsychology, 19, 125 130. Bianchini, K.J., Curtis, K.L., & Greve, K.W. (2006). Compensation and malingering in traumatic brain injury: A dose-response relationship? The Clinical Neuropsychologist, 20, 831 847. Bianchini, K.J., Greve, K.W., & Glynn, G. (2005). On the diagnosis of malingered pain-related disability: Lessons from cognitive malingering research. Spine Journal, 5, 404 417. Bigler, E.D., Kaufmann, P.M., & Larrabee, G.J. (2010, February). Admissibility and appropriate use of symptom validity science in forensic consulting. Talk presented at the 38th Annual Meeting of the International Neuropsychological Society, Acapulco, Mexico. Boone, K.B. (2007). Assessment of feigned cognitive impairment. A neuropsychological perspective. New York: Guilford. Boone, K.B. (2009). The need for continuous and comprehensive sampling of effort/response bias during neuropsychological examinations. The Clinical Neuropsychologist, 23, 729 741. Boone, K.B., & Lu, P.H. (2003). Noncredible cognitive performance in the context of severe brain injury. The Clinical Neuropsychologist, 17, 244 254. Boone, K.B., Lu, P., & Herzberg, D.S. (2002a). The b Test. Manual. Los Angeles, CA: Western Psychological Services. Boone, K.B., Lu, P., & Herzberg, D.S. (2002b). The Dot Counting Test. Manual. Los Angeles, CA: Western Psychological Services. Brulot, M.M., Strauss, E., & Spellacy, F. (1997). Validity of the Minnesota Multiphasic Personality Inventory-2 correction factors for use with patients with suspected head injury. The Clinical Neuropsychologist, 11, 391 401. Etherton, J.L., Bianchini, K.J., Ciota, M.A., & Greve, K.W. (2005). Reliable Digit Span is unaffected by laboratory-induced pain: Implications for clinical use. Assessment, 12, 101 106. Etherton, J.L., Bianchini, K.J., Greve, K.W., & Ciota, M.A. (2005). Test of memory malingering performance is unaffected by laboratory-induced pain: Implications for clinical use. Archives of Clinical Neuropsychology, 20, 375 384. Fox, D.D. (2011). Symptom validity test failure indicates invalidity of neuropsychological tests. The Clinical Neuropsychologist, 25, 488 495. Gervais, R.O., Ben-Porath, Y.S., Wygant, D.B., & Green, P. (2008). Differential sensitivity of the Response Bias Scale (RBS) and MMPI-2 validity scales to memory complaints. The Clinical Neuropsychologist, 22, 1061 1079. Goodrich-Hunsaker, N.J., & Hopkins, R.O. (2009). Word Memory Test performance in amnesic patients with hippocampal damage. Neuropsychology, 23, 529 534. Green, P. (2007). The pervasive influence of effort on neuropsychological tests. Physical Medicine and Rehabilitation Clinics of North America, 18, 43 68. Green, P., Rohling, M.L., Iverson, G.L., & Gervais, R.O. (2003). Relationships between olfactory discrimination and head injury severity. Brain Injury, 17, 479 496. Green, P., Rohling, M.L., Lees-Haley, P.R., & Allen, L.M. (2001). Effort has a greater effect on test scores than severe brain injury in compensation claimants. Brain Injury, 15, 1045 1060. Greiffenstein, M.F. (2007). Motor, sensory, and perceptual-motor pseudoabnormalities. In G.J. Larrabee (Ed.). Assessment of malingered neuropsychological deficits (pp. 100 130). New York: Oxford University Press. Greiffenstein, M.F., & Baker, W.J. (2003). Premorbid Clues? Preinjury scholastic performance and present neuropsychological functioning in late postconcussion syndrome. The Clinical Neuropsychologist, 17, 561 573. Greiffenstein, M.F., Baker, W.J., & Gola, T. (1996). Motor dysfunction profiles in traumatic brain injury and post-concussion syndrome. Journal of the International Neuropsychological Society, 2, 477 485. Grimes, D.A., & Schulz, K.F. (2005). Epidemiology 3. Refining clinical diagnosis with likelihood ratios. Lancet, 365, 1500 1505. Hanninen, T., Reinikainen, K.J., Helkala, E.-L., Koivisto, K., Mykkanen, L., Laakso, M., y Riekkinen, R.J. (1994). Subjective memory complaints and personality traits in normal elderly subjects. Journal of the American Geriatric Society, 42, 1 4. Heaton, R.K., Miller, S.W., Taylor, M.J., & Grant, I. (2004). Revised comprehensive norms for an expanded Halstead-Reitan Battery: Demographically adjusted neuropsychological norms for African American and Caucasian adults. Professional Manual. Lutz, FL: Psychological Assessment Resources. Heaton, R.K., Smith, H.H., Jr., Lehman, R.A., & Vogt, A.J. (1978). Prospects for faking believable deficits on neuropsychological testing. Journal of Consulting and Clinical Psychology, 46, 892 900. Heilbronner, R.L., Sweet, J.J., Morgan, J.E., Larrabee, G.J., Millis, S.R., & Conference Participants (2009). American Academy of Clinical Neuropsychology consensus conference statement on the neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist, 23, 1093 1129. Iverson, G.L., Le Page, J., Koehler, B.E., Shojania, K., & Badii, M. (2007). Test of Memory Malingering (TOMM) scores are not affected by chronic pain or depression in patients with fibromyalgia. The Clinical Neuropsychologist, 21, 532 546. Larrabee, G.J. (2003a). Detection of malingering using atypical performance patterns on standard neuropsychological tests. The Clinical Neuropsychologist, 17, 410 425. Larrabee, G.J. (2003b). Detection of symptom exaggeration with the MMPI-2 in litigants with malingered neurocognitive dysfunction. The Clinical Neuropsychologist, 17, 54 68. Larrabee, G.J. (Ed.) (2007). Assessment of malingered neuropsychological deficits. New York: Oxford. Larrabee, G.J. (2008). Aggregation across multiple indicators improves the detection of malingering: Relationship to likelihood ratios. The Clinical Neuropsychologist, 22, 666 679. Larrabee, G.J. (2012). Assessment of malingering. In G.J. Larrabee (Ed.). Forensic neuropsychology: A scientific approach (2nd ed., pp. 116 159). New York: Oxford University Press. Larrabee, G.J., Greiffenstein, M.F., Greve, K.W., & Bianchini, K.J. (2007). Refining diagnostic criteria for malingering. In G.J. Larrabee (Ed.). Assessment of malingered neuropsychological deficits (pp. 334 371). New York: Oxford.

6 G.J. Larrabee Larrabee, G.J., & Levin, H.S. (1986). Memory self-ratings and objective test performance in a normal elderly sample. Journal of Clinical and Experimental Neuropsychology, 8, 275 284. Larrabee, G.J., Millis, S.R., & Meyers, J.E. (2008). Sensitivity to brain dysfunction of the Halstead-Reitan vs. an ability-focused neuropsychological battery. The Clinical Neuropsychologist, 22, 813 825. Larrabee, G.J., Millis, S.R., & Meyers, J.E. (2009). 40 plus or minus 10, a new magical number: Reply to Russell. The Clinical Neuropsychologist, 23, 746 753. McCrea, M., Iverson, G.L., McAllister, T.W., Hammeke, T.A., Powell, M.R., Barr, W.B., & Kelly, J.P. (2009). An integrated review of recovery after mild traumatic brain injury (MTBI): Implications for clinical management. The Clinical Neuropsychologist, 23, 1368 1390. Merten, T., Bossink, L., & Schmand, B. (2007). On the limits of effort testing: Symptom validity tests and severity of neurocognitive symptoms in nonlitigant patients. Journal of Clinical and Experimental Neuropsychology, 29, 308 318. Meyers, J.E., & Volbrecht, M.E. (2003). A validation of multiple malingering detection methods in a large clinical sample. Archives of Clinical Neuropsychology, 18, 261 276. Mittenberg, W., Azrin, R., Millsaps, C., & Heilbronner, R. (1993). Identification of malingered head injury on the Wechsler Memory Scale-Revised. Psychological Assessment, 5, 34 40. Mittenberg, W., Rotholc, A., Russell, E., & Heilbronner, R. (1996). Identification of malingered head injury on the Halstead- Reitan Battery. Archives of Clinical Neuropsychology, 11, 271 281. Morgan, J.E., & Sweet, J.J. (Eds.), (2009) Neuropsychology of malingering casebook. New York: Psychology Press. Rees, L.M., Tombaugh, T.N., & Boulay, L. (2001). Depression and the Test of Memory Malingering. Archives of Clinical Neuropsychology, 16, 501 506. Rogers, R. (1988). Researching dissimulation. In R. Rogers (Ed.), Clinical assessment of malingering and deception (pp. 309 327). New York: Guilford Press. Rohling, M.L., Larrabee, G.J., Greiffenstein, M.F., Ben-Porath, Y.S., Lees-Haley, P., Green, P., & Greve, K.W. (2011). A misleading review of response bias: Comment on McGrath, Mitchell, Kim, and Hough (2010). Psychological Bulletin, 137, 708 712. Ross, S.R., Millis, S.R., Krukowski, R.A., Putnam, S.H., & Adams, K.M. (2004). Detecting probable malingering on the MMPI-2: An examination of the Fake-Bad Scale in mild head injury. Journal of Clinical and Experimental Neuropsychology, 26, 115 124. Slick, D.J., Sherman, E.M.S., & Iverson, G.L. (1999). Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research. The Clinical Neuropsychologist, 13, 545 561. Straus, S.E., Richardson, W.S., Glasziou, P., & Haynes, R.B. (2005). Evidence-based medicine: How to practice and teach EBM (3rd ed.). New York: Elsevier Churchill Livingstone. Suhr, J.A., & Gunstad, J. (2005). Further exploration of the effect of diagnosis threat on cognitive performance in individuals with mild head injury. Journal of the International Neuropsychological Society, 11, 23 29. Tombaugh, T.N. (1996). TOMM. Test of Memory Malingering. New York: Multi-Health Systems. Vickery, C.D., Berry, D.T.R., Inman, T.H., Harris, M.J., & Orey, S.A. (2001). Detection of inadequate effort on neuropsychological testing: A meta-analytic review of selected procedures. Archives of Clinical Neuropsychology, 16, 45 73. Victor, T.L., Boone, K.B., Serpa, J.G., Buehler, J., & Ziegler, E.A. (2009). Interpreting the meaning of multiple symptom validity test failure. The Clinical Neuropsychologist, 23, 297 313. Williams, J.M., Little, M.M., Scates, S., & Blockman, N. (1987). Memory complaints and abilities among depressed older adults. Journal of Consulting and Clinical Psychology, 55, 595 598. doi:10.1017/s1355617712000392 DIALOGUE RESPONSE Response to Bigler Glenn J. Larrabee Bigler (this issue) and I apparently are in agreement about the importance of symptom validity testing, and my recommendation to adopt a new terminology of performance validity to address the validity of performance on measures of ability, and symptom validity to address the validity of symptom report on measures such as the MMPI-2. We appear to differ on issues related to false positives and the rigor of performance and symptom validity research designs. The study by Locke, Smigielski, Powell, and Stevens (2008) is cited by Bigler as demonstrating potential false positive errors due to TOMM scores falling in a near miss zone just below cutoff. This interpretation suggests a continuum of performance. Review of Bigler s Figure 1 and Locke et al. s Table 2 shows that the frequency distribution of TOMM scores does not, however, reflect a continuum but shows two discrete distributions: (1) a sample of 68 ranging from 45 to 50 (mean 5 49.31, SD 5 1.16) and (2) a sample of 19 ranging from 22 to 44 (mean 5 35.11, SD 5 6.55) [note Bigler interprets two distributions below 45, but the sample size is too small to establish this presence]. Clearly,

Performance validity and symptom validity 7 Locke et al. did not view TOMM failures as false positives in their sample. Although Locke et al. found that performance on neurocognitive testing was significantly lower in this group, TOMM failure was not related to severity of brain injury, depression or anxiety; only disability status predicted TOMM failure. They concluded: This study suggests that reduced effort occurs outside forensic settings, is related to neuropsychometric performance, and urges further research into effort across various settings (p. 273). As previously noted in my primary review, several factors minimize the significance of false positive errors. First, scores reflecting invalid performance are atypical in pattern or degree for bona fide neurological disorder. Second, cutoff scores are typically set to keep false positive errors at or below 10%. Third, investigators are encouraged to specify the characteristics of bona fide clinical patients who fail PVTs representing false positives, to enhance the clinical use of the PVT in the individual case. Fourth, appropriate use of PVTs in the individual case requires the presence of multiple abnormal scores on independent PVTs, occurring in the context of external incentive, with no compelling neurologic, psychiatric or developmental explanation for PVT failure, before one can conclude the presence of malingering (cf., Slick, Sherman, & Iverson, 1999). Bigler also criticizes the research in this area as being, at best, Class III level research (American Academy of Neurology, AAN, Edlund, Gronseth, So, & Franklin, 2004), noting the research is typically retrospective, using samples of convenience, with study authors not blind to group assignment. Review of the AAN guidelines, however, shows that retrospective investigations using case control designs can meet Class II standards (p. 20). Moreover, there is no requirement for masked or independent assessment, if the reference standards for presence of disorder and the diagnostic tests are objective (italics added). The majority of studies cited in recent reviews (Boone, 2007; Larrabee, 2007; Morgan & Sweet, 2009) follow case control designs contrasting either non-injured simulators or criterion/known-groups of definite or probable malingerers, classified using objective test criteria from Slick et al. (1999), with groups of clinical patients with significant neurologic disorder (usually moderate/severe TBI)and/or psychiatric disorder (i.e., major depressive disorder). As such, these investigations would meet AAN Level II criteria. In my earlier review in this dialog, I described a high degree of reproducibility of results in performance and symptom validity research. Additionally, the effect sizes generated by this research are uniformly large, for example, d 521.34 for Reliable Digit Span (Jasinski, Berry, Shandera, & Clark, 2011); d 5.96 for MMPI-2 FBS (Nelson, Sweet, & Demakis, 2006), replicated at d 5.95 incorporating 43 new studies (Nelson, Hoelzle, Sweet, Arbisi, & Demakis, 2010); d 5 2.02 for the two-alternative forced choice Digit Memory Test (Vickery, Berry, Inman, Harris, & Orey, 2001). These effect sizes exceed those reported for several psychological and medical tests (Meyer et al., 2001). Effect sizes of this magnitude are striking, considering that the discrimination is between feigned performance and legitimate neuropsychological abnormalities, rather than between feigned performance and normal performance. Reproducible results and large effect sizes cannot occur without rigorous experimental design. REFERENCES Bigler, E.D. (2012). Symptom validity testing, effort, and neuropsychological assessment. Journal of the International Neuropsychological Society, 18, 000 000. Boone, K.B. (2007). Assessment of feigned cognitive impairment. A neuropsychological perspective. New York: Guilford. Edlund, W., Gronseth, G., So, Y., & Franklin, G. (2004). Clinical practice guideline process manual: For the Quality Standards Subcommittee (QSS) and the Therapeutics and Technology Assessment Subcommittee (TTA). St. Paul: American Academy of Neurology. Jasinski, L.J., Berry, D.T.R., Shandera, A.L., & Clark, J.A. (2011). Use of the Wechsler Adult Intelligence Scale Digit Span subtest for malingering detection: A meta-analytic review. Journal of Clinical and Experimental Neuropsychology, 33, 300 314. Larrabee, G.J. (2007) (Ed.), Assessment of malingered neuropsychological deficits. New York: Oxford. Locke, D.E.C., Smigielski, J.S., Powell, M.R., & Stevens, S.R. (2008). Effort issues in post-acute outpatient acquired brain injury rehabilitation seekers. Neurorehabilitation, 23, 273 281. Meyer, G.J., Finn, S.E., Eyde, L.D., Kay, G.G., Moreland, K.L., Dies, R.R., y Reed, G.M. (2001). Psychological testing and psychological assessment. A review of evidence and issues. American Psychologist, 56, 128 165. Morgan, J.E., & Sweet, J.J. (Eds.) (2009). Neuropsychology of malingering casebook. New York: Psychology Press. Nelson, N.W., Hoelzle, J.B., Sweet, J.J., Arbisi, P.A., & Demakis, G.J. (2010). Updated meta-analysis of the MMPI-2 Symptom Validity Scale (FBS): Verified utility in forensic practice. The Clinical Neuropsychologist, 24, 701 724. Nelson, N.W., Sweet, J.J., & Demakis, G.J. (2006). Meta-analysis of the MMPI-2 Fake Bad Scale: Utility in forensic practice. The Clinical Neuropsychologist, 20, 39 58. Slick, D.J., Sherman, E.M.S., & Iverson, G.L. (1999). Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research. The Clinical Neuropsychologist, 13, 545 561. Vickery, C.D., Berry, D.T.R., Inman, T.H., Harris, M.J., & Orey, S.A. (2001). Detection of inadequate effort on neuropsychological testing: A meta-analytic review of selected procedures. Archives of Clinical Neuropsychology, 16, 45 73.

Journal of the International Neuropsychological Society (2012), 18, 1 11. Copyright E INS. Published by Cambridge University Press, 2012. doi:10.1017/s1355617712000252 DIALOGUE Symptom Validity Testing, Effort, and Neuropsychological Assessment Erin D. Bigler 1,2,3,4 1 Department of Psychology, Brigham Young University, Provo, Utah 2 Neuroscience Center, Brigham Young University, Provo, Utah 3 Department of Psychiatry, University of Utah, Salt Lake City, Utah 4 The Brain Institute of Utah, University of Utah, Salt Lake City, Utah (RECEIVED November 14, 2011; FINAL REVISION February 4, 2012; ACCEPTED February 8, 2012) Abstract Symptom validity testing (SVT) has become a major theme of contemporary neuropsychological research. However, many issues about the meaning and interpretation of SVT findings will require the best in research design and methods to more precisely characterize what SVT tasks measure and how SVT test findings are to be used in neuropsychological assessment. Major clinical and research issues are overviewed including the use of the effort term to connote validity of SVT performance, the use of cut-scores, the absence of lesion-localization studies in SVT research, neuropsychiatric status and SVT performance and the rigor of SVT research designs. Case studies that demonstrate critical issues involving SVT interpretation are presented. (JINS, 2012, 18, 1 11) Keywords: Symptom validity testing, SVT, Effort, Response bias, Validity Symptom validity testing (SVT) has emerged as a major theme of neuropsychological research and clinical practice. Neuropsychological assessment methods and procedures strive for and require the most valid and reliable techniques to assess cognitive and neurobehavioral functioning, to make neuropsychological inferences and diagnostic conclusions. From the beginnings of neuropsychology as a discipline, issues of test reliability and validity have always been a concern (Filskov & Boll, 1981; Lezak, 1976). However, neuropsychology s initial focus was mostly on test development, standardization and the psychometric properties of a test and not independent measures of test validity. A variety of SVT methods are now available (Larrabee, 2007). While contemporary neuropsychological test development has begun to more directly incorporate SVT indicators embedded within the primary neuropsychological instrument (Bender, Martin Garcia, & Barr, 2010; Miller et al., 2011; Powell, Locke, Smigielski, & McCrea, 2011), traditional neuropsychological test construction and the vast majority of standardized tests currently in use do not. Current practice has Correspondence and reprint requests to: Erin D. Bigler, Department of Psychology and Neuroscience Center, 1001 SWKT, Brigham Young University, Provo, UT 84602. E-mail: erin_bigler@byu.edu been to use what are referred to as stand-alone SVT measures that are separately administered during the neuropsychological examination (Sollman & Berry, 2011). SVT performance is then used to infer validity or lack thereof for the battery of all neuropsychological tests administered during that test session. The growth in SVT research has been exponential. Using the search words symptom validity testing in a National Library of Medicine literature search yielded only one study before 1980, five articles during the 1980s, but hundreds thereafter. SVT research of the last decade has led to important practice conclusions as follows: (1) professional societies endorse SVT use (Bush et al., 2005; Heilbronner, Sweet, Morgan, Larrabee, & Millis, 2009), (2) passing a SVT infers valid performance, (3) SVT measures have good face validity as cognitive measures, all have components that are easy for healthy controls and even for the majority of neurological patients to pass with few or no errors, and (4) groups that perform below established cut-score levels on a SVT generally exhibit lower neuropsychological test scores. This last observation has been interpreted as demonstrating that SVT performance taps a dimension of effort to perform, where SVT failure reflects non-neurological factors that reduce neuropsychological test scores and invalidates findings 1

2 E.D. Bigler (West, Curtis, Greve, & Bianchini, 2011). On forced-choice SVTs, the statistical improbability of below chance performance implicates malingering (the examinee knows the correct answer but selects the incorrect to feign impairment). A quote from Millis (2009) captures the importance of why neuropsychology must address the validity of test performance: All cognitive tests require that patients give their best effort (italics added) when completing them. Furthermore, cognitive tests do not directly measure cognition: they measure behavior from which we make inferences about cognition. People are able to consciously alter or modify their behavior, including their behavior when performing cognitive tests. Ostensibly poor or impaired test scores will be obtained if an examinee withholds effort (e.g., reacting slowly to reaction time tests). There are many reasons why people may fail to give best effort on cognitive testing: financial compensation for personal injury; disability payments; avoiding or escaping formal duty or responsibilities (e.g., prison, military, or public service, or family support payments or other financial obligations); or psychosocial reinforcement for assuming the sick role (Slick, Sherman, & Iverson, 1999). y. Clinical observation alone cannot reliably differentiate examinees giving best effort from those who are not. (Millis & Volinsky, 2001, p. 2409) This short review examines key SVT concepts and the effort term as used in neuropsychology. Effort seems to capture a clinical descriptive of patient test performance that, at first blush, seems straightforward enough. However, effort also has neurobiological underpinnings, a perspective often overlooked in SVT research and clinical application. Furthermore, does the term effort suggest intention, such as genuine effort the patient is trying their best? Or if maximum effort is not being applied to test performance or exhibited by the patient, when does it reflect performance that may not be trustworthy? What is meant by effort? EFFORT ITS MULTIPLE MEANINGS In the SVT literature when the effort term is linked with other nouns, verbs, and/or adjectives such statements appear to infer or explain a patient s mental state, including motivation. For example it is common to see commentaries in the SVT literature indicating something about cognitive effort, mental effort, insufficient effort, poor effort, invalid effort, or even faked effort. Some of these terms suggest that neuropsychological techniques, and in particular the SVT measure itself, are capable of inferring intent? Can they (see discussion by Dressing, Widder, & Foerster, 2010)? There are additional terms often used to describe SVT findings including response bias, disingenuous, dissimulation, non-credible, malingered, or non- or sub-optimal, further complicating SVT nomenclature. There is no agreed upon consensus definition within neuropsychology of what effort means. Fig. 1. The distribution of Test of Memory Malingering (TOMM) scores that were above (green) the cut-point of 45 compared to those who performed below (blue-red). Note the bi-modal distribution of those who failed, compared to the peaked performance of 50/50 correct by those who pass. Below chance responding is the most accepted SVT feature indicative of malingering. As indicated by the shading of blue coloration emerging into red, as performance approaches chance, that likely reflects considerable likelihood of malingering. However, recalling that all of these patients had some form of ABI, those in blue, are much closer to the cut-point, raising the question of whether their neurological condition may contribute to their SVT performance. Below Cut-Score SVT Performance and Neuropsychological Test Findings: THE ISSUE An exemplary SVT study has been done by Locke, Smigielski, Powell, and Stevens (2008). This study is singled out for this review because it was performed at an academic medical center (many SVT studies are based on practitioners clinical cases), had institutional review board (IRB) approval (most SVT studies do not), examined non-forensic (no case was in litigation although some cases had already been judged disabled and were receiving compensation), and was based on consecutive clinical referrals (independently diagnosed with some type of acquired brain injury [ABI] before the neuropsychological assessment) with all patients being seen for treatment recommendations and/or actual treatment. Figure 1 shows the distribution of pass/fail SVT scores where 21.8% performed below the SVT cut-point [pass Z 45/50 items correct on Trial 2 of the Test of Memory Malingering (TOMM), Tombaugh, 1996], which Locke et al. defined as

Symptom validity testing assessment 3 Fig. 2. MRI showing partial temporal lobectomy. Pre-Surgery Test of Memory and Malingering (TOMM): Trial 1 5 50/50; Trial 2 5 50/50; Test of Neuropsychological Malingering (TNM) 5 90% correct (see Hall and Pritchard, 1996). Post-Surgery: TOMM Trial 1 5 42/50; Trial 2 5 46/50; Delayed 5 44/50; Rey 15-Item 5 6/15;WordMemoryTest (WMT): IR 67.5%, 30 min delay 75%; Free Recall 5%; Free Recall Delay 5 7.5%; Free Recall Long Delay 5 0.0. constituting a group of ABI patients exhibiting sub-optimal effort. Of greater significance for neuropsychology is that the subjects in the failed SVT group, while being matched on all other demographic factors, performed worse and statistically lower across most (but not all) neuropsychological test measures. As seen in Figure 1, the modal response is a perfect or near perfect TOMM score of 50/50 correct, presumably reflective of valid test performance. The fact that all subjects had ABI previously and independently diagnosed before the SVT being administered, and most performed without error, is a testament to the ease of performing an SVT. For those who scored below the cut-point and thereby failed the SVT measure, a distinct bi-modal distribution emerges. One group, shown in red (see Figure 1), hovers around chance (clearly an invalid performance) but the majority of others who fail, as shown in blue, hover much closer to, but obviously below, the SVT cut-point. For the purposes of this review, the distinctly abovechance but below cut-score performing group will be labeled the Near-Pass SVT group. It is with this group where the clinician/researcher is confronted with Type I and II statistical errors when attempting to address validity issues of neuropsychological test findings. All current SVT methods acknowledge that certain neurological conditions may influence SVT performance, but few provide much in the way of guidelines as to how this problem should be addressed. who fails SVT measures. In both cases, SVT failure is not below chance, both patients have distinct, bona fide and unequivocal structural damage to critical brain regions involved in memory. Is not their SVT performance a reflection of the underlying damage and its disruption of memory performance? If a neuropsychologist interpreted these test data as invalid because of failed SVT performance, is that not ignoring the obvious and committing a Type II error? So when is failed SVT performance just a reflection of underlying neuropathology? For example, cognitive impairment associated with Alzheimer s disease is sufficient to impair SVT performance resulting in below recommended cut-score levels and therefore constituting Type II Error and the Patient With a Near-Pass SVT performance 1 Two cases among many seen in our University-based neuropsychology program are representative of the problems with classifying neuropsychological test performance as invalid in the Near-Pass SVT performing patient. Figure 2 depicts the post-temporal lobectomy MRI in a patient with intractable epilepsy who underwent a partial right hippocampectomy. Pre-surgery he passes all SVT measures but post-surgery passes some but not others. Figure 3 shows abnormal medial temporal damage in a patient with herpes simplex encephalitis 1 It is impossible to discuss SVT measures without discussing some by name. This should not be considered as any type of endorsement or critique of the SVT mentioned. SVT inclusion of named tests in this review occurred merely on the basis of the research being cited. Fig. 3. The fluid attenuated inversion recovery (FLAIR) sequence magnetic resonance imaging (MRI) was performed in the sub-acute stage demonstrating marked involvement of the right (arrow) medial temporal lobe region of this patient (see arrow). The patient was attempting to return to college and was being evaluated for special assistance placement. On the Wechsler Memory Scale (WMS-IV) he obtained the following: WMS-IV: Audio Memory 5 87; Visual Memory 5 87; Visual Word Memory 5 73; Immediate Memory 5 86; Delayed Memory 5 84. Immediate Recall: 77.5% (Fail); Delayed Recall: 72.5% (Fail); Consistency: 65.0% (Fail); Multiple Choice: 50.0% (Warning); Paired Associate: 50.0% (Warning); Free Recall: 47.5%; Test of Memory and Malingering (TOMM): Trial 1: 39, Trial 2: 47.

4 E.D. Bigler a SVT failure (Merten, Bossink, & Schmand, 2007). However, in such a circumstance impaired neuropsychological performance and the low SVT score are both thought to reflect genuinely impaired cognitive ability. No Systematic Study Lesion/Localization Studies of SVT performance Neuropsychological assessment has a long tradition of examining lesion effects (or lack thereof) on neuropsychological test performance (Cipolotti & Warrington, 1995). There are no systematic studies of lesion effects on SVT performance. Despite the assumption of minimal requirements to perform a SVT task, episodic memory is being tapped where functional neuroimaging studies demonstrate expected medial temporal, cingulate and parietofrontal attentional network activation during SVT performance (Allen, Bigler, Larsen, Goodrich- Hunsaker, & Hopkins, 2007; Browndyke et al., 2008; Larsen, Allen, Bigler, Goodrich-Hunsaker, & Hopkins, 2010). Goodrich-Hunsaker and Hopkins (2009) and Wu, Allen, Goodrich-Hunsaker, Hopkins, & Bigler (2010) in case studies have shown that five patients with hippocampal damage (3 with anoxic injury, 2 with TBI) can pass SVTs. However, this woefully under samples the possible lesions and abnormalities that have the potential to directly disrupt SVT performance. While not a lesion study, Merten et al. (2007) have demonstrated SVT failure related to dementia severity and Gorissen, Sanz, and Schmand (2005) demonstrated high SVT failure rates in neuropsychiatric patients, particularly those with schizophrenia. If certain neuropathological conditions are more likely to affect SVT performance, such findings would be critical for SVT interpretation. Currently, there are no recommended adjustments in SVT cut-scores based on location, size or type of lesion that may be present. Illness Behavior and Cognitive Performance In non-litigating neurological and neuropsychiatric clinical populations, 15 to 30% or higher SVT failure rates have been reported (Williams, 2011). Does this mean invalid neuropsychological test data occurs in up to one-third of all patients seen for neuropsychological assessment based on not passing an externally administered SVT? Are these patients malingering? What is the role of SVT performance in rendering differential diagnoses involving malingering, somatoform disorder and other neuropsychiatric conditions? Figure 4 from Metternich, Schmidtke, and Hull (2009) provides a model showing the overlap between functional and constitutional memory factors that may adversely affect cognition in the neuropsychiatric patient. In reviewing this model, the reader should note that there are numerous meta-cognitive as well as neural pathways that could legitimately disrupt SVT performance as a result of the disorder. Wager-Smith and Markou (2011) review the effects of stress, cytokine, and neuroinflammatory reactions that relate to sickness behavior and impaired cognition. In the context of the Metternich et al. (2009) model, sickness behaviors may interact with stress mediated biological factors that Fig. 4. Theoretical model postulating how stress may influence cognition. Primary influences may come from purely psychosocial variables, directly from physiologically activated stress variables or the combination of the two. Reprinted from Journal of Psychosomatic Research, Volume 66, Issue 05. How are memory complaints in functional memory disorder related to measures of affect, metamemory and cognition? by Birgitta Metternich, Kalaus Schmidtke and Michael Hull, pp. 435 444. Copyright E 2009 with permission from Elsevier. appear psychological yet disrupt cognition. Does any of this reflect differences in how SVT cut scores should be established if a known neuropsychiatric disorder is present? Diagnosis Threat and SVT Performance The diagnosis threat literature clearly demonstrates both experimentally and clinically that performance expectations influence actual cognitive test performance and the perceived influence of symptoms on cognitive test results (Ozen & Fernandes, 2011; Suhr & Gunstad, 2005). Likewise, placebo research on cognitive performance plainly demonstrates the influential role that expectations have on symptom generation and test performance (Pollo & Benedetti, 2009). Thus, psychological state and trait characteristics and the perception of wellbeing versus illness may influence cognitive performance (Pressman & Cohen, 2005; Walkenhorst & Crowe, 2009). Is near-pass SVT performance an expected human dimension of diagnosis threat? Can these factors be disentangled from the degree and type of brain injury, medication status, litigation status, levels of psychological distress including premorbid conditions and other non-neurological factors by SVT performance (see discussion by Suhr, Tranel, Wefel, & Barrash, 1997)? Effort or Ability? If SVT performance required minimal to no cognitive effort, then experimental paradigms using cognitive load as a

Symptom validity testing assessment 5 distraction, should result in minimal to little change in SVT performance. Batt, Shores, and Chekaluk (2008) examined non-litigating severe TBI patients on SVT measures during a task where distraction occurred during the SVT learning phase, demonstrating the influence of cognitive processing on SVT performance. Cognitive neuroscience uses simple cognitive tasks, as simple as SVT measures, to experimentally manipulate conditions to tease out neural and experimental effects on cognition (Graham, Barense, & Lee, 2010). Unfortunately, other than the Batt et al. and a handful of other studies, the cognitive neuroscience of SVT performance has been ignored. For example, it is unknown whether the foil stimuli used in an SVT task are equivalent or not, or are uniquely influenced by certain types of structural damage, or neuropsychiatric or neurological condition. Design Issues in SVT Research The review to this point has raised several interpretative questions that occur in Near-Pass SVT subjects. The answer to these questions requires better designed SVT studies that address ambiguities of past SVT findings. Williams (2011) points out the necessity of some messy SVT research designs due to the impossibility of getting genuine malingerers to volunteer for standardization studies. Standardization studies have had to rely on simulator studies and clinical samples; mostly ones of convenience and mostly forensic samples. Circular reasoning, tautology and SVT research If one uses SVT performance as the only index of effort and then concludes that SVT failure is a sign of poor effort, yet there are no other independent measures of what may be test behavior compliance or willingness to engage, process, and perform, or even malinger, then is this not a tautological argument? In such studies the only classifier defining poor effort is the SVT performance itself. While such studies often classify subjects by secondary gain identifiers (litigation, disability determination, etc.), such identifiers are not direct measures of effort, only of secondary gain. Tautology also involves the unnecessary repetitive use of different words with the same meaning. The list of terms being used interchangeably with SVT and effort include response bias, invalid or failed performance, symptom amplification, performance exaggeration, underperformance or distortion, symptom embellishment, disingenuous, sub-optimal, poor effort, non-credible, faked and malingered. It is not uncommon to see statements like Failed SVT performance was associated with invalid neuropsychological test performance that was deemed non-credible due to sub-optimal effort. The tautological problems with such a statement should be obvious. Rigor of SVT studies Class I and II level research as endorsed by the National Institutes of Health (NIH) and all major medical societies involves independently conducted investigations that have some external review and monitoring where investigators are independent of the outcome at all levels of the investigation (Edlund, Gronseth, So, & Franklin, 2004). The best Class I and II investigations are those where aprioriconsensus diagnostic standards are in place that are independent of any outcome measure, data collected prospectively and independently from those in charge of their analysis, where clinicians involved in diagnostic decision making are independent of those who analyze the data, and who are also blinded. In Class I or II investigations, clinicians and data managers cannot also be the statisticians. Explicitly different roles at all levels of data acquisition, tabulation, analysis, and report writing increases the likelihood of unbiased findings. Institutional based investigations require human subjects review, consent, and IRB approval. Few current SVT studies meet Class I or II level research or are subject to IRB approval. SVT studies that come from clinical practitioners in private practice not affiliated with an institution do not fall under any external review process whatsoever. Important investigational research comes from clinicians in private practice but rarely does this research meet a Class I or II standard. IS THERE A NEUROBIOLOGY OF DRIVE, EFFORT, MOTIVATION, AND ATTENTION? Much of the discussion up to this point has focused on cognitive elements of SVT performance but there is also a behavioral dimension that centers on the neurobiology of drive, effort, and motivation (Sarter, Gehring, & Kozak, 2006). Patients with frontotemporolimbic damage may be apathetic with problems sustaining drive and goal-directed behaviors (Lezak, Howieson & Loring, 2004). Apathy is a common consequence of traumatic brain injury (TBI; Marin & Wilkosz, 2005). What happens during neuropsychological assessment of the patient with neurogenic drive and motivation problems? The following test scores were obtained in a patient approximately one year post-tbi who the family described as unmotivated; neuroimaging demonstrated extensive bi-frontal and right parietal encephalomalacia and generalized atrophy: TOMM: Trial 1 5 45/50; Trial 2 5 50/50; Rey 15-Item: 10/15; Word Memory Test (WMT): IR (immediate recognition) 5 78%; DR (delayed recognition) 5 85%; CNS (consistency response) 5 78%). The 45/ 50 on TOMM Trial 1 represents a pass, but is right at the cutscore with a perfect 50/50 on Trial 2 representing a pass by all standards. The Rey 15-Item performance represents a pass (although a borderline score by some standards); however, a 78% correct on the WMT IR and CNS Scales represents a failure by WMT standards. How does brain damage to motivational and attentional networks affect SVT performance and should patients with obvious structural lesions be evaluated by different cut-score standards? Which Test to Use? There are now numerous SVT measures available for use in general neuropsychological practice (Grant & Adams, 2009;

6 E.D. Bigler Lezak, Howieson, & Loring, 2004; Mitrushina, Boone, Razani, & D Elia, 2005; Strauss, Sherman, & Spreen, 2006; Tate, 2010) and potentially even more in a forensic setting (Boone, 2007; Larrabee, 2007; Morgan & Sweet, 2009). While SVT use is endorsed by professionals, SVT test selection relies solely on the judgment of the researcher and/or clinician. With such a broad array of SVT measures that can be used, which ones should be used and in what circumstances? Also, a reasonable argument has been made that multiple SVTs are needed, especially in any lengthy or forensic assessment (Boone, 2009; Larrabee, 2008), but again no agreed upon professional standards as to the correct number, in what order, and in what context. Administration of multiple SVT measures also raises other questions when failures on some but not others occur and whether there is an order effect in SVT test administration (Ryan, Glass, Hinds, & Brown, 2010)? THE PROBLEM WITH CUT SCORES Dwyer (1996) reviewed the methods of cut-score development concluding that cut-scores (a) always entail judgment; (b) inherently result in some misclassification, (c) impose artificial pass/fail dichotomies and (d) no true cut scores exist (p. 360). Given Dwyer s comments, should cut scores be adjusted or customized to specific clinical conditions? Kirkwood and Kirk (2010) in a university-based assessment clinic evaluated 193 consecutively referred children with mild TBI. This study had IRB approval and these investigators examined both the TOMM and Medical Symptom Validity Test (MSVT; Green, 2004). Using their terminology, Kirkwood and Kirk (2010) found 17% of the sample to exhibit suboptimal effort. Only one failure was thought to be influenced by litigation. They also attempted to identify other potential sources for SVT failure, including impulsivity, distractibility during testing, pre-injury neuropsychiatric diagnosis and potential effects of reading disability. This study unmistakably demonstrates the complexities and potential issues in a clinical sample that may lead to sub-optimal performance. In the Merten et al. (2007) study mentioned earlier that examined bona fide neurological patients with and without clinically obvious symptoms only 15/48 (31%) passed all SVTs. Only 1/24 (4%) of those with clinically obvious cognitive symptoms was able to pass all SVTs. These authors conclude that the y. Results clearly show that many of these bona fide patients fail on most SVTs. Had the recommended cutoffs been applied rigorously without taking the clinical picture into consideration, these patients would have been incorrectly classified as exerting insufficient effort. (p. 314). Donders and Strong (2011) attempted to replicate a logistic regression method that had been developed by Wolfe et al. (2010) to identify embedded effort indicators on the California Verbal Learning Test (CVLT). They not only applied the logistic regression method of Wolfe et al. but also used an externally administered SVT. However, the limits of interrelationship and inter-test agreement between actual test performances, embedded indicators of effort and the external SVT led them to conclude that this method was not ready for clinical application. These studies simply underscore the difficulties of what needs to be addressed and accounted for in the neuropsychological application of current SVT technology. Similarly, Powell et al. (2011) show that supposed markers of suboptimal effort using the Trail Making Test, also have limited predictive ability; again demonstrating the problem of disentangling true neuropsychological performance from associated elements of effort, drive, motivation, and attention when attempting to use embedded methods that were not explicitly designed to simultaneously assess validity. Cut-scores are a necessary part of contemporary neuropsychological testing as they provide a method for classification but cut-scores are best used in the context of guidelines, rather than a dichotomous defining point for presence or absence of a deficit (Strauss et al., 2006). False Memory and Dissociative Reactions Mental health issues as they relate to false memory have been the topic of considerable controversy in clinical, research, and legal settings (Loftus & Davis, 2006). Interestingly, even an animal model of false memory has been developed (McTighe, Cowell, Winters, Bussey, & Saksida, 2010). Are some SVT failures by neurological or neuropsychiatric patients generated by false memories? For example, cognitive neuroscience often examines how confident a subject is in their response, when assessing false memory (Moritz & Woodward, 2006). No SVT studies to date have tackled this dimension. Does Failed SVT Always Equate With Invalid Performance for All Neuropsychological Measures? In the Locke et al. (2008) investigation not all neuropsychological test scores were significantly suppressed in the failed SVT group. The Category Test and two of the three Wisconsin Card Sorting measures did not differ between the group that passed the SVT and the group that failed nor did scores on the Beck Depression and Anxiety scales. Whitney, Shepard, Mariner, Mossbarger, and Herman (2010) found that Wechsler Test of Adult Reading (WTAR) scores were no different between those with passed or failed SVT performance suggesting that WTAR findings remain robust even in the face of suboptimal effort (p. 196). Does this mean valid neuropsychological test findings occur with some tests even in the presence of SVT failure? SVT Deception Is the SVT measure infallible? DenBoer and Hall (2007) have shown that simulators can be taught how to detect SVT tasks and pass them and go on to fail the more formal neuropsychological measures (see also Rüsseler, Brett, Klaue, Sailer, & Munte, 2008). If the SVT task can be faked how would the clinician and/or researcher know? CONCLUSIONS SVT findings may offer important information about neuropsychological test performance, but if an oversimplified view

Symptom validity testing assessment 7 of SVT-test behavior dichotomizes neuropsychological performance as either valid (above cut-score) or invalid (below cut-score), clinically important Type I and II errors are unavoidable. As shown in this review, patients with legitimate neurological and/or neuropsychiatric conditions fail SVTs for likely neurogenic factors. There can be no debate that issues of poor effort and secondary gain may have such a profound effect as to completely invalidate neuropsychological test findings rendering them uninterruptable (see Stevens, Friedel, Mehren, & Merten, 2008). However, considerably more SVT research is needed to address the issues raised in this review. ACKNOWLEDGMENTS Parts of this were presented as a debate at the 38th Annual International Neuropsychological Society meetings on Admissibility and Appropriate Use of Symptom Validity Science in Forensic Consulting in Acapulco, Mexico moderated by Paul M. Kaufmann, J.D., Ph.D. Dr. Bigler co-directs Brigham Young University s Neuropsychological Assessment and Research Clinic that provides a service to the community evaluating a broad spectrum of clinical referrals. For legal referrals the University is compensated for the evaluation. Dr. Bigler also performs forensic consultation for which he is directly compensated. His research is National Institutes of Health (NIH) funded, but no NIH grant funds directly supported the writing of this commentary. Dr. Bigler has no financial interest in any commercial symptom validity test (SVT) measure. The assistance of Jo Ann Petrie, Thomas J. Farrer, and Tracy J. Abildskov in the preparation of this manuscript is gratefully acknowledged. REFERENCES Allen, M.D., Bigler, E.D., Larsen, J., Goodrich-Hunsaker, N.J., & Hopkins, R.O. (2007). Functional neuroimaging evidence for high cognitive effort on the Word Memory Test in the absence of external incentives. Brain Injury, 21, 1425 1428. doi:10.1080/ 02699050701769819 Batt, K., Shores, E.A., & Chekaluk, E. (2008). The effect of distraction on the Word Memory Test and Test of Memory Malingering performance in patients with a severe brain injury. Journal of the International Neuropsychological Society, 14, 1074 1080. doi:10.1017/s135561770808137x Bender, H.A., Martin Garcia, A., & Barr, W.B. (2010). An interdisciplinary approach to neuropsychological test construction: Perspectives from translation studies. Journal of the International Neuropsychological Society, 16, 227 232. doi:10.1017/s1355617709991378 Boone, K.B. (2007). Assessment of feigned cogntive impairment: A neuropsychological perspective. New York: The Guilford Press. Boone, K.B. (2009). The need for continuous and comprehensive sampling of effort/response bias during neuropsychological examinations. The Clinical Neuropsychologist, 23, 729 741. doi:10.1080/13854040802427803 Browndyke, J.N., Paskavitz, J., Sweet, L.H., Cohen, R.A., Tucker, K.A., Welsh-Bohmer, K.A., y Schmechel, D.E. (2008). Neuroanatomical correlates of malingered memory impairment: Event-related fmri of deception on a recognition memory task. Brain Injury, 22, 481 489. doi:10.1080/02699050802084894 Bush, S.S., Ruff, R.M., Troster, A.I., Barth, J.T., Koffler, S.P., Pliskin, N.H., y Silver, C.H. (2005). Symptom validity assessment: Practice issues and medical necessity: NAN Policy & Planning Committee. Archives of Clinical Neuropsychology, 20, 419 426. doi:10.1016/j.acn.2005.02.002 Cipolotti, L., & Warrington, E.K. (1995). Neuropsychological assessment. Journal of Neurology, Neurosurgery and Psychiatry, 58, 655 664. DenBoer, J.W., & Hall, S. (2007). Neuropsychological test performance of successful brain injury simulators. The Clinical Neuropsychologist, 21, 943 955. doi:10.1080/13854040601020783 Donders, J., & Strong, C.A. (2011). Embedded effort indicators on the California Verbal Learning Test Second Edition (CVLT-II): An attempted cross-validation. The Clinical Neuropsychologist, 25, 173 184. doi:10.1080/13854046.2010.536781 Dressing, H., Widder, B., & Foerster, K. (2010). [Symptom validity tests in psychiatric assessment: A critical review]. Versicherungsmedizin/herausgegeben von Verband der Lebensversicherungs-Unternehmen e.v. und Verband der Privaten Krankenversicherung e.v, 62, 163 167. Dwyer, C.A. (1996). Cut scores and testing: Statistics, judgment, truth, and error. Psychological Assessment, 8, 360 362. Edlund, W., Gronseth, G., So, Y., & Franklin, G. (2004). Clinical practice guideline process manual: For the Quality Standards Subcommittee (QSS) and the Therapeutics and Technology Assessment Subcommittee (TTA). St. Paul: American Academy of Neurology. Filskov, S.B., & Boll, T.J. (1981). Handbook of clinical neuropsychology. New York: John Wiley & Sons. Goodrich-Hunsaker, N.J., & Hopkins, R.O. (2009). Word memory test performance in amnesic patients with hippocampal damage. Neuropsychology, 23, 529 534. doi:10.1037/a0015444 Gorissen, M., Sanz, J.C., & Schmand, B. (2005). Effort and cognition in schizophrenia patients. Schizophrenia Research, 78, 199 208. doi:10.1016/j.schres.2005.02.016 Graham, K.S., Barense, M.D., & Lee, A.C. (2010). Going beyond LTM in the MTL: A synthesis of neuropsychological and neuroimaging findings on the role of the medial temporal lobe in memory and perception. Neuropsychologia, 48, 831 853. doi:10.1016/j.neuropsychologia.2010.01.001 Grant, I., & Adams, K.M. (2009). Neuropsychological assessment of neuropsychiatric and neuromedical disorders. New York: Oxford University Press. Green, P. (2004). Medical Symptom Validity Test (MSVT) for Microsoft Windows: User s Manual. Edmonton, Canada: Green s Publishing. Hall, H., & Pritchard, D. (1996). Detecting malingering and deception: Forensic distortion analysis. Florida: St. Lucie Press. Heilbronner, R.L., Sweet, J.J., Morgan, J.E., Larrabee, G.J., & Millis, S.R. (2009). American Academy of Clinical Neuropsychology Consensus Conference Statement on the neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist, 23, 1093 1129. doi:10.1080/ 13854040903155063 Kirkwood, M.W., & Kirk, J.W. (2010). The base rate of suboptimal effort in a pediatric mild TBI sample: Performance on the Medical Symptom Validity Test. The Clinical Neuropsychologist, 24, 860 872. doi:10.1080/13854040903527287 Larrabee, G.J. (2007). Assessment of malingered neuropsychological deficits. New York: Oxford University Press. Larrabee, G.J. (2008). Aggregation across multiple indicators improves the detection of malingering: Relationship to likelihood ratios. The Clinical Neuropsychologist, 22, 666 679. doi:10.1080/13854040701494987

8 E.D. Bigler Larsen, J.D., Allen, M.D., Bigler, E.D., Goodrich-Hunsaker, N.J., & Hopkins, R.O. (2010). Different patterns of cerebral activation in genuine and malingered cognitive effort during performance on the Word Memory Test. Brain Injury, 24, 89 99. doi:10.3109/ 02699050903508218 Lezak, M.D. (1976). Neuropsychological assessment. New York: Oxford University Press. Lezak, M.D., Howieson, D.B., & Loring, D.W. (2004). Neuropsychological assessment. New York: Oxford. Locke, D.E., Smigielski, J.S., Powell, M.R., & Stevens, S.R. (2008). Effort issues in post-acute outpatient acquired brain injury rehabilitation seekers. Neurorehabilitation, 23, 273 281. Loftus, E.F., & Davis, D. (2006). Recovered memories. Annual Review of Clinical Psychology, 2, 469 498. doi:10.1146/annurev. clinpsy.2.022305.095315 Marin, R.S., & Wilkosz, P.A. (2005). Disorders of diminished motivation. The Journal of Head Trauma Rehabilitation, 20, 377 388. McTighe, S.M., Cowell, R.A., Winters, B.D., Bussey, T.J., & Saksida, L.M. (2010). Paradoxical false memory for objects after brain damage. Science, 330, 1408 1410. doi:10.1126/ science.1194780 Merten, T., Bossink, L., & Schmand, B. (2007). On the limits of effort testing: Symptom validity tests and severity of neurocognitive symptoms in nonlitigant patients. Journal of Clinical and Experimental Neuropsychology, 29, 308 318. doi:10.1080/ 13803390600693607 Metternich, B., Schmidtke, K., & Hull, M. (2009). How are memory complaints in functional memory disorder related to measures of affect, metamemory and cognition? Journal of Psychosomatic Research, 66, 435 444. doi:10.1016/j.jpsychores.2008.07.005 Miller, J.B., Millis, S.R., Rapport, L.J., Bashem, J.R., Hanks, R.A., & Axelrod, B.N. (2011). Detection of insufficient effort using the advanced clinical solutions for the Wechsler Memory Scale, Fourth edition. The Clinical Neuropsychologist, 25, 160 172. doi:10.1080/13854046.2010.533197 Millis, S.R. (2009). Methodological challenges in assessment of cognition following mild head injury: Response to Malojcic et al. 2008. Journal of Neurotrauma, 26, 2409 2410. doi:10.1089/ neu.2008.0530 Millis, S.R., & Volinsky, C.T. (2001). Assessment of response bias in mild head injury: Beyond malingering tests. Journal of Clinical and Experimental Neuropsychology, 23, 809 828. Mitrushina, M., Boone, K.B., Razani, J., & D Elia, L.F. (2005). Handbook of normative data for neuropsychological assessment. New York: Oxford. Morgan, J.E., & Sweet, J.J. (2009). Neuropsychology of malingering casebook. New York: Psychology Press. Moritz, S., & Woodward, T.S. (2006). Metacognitive control over false memories: A key determinant of delusional thinking. Current Psychiatry Reports, 8, 184 190. Ozen, L.J., & Fernandes, M.A. (2011). Effects of diagnosis threat on cognitive and affective functioning long after mild head injury. Journal of the International Neuropsychological Society, 17, 219 229. doi:10.1017/s135561771000144x Pollo, A., & Benedetti, F. (2009). The placebo response: Neurobiological and clinical issues of neurological relevance. Progress in Brain Research, 175, 283 294. doi:10.1016/s0079-6123(09)17520-9 Powell, M.R., Locke, D.E., Smigielski, J.S., & McCrea, M. (2011). Estimating the diagnostic value of the Trail Making Test for suboptimal effort in acquired brain injury rehabilitation patients. The Clinical Neuropsychologist, 25, 108 118. doi:10.1080/ 13854046.2010.532912 Pressman, S.D., & Cohen, S. (2005). Does positive affect influence health? Psychological Bulletin, 131, 925 971. doi:10.1037/0033-2909.131.6.925 Rüsseler, J., Brett, A., Klaue, U., Sailer, M., & Munte, T.F. (2008). The effect of coaching on the simulated malingering of memory impairment. BMC Neurology, 8, 37. doi:10.1186/1471-2377-8-37 Ryan, J.J., Glass, L.A., Hinds, R.M., & Brown, C.N. (2010). Administration order effects on the Test of Memory Malingering. Applied Neuropsychology, 17, 246 250. doi:10.1080/09084282. 2010.499802 Sarter, M., Gehring, W.J., & Kozak, R. (2006). More attention must be paid: The neurobiology of attentional effort. Brain Research Reviews, 51, 145 160. doi:10.1016/j.brainresrev.2005.11.002 Slick, D.J., Sherman, E.M., & Iverson, G.L. (1999). Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research. The Clinical Neuropsychologist, 13, 545 561. Sollman, M.J., & Berry, D.T. (2011). Detection of inadequate effort on neuropsychological testing: A meta-analytic update and extension. Archives of Clinical Neuropsychology, 26, 744 789. Stevens, A., Friedel, E., Mehren, G., & Merten, T. (2008). Malingering and uncooperativeness in psychiatric and psychological assessment: Prevalence and effects in a German sample of claimants. Psychiatry Research, 157, 191 200. doi:10.1016/ j.psychres.2007.01.003 Strauss, E., Sherman, E.M.S., & Spreen, O. (2006). A compendium of neuropsychological tests. New York: Oxford University Press. Suhr, J., Tranel, D., Wefel, J., & Barrash, J. (1997). Memory performance after head injury: Contributions of malingering, litigation status, psychological factors, and medication use. Journal of Clinical and Experimental Neuropsychology, 19, 500 514. Suhr, J.A., & Gunstad, J. (2005). Further exploration of the effect of diagnosis threat on cognitive performance in individuals with mild head injury. Journal of the International Neuropsychological Society, 11, 23 29. doi:10.1017/s1355617705050010 Tate, R.L. (2010). A compendium of tests, scales and questionnaires: The practitioner s guide to measuring outcomes after acquired brain impairment. New York: Psychology Press. Tombaugh, T. (1996). TOMM: Test of memory malingering. New York: Multi-Health Systems. Wager-Smith, K., & Markou, A. (2011). Depression: A repair response to stress-induced neuronal microdamage that can grade into a chronic neuroinflammatory condition? Neuroscience and Biobehavioral Reviews, 35, 742 764. doi:10.1016/j.neubiorev. 2010.09.010 Walkenhorst, E., & Crowe, S.F. (2009). The effect of state worry and trait anxiety on working memory processes in a normal sample. Anxiety, Stress, and Coping, 22, 167 187. doi:10.1080/ 10615800801998914 West, L.K., Curtis, K.L., Greve, K.W., & Bianchini, K.J. (2011). Memory in traumatic brain injury: The effects of injury severity and effort on the Wechsler Memory Scale-III. Journal of Neuropsychology, 5, 114 125. doi:10.1348/174866410x521434 Whitney, K.A., Shepard, P.H., Mariner, J., Mossbarger, B., & Herman, S.M. (2010). Validity of the Wechsler Test of Adult Reading (WTAR): Effort considered in a clinical sample of U.S. military veterans. Applied Neuropsychology, 17, 196 204. doi:10.1080/09084282.2010.499787 Williams, J.M. (2011). The malingering factor. Archives of Clinical Neuropsychology, 26, 280 285. doi:10.1093/arclin/acr009

Symptom validity testing assessment 9 Wolfe, P.L., Millis, S.R., Hanks, R., Fichtenberg, N., Larrabee, G.J., & Sweet, J.J. (2010). Effort indicators within the California Verbal Learning Test-II (CVLT-II). The Clinical Neuropsychologist, 24, 153 168. doi:10.1080/13854040903107791 Wu, T.C., Allen, M.D., Goodrich-Hunsaker, N.J., Hopkins, R.O., & Bigler, E.D. (2010). Functional neuroimaging of symptom validity testing in traumatic brain injury. Psychological Injury & Law, 3, 50 62. doi:doi 10.1007/s12207-010-9067-y doi:10.1017/s1355617712000409 DIALOGUE RESPONSE Response to Larrabee Erin D. Bigler Neuropsychology needs objective methods that confidently and accurately reflect the validity of brain-behavior relationships as measured by neuropsychological assessment techniques. Symptom validity testing (SVT) has emerged as a method designed to address validity of neuropsychological test performance; but, just like the field of neuropsychology, SVT research is new and evolving. Within any new research endeavor, first generation studies often demonstrate broad support for a new construct but as the research expands more complex issues arise that require refinements in theory and practice (Oner & Dhert, 2011). Such is the case with SVT research and its clinical application. One goal of the dialogue with Larrabee on the current status of SVT research and clinical application was to highlight areas of agreement and disagreement. My review challenges some SVT assumptions, pointing out the need for refinements in methods and theory, calling for improved research designs that will hopefully lead to a more complete understanding of SVT use and interpretation in neuropsychological assessment. Larrabee (this issue), in response to my SVT review (see Bigler, this issue), argues for a change in terminology, abandoning the singular term effort in favor of performance validity and symptom validity and offers cogent reasoning and research to support such a distinction. In my opinion, the term effort as a singular descriptor in neuropsychology should be abandoned in favor of the performance validity and symptom validity terms as suggested by Larrabee in his commentary. As already stated in the critique there are simply too many potential meanings suggested with just the term effort or effort tests, spanning the biological to inferring intent. From the biological, effort suggests neural factors associated with basic drives and emotional states (see Sarter, Gehring, & Kozak, 2006). Within cognitive neuroscience, effort relates directly to complexity of stimulus processing (Kohl, Wylie, Genova, Hillary, & Deluca, 2009) and levels of motivation (Bonnefond, Doignon-Camus, Hoeft, & Dufour, 2011; Harsay et al., 2011). In forensic and applied neuropsychology, the effort term suggests some intention on the subject s part where poor effort may be equated with malingering (see Williams, 2011). These multiple meanings make the term imprecise when used in neuropsychological parlance to describe test behavior. The performance validity and symptom validity terminology represent far more accurate descriptors of what is being assessed and neuropsychology will be better served by following Larrabee s recommendation. There are also two basic agreements on what may be considered SVT tenets: (1) questions of symptom and performance invalidity are proportional to the number SVT items not passed and, (2) close to or below chance SVT test performance levels are the clearest and most indisputable indicators for invalidity. In my opinion, little debate about the above two points is needed. For forced-choice SVT measures, invalid neuropsychological test performance may be assumed as SVT performance falls substantially below a conventionally established cut-score. SVT performance at, near, or below chance reflects invalid test performance. Despite these points of agreement, two major SVT topics where our opinions diverge are: (1) the false positive/false negative problem and interpretative validity issues and, (2) the rigor of SVT study designs. THE FALSE POSITIVE PROBLEM AND INTERPRETATIVE VALIDITY ISSUES The most effective SVT will minimize false positive and negative classifications with the false positive typically being the more serious error. False positive classification occurs when failed SVT scores are used to designate invalid neuropsychological test performance when in fact, the failed SVT performance occurs because of the underlying neurological and/or neuropsychiatric condition. The clinical gravity of a false positive SVT decision for neuropsychology is obvious in the face of a false positive SVT indicating

10 E.D. Bigler invalidity of neuropsychological test findings, proper clinical diagnosis, service, and treatment may be improperly made, withheld, denied, or delayed. As a profession neuropsychology needs to make sure that the best research informs the clinician and/or researcher with the most complete and correct information for making SVT interpretive statements. As pointed out in the critique, several SVT failures in the Locke, Smigielski, Powell, and Stevens (2008) study all participants of whom were not in litigation and had been independently diagnosed with an acquired brain injury performed within the near miss zone of SVT performance. Do these SVT scores truly reflect invalid performance across all neuropsychological tests administered that cannot be explained by their neurological/neuropsychiatric condition? Is there something unique about their injuries that lowered their performance on the SVT? How many of these subject s SVT scores represent a true false positive and how would the neuropsychologist know? These are important questions without answers. The very nature of an SVT cut-score is to make a dichotomous decision and if that SVT cut-score is applied to all types of neurological/ neuropsychiatric conditions this becomes a one-size-fits-all approach. The Diagnostic and Statistical Manual IV edition (DSM-IV)lists17AxisIorIIgeneralcategorieswithover450 separate diagnostic codes and about that many International Classification of Diseases, Tenth Revision (ICD-10) classifications involving neurological disorders are also listed in the DSM-IV (see American Psychiatric Association, 1994). Unanswered SVT questions thereby remain as to whether SVT findings broadly apply across all DSM-IV classifications; whether different SVTs should be used depending on the disorder being assessed; when in the assessment protocol should an SVT be administered (first test administered, somewhere in the middle, multiple ones, does not matter, etc.?); whether different cut-scores apply for different patient demographics, etc.? More research is needed to address these basic SVT questions and others not listed. THE QUALITY OF SVT RESEARCH DESIGN Larrabee spends a good deal of his commentary defending the rigor of SVT research. As already stated there is sufficient convergence and quality of research to support the two broad SVT tenets stated above. Issues of research design quality are not directed at these fundamental points. The opening statements of the American Academy of Clinical Neuropsychology (AACN) document on effort, response bias, and malingering discuss the necessity of ever improving research designs to advance the field where Heilbronner, Sweet, Morgan, Larrabee, and Millis (2009) state the following, y science-driven healthcare specialties create progress by a process of challenging current and new ideas through intellectual discourse and empirical hypothesis testing (p. 1094). McGrath and colleagues (see McGrath, Kim, & Hough, 2011; McGrath, Mitchell, Kim, & Hough, 2010), in their reviews and commentaries of response bias research within applied psychological assessment, emphasize that SVT research designs must be the most stringent before any blanket acceptance of SVT interpretive statements can be made, especially in terms of Type II statistical errors. Given these guidelines, it seems a nonarguable point that neuropsychology seeks the best designed, most rigorous studies from which to base applied decision making. The better the research design the more generalizable are the findings. As pointed out in Bigler (this issue), research design rigor straightforwardly can be assessed using the American Academy of Neurology (AAN) rating method (see Edlund, Gronseth, So, & Franklin, 2004). As a historical note, it was this method of rating quality of neuropsychological research involving cases of dementia, cerebrovascular disease, traumatic brain injury (TBI) and epilepsy, that in 1996 allowed the AAN Therapeutics and Technology Assessment (TTA) subcommittee (see American Academy of Neurology, 1996) to grant a Class II, Type A rating for using neuropsychological assessment techniques to evaluate the cognitive and neurobehavioral effects of these specific neurological conditions. A Type A rating means that the technique is established as useful/predictive for a given condition in the specified population. (p. 598). The AAN publication predates the development of current SVT methods although some response bias and validity issues were discussed in the AAN statement. AAN guidelines are clear that Type A ratings come only after establishedclassioriidesignedstudies andthenonlyaftera comprehensive review by the TTA subcommittee. By AAN research design classification standards, Class I is the most rigorous with Class IV the least. Cappa, Conger, and Conger (2011) provide guidelines for another method of rating experimental design quality for neuropsychological outcome research by assessing nine points related to study design. These nine points are summed to create four classifications from best to worst as follows: commendable, adequate, marginal and flawed. Regardless of whether the AAN (1996) guidelines or those from Cappa et al. (2011) are used to rate rigor of SVT study design, the best designed studies ( Class I or commendable ) will be those with a priori defined criteria that require a prospective experimental design, uniform recruitment where investigators and/or clinicians have well defined and independent roles especially in diagnostic decision making and classification along with the study being appropriately blinded including all aspects of data coding, entry and analysis to list just some of the key elements. By AAN standards, Class II may include retrospective studies but still requires investigator independence, blinded assessments, and data analysis. Class III may be retrospective and partially unblinded but still requires independence of the investigators. Class IV may include case series and be based on expert opinion where non-independence of the investigators is present. Most of the SVT research cited in my review and Larrabee s comment would merit no better than a Class III level AAN rating or an adequate-to-marginal rating by Cappa et al. standards. As pointed out by Edlund et al. (2004), Class III

Symptom validity testing assessment 11 and IV level research is important for hypothesis building and proof of concept studies. Clearly, solid SVT research has been done; that is why both the AACN (Heilbronner et al., 2009) and the National Academy of Neuropsychology (NAN) (see Bush et al., 2005) have position papers on the use of SVT measures. This does not mean as a profession we should be content with Class III and IV level SVT research and as clinicians and researchers not demand better designed studies. Larrabee makes the point about the importance of known group or criterion design as an example of the rigor of existing SVT investigations. However, as pointed out in the AACN consensus statement, specifically in the section on known groups, Heilbronner et al. (2009) point out y. Developing appropriate external criteria for defining response bias can be a major methodological challenge (p. 1118). For example, several SVT studies have used a forensic sample establishing a known group with objective brain damage demonstrated by radiological evidence of abnormality. However, careful reading of these studies show that the determination of who is in the objectively brain damaged group is based entirely on the retrospectively obtained clinical record and whatever radiological report the author/ investigator may have available. None of these studies provide any quality control over the neuroimaging method used, the sensitivity of the neuroimaging tool to detect the problem, or the radiologist making the rating. So without the uniformity that comes from exactly the same procedure prospectively performed on all subjects, these known groups with objective indicators of brain damage versus no objective indicators of brain damage become ill-defined and potentially meaningless. Retrospective data sets based on forensic or clinical samples will never be Class I or II or commendable research designs. Better SVT research, prospectively designed, and independently conducted is needed. In the spirit of the 2009 AACN recommendations on the assessment of effort, response bias, and malingering y.progress y is made via y. a process of challenging current and new ideas through intellectual discourse and empirical hypothesis testing (Heilbronner et al., 2009, p. 1094). This is the challenge for the next generation of SVT studies better research design, less reliance on samples of convenience, and a focus on prospectively designed, independently conducted investigations. REFERENCES American Academy of Neurology. (1996). Assessment: Neuropsychological testing of adults. Considerations for neurologists. Report of the Therapeutics and Technology Assessment Subcommittee of the American Academy of Neurology. Neurology, 47, 592 599. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders 4th edition (DSM-IV). Washington, DC: American Psychiatric Association. Bigler, E.D. (2012). Symptom validity testing, effort, and neuropsychological assessment. Journal of the International Neuropsychological Society, 18,. Bonnefond, A., Doignon-Camus, N., Hoeft, A., & Dufour, A. (2011). Impact of motivation on cognitive control in the context of vigilance lowering: An ERP study. Brain and Cognition, 77, 464 471. doi:10.1016/j.bandc.2011.08.010 Bush, S.S., Ruff, R.M., Troster, A.I., Barth, J.T., Koffler, S.P., Pliskin, N.H., y Silver, C.H. (2005). Symptom validity assessment: Practice issues and medical necessity: NAN Policy & Planning Committee. Archives of Clinical Neuropsychology, 20, 419 426. doi:10.1016/j.acn.2005.02.002 Cappa, K.A., Conger, J.C., & Conger, A.J. (2011). Injury severity and outcome: A meta-analysis of prospective studies on TBI outcome. Health Psychology, 30, 542 560. doi:10.1037/ a0025220 Edlund, W., Gronseth, G., So, Y., & Franklin, G. (2004). Clinical practice guideline process manual: For the Quality Standards Subcommittee (QSS) and the Therapeutics and Technology Assessment Subcommittee (TTA). St. Paul: American Academy of Neurology. Harsay, H.A., Cohen, M.X., Oosterhof, N.N., Forstmann, B.U., Mars, R.B., & Ridderinkhof, K.R. (2011). Functional connectivity of the striatum links motivation to action control in humans. The Journal of Neuroscience, 31, 10701 10711. doi:10.1523/ jneurosci.5415-10.2011 Heilbronner, R.L., Sweet, J.J., Morgan, J.E., Larrabee, G.J., & Millis, S.R. (2009). American Academy of Clinical Neuropsychology Consensus Conference Statement on the neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist, 23, 1093 1129. doi:10.1080/ 13854040903155063 Kohl, A.D., Wylie, G.R., Genova, H.M., Hillary, F.G., & Deluca, J. (2009). The neural correlates of cognitive fatigue in traumatic brain injury using functional MRI. Brain Injury, 23, 420 432. doi:10.1080/02699050902788519 Locke, D.E., Smigielski, J.S., Powell, M.R., & Stevens, S.R. (2008). Effort issues in post-acute outpatient acquired brain injury rehabilitation seekers. Neurorehabilitation, 23, 273 281. McGrath, R.E., Kim, B.H., & Hough, L. (2011). Our main conclusion stands: Reply to Rohling et al. (2011). Psychological Bulletin, 137, 713 715. doi:10.1037/a0023645 McGrath, R.E., Mitchell, M., Kim, B.H., & Hough, L. (2010). Evidence for response bias as a source of error variance in applied assessment. Psychological Bulletin, 136, 450 470. doi:10.1037/ a0019216 Oner, F.C., & Dhert, W.J. (2011). Challenging the medico-industrialadministrative complex. The Spine Journal, 11, 698 699. doi:10.1016/j.spinee.2011.08.010 Sarter, M., Gehring, W.J., & Kozak, R. (2006). More attention must be paid: The neurobiology of attentional effort. Brain Research Reviews, 51, 145 160. doi:10.1016/j.brainresrev.2005.11.002 Williams, J.M. (2011). The malingering factor. Archives of Clinical Neuropsychology, 26, 280 285. doi:10.1093/arclin/acr009