Does Emotional Intelligence Meet Traditional Standards for an Intelligence? Some New Data and Conclusions

Emotion Copyright 2001 by the American Psychological Association, Inc. 2001, Vol. 1, No. 3, 196 231 1528-3542/01/$5.00 DOI: 10.1037//1528-3542.1.3.196 Does Emotional Intelligence Meet Traditional Standards for an Intelligence? Some New Data and Conclusions Richard D. Roberts University of Sydney Moshe Zeidner University of Haifa Gerald Matthews University of Cincinnati Performance-based measures of emotional intelligence (EI) are more likely than measures based on self-report to assess EI as a construct distinct from personality. A multivariate investigation was conducted with the performance-based, Multi- Factor Emotional Intelligence Scale (MEIS; J. D. Mayer, D. Caruso, & P. Salovey, 1999). Participants (N 704) also completed the Trait Self-Description Inventory (TSDI, a measure of the Big Five personality factors; Christal, 1994; R. D. Roberts et al.), and the Armed Services Vocational Aptitude Battery (ASVAB, a measure of intelligence). Results were equivocal. Although the MEIS showed convergent validity (correlating moderately with the ASVAB) and divergent validity (correlating minimally with the TSDI), different scoring protocols (i.e., expert and consensus) yielded contradictory findings. Analyses of factor structure and subscale reliability identified further measurement problems. Overall, it is questionable whether the MEIS operationalizes EI as a reliable and valid construct. Richard D. Roberts, Department of Psychology, University of Sydney, Sydney, Australia; Moshe Zeidner, Department of Psychology, University of Haifa, Mount Carmel, Israel; Gerald Matthews, Department of Psychology, University of Cincinnati. The data gathered in this investigation were collected by personnel of the (now defunct) Human Effectiveness Directorate of the United States Air Force Research Laboratory, Brooks Air Force Base, Texas. We acknowledge the support of this institution. We thank Janice Heresford, Ralph Sayler, Rich Walker, and Mary Lawson for their efforts in data collection and the scoring of test protocols. We also thank Lazar Stankov and Gerry Pallier. Correspondence concerning this article should be addressed to Moshe Zeidner, Center for Interdisciplinary Research of Emotions, University of Haifa, Mount Carmel 31905 Israel. Electronic mail may be sent to zeidner@ research.haifa.ac.il. Emotional intelligence (EI) is a relatively new domain of psychological investigation, having recently gathered considerable momentum with widespread, international media attention. Daniel Goleman s (1995) book on the topic appeared on The New York Times best-seller list, which led to a Time magazine article devoted to detailed exposition of the topic (Gibbs, 1995). More recently, the influential electronic magazine Salon devoted a lengthy article to the discussion of its application in the workforce (Paul, 1999). Clearly, this attention was inspired by a veritable plethora of trade texts (and Web sites) dealing with self-help and management practices, assessment, and organization-based applications implicit to the concept of EI (see, e.g., Abraham, 2000; Bar-On, 1997, 2000; Bar-On, Brown, Kirkcaldy, & Thome, 2000; Cooper & Sawaf, 1997; Epstein, 1998; Ryback, 1998; Saarni, 1999; Weisinger, 1998). EI as a concept has prospered, in part, because of the increasing personal importance of emotion management for individuals in modern society. Indeed, researchers have commonly claimed that EI predicts important educational and occupational criteria beyond that predicted by general intellectual ability (e.g., Elias & Weissberg, 2000; Fisher & Ashkanasy, 2000; Fox & Spector, 2000; Goleman, 1995; Mehrabian, 2000 Saarni, 1999; Scherer, 1997). Furthermore, EI s chief proponents appear to have made strides toward understanding its nature, components, determinants, effects, developmental track, and modes of modification (see Matthews, Zeidner, & Roberts, in press, for a critical review). EI first appeared in the scientific literature in the early 1990s (Mayer, DiPaulo, & Salovey, 1990; Sa- 196

EMOTIONAL INTELLIGENCE 197 lovey & Mayer, 1990), where the term was used to denote a type of intelligence that involved the ability to process emotional information. Subsequently, researchers have proposed that EI incorporates a set of conceptually related psychological processes involving the processing of affective information. These processes include: (a) the verbal and nonverbal appraisal and expression of emotion in the self and others, (b) the regulation of emotion in the self and others, and (c) the utilization of emotion to facilitate thought (see Mayer & Geher, 1996; Mayer & Salovey, 1997; Salovey & Mayer, 1990). Although various authors have proposed that EI is a type of intelligence, in the traditional sense, contemporary research and theory lacks any clear conceptual model of intelligence within which to place the construct. For example, Spearman s (1927) model of g (general ability) affords no special role for EI. Neither is emotional (or social, for that matter) intelligence included in Thurstone s (1938) list of primary mental abilities or Guttman s (1965a, 1965b) radex model of intelligence. Although EI has captured the public s imagination during the past 5 years, the concept s origins trace back to a number of constructs emanating from traditional psychometric models of intelligence. We now briefly examine the construct s historical roots and conceptual linkages, pointing to similarities and differences between EI and a variety of cognate constructs (i.e., social intelligence, crystallized ability, behavioral cognition, personal intelligence) identified within influential theories of intelligence. EI Within the Context of Intelligence Theory and Assessment Intelligence: Overview General intelligence refers to a person s overall capacity for adaptation through effective cognition and information processing. It may be seen as a general competence of the mind (mental ability) or of higher order faculties such as understanding, reasoning, problem solving, and learning, especially of complex, structured material (cognitive ability; Brody, 1992). However, the concept of general intelligence says little about the more specific competencies that comprise it. Thus, psychologists have sought to partition the domain of intelligence into more manageable chunks, including less narrow (but still broad) categories of abilities (e.g., crystallized intelligence [Gc]) or more specific abilities (e.g., verbal comprehension). These various levels of conceptualization have led to taxonomic models, which have recently been synthesized inside Carroll s (1993) three-stratum model. Carroll found, after reanalysis of virtually all data sets collected in the 20th century, a hierarchy of structures. At the lowest level of the hierarchy there are around 70 fairly narrow primary abilities. Correlations between primary abilities allow them to be clustered together to define eight broad abilities at the second stratum, and these broad abilities cluster to define general intelligence at the third stratum. The identification of general intelligence through psychometric studies has stimulated over a century of debate on the nature of intelligence and the proper way to assess this construct. As the American Psychological Association s (APA s) Task Force on Intelligence (APA Public Affairs Office, 1997) has stated, it is generally (though not universally) agreed that the conventional psychometric approach has successfully identified a reliable quality of the individual that predicts important real-world criteria. The conventional psychometric approach is seen as the most influential and the most systematically researched, although other conceptions of intelligence also have much to offer. The intelligence literature also contains criticisms of the notion that there is a consensual definition of intelligence shared by most psychologists (see various chapters in Sternberg, 2000b), especially in view of cultural differences in conceptions of intelligence (see Sternberg, 2000a). The importance of conventional, cognitive intelligence has been challenged by recent suggestions that there are many different kinds of intelligence (e.g., Gardner, 1983). As these include abilities such as musical intelligence, it is difficult to assume that the same criterion for inclusion holds true for all intelligence constructs. Thus, many psychometricians would concur that the defining attribute of a cognitive ability test is that there is one correct answer based on logical, empirical, semantic, or even normative criteria (Guttman, 1965a, 1965b; Nunnally, 1978; Zeidner & Feitelson, 1989). Equally important, however, the psychometric criteria developed in studies of cognitive ability may not be applicable to other domains of intelligence, such as managing emotion. Having discussed some of these issues, we now turn to examine conceptual linkages between EI and a variety of related constructs identified within disparate models of intelligence. EI and social intelligence. Many commentators suppose that EI derives from the broader construct of social intelligence (e.g., Bar-On, 2000; Gardner, 1983; Goleman, 1995). Contemporary individual-

198 ROBERTS, ZEIDNER, AND MATTHEWS differences perspectives on the construct of social intelligence have their origins in Thorndike s (1920) influential, tripartite division of intelligence into the following broad classes: (a) abstract scholastic intelligence the ability to understand and manage ideas, (b) mechanical visuospatial intelligence the ability to understand and manipulate concrete objects; and (c) social (practical) intelligence the ability to understand and manage people and act wisely in social contexts. Thorndike s abstract definition of social intelligence as wisdom in social contexts was translated quickly into standardized instruments for measuring individual differences in this construct. In the 1930s, the study of social intelligence was largely a study of how people make judgments regarding others and the accuracy of such social judgments. By the 1950s, however, this body of work had polarized to form two distinct traditions: (a) an intelligence tradition, which was interested in the abilities of person perception, and (b) a social psychological tradition, which focused on the social determinants of person perception. In recent times, there has been growing convergence between these distinctive domains. Thus, researchers from the domain of individual differences have become more interested in social facets of ability, and social psychologists have become more interested in cognitive determinants of perception (Mayer & Geher, 1996). Despite considerable interest and numerous attempts to define and measure social intelligence over the past eight decades, these attempts have proved problematic (see Kihlstrom & Cantor, 2000, for a review of attempts to conceptualize and measure this construct). Although defining social intelligence seemed easy enough, the measurement of the construct proved to be an almost insurmountable task. Thus, social intelligence has been studied less than other forms of intelligence because it seems the hardest of the three broad classes of intelligence to distinguish from the others, both theoretically and empirically (Mayer & Geher, 1996). The inability to discriminate between general and social intelligence, coupled with difficulties in selecting external criteria against which to validate experimental scales, led to a decline in research focusing on social intelligence as a distinct intellectual entity, until the recent upsurge of interest in EI. EI and the behavioral facet of the structure of intellect model. EI involves the processing of both information that refers directly to emotion (e.g., one s own mood) and information on behaviors that have emotional connotations (e.g., violent behaviors). Intelligence in understanding behaviors and their significance already appears in Guilford s (1959) structure of intellect model. Guilford postulated a facet model of intelligence, on the basis of all possible combinations of three major facets: (a) operations (i.e., cognition, memory, divergent production, convergent production, and evaluation); (b) content (i.e., figural, semantic, symbolic, and behavioral); and (c) products (i.e., units, classes, relations, systems, transformation, and implications). Figural, semantic, and symbolic content largely corresponds to the abstract material contained in standard intelligence tests. However, the behavioral domain is neglected in conventional tests and seems to correspond to social emotional intelligence. In particular, EI overlaps with the cognition of behavioral content (e.g., ability to identify internal status of individuals, interpretation of consequences of social behavior, etc). In fact, the test items designed to gauge behavioral cognition, constructed by Guilford s team (e.g., O Sullivan, Guilford, & de Mille, 1965), are reminiscent of current behavioral measures of EI. Furthermore, references to empathic ability (whose definition parallels that of major facets of EI remarkably closely) can be found in some of Guilford s (1959) earliest writings on the structure of intellect model. 1 EI and crystallized intelligence. The theory of fluid (Gf ) and crystallized (Gc) intelligence proposed by Cattell (1971), Horn (1988), and their associates (see Horn & Noll, 1994; Horn & Stankov, 1982) is arguably the most efficacious empirically based psychometric model of intelligence (see Stankov, Boyle, & Cattell, 1995). Researchers have speculated that, within this theory, EI will constitute an additional aspect of (possibly one or more primary underlying mental abilities) Gc. This assertion is based on the assumption that the appraisal, expression, regulation, and utilization of emotion develops through experience and social interaction in much the same way as 1 As testament to the preceding assertion consider the following question: [E]fforts are being made to bring within the sphere of psychological measurement what is often called empathic ability. This is an assumed ability to know psychological dispositions of other persons their perceptions, thoughts, feelings, and attitudes, as well as their traits.... The understanding of such abilities would be of utmost importance to all those who deal directly with people in any professional way politicians and teachers as well as psychiatrists, psychologists, and social workers (Guilford, 1959, p. 395).

EMOTIONAL INTELLIGENCE 199 other psychological processes comprising Gc (see Davies, Stankov, & Roberts, 1998). EI and personal intelligence. The concept of EI strongly overlaps with Gardner s (1983) notion of social intelligence, which he referred to as a type of personal intelligence. Indeed, part of Gardner s definition focuses specifically on the processing of affective information. Current conceptualization of EI (e.g., Mayer, Salovey, & Caruso, 2000a) focuses on one s ability to accurately identify, appraise, and discriminate among emotions in oneself and others, understand emotions, assimilate emotions in thought, and regulate both positive and negative emotions in self and others. This conceptualization encompasses the following subtypes of personal intelligence described by Gardner (1983) within his theory of multiple intelligence: (a) intrapersonal intelligence the ability to access one s own feeling life; to identify, label, and discriminate among one s feelings; and to represent them symbolically; and (b) interpersonal intelligence the ability to discern the moods, intentions, and desires of others. Thus, whereas intrapersonal intelligence refers to the person s ability to gain access to his or her own internal emotional life, interpersonal intelligence represents the individual s ability to understand other people, to know what they feel, and to notice and make distinctions among other individuals. In sum, the current definition and conceptualization of EI, as a cognitive ability, overlaps considerably with Gardner s notion of personal intelligence, subsuming both intrapersonal and interpersonal forms of intelligence. In attempting to locate these intelligences within the traditional psychometric domain, Carroll (1993) suggested that interpersonal intelligence is a specialized type of acquired knowledge (i.e., Gc). However, Gardner s intrapersonal intelligence access to one s own feelings finds no counterpart in Carroll s taxonomic model. However, it may be argued that this situation has arisen because adequate assessment of this type of intelligence has never appeared in the extant factor-analytic literature. Models of EI Conceptualizing and Assessing EI One of the difficulties currently encountered in research on EI would appear to be the multitude of qualities covered by the concept (see Roberts, in press). Indeed, many qualities appear to overlap with well-established personality constructs, such as the Big Five personality factor model (see Davies et al., 1998; McCrae, 2000). Mayer, Caruso, and Salovey (1999, 2000) warned that careful analysis is required to distinguish what is (and what is not) part of EI (see also Mayer, Salovey, & Caruso, 2000a, 2000b). Throughout, Mayer and colleagues distinguished between (a) mental ability models, focusing on aptitude for processing affective information, and (b) mixed models that conceptualize EI as a diverse construct, including aspects of personality as well as the ability to perceive, assimilate, understand, and manage emotions. These mixed models include motivational factors and affective dispositions (e.g., self-concept, assertiveness, empathy; see Bar-On, 1997; Goleman, 1995). In contrast, Mayer and colleagues have proposed a four-branch mental ability model of EI, which encompasses the following psychological processes (see e.g., Mayer, Caruso, & Salovey, 1999, 2000; Mayer & Salovey, 1997; Mayer, Salovey, & Caruso, 2000a, 2000b; Salovey & Mayer, 1990): 1. The verbal and nonverbal appraisal and expression of emotion in the self and others. EI has been defined as the ability to perceive emotions, to access and generate emotions so as to assist thought, to understand emotions and emotional knowledge, and to reflectively regulate emotions so as to promote emotional and intellectual growth (Mayer & Salovey, 1997, p.5). Inside this definitional framework, the most fundamental level of EI includes the perception, appraisal, and expression of emotions (Mayer, Caruso, & Salovey, 1999). In other words, implicit in this aspect of EI is the individual s awareness of both their emotions and their thoughts concerning their emotions, the ability to monitor and differentiate among emotions, and the ability to adequately express emotions. 2. The utilization of emotion to facilitate thought and action. This component of EI involves assimilating basic emotional experiences into mental life (Mayer, Caruso, & Salovey, 1999, 2000). This includes weighing emotions against one another and against other sensations and thoughts and allowing emotions to direct attention (e.g., holding an emotional state in consciousness long enough to compare its correspondence with similar sensations in sound, color, and taste). Marshaling emotions in the service of a goal is essential for selective attention, self-monitoring, self-motivation, and so forth. 3. Understanding and reasoning about emotions.

200 ROBERTS, ZEIDNER, AND MATTHEWS This aspect of EI involves perceiving the lawfulness underlying specific emotions (e.g., to understand that anger arises when justice is denied or when an injustice is performed against oneself or one s loved ones). This process also involves the understanding of emotional problems, such as knowing what emotions are similar and what relation they convey. 4. The regulation of emotion in the self and others. According to Mayer, Caruso, & Salovey (1999), the highest level in the hierarchy of EI skills is the management and regulation of emotions. This facet of EI involves knowing how to calm down after feeling stressed out or alleviating the stress and emotion of others. This facet facilitates social adaptation and problem solving. The Assessment of EI: Self-Report and Performance Approaches Although several measures have been (or are currently being) designed for the assessment of EI, it remains uncertain whether there is anything about EI that psychologists working within the fields of personality, intelligence, and applied psychological research do not already know. Moreover, the increased media attention and the vast number of trade texts devoted to the topic of EI often subsume findings from these fields in a faddish way rather than deal directly with the topic as defined by its chief exponents. In short, like many psychological constructs, EI is often loosely defined in the literature, causing considerable confusion among researchers in the field. Nevertheless, since the term first appeared, there has been a rapid propagation of measures of EI (for a review, see Ciarrochi, Chan, Caputi, & Roberts, 2001). Popular measures of EI include the Bar-On Emotional Quotient Inventory (Bar-On, 1997, 2000), the EQ Map Test (Cooper & Sawaf, 1997), the Schutte Self-Report Inventory (Schutte et al., 1998), the Trait Meta-Mood Scale (Salovey, Mayer, Goldman, Turvey, & Palfai, 1995), and the Multi-Factor Emotional Intelligence Scale (Mayer, Caruso, & Salovey, 1999). 2 The content of these EI measures varies as a function of the different theoretical conceptualizations and interpretations of EI appearing in the literature (Mayer, Salovey, & Caruso, 2000a, 2000b). However, many commentators classify indicators of EI according to whether they derive from self-reports of typical behaviors in everyday life, as opposed to objective performance in controlled experimental settings. A brief overview and critique of these two distinctive approaches to the assessment of EI follows. Self-Report Measures of EI Self-report measures have been designed to assess beliefs and perceptions about an individual s competencies in specific domains of EI (Salovey, Woolery, & Mayer, 2000). These indexes generally ask a person to endorse a series of descriptive statements, usually on some form of rating scale. For example, in the Schutte Self-Report Inventory (Schutte et al., 1998) individuals rate themselves on a scale from 1 strongly disagree to 5 strongly agree on 33 statements (e.g., I know why my emotions change, I expect good things to happen ). Self-report measures typically sample a diversity of constructs, and hence assume a mixed model of EI (i.e., as both ability and personality trait), in Mayer, Caruso, & Salovey s (e.g., 1999, 2000) terminology. A number of problems and serious omissions currently plague the research on EI that uses self-report methodologies (cf. Petrides & Furnham, 2000). These self-report scales rely on a person s self-understanding; if the self-reports are inaccurate, these measures yield information concerning only the person s self-perception (rather than his or her actual level) of EI. Self-perceptions may not be particularly accurate or even available to conscious interpretation, being vulnerable to the entire gamut of response sets and social desirability factors afflicting self-report measures, as well as deception and impression management. These problems are, of course, common to all scales based on self-report, including personality assessment. To counteract this criticism in other fields where self-reports are used, researchers have devised a number of procedures, including comparing selfassessed responses with reports provided by a respondent s peers (see, e.g., Costa & McCrae, 1992; Pavot, 2 A number of measures of psychological constructs developed before EI gained widespread notoriety have also been used as proxies in the assessment of EI. These include instruments designed to assess alexithymia (e.g., Toronto Alexithymia Scale; Bagby, Parker, & Taylor, 1994), empathy (e.g., Questionnaire Measure of Emotional Empathy; Mehrabian & Epstein, 1970), emotional control (e.g., Emotional Control Questionnaire; Roger & Najarian, 1989), and defensive avoidance (e.g., repression sensitization scale; Weinberger, Schwarz, & Davidson, 1979).

EMOTIONAL INTELLIGENCE 201 Diener, & Suh, 1998; Stoeber, 1998). 3 However, validation studies of this type appear not to have been conducted with respect to self-report measures of EI. Hence, whether extant scales are free from response biases and social desirability effects remains an open, empirical question in urgent need of detailed investigation. 4 This issue notwithstanding, it is questionable whether items asking participants to self-appraise intellectual ability (e.g., I am an extremely intelligent student ) would make for a valid measure of general intelligence. Under the assumption that EI constitutes a traditional form of intelligence, the usefulness of analogous items about one s EI seems doubtful (Salovey et al., 2000). Note that past research has reported rather modest associations between self-rated and actual ability measures, with self-report accounting for less than 10% of intelligence-score variance. Thus, a meta-analytic review of 55 studies by Mabe and West (1982) yielded a mean correlation (validity coefficient of self-rating) of.34 between selfevaluations of intelligence and objective intelligence test scores. More recent studies (see, e.g., Paulhus, Lysy, & Yik, 1998) have concurred that the correlations between self-reports of intelligence and mental test performance tend to be rather modest (about r.30). Finally, tests of EI that assess noncognitive traits (e.g., assertiveness, optimism, impulse control) seem to be tapping dimensions of individual differences that are entirely different from contemporary notions of what constitutes intelligence (Davies et al., 1998). Indeed, the information derived from these instruments appears more pertinent to constructs comprising existing personality models (see McCrae, 2000). Empirical data pointing to the substantial relationship between EI and existing personality measures have, curiously, actually been used in support of the discriminant validity and conceptual soundness of EI (see, e.g., Bar-On, 2000). For example, a recent study by Dawda and Hart (2000) revealed average correlations approaching.50 between measures of the Big Five personality factors (i.e., Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness) and general EI derived from the Bar-On (1997) Emotional Quotient Inventory (see Table 7, p. 807). Noting the relative independence of each of the Big Five personality factors (e.g., Costa & McCrae, 1992), we believe these data suggest that the Bar-On Emotional Quotient Inventory is nothing but a proxy measure of a composite of Big Five personality constructs, weighted most strongly toward low neuroticism. Performance-Based EI Measures In view of the foregoing problems associated with the use of self-report measures, several authors have advocated the development of more objective, abilitybased indicators of EI (e.g., Mayer, Caruso, & Salovey, 1999, 2000; Mayer & Salovey, 1997; Mayer, Salovey, & Caruso, 2000a, 2000b). According to these authors, ability testing is the gold standard in intelligence research, because intelligence refers to the actual capacity to perform well at mental problems not just one s beliefs about such capacities (see also Carroll, 1993). Under this framework, a psychological instrument directly measures ability by having a person solve a problem (e.g., identify the emotion in a person s face, story, or painting). In addition, the examinee s answer should be available for evaluation against accuracy criteria (Mayer & Geher, 1996). Consequently, task-based measures engage participants in exercises designed to assess competencies supporting EI skills. The ability-based mode of assessment proposed by Mayer and Salovey (1997) and its underlying four-branch conceptual model of EI, has gained currency, largely because it appears ability-oriented and empirically based. Their four-branch model, described above, is currently operationalized through the Multi-Factor Emotional Intelligence Scale (MEIS; Mayer, Caruso, & Salovey, 1999), and the recently developed Mayer Salovey Caruso Emotional Intelligence Test (MSCEIT; Mayer, Caruso, & Salovey, 2000). As further discussed below, there is considerable difficulty in determining objectively correct responses to stimuli involving emotional content and in applying 3 It is worth noting that this is considered by researchers working within the field of personality as a measure of consensus. Given the consensus scoring procedure alluded to previously (and described shortly in some depth for performance-based assessment of EI), an intriguing conceptual question may be posed: To what extent are these two forms of consensus empirically equivalent? 4 It is noted that peer assessment may not be a foolproof remedy to self-report biases. It may be subject to its own particular biases, such as the nature of the situations in which the peer encounters the person rated. One of the issues frequently encountered is the problem of peer agreement, which for many traits is rather low. The additional problem for emotional reaction is that it often may not be visible, even to peers.

202 ROBERTS, ZEIDNER, AND MATTHEWS truly veridical criteria in scoring tasks of emotional ability. Proponents of EI ability measures have thus promoted three alternative scoring procedures to discriminate right from wrong answers on ability tests of EI (Mayer, Caruso, & Salovey, 1999), which are described as follows: Consensus scoring. An examinee receives credit for endorsing responses that the group endorses. Thus, if the group agrees that a face (or design, passage of music, etc.) conveys a happy or sad emotion, then that becomes the correct response. This approach assumes that observations for a large number of people can be pooled and can serve as reliable measures. Expert scoring. Experts in the field of emotions (e.g., psychologists, psychiatrists, philosophers, and so forth) examine certain stimuli (e.g., a face, passage of music, or design) and then use their best judgment to determine the emotion expressed in that stimulus. Presumably, the expert brings professional know-how (along with a history of behavioral knowledge) to bear on judgments about emotional meanings. However, researchers have argued that an expert s assessment may be no more than a reliable indicator of the group consensus, albeit a particularly sensitive one (Legree, 1995). The test taker receives credit for ratings that correspond to those used by the experts. Target scoring. A judge (i.e., the test taker) assesses what a target (artist, photographer, musician, and so forth) is portraying at the time the target individual is engaged in some emotional activity (e.g., writing a poem, playing a musical score, painting, sculpting, photographing a picture, etc.). A series of emotion-rating scales is then used to match the emotions conveyed by the stimuli to those reported by the target. It is commonly held that the target has more information than is available to the outside observer (Bar-On, 1997; Mayer, Caruso, & Salovey, 1999, 2000; Mayer & Geher, 1996) and is used as the criterion for scoring judges responses. Target scoring has received rather little attention in previous research, ostensibly because it is suitable only for emotion-identification tasks and not for other, higher level aspects of EI. Hence, we will not discuss target scoring at length in the current article, although it seems promising for measuring some aspects of EI and might be explored further. Issues pertaining to the scoring of EI tests. The use of multiple scoring methods in objective assessment of EI contrasts with the scoring of conventional intelligence tests. The logic of facet-analytic thinking (see, e.g., Guttman & Levy, 1991; Most & Zeidner, 1995; Zeidner & Feitelson, 1989) is that the main criterion for an intelligence task is the application of a veridical criterion against which one judges a response as correct or incorrect. Often, intelligence test items are based on some formal, rule-bound system that indicated unequivocally whether an answer is correct. Various formal systems are used depending on item content, such as mathematics (numerical tests), logic (reasoning tests), geometry (spatial tests), and the semantics of language (verbal tests). It is also relatively straightforward to determine which individuals are expert in these areas and thus are professionally qualified to act as arbiters. In contrast, items used in early IQ tests that depended on subjective judgment, such as deciding which of several faces was most attractive, have been largely removed from tests, due, in part, to the risk of cultural bias. This is not to say that conventional intelligence testing is entirely free from scoring problems. An anonymous reviewer of this article pointed out that series completion problems such as 2, 4, 6,...?? could be completed in any way; 5 use of the simplest rule (add 2) is arbitrary (but consensual). In addition, individual testing, especially of children, may require a judgment on the part of the tester as to whether a question has been correctly answered. Concerns also linger over the extent to which intelligence testing is truly culture-fair, despite efforts to remove obvious sources of cultural bias. Nevertheless, there is generally a clear rationale for justifying the correctness of an answer, and it is rare for well-informed people to dispute the correct answer to an item. The assessment of EI as a mental ability depends on the presumption that answers to stimuli assessing various facets of feelings can be categorized as correct or incorrect (Mayer & Salovey, 1997). If this presumption is incorrect, no scoring method can meet the basic psychometric criterion for ability tests, namely, the existence of a true and unequivocal veridical standard against which to judge responses. In fact, the likelihood of there being a veridical standard depends on the nature of the EI test item. As with cognitive intelligence, items may refer to psychological processes at different levels of abstraction from rawsense data. EI may, in principle, be assessed through lower order processes linked to sensation and perception, such as detecting the presence of an emotion in a face stimulus presented tachistoscopically or deciding that two words have similar valence. Alterna- 5 We are grateful to two anonymous reviewers for comments on earlier versions of this article.

EMOTIONAL INTELLIGENCE 203 tively, EI test items may refer to higher order reasoning processes, such as choosing how to cope with a stressful enounter. Mayer, Salovey, and Caruso (2000a) arranged the four branches in a hierarchy beginning with lower level or basic skills of perception and appraisal of emotion, and finishing, at the highest level, with synthetic skills for emotion management that integrate lower level skills. Basic skills appear to be those most open to objective assessment (although it is likely that perception and appraisal of emotion also involve high-level inference). For example, facial expression of emotion is sufficiently well understood (e.g., Ekman, 1999) in that objectively scored tests of identification of facial emotion may be feasible. In such a case, expert scoring seems appropriate and there is no place for consensus scoring. Conversely, items for tests of the managing emotions branch are more problematic. Certain emotional reactions may be assessed according to logically consistent criteria only by reference to personal and societal standards (Matthews & Zeidner, 2000; Matthews, Zeidner, & Roberts, in press). For example, what is the best or right response to being insulted or mocked by a coworker? Clearly, the best response would depend on the situation, the person s experience with insults, cultural norms, the individual s position in the status hierarchy, and so forth. Even within a single specified situation, it is often difficult to specify the best response there are multiple criteria for adaptation that may conflict (e.g., preserving self-esteem, maintaining good relationships with others, advancing in one s career). None of the scoring methods appear to be very satisfactory for higher level aspects of EI (which may be those most relevant to real-world functioning). Experts may be able to use psychological research to provide answers (as did Mayer, Caruso, & Salovey, 1999), but there are two fundamental limitations to expert knowledge in this area. First, research typically reveals only statistical rather than directly contingent relationships, for example, being mocked by a coworker typically (but not invariably) leads to anger. Second, there are multiple domains of expertise leading to conflicting viewpoints. If we present the question of how a child s emotional problems can best be managed to a cognitive therapist, an evolutionary psychologist, a psychoanalyst, a social worker, a high school teacher, and a gender studies professor, what is the probability that these experts will agree on a solution? (We might feel fortunate to find agreement between any two of the above.) The adequacy of consensus judgments is based on evolutionary and cultural foundations, where the consistency of emotionally signaled information appears paramount (Bar-On, 1997; Mayer, Caruso, & Salovey, 1999). Researchers have argued that the pooled responses of large normative samples is accurate (Legree, 1995), although more evidence is needed. Even if that is the case, there are serious concerns about bias in consensus judgment. Consensus may be influenced by nonveridical cultural beliefs, such as the traditional British belief that a stiff upper lip is always the best response to emotional problems. There are also concerns about the validity of consensus judgments that cross gender and cultural boundaries. The popular Venus and Mars view of gender relations is that men are good at understanding the emotions of other men but are inept at understanding women s feelings, and vice versa. In the worst case, consensus scoring may simply indicate the extent of agreement with cultural or gender-based prejudices. If we are prepared to set such difficulties of scoring principles aside, perhaps we can proceed pragmatically, as Binet did in developing intelligence tests that would discriminate children of high academic potential. Testing EI may well be worthwhile if there is evidence that EI tests are reliable, in measuring some underlying quality accurately, and valid, in predicting relevant criteria better than other tests. Given that the MEIS is a new measure, it may be inappropriate to stifle research prematurely by applying overlystringent criteria. However, it is essential that there is convergence between different scoring methods, or the construct may be judged as unreliable. Mayer, Caruso, and Salovey (1999) pointed out that as the different criteria represent different perspectives, it is unlikely that they would be in complete agreement. These authors go on to state that there should be a general rough convergence, which would substantiate the view that EI is, in fact, an intelligence. Unfortunately, it is unclear how high correlations should be to attain rough convergence or whether it is satisfactory for correlations to be substantial but considerably less than unity (e.g., in the range of 0.50 0.70). The pragmatic approach raises the issue of empirical findings that are based on the MEIS, which will be considered next. EI: Empirical Findings These theoretical issues notwithstanding, recent research by Mayer, Caruso, and Salovey (1999) suggests that state-of-the-art objective measures of EI meet the standards of validity and reliability expected

204 ROBERTS, ZEIDNER, AND MATTHEWS of traditional cognitive-ability measures. Indeed, although the scientific study of EI has only recently begun, the scant empirical evidence available is contradictory. A brief examination of these conflicting results follows. EI Measures: Positive Results Mayer, Caruso, and Salovey (1999) have argued that standard criteria need to be met before any (new) form of intelligence can be considered to constitute a legitimate scientific domain. These authors have focused on the following three standards, which have been replicated many times in psychometric studies of intelligence (and its taxonomic structure) over the past century (see, e.g., Carroll, 1993; Cattell, 1971; Guttman & Levy, 1991; Horn & Hofer, 1992; Jensen, 1998): 1. An intelligence should be capable of reflecting mental performance rather than preferred ways of behaving, or a person s self-esteem, or nonintellectual attainments (Mayer, Caruso, & Salovey, 1999, pp. 269 270). In short, this so-called conceptual criterion asserts that the concept in question be operationalized as a set of abilities (in this case, emotion-related abilities) that have clearly defined performance components. 2. A (new) intelligence should meet prescribed correlational criteria. For example, tests for different aspects of such an intelligence should be positively intercorrelated. Measures of a new ability should be related to existing psychometric intelligence tests (specifically demonstrating the positive manifold phenomenon represented by a nonnegative matrix of correlation coefficients, as prescribed by Guttman s first law of intelligence (Guttman & Levy, 1991). 6 3. Measures of intelligence should vary with experience and age. 7 Researchers have claimed that available evidence supports the notion that EI meets all three criteria and so is a legitimate form of intelligence (Mayer & Cobb, 2000; Mayer & Salovey, 1993, 1997; Mayer, Salovey, & Caruso, 2000a, 2000b; Salovey et al., 2000). With respect to operationalization criteria, EI has been measured by a series of ability tasks on state-of-theart instruments, such as the MEIS, and has been objectively scored by using consensus, expert, and (for some scales) target criteria. These criteria are claimed to converge (i.e., were positively correlated) to a satisfactory degree (Mayer, Salovey, & Caruso, 2000b). In the Mayer, Caruso, and Salovey (1999) data, correlations between consensus and expert test scores ranged from.16 to.95, with half of the 12 correlations exceeding an r of.52. A median of.52 suggests the desired rough convergence, though it is questionable whether correlations of this magnitude are sufficient to establish a reliable common element to the two forms of scoring. Moreover, Mayer, Caruso, and Salovey (1999, 2000) asserted that the four-branch model has (more or less) been vindicated by a series of factor analyses, such that the component tests adhere to the stated performance model. Finally, subtests comprising the MEIS are generally claimed to exhibit satisfactory levels of internal consistency reliability (see also Ciarrochi, Chan, & Caputi, 2000). In fulfilling the second criterion, which essentially captures major features of construct validation, measures of EI have been shown to have concurrent validity with cognate measures of EI, such as empathy, parental warmth, and emotional openness (Mayer, Caruso, & Salovey, 1999; Mayer & Geher, 1996), which serve as criteria for validity assessment. More important, consensus and target scores appear to correlate to a similar degree with selected outside criteria (e.g., empathy, self-reported Scholastic Assessment Test [SAT] scores, decreased emotional defensiveness) in student populations (Mayer & Geher, 1996), although comparability of consensus and expert scores as predictors has been neglected. Other evidence comes from studies using questionnaire measures of EI. For 6 In interests of economy of expression, we use the term positive manifold throughout this article to refer to a nonnegative matrix of correlation coefficients. Strictly speaking, however, as one reviewer pointed out, positive manifold more correctly refers to an untested, mathematical hypothesis first put forward by Thurstone (1931). 7 As one reviewer noted, Criteria 2 and 3, put forth by Mayer, Caruso, and Salovey (1999), may be problematic as criteria because these conditions may be construed as empirical findings rather than definitional features. This would appear a distinct possibility, especially with respect to Criterion 3, perhaps less so with respect to Criterion 2, as many intelligence researchers take positive manifold to be a lawful phenomenon of ability measures. Thus, for example, Guttman and Levy (1991) proposed, as the first law of intelligence, that positive intercorrelations of test items is an empirical law for a certain class of items. This issue aside, we do not wish to suggest that we wholeheartedly endorse any of these criteria (as evidenced later in our exposition). Rather we present them here as the three standards espoused by Mayer, Caruso, and Salovey (1999) for establishing that a new domain (such as EI) constitutes a form of intelligence.

EMOTIONAL INTELLIGENCE 205 example, this form of EI predicts first-year college students success (Schutte et al., 1998). Self-reported EI is also negatively related to alexithymia (i.e., difficulties in identifying, evaluating, describing, and expressing feelings), as measured by the Toronto Alexithymia Scale (e.g., Schutte et al., 1998; Taylor, 2000). However, arguably, the most important construct validation criterion is the extent to which EI overlaps with other intelligence(s). In their pioneering study, Mayer, Caruso, and Salovey (1999) claimed that MEIS measures were sufficiently differentiated from verbal intelligence to provide unique variance but also sufficiently correlated to indicate that concepts underlying the MEIS form an intelligence. Somewhat curiously, the verbal intelligence measure, used in the Mayer, Caruso, and Salovey (1999) study (i.e., the Army Alpha), is seldom used in contemporary investigations of cognitive ability. Moreover, another study, which included an oft-used measure of cognitive abilities, came up with a notably different finding that might be construed as questioning the claim that EI meets the standards expected of an intelligence. In particular, Ciarrochi et al. (2000) found near zero correlations between general EI, measured by total MEIS scores, and the Australian version of the Ravens Standard Progressive Matrices test (RSPM; Australian Council of Educational Research [ACER], 1989), and negative correlations between an understanding and managing emotions factor and RSPM score! With respect to their criterion, Mayer, Caruso, and Salovey (1999) reported that differences in mean EI scores observed for adolescents and adults serve as evidence supporting the developmental criterion. Note, however, that the above study was based on a cross-sectional design and thus allows interpretation only in terms of age group differences not developmental differences. There is another interesting issue implicit to the issue of developmental differences raised by consensus scoring. In particular, if one takes the consensus of the younger group, as the measure by which one should score these scales, it remains plausible that these age trends will reverse. In their study, Mayer, Caruso, and Salovey (1999) actually used an independent adult sample to obtain the consensus scores, meaning that this rival hypothesis certainly cannot be ruled out. In any event, the developmental criterion espoused by Mayer, Caruso, and Salovey (1999) is imprecise. In the intelligence literature, a particularly important finding is that certain classes of cognitive ability (e.g., Gf) actually decline with age (see, e.g., Carroll, 1993; Cattell, 1971; Horn & Hofer, 1992). It is difficult to envisage what developmental trend, other than complete insensitivity to age, would call into question the validity of any given measure. EI: Negative Results Mayer and Salovey (1993) had originally described EI as a type of social intelligence. However, despite much research, the independence of social intelligence from other types of intelligence (i.e., verbal) has not been successfully demonstrated (Carroll, 1993; Cronbach, 1960). Indeed, there is some evidence relating EI to Gc, through its mutual relationships with putative measures of social intelligence (Davies et al., 1998). Davies et al. (1998) found a range of measures purportedly assessing EI to have poor psychometric properties. These authors found low correlations among three factors defining the EI construct in their study appraisal of emotions in the external world (perception) and appraisal of emotions in the self (awareness and clarity). A positive outcome evidenced in the Davies et al. investigation was that the perception of consensus-judged emotion in external objects represents a clearly defined unifactorial construct. However, two problems exist with emotion perception as a facet of EI. First, the scales have evidenced relatively low reliability. Second, consensus scoring may define the factor rather than emotional content per se. This methods-factor issue is an important one, certainly worthy of more careful consideration than it has been given to date. One of the main criticisms subsequently leveled at the Davies et al. (1998) investigation was the EI measures were still in their infancy such that their conclusions appeared premature (e.g., Mayer, Caruso, & Salovey, 1999; Mayer & Cobb, 2000; Mayer, Salovey, and Caruso, 2000a, 2000b). Thus, explicitly citing this reference, Mayer and Cobb (2000) noted that the Davies et al. study preceded publication of the highly reliable MEIS (p. 173). The question that should then be posed is, To what extent do available data support the efficacy of the MEIS, which debatably would now appear as the premier vehicle for the assessment of EI (see, e.g., Ciarrochi et al., 2001)? In their recent psychometric analysis of scores obtained from the MEIS, Mayer, Caruso, and Salovey, (1999) demonstrated that for consensus scores, reliabilities ranged from.49 to.94. 8 Indeed, Ciarrochi et 8 The reader should note that, for comparative purposes, we present each of the reliabilities for respective tests and scoring protocols obtained by Mayer, Caruso, and Salovey (1999) as a companion to similar analysis we conduct as part of the present study.