Assessing speaking in the revised FCE Nick Saville and Peter Hargreaves

Assessing speaking in the revised FCE Nick Saville and Peter Hargreaves This paper describes the Speaking Test which forms part of the revised First Certificate of English (FCE) examination produced by the University of Cambridge Local Examinations Syndicate (UCLES), and introduced for the first time in December 1996 (see First Certificate in English: Handbook, UCLES, 1997). The aim is to present the new test as the outcome of a rationalprocess of test development, and to consider why the new design provides improvements in the assessment of speaking within the FCE context. While examinations by their nature tend to be conservative, the Cambridge examinations produced over the years have kept pace with changes in English teaching, SO that modifications to the examinations have taken place in an evolutionary way. FCE, first introduced in 1939 under the title Lower Certificate in English, has been revised periodically over the years in order to keep pace with changes in language teaching and language use, and also as part of an ongoing commitment to test validation. Prior to the revision introduced in 1996, FCE underwent major revisions in 1984, and before that in 1973. By changing in this way, it has been possible to continue to achieve positive impact in the contexts where the examinations are used - especially in relation to English language learning, and teaching around the world. In this respect, one of the key features of UCLES EFL examinations has been a focus on the assessment of speaking by means of a face-to-face speaking test as an obligatory component of the examinations. As part of the revisions to FCE and CPE, in order to keep up with developments in the field, UCLES has introduced new procedures and a number of different speaking-test formats have been used. Across the range of examinations that are now produced by UCLES, there is currently no single model for testing speaking. Some examinations, like the International English Language Testing System (IELTS), employ a speaking test in the one-to-one format (i.e. with one candidate and one examiner), and all tests are recorded SO that they can be rated by other examiners at a later stage. Other examinations make use of a group format in the speaking tests, with more than two candidates assessed together (as in the Certificate in English for English Language Teachers - CEELT). Elicitation and In designing a face-to-face speaking test such as those employed by ratings UCLES, the test developer has to produce a suitable procedure which involves two main aspects: 42 ELT Journal Volume 53/1 January 1999 Oxford University Press 1999

a) the elicitation of an appropriate sample of spoken English; b) the rating of that sample in terms of pre-defined descriptions of performance in spoken English, whether as a whole, or broken down into different criteria (e.g. accuracy, range, pronunciation, etc.). These aspects in turn depend on two factors: the availability of valid and reliable materials and criterion rating scales; the development and support of a professional oral examiner cadre. In designing a speaking test, there are no right or wrong solutions to this problem; as Bachman and Palmer (1996) point out, an appropriate ( useful ) outcome is achieved by balancing the essential qualities of validity, reliability, impact, and practicality to meet the requirements of the testing context. Despite the variety of formats which are used for testing speaking, in relation to the following examinations which form the Cambridge 5-Level System, UCLES has taken steps in recent years towards harmonization of approach: Cambridge Level 5 - Certificate of Proficiency in English-CPE Cambridge Level 4 - Certificate in Advanced English-CAE Cambridge Level 3 - First Certificate in English-FCE Cambridge Level 2 - Preliminary English Test-PET Cambridge Level 1 - Key English Test-KET The aim is to establish common features which can be applied appropriately at the different levels. Some of the more important features which have been identified in this process have been incorporated into the revision of the FCE, and can be summarized as follows: a) A paired format, based on two candidates and two oral examiners. b) Of the two oral examiners, one acts as interlocutor, and his or her most important role is to manage the discourse (i.e. ensure that an appropriate sample is elicited from each of the paired candidates); the other acts as assessor, and is not involved in the interaction. c) There are different phases or parts to the FCE Speaking Test, which facilitate the assessment of different patterns of interaction, participant roles, discourse, rhetorical functions, etc. d) Standardization of formats is achieved partly by the use of controlled interlocutor frames, and partly by the use of tasks based on visual stimuli from generic sets appropriate to the level and nature of the examination. e) Both the interlocutor and the assessor rate the candidates performance, but the interlocutor provides a global/holistic assessment, while the assessor provides an analytical assessment. With the revision of FCE, there are now four examinations which make use of the paired format - KET, PET, FCE, and CAE (CPE is currently Assessing speaking in the revised FCE 43

The paired format under review, and still retains the option of the one-to-one approach). The decision to use the paired format as the standard model for the main suite speaking tests has been a key feature in balancing the essential test qualities in relation to the contexts where these examinations are used. The paired test was first used with FCE and CPE as an optional format during the 1980s. When CAE was introduced in 1991, the paired format was established as an obligatory feature of one of the main suite tests for the first time. This was extended to KET in 1993, to the revised PET in 1995, and most recently to the revised FCE in 1996. Before the decision was made to extend the use of the paired format across the range of examinations, and especially to FCE, the various alternative formats were evaluated. In particular, feedback was collected from a wide range of stakeholders in the tests from around the world (including oral examiners, teachers, students, and candidates taking the tests). In addition, a range of validation projects carried out by the UCLES EFL Division in the 1990s have contributed to a greater understanding of this kind of assessment procedure (e.g. Lazaraton 1996a, 1996b, Milanovic, Saville, Pollitt, and Cook 1996, Young and Milanovic 1992). One of the major advantages of the paired format is the use of two examiners to assess a candidate. This adds to the fairness of the assessment, and helps to make candidates feel reassured that their mark does not just depend on one person. The paired format also allows more varied patterns of interaction during the examination; whereas in the one-to-one model there is only one interaction pattern possible (i.e. interaction between one examiner and one candidate), the paired format provides the potential for various interaction patterns between each candidate and the examiner, and between the candidates themselves. In addition, the paired format has the potential for positive washback, in encouraging more interaction between learners in the classroom. Any test format presents test developers with a range of potential problems and issues which need to be addressed, and the paired format of the speaking tests is no exception. Critics of the paired format are often concerned with issues relating to the pairing of the candidates. It is argued, for example, that the paired format may not provide each candidate with an equal opportunity to perform to the best of their ability, or that the pairings may influence the assessment (e.g. due to mismatch of language level - a good candidate paired with a weaker one - or if one candidate is paired with another of a different age, gender, or nationality). Many of these concerns cannot be addressed with definitive answers. However, the potential problems need to be seen within the context of the overall design of the examination, and balanced against the positive advantages. For example, UCLES has attempted to address the issue of how much spoken language is produced by each candidate: a) by paying close attention to the design of the different parts of the tests; 44 Nick Saville and Peter Hargreaves

b) by providing examiners with an interlocutor frame to follow whilst administering the examination. These features, together with comprehensive training for oral examiners (described below), help to ensure that a balanced sample of speech is elicited, and that each candidate receives an equal opportunity to perform to the best of his or her ability during the examination. Moreover, the EFL Division at UCLES has been conducting research since 1992 on the discourse produced in paired-format speaking tests, and work specifically related to the speech of candidates in FCE has been going on since 1995. The purpose of this research is to better understand the features of the language produced during a paired format test. Initial findings related to the revised FCE suggest that the features of candidate language predicted by the test specifications were present in the samples which were analysed. In this regard, it is important to understand how the format of the revised FCE Speaking Test was arrived at, and the steps taken to ensure that the assessment is standardized. The second half of this paper describes in more detail the features of the revised FCE Speaking Test, and the way that oral examiners are trained and co-ordinated to carry out the test procedures, and to make appropriate ratings. Features of the The initial context for the most recent revision of FCE was provided by revised FCE the existing uses of the examination, and the nature of the current Speaking Test candidature. What was already known of existing standards - the expected level of performance by FCE candidates (centred on passing candidates with a grade C) - provided the background for the revised assessment criteria, and the application of the rating scales. The rating scales themselves (for use by oral examiners) were redeveloped in relation to the harmonized approach to the assessment of speaking, described above. Within this approach, all criteria used in the assessment were defined, and related to a model of Communicative Language Ability (CLA). The criteria and scales for revised FCE are derived from the same model Figure 1 Spoken language ability Language competence Strategic competence Grammatical Discourse Pragmatic Syntax Morphology Vocabulary Pronunciation I Rhetorical organisation Coherence Cohesion I eg Sensitivity to illocution Interaction skills Non-verbal features of interaction Assessing speaking in the revised FCE 45

which underpins the revision project as a whole, based on the developments in this area during the 1980s, e.g. the work of Canale and Swain (1980) Bachman (1990) and the Council of Europe specifications for Waystage and Threshold (1990). See Figure 1. The revised FCE has five assessment criteria in all, four analytical and one global: grammar and vocabulary, discourse management, pronunciation, interactive communication, and global achievement. Test format and In the revised FCE, the Speaking Test consists of four parts, each of task features which focuses on a different type of interaction: between the interlocutor and each candidate, between the two candidates, and among all three. The patterns of discourse vary within each part of the test, and candidates are encouraged to prepare for the Speaking Test by practising talking individually, and in small groups with the teacher and with peers. The aim is to help them to be aware of, and to practise, the norms of turn-taking, and the appropriate ways of participating in a conversation, or taking up a topic under discussion. This is seen as one aspect of positive impact that the test can achieve. Oral examiners make use of a task features specification which summarizes the features of the task which are appropriate to the level, and purpose of the examination. Each part of the test is a separate task, with the following specific features: interaction pattern (examiner to candidate(s); candidate to candidate, etc.), input (verbal and/or visual), and output by candidates. The expected output of the candidates is predicted from the combination of features for each task, and is judged in relation to their performance in these tasks, which have been designed according to the level of FCE, and in order to provide an appropriate level of difficulty for the typical FCE candidature. As noted above, the tasks include different interaction patterns, different discourse types (short turn, long turn, etc.), and have features such as turn-taking, collaborating, initiating/responding, and exchanging information. Examples of other task features include functions such as describing and comparing, stating and supporting an opinion, agreeing and disagreeing, speculating, expressing certainty and uncertainty. This is summarized in Table 1. Each task has its own focus: Part 1 - Interview The interlocutor directs the conversation, by asking each candidate to give some basic persona1 information about him or herself. The candidates do not need to talk to each other in this part of the test, though they may if they wish. Part 2 - Long turn Each candidate is given the opportunity to talk without interruption on his or her own for about one minute. Each candidate is asked to compare and contrast two colour photographs, commenting on the 46 Nick Saville and Peter Hargreaves

Table 1: Task features Parts Task format Candidate output specification Interaction Discourse pattern Input features Functions 1 Interview interlocutor verbal responding to giving personal (3 minutes) interviews questions questions information candidates expanding on talking about responses present circumstaces talking about past experience talking about future plans 2 Individual interlocutor visual stimuli sustaining a giving long turn delegates an with verbal long turn information (4 minutes) individual task rubrics managing expressing to each discourse: opinions, e.g. candidate - coherence through and clarity of comparing and message contrasting - organization explaining and of language giving reasons and ideas - accuracy and appropriacy of linguistic resources 3 Two-way interlocutor visual/written turn-taking: exchanging collaborative delegates a stimuli, with initiating and information task collaborative verbal rubrics responding and opinions (3 minutes) task to the pair appropriately expressing and of candidates negotiating justifying opinions agreeing and/ or disagreeing suggesting speculating 4 Three-way interlocutor verbal prompts initiating and exchanging discussion leads a responding information (4 minutes) discussion with appropriately and opinions the two candidates developing topics expressing and justifying opinions agreeing and/ or disagreeing pictures, and giving some persona1 reaction to them. They are not required to describe the photographs in detail. Part 3 - Two-way collaborative task The candidates are provided with a visual stimulus (one or several photographs/line drawings/computer graphics, etc.) to form the basis for a task which they attempt together. Sometimes the candidates may be asked to agree on a decision or conclusion, whereas at other times they may be told that they may agree to disagree. In all cases, it is the Assessing speaking in the revised FCE 47

working towards the completion of the task that counts, rather than the actual completion of the task. Part 4 - Three-way discussion The interlocutor again directs the conversation by encouraging the candidates to broaden and discuss further the topics introduced in Part 3. In the information about the test which is provided to candidates, it is made clear that they must be prepared to provide full but natural answers to questions asked either by the interlocutor or the other candidate, and to speak clearly and audibly. They should not be afraid to ask for clarification if they have not understood what has been said. If misunderstandings arise during the test, candidates should ask the interlocutor, or each other, to explain further. Obviously, no marks are gained by remaining silent, and equally, no marks are lost for seeking clarification on what is required. On the contrary, this is an important feature of strategic ability, which is one of the criteria for assessment (under the interactive communication scale). While it is the role of the interlocutor, where necessary, to manage or direct the interaction, ensuring that both candidates are given an equal opportunity to speak, it is also the responsibility of the candidates to maintain the interaction as much as possible. Candidates who are able to balance their turns in the interchange will utilize to best effect the amount of time available, and SO provide the oral examiners with an adequate amount of language to assess. From the point of view of ratings, an advantage of the paired format is that two independent ratings are obtained for each candidate, thus making the examination fairer. In the revised FCE both the assessor and interlocutor record marks using the same criteria, although the two examiners are expected to have slightly different perspectives on the performance due to their different roles - the interlocutor as participant, and the assessor as observer. TO reflect this, the revised FCE makes use of two types of rating scale: a set of analytical scales derived from the criteria in the model of Communicative Language Ability, and a global scale which combines the criteria in the analytical scales in an appropriate way. This rating procedure involves the assessor marking each candidate on the four analytical scales as the test is in progress, and at the end, the interlocutor giving a single global score for each candidate based on the global achievement scale. There is no requirement for the examiners to discuss and agree the marks, and the final assessment is derived from the two ratings when the mark sheets are returned to Cambridge. Examiner training Successful elicitation and accurate ratings are to a large extent dependent on the knowledge and ability of the oral examiners. In the first instance, careful test design can help to ensure that the examiners are likely to find the elicitation procedures and rating scales easy to apply. However, the successful functioning of a speaking test, such as that used in the revised FCE and most other Cambridge examinations, Nick Saville and Peter Hargreaves

relies heavily on a system for training and standardizing the oral examiners. For UCLES this is a major undertaking, as there are currently about 7,000 approved UCLES EFL oral examiners around the world involved in conducting one or more of the Speaking Tests for the Cambridge EFL examinations. The major objectives in regard to the performance of these oral examiners are that: a) they consistently apply the Speaking Test procedures to obtain representative, valid samples of the candidates spoken English in accordance with the test specifications; b) they rate the samples of spoken English accurately and consistently, in terms of the pre-defined descriptions of performance, using the rating scales provided by UCLES. Over the years UCLES has developed a two-pronged approach to ensuring these objectives can be met, based, firstly, on a network of professionals with various levels of (overlapping) responsibility, and, secondly, on a set of procedures which apply to each professional level. In the network of professionals there are three levels, in addition to UCLES own staff. At the operational level there are the oral examiners. At the next level up, in countries where there are sufficient numbers of oral examiners to merit it, team leaders are engaged by local secretaries with the responsibility of professional supervision of oral examiners, in a ratio of about one team leader to between five and 30 oral examiners, depending on such factors as distribution of oral examiners, location of centres, etc. Finally, in countries where the number of team leaders (and hence oral examiners) merit it, senior team leaders have been appointed by UCLES to supervise team leaders in an average ratio of one senior team leader to 15 team leaders. This forms a hierarchy of responsibilities. See Figure 2. The levels in this hierarchy are not sealed off from each other: it is a requirement that team leaders and senior team leaders must also be practising oral examiners, in order to ensure that they can draw on their experience when it comes to dealing with the concerns of oral examiners. The set of procedures which regulate the activities of these three professional levels is summarized by the acronym R-I-T-C-M-E, where the initials stand for: Figure 2 UCLES A Senior team leaders A Team leaders A Oral examiners Assessing speaking in the revised FCE 49

Recruitment, Induction, Training, Co-ordination, Monitoring, and Evaluation. Each of these procedures is defined by a list of Minimum Professional Requirements (MPRs) appropriate to the level of professional responsibility. These MPRs set down the minimum levels and standards (for recruitment, induction programmes, etc.) which must be achieved in order to meet the professional requirements of administering Cambridge EFL Speaking Tests, and to sustain a fully effective team leader system. The first two procedures covered by R-I-T-C-M-E, recruitment and induction, typically apply only once to an applicant oral examiner for a given examination. The remainder of the procedures are recurrent, and to some extent cyclical for each examination, in SO far as the outcome of monitoring and evaluation feeds into training and co-ordination. After initial training of examiners, standardization of assessment is maintained by the annual examiner co-ordination sessions of oral examiners approved for relevant examination, and by monitoring visits to centres by team leaders. During co-ordination sessions, examiners watch and discuss sample speaking tests recorded on video, and then conduct practice tests with volunteer candidates in order to establish a common standard of assessment. The sample tests on video are selected by UCLES to demonstrate a range of task types and different levels of competence, and are pre-marked by a team of experienced assessors. In this context, monitoring and evaluation refer to both the test procedures (e.g. whether the procedure elicits an appropriate sample), and to the performance of the oral examiners. This latter kind of monitoring and evaluation forms part of the human resource appraisal system, which is necessary to guarantee the quality of the assessments. During monitoring, team leaders complete evaluation sheets for the oral examiners being monitored; they discuss results with the oral examiners themselves, and with the local secretaries as part of a planning/review meeting. The evaluation sheets are then sent to the senior team leaders, and finally on to Cambridge for analysis. In addition, greater use is now being made of audio recordings to monitor both candidate output and examiner performance. Conclusion This paper has described the Speaking Test in the revised FCE in relation to the format of the test, and the way in which assessments are made, focusing in particular on the role of the oral examiners. While no test achieves a perfect balance of necessary qualities, it is believed that the current balance, with the recent revisions, takes several steps forward in terms of improvements over earlier solutions. An on-going commitment to validation involving data collection, monitoring, and evaluation will ensure that the evolutionary process of change continues in future. In this way, and as our knowledge of the complexities of spoken language grows, further revisions can be expected in the future. 50 Nick Saville and Peter Hargreaves

References Bachman, L. F. 1990. Fundamental Considerations in Language Testing. Oxford: Oxford University Press. Bachman, L. F. and A. S. Palmer. 1996. Language Testing in Practice. Oxford: Oxford University Press. Canale, M. and M. Swain. 1980. Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics l/l:l-47. Oxford: Oxford University Press. van Ek, J. A. and J. L. M. Trim. 1990. Threshold Level 1990. Strasbourg: Council of Europe. van Ek, J. A. and J. L. M. Trim. 1990. Waystage 1990. Strasbourg: Council of Europe. Lazaraton, A. 1996a. Interlocutor support in oral proficiency interviews: The case of CASE. Language Testing 13: 151-72. London: Edward Arnold. Lazaraton. A. 1996b. A qualitative approach to monitoring examiner conduct in the Cambridge Assessment of Spoken English (CASE) in Studies in Language Testing 3: Performance testing, cognition and assessment: Selected papers from the 15th Language Testing Research Colloquium (LTRC): 18-33. Cambridge: Cambridge University Press/UCLES. Milanovic, M. N., A. Saville, A. Pollitt, and A. Cook. 1996. Developing Rating Scales for CASE: Theoretical Concerns and Analyses in Validation in Language Testing. Clevedon: Multilingual Matters. University of Cambridge Local Examinations Syndicate. 1997. First Certifîcate in English: Handbook. Cambridge: University of Cambridge Local Examinations Syndicate. Young, R. and M. Milanovic. 1992. Discourse variation in oral proficiency interviews in Studies in Second Language Acquisition 14: 403-24. Cambridge: Cambridge University Press. The authors Nick Saville has been Group Manager for Test Development and Validation within the EFL Division of the University of Cambridge Local Examinations Syndicate (UCLES) since 1994. His own research interest is in the development and validation of procedures for oral assessment, and he was a member of the UCLES development team that worked on the revision of the FCE Speaking Test. E-mail: <saville.n@ucles.org.uk> Peter Hargreaves joined UCLES as Director of the EFL Division after working for the British Council for over twenty years. His early background in ELT was in teacher training, but he moved into testing with the Council after obtaining his doctorate in the discipline grammar of English. He now heads a team of about 70 staff at UCLES, working on the Cambridge EFL examinations and Integrated Language Teaching Schemes. Assessing speaking in the revised FCE 51