Running head: REVIEW OF IELTS, MET, AND TOEFL 1

Similar documents

Cambridge IELTS 3. Examination papers from the University of Cambridge Local Examinations Syndicate

Cambridge IELTS 2. Examination papers from the University of Cambridge Local Examinations Syndicate

Information for candidates

Information for candidates

Guide. for educational institutions, governments, professional bodies and commercial organizations.

A Guide to Cambridge English: Preliminary

Illinois Institute of Technology Chicago, U.S.A.

Result Analysis of the Local FCE Examination Sessions ( ) at Tomsk Polytechnic University

Ari Huhta Centre for Applied Language Studies University of Jyväskylä

PTE Academic. Score Guide. November Version 4

Information for teachers about online TOEIC Listening and Reading practice tests from

Guidelines for Best Test Development Practices to Ensure Validity and Fairness for International English Language Proficiency Assessments

Information for candidates

Level 4 Certificate in English for Business

A Minimum English Proficiency Standard for The Test of English as a Foreign Language Internet-Based Test (TOEFL ibt)

The Michigan State University - Certificate of English Language Proficiency (MSU- CELP)

APEC Online Consumer Checklist for English Language Programs

Programme Specification (Postgraduate) Date amended: March 2012

IELTS for Community Colleges

Ⅱ Admissions Requirements. 1.Terms of Application

ETS Automated Scoring and NLP Technologies

Comparison of the Cambridge Exams main suite, IELTS and TOEFL

How To Pass Cambriac English: First For Schools

The. Languages Ladder. Steps to Success. The

AIMING FOR GLOBAL EXCELLENCE. UEPP 1-4 University English Preparatory Programme. ENTRY DATES January, March, July and October

oxford english testing.com

THE USE OF INTERNATIONAL EXAMINATIONS IN THE ENGLISH LANGUAGE PROGRAM AT UQROO

English language courses for non-native speakers of English

Considerations of Conducting Spoken English Tests for Advanced College Students. Dr. Byron Gong

Final Project Report

The Michigan State University - Certificate of English Language Proficiency (MSU-CELP)

CELTA. Syllabus and Assessment Guidelines. Fourth Edition. Certificate in Teaching English to Speakers of Other Languages

Guide to Pearson Test of English General

Modern foreign languages

Brief exam guide for exams from 2015

Chinese Proficiency Test (HSK)

PTE Academic Recommended Resources

PTE Academic Recommended Resources

Information for candidates For exams from 2015

Washback of IELTS on the Assumption College English Program

Bilingual Education Assessment Urdu (034) NY-SG-FLD034-01

Improve your English and increase your employability with EN Campaigns

The Cambridge English Scale explained A guide to converting practice test scores to Cambridge English Scale scores

Italian Language & Culture Courses for Foreigners. ITALY Language Training

Student Handbook. Part C Courses & Examinations

Exams for higher education

3. IELTS as a predictor of academic language performance, Part 1

Understanding skills levels. Understanding skills levels

Dr. Wei Wei Royal Melbourne Institute of Technology, Vietnam Campus January 2013

PROGRAMME SPECIFICATION Programme Title. MA in TESOL. UCAS/JACS Code. School/Subject Area. Final Award

French Language and Culture. Curriculum Framework

Teacher s Guide for DynEd s Placement Tests

Delta. Handbook for tutors and candidates. Module 1, Module 2, Module

Assessing Speaking Performance Level B2

What is the Common European Framework of Reference for language?

REGULATIONS FOR THE DEGREE OF MASTER OF ARTS IN TEACHING ENGLISH TO SPEAKERS OF OTHER LANGUAGES (MA[TESOL])

Reliability on ARMA Examinations: How we do it at Miklós Zrínyi National Defence University

Course Syllabus My TOEFL ibt Preparation Course Online sessions: M, W, F 15:00-16:30 PST

Alignment of the National Standards for Learning Languages with the Common Core State Standards

Writing Proficiency TEST Familiarization Manual

CALIFORNIA S TEACHING PERFORMANCE EXPECTATIONS (TPE)

PTE Academic Recommended Resources

Listening Student Learning Outcomes

ENGLISH PROFICIENCY LEVEL DEMANDED BY POLISH UNIVERSITIES FROM BRAZILIAN UNDERGRADUATE AND GRADUATE STUDENTS

Assessing Adult English Language Learners

Speaking skills for Cambridge English: First for Schools (2015)

TEFL Cert. Teaching English as a Foreign Language Certificate EFL MONITORING BOARD MALTA. January 2014

Teaching English to Speakers of Other Languages. Post-Degree Certificate Program 2013/14 Application Information

Chinese Tests Chinese Proficiency Test (HSK)

TExES English as a Second Language Supplemental (154) Test at a Glance

Programme Specification: BA Teaching English as a Foreign Language

Teaching Qualifications. TKTModules 1 3. Teaching Knowledge Test. Handbook for Teachers

Defining the general examination and certification requirements, language examination standards and recommendations for renewing and modernizing:

EXAMPLES OF SPEAKING PERFORMANCE AT CEFR LEVELS A2 TO C2. (Taken from Cambridge ESOL s Main Suite exams)

Programme Specification and Curriculum Map for MA TESOL

Cambridge English. First. First Certificate in English (FCE) CEFR Level B2. Ready for success in the real world

ENGLISH FILE Pre-intermediate

Cambridge English: First (FCE) Frequently Asked Questions (FAQs)

Brief Course Descriptions for Academic English Cluster

The Official Study Guide

Introductory Guide to the Common European Framework of Reference (CEFR) for English Language Teachers

The quest for IELTS Band 7.0: Investigating English language proficiency development of international students at an Australian university

Task-based pedagogy in technology mediated writing

CANADA COLLEGE OF EDUCATION

Automated Essay Scoring with the E-rater System Yigal Attali Educational Testing Service

ACRONYMS & TERMS RELATED TO ENGLISH LANGUAGE LEARNERS

CAMBRIDGE EXAMINATIONS, CERTIFICATES & DIPLOMAS FCE FIRST CERTIFICATE IN ENGLISH HANDBOOK. English as a Foreign Language UCLES 2001 NOT FOR RESALE

Effects of Different Response Types on Iranian EFL Test Takers Performance

Accelerated Professional Program (APP) Absolute Beginner Varies (Typically 50+ units)

Study Regulations for the Bachelor of Science in International Management

and the Common European Framework of Reference

Transcription:

Running head: REVIEW OF IELTS, MET, AND TOEFL 1 Review of the IELTS, TOEFL-iBT and MET English Proficiency Exams Kristen Foster Colorado State University

REVIEW OF IELTS, TOEFL AND MET 2 Abstract Large-scale proficiency exams are used to make high-stakes decisions about test-takers, and as potential users of these exams, teachers of English as a second or foreign language should possess the ability to evaluate them for usefulness, including reliability and validity. The purpose of this review is to describe the format, purpose, and evidence of usefulness of three large-scale proficiency exams: IELTS, TOEFL-iBT, and MET. Based on this evaluation, the author has concluded that she would most readily endorse the IELTS exam as a reliable and valid measure of English language proficiency. Keywords: English as a second/foreign language, proficiency exam, large-scale exam, IELTS, TOEFL-iBT, MET

REVIEW OF IELTS, TOEFL AND MET 3 Introduction The use of large-scale proficiency exams for assessing the English-language proficiency of non-native speakers of English is a widespread practice that plays a critical role in making important, high-stakes decisions about test-takers (Uysal, 2010). These decisions can include those concerning employment or admissions to English-medium universities, and for many test-takers, the results of these tests hold great importance concerning their academic and professional prospects and success (Bachman & Palmer, 1996). Due to the importance of decisions made as a result of test scores, many test-takers invest a great deal of time, energy and money into preparing for proficiency exams. This is merely one reason why it is essential for test developers, administrators, and users to regularly examine large-scale testing procedures in order to make sure that they meet professional standards and to contribute to their further development (Uysal, 2010, p. 314). As a specific kind of test user, teachers of English as a second/foreign language should possess the ability to evaluate these large-scale proficiency exams in order to determine their potential uses and benefits for specific contexts (Stoynoff & Chapelle, 2005). The purpose of the present paper is to report on information gathered while reviewing three large-scale, widely-administered English-language proficiency exams: the International English Language Testing System (IELTS), the Michigan English Test (MET), and the Test of English as a Foreign Language (TOEFL). These tests are commonly used to assess the English language proficiency of adult English speakers. My motivation in reviewing these three tests, as opposed to other widely administered proficiency exams, is tied to my future teaching goals. Upon graduating with an M.A. in TESL/TEFL, I hope to travel abroad and teach English to adults in academic contexts. My goal in reviewing these tests is to gain insight into their purposes and formats, as well as to find evidence of their reliability and validity in measuring the constructs they purport to measure.

REVIEW OF IELTS, TOEFL AND MET 4 International English Language Testing System (IELTS) Publisher: University of Cambridge ESOL Examinations: 1 Hills Road, Cambridge, CB1 2EU, United Kingdom; telephone 44-1223-553997; ielts@cambridgeesol.org; IDP: IELTS Australia, 535 Bourke St, Melbourne VIC 3000, Australia; telephone 61-3-9612-4400; ielts@idp.com; the British Council, Bridgewater House, 58 Whitworth Street, Manchester, M1 6BB, United Kingdom; telephone 44-161-957-7755, ielts@britishcouncil.org; IELTS International, 825 Colorado Boulevard, Suite 112, Los Angeles, CA 90041; telephone 1-323-255-2771; ielts@ieltsintl.org Publication Date: 1989 (introduced as ELTS in 1980) Target Population: Cost: Adult speakers of English Variable, depending on test center location. USA: Typically $185. Test locations, dates and costs can be accessed at http://www.ielts.org/test_centre_search/search_results.aspx. Overview The IELTS is a large-scale test of English-language proficiency that was first introduced by the University of Cambridge ESOL Examinations in 1980 as the English Language Testing System (ELTS) (Stoynoff & Chapelle, 2005). Since that time, the test has undergone extensive revisions in order to address validity concerns and can now be taken online. It is one of the most popular ESL tests throughout the world, and is unique in that it claims to assess English as an international language (Uysal, 2010). It also claims to be a task-based test (IELTS Guide, 2012, p. 5) that reflects current thinking and theory about communicative language ability and English for Specific Purposes. The IELTS is currently offered in over 800 testing centers all over the world. Table 1 provides an overview of the purpose, structure, scoring, statistical distribution of scores, standard error of measurement, and evidence of reliability and validity of the IELTS. Table 1 Extended description of IELTS Exam Test Purpose IELTS is a test that measures the English language proficiency of adults who wish to work or study in English-language contexts. Test-takers choose between two modules Academic or General Training depending on their expected future context. The IELTS Academic module purports to measure the English language proficiency necessary for academic, higher-learning contexts. It tests general

REVIEW OF IELTS, TOEFL AND MET 5 academic English ability, and is intended to assess whether or not the test-taker is prepared to study or train in English speaking contexts (IELTS: Ensuring quality). The IELTS General Training module emphasizes survival skills in a broad social and educational context (IELTS: Ensuring quality, p. 3). It is intended to assess the English language proficiency of test takers who plan to undergo nonacademic training or work in an English-speaking context. It is sometimes used for immigration purposes. Test Structure IELTS offers two modules: IELTS Academic and IELTS General Training. Both modules consist of four sections taken in the following order: listening, reading, writing and speaking sections. Test takers are given the option to take the speaking section up to a week before or after the listening, reading and writing sections. The listening and speaking components are the same for both modules. The listening component allows test-takers 30 minutes to listen to four recorded texts, monologues and conversations by a range of native speakers and to write answers to a series of questions based on these recordings. There are a total of 40 questions in the listening component. The speaking component consists of three tasks. The first task takes between four and five minutes, and asks test-takers to answer general questions about themselves and a range of familiar topics, such as their home, family, work, studies, and interests. The second task takes up to three minutes, and asks testtakers to respond to a particular topic based on a booklet. This task is involves reciprocal interaction with the test proctor. The third task takes between four and five minutes, and requires test-takers to respond to further questions that are connected to the topic from the previous task. The IELTS Academic module reading section allows test-takers 60 minutes to read and answer questions in response to three authentic English texts, which range from the descriptive and factual to the discursive and analytical and are selected from books, journals, magazines and newspapers (IELTS Guide, 2011, p. 3). There are a total of 40 questions in this reading section. The IELTS Academic module writing section allows test-takers 60 minutes to complete two tasks. In the first task, test-takers are given a graph, table, chart, or diagram for which they should describe, summarize or explain the information using their own words. In the second task, test-takers are asked to write a short essay in response to a point of view, argument, or problem. Test-takers are expected to respond to both tasks in formal, academic style. The IELTS General Training module reading section is divided into three subsections and allows test-takers 60 minutes to complete tasks. The first section consists of two or three factual texts; the second section consists of two workrelated factual texts; and the third section consists of one longer text on a topic of general interest. The texts are authentic materials that test-takers could encounter on a daily basis in an English speaking country (IELTS Guide, p. 3). The IELTS General Training module writing section allows test-takers 60 minutes to complete two tasks. In the first task, test-takers are given a situation and asked to write a letter than either explains the situation or asks for

REVIEW OF IELTS, TOEFL AND MET 6 information. The second task requires that test-takers write a short essay in response to a point of view, argument or problem. Scoring of the Test All sections (reading, writing, listening and speaking) of the IELTS test are scored and reported individually using the following IELTS 9-band scale. These four sub-scores are averaged to provide an overall test score from 0-9. Scores can be reported using half-bands (e.g., 7.5). 9 Expert User 8 Very Good User 7 Good User 6 Competent User 5 Modest User 4 Limited User Has fully operational command of the language; appropriate, accurate and fluent with complete understanding Has fully operational command of the language with only occasional unsystematic inaccuracies and inappropriacies. Misunderstandings may occur in unfamiliar situations. Handles complex detailed argumentation well. Has operational command of the language, though with occasional inaccuracies, inappropriacies and misunderstandings in some situations. Generally handles complex language well and understand detailed reasoning. Has generally effective command of the language despite some inaccuracies, inappropriacies and misunderstandings. Can use and understand fairly complex language, particularly in familiar situations. Has partial command of the language, coping with overall meaning in most situations, though is likely to make many mistakes. Should be able to handle basic communication in own field. Basic competence is limited to familiar situations. Has frequent problems in understanding and expression. Is not able to use complex language. 3 2 Extremely Limited User Intermittent User Conveys and understands only general meaning in very family situations. Frequent breakdowns in communication occur. No real communication is possible except for th emost basic information using isolated words or short formulae in familiar situations and to meet immediate needs. Has great diffuiculty understand spoken and written English. 1 Non-user Essentially has no ability to use the language beyond possibly a few isolated words.

REVIEW OF IELTS, TOEFL AND MET 7 0 Did not No assessable information provided. attempt the test (Reproduced from IELTS Guide for Educational Institutions, 2011 p. 12). The reading and listening sections each contain 40 items for which test-takers receive one point if answered correctly, and zero points if not answered or answered incorrectly. IELTS reports that prior to releasing test versions, test task items are pre-tested in order to identify differences in difficulty level between test versions (IELTS Guide, 2012, p. 8). Using this information, raw scores out of 40 are then calibrated between test versions in order to ensure that test-takers raw scores reflect the appropriate band scores. The following tables report the minimum number of correct items needed to achieve the corresponding band score for the Listening, Academic Reading, and General Training Reading sections. More items must be answered correctly on the General Training Reading section than the Academic Reading section in order to receive the same score, because the Academic texts are lexically and textually more complex than the General Training texts. Listening Raw Band Score Score out of 40 5 16 6 23 7 30 8 35 Academic Reading Raw Band Score Score out of 40 5 15 6 23 7 30 8 35 General Training Reading Raw Band Score Score out of 40 4 15 5 23 6 30 7 34

REVIEW OF IELTS, TOEFL AND MET 8 The Writing and Speaking sections are rated by IELTS raters who are trained and certified by Cambridge ESOL. The writing and speaking sections are awarded band scores using detailed performance descriptors that weight each of the following criterion equally: Writing: task achievement/task response; coherence and cohesion; lexical resource; and grammatical range and accuracy. Speaking: fluency and coherence; lexical resource; grammatical range and accuracy; pronunciation. Statistical Distribution of Scores The distribution scores and standard error of measurement were provided only for the listening, Academic module reading, and General Training module reading sections of the IELTS. According to the 2011 IELTS Researchers Test Performance report, The reliability of the Writing and Speaking modules cannot be reported in the same manner as for Reading/Listening because they are not item-based; candidates' writing and speaking performances are rated by trained and standardized examiners according to detailed descriptive criteria and rating scales. Mean SD SEM Listening 6.1 1.3 0.39 Academic Reading 5.9 1.2 0.379 Standard Error of Measurement General Training Reading 5.7 1.5 0.424 The average standard error of measurement for the IELTS Listening, Academic Reading, and General Training Reading sections is.3976. This translates into less than half a band score (IELTS Research Reports, 2009). Evidence of Reliability Reliability estimates of the listening and reading portions of the IELTS were provided for 2011 test results using Cronbach s alpha (IELTS Researchers, 2011). The average reliability estimates provided were 0.91 for the Listening Module, 0.92 for the General Training Reading module, and 0.90 for the Academic Reading module.

REVIEW OF IELTS, TOEFL AND MET 9 Evidence of Validity According to Breeze and Miller (2011), results from studies exploring the predictive validity of IELTS have produced varied and sometimes contradictory results; for example, one study (Allwright and Banerjee, 1997) concluded that a positive correlation existed between the English-medium academic success and overall IELTS band scores of international students, while others (e.g., Cotton and Conrow, 1998) discovered minimal correlation between students academic success and overall band scores. Positive correlations between academic performance and the Academic Reading module, however, have been more readily found (e.g., Dooey & Oliver, 2002, as cited in Breeze & Miller, 2011).

REVIEW OF IELTS, TOEFL AND MET 10 Michigan English Test (MET) Publisher: Cambridge Michigan Language Assessments, Argus 1 Building, 535 West, William St., Suite 310, Ann Arbor, Michigan, 48103-4978 USA; Phone: +1 734.615.9446; Fax: +1 734.615.6586; met@cambridgemichigan.org; www.cambridgemichigan.org Publication Date: 2008 Target Population: Cost: Adult and adolescent speakers of English Varies by testing center; exact cost N/A Overview The MET is a test of general English language proficiency developed by the Cambridge Michigan Language Assessments (CaMLA) and was first administered in 2008 (MET Test Administration, 2011). It purports to measure English language ability ranging from high beginner to low advanced in personal, public, occupational, and educational contexts. The MET developers claim that it s utility is multi-faceted in that it can be used for educational purposes, such as when finishing an English language course, or for employment purposes, such as applying for a job or pursuing promotion that requires an English language qualification (pg. 1). Table 2 provides an overview of the purpose, structure, scoring, statistical distribution of scores, standard error of measurement, and evidence of reliability and validity of the MET. Table 2 Extended description of MET Exam Test Purpose The Michigan English Test is purported to be for adult and adolescent speakers of English who wish to assess their general English language proficiency in social, educational, and workplace contexts. It emphasizes the ability of the examinee to communicate effectively in English (MET: Information, 2012, p. 1). The test publishers claim that the MET can be used for both employment and educational purposes, but that it is not intended to be used as an admissions test for students applying to colleges or universities. Test Structure The MET consists of two sections. Section I assesses listening skills, and consists of multiple-choice questions based on conversations and talks in workplace, social, and educational settings. Test-takers are given 45 minutes to complete 60 items in this section. Section II assesses reading and grammar, and consists of two parts. The first part has 25 multiplechoice grammar questions, and the second part has 50 multiple-choice questions based on written texts in workplace, social and educational settings. Test-takers are given 90 minutes to complete this section.

REVIEW OF IELTS, TOEFL AND MET 11 Scoring The MET is a pencil-and-paper test, and test-takers record test answers on answer sheets designed specifically for the test. These are scanned. Correct answers carry equal weight within each section. No points are deducted for incorrect answers. Test-takers typically receive score reports within four weeks of scoring, and test scores are valid for up to two years following test administration. The MET does not report scores based on percentages, and there is no cut score. Rather, in order to ensure the comparability of test scores across test-takers and test administrations, scaled scores are assigned in order to give an indication of where test-takers lie on the scale of language ability. The maximum score is 80 for both sections I and II, and the final score is the sum total of the two sections for a possible total score of 160. Scaled MET scores relate to the Common European Framework of Reference (CEFR). Six levels of language ability are described by the CEFR: A1-A2: Basic User B1-B2: Independent User C1-C2: Proficient User Following is a table that outlines the MET scaled scores as they correspond to the CEFR levels. CEFR Level MET Section 1 Scaled Score MET Section II Scaled Score A2 39 or below 39 or below B1 40-52 40-52 B2 53-63 53-63 C1 64 and above 64 and above Statistical Distribution of Scores Standard Error of Measurement Information concerning the statistical distribution of MET scores was not available, nor delivered upon request. Estimates of the reliability and standard error of measurement (SEM) were provided in the MET: Test Administration Report (2011). These estimates were based upon scaled scores reported from each month of distribution. The following standard error of measurement figures were averaged based on reliability scores provided for the months of January through November, 2011 (MET Test, 2011).

REVIEW OF IELTS, TOEFL AND MET 12 SEM Section 1 2.795 Section 2 2.35 Evidence of Reliability The following reliability estimates were averaged based on reliability estimates provided for the months of January through November, 2011 (MET Test, 2011). The report asserts that For high-stakes exams such as the MET, a reliability estimate of 0.80 and above is expected and acceptable (p. 4), and that All MET sections demonstrated very high reliability, between 0.91 and 0.94. Reliability Estimate Section 1 0.925 Section 2.931 Evidence of Validity Information concerning evidence of the validity of MET scores was not available, nor delivered upon request.

REVIEW OF IELTS, TOEFL AND MET 13 Test of English as a Foreign Language-Internet-Based (TOEFL-iBT) Publisher: Educational Testing Service; Mail Stop 33-L, Princeton, NJ 08541 USA; phone: 1-609-683-2008; toeflnews@ets.org Publication Date: Paper-based: 1964; Internet-based exam launched in 2005 Target Population: Adult speakers of English Cost: Varies between countries, typically $160 to $250 Overview The TOEFL test was first developed and administered in 1964 by the Educational Testing Service (ETS), and it is now offered online under the TOEFL-iBT brand. According to ETS, the TOEFL ibt is the most widely accepted English language assessment, used for admissions purposes in more than 130 countries (Enright, 2011, pg. 1). Since the test s development in 1964, developers have undertaken several major revisions informed by developments in theories of communicative language ability. The TOEFL ibt reflects this most recent revision. According to ETS, the online test includes integrated test tasks that engage multiple language skills to simulate language use in academic settings (p. 1), as well as test materials that reflect the reading and listening demands of real-world academic environments (pg. 1). Table 3 provides an overview of the purpose, structure, scoring, statistical distribution of scores, standard error of measurement, and evidence of reliability and validity of the TOEFL-iBT. Table 3 Extended description of TOEFL Exam Test Purpose The purpose of the TOEFL ibt test is to assess the English-language proficiency of non-native English speakers. Scores from the exam are most often used as a measure of the ability of international students to use English in an academic, English-medium environment. Test Structure The TOEFL ibt consists of four sections: reading, listening, speaking and writing. The reading section requires test-takers to respond to 3-4 passages that are taken from academic texts. Test-takers are given 60-80 minutes to respond to between 36 and 56 questions that are meant to evaluate test-takers ability to understand factual information, infer information from passages, understand vocabulary in context, and understand an author s purpose. The listening section requires test-takers to listen to lectures, conversations, and classroom discussions. Test-takers are given 60-90 minutes to respond to between 34 and 51 questions that are meant to measure test-taker s ability to comprehend main ideas or important details,

REVIEW OF IELTS, TOEFL AND MET 14 recognize a speaker s attitude or purpose, understand the organization of given information, and make inferences about or connections between pieces of information. The speaking section requires test-takers to both express an opinion on a familiar topic, as well as speak based on reading or listening tasks. Testtakers are given 20 minutes to respond to 6 tasks, two of which are independent, and four of which are integrated. The writing section requires test-takers to complete an integrated and an independent task. The integrated task asks test-takers to write an essay based on reading and listening tasks. The independent task asks test-takers to support an opinion in writing. Test-takers are given 50 minutes to respond to 2 tasks. Scoring The highest possible score for the TOEFL-iBT is 120; each of the four sections is scored on a scale of 0-30. Items from each of the sections are assigned scaled scores as follows: Skill Score Range Level Reading 0-30 Listening 0-30 Speaking 0-30 score scale Writing 0-30 score scale Total Score 0-120 High (22-30) Intermediate (15-21) Low (0-14) High (22-30) Intermediate (15-21) Low (0-14) Good (26-30) Fair (18-25) Limited (10-17) Weak (0-9) Good (24-30) Fair (17-23) Limited (1-16) ETS does not set cut scores for the exam; rather, institutions typically have minimum score requirements. Automated scoring and human rating is used in conjunction to provide scores for the two writing tasks. Content and meaning is assessed by a human rater, while linguistic features are evaluated using automated scoring techniques. ETS-certified raters score speaking responses. Each of the six tasks is rated on a scale from 0 to 4, and these are summed and converted to a scaled score of 0 to 30.

REVIEW OF IELTS, TOEFL AND MET 15 Statistical Distribution of Scores ETS reported the following means and standard deviations of test-takers who took the TOEFL-iBT between January and December of 2011 (Test and Score Data, 2012). Section Mean SD Reading 20.15 6.75 Listening 20.05 6.7 Speaking 20.4 4.6 Writing 21 5 Total 81.5 20.5 Standard Error of Measurement ETS reported standard error of measurement (SEM) values for each of the four sections, and for total test scores. These measures are based on operational data collect in 2007 (Enright & Tyson, 2011). Section SEM Reading 3.35 Listening 3.2 Speaking 1.62 Writing 2.76 Total 5.64 Evidence of Reliability ETS uses test specifications or a detailed operational definition of test characteristics (Enright & Tyson, 2011a, pg. 3) in order to ensure that scores are comparable across test forms and testing contexts. The following estimates of reliability are based on operational data collect in 2007 (Enright & Tyson, 2011a). Section Reliability Estimate Reading 0.85 Listening 0.85 Speaking 0.88 Writing 0.74 Total 0.94 In order to ensure inter-rater reliability, ETS claims to conduct daily calibration exercises with its human raters. These exercises include task familiarization, guidance on scoring the task, and practice on a range of responses (TOEFL ibt Test Scores, 2012).

REVIEW OF IELTS, TOEFL AND MET 16 Evidence of Validity TOEFL ibt has published a report on the evidence gathered in support of validity of the test (Enright & Tyson, 2011b). In this report, six propositions concerning validity were made and responded to. The first proposition that the content of the test is relevant to and representative of the kinds of tasks and written and oral texts that students encounter in college and university settings (p. 3) was given support through test developer s research in relation to language use at institutions of higher education that use English as the language of delivery. The second proposition that tasks and scoring criteria are appropriate for obtaining evidence of test takers academic language abilities (p. 3) was validated through numerous pilot tests and field studies that examined task design, test design, and rubric development. The third proposition that academic language proficiency is actually assessed through a combination of the strategies, processes, and linguistic knowledge utilized by test-takers while responding to tasks was validated through the examination of discourse characteristics of written and spoken responses and strategies used in answering reading comprehension questions (p. 3). The fourth proposition that the structure of the test is consistent with theoretical views of the relationships among English language skills (p. 3) was validated through conducting a statistical factor analysis of a test form distributed during a field study. The fifth proposition that test performance can be interpreted as in indication of academic language proficiency was demonstrated through the analysis of relationships between test scores and self-assessment, academic placements, local assessments of international teaching assistants, (and) performance on simulated academic listening tasks (p. 3). The sixth proposition that results of the test are used in an appropriate manner in order to create positive consequences for test-takers has been negotiated through a long-term empirical study of test washback, the development and publication of materials that test-takers can use in order to prepare themselves to take the test, as well as materials made available for test user to interpret test scores.

REVIEW OF IELTS, TOEFL AND MET 17 Discussion As a language teacher who will inevitably need to both recommend and help students prepare for proficiency exams, it is important to be able to evaluate the ability of these exams to accurately assess test-takers English language ability. The evaluation of these components can lead to more informed designations of a test s usefulness, and this is what I have attempted to do in my review of these exams. My choice to review the IELTS, MET, and TOEFL ibt is based upon the context in which I foresee myself teaching English in the near future. As mentioned previously, I hope to return to South Korea to teach English as a foreign language to university-level students. Many of these students will likely be planning to take part in study abroad programs or enter into 4-year or graduate degree programs in an English speaking country, and they will at some point need to take a large-scale English proficiency exam in order to gain entry into English-medium academic programs. As a result of my evaluations and the available evidence of reliability and validity, I have concluded that the IELTS is the most appropriate large-scale proficiency exam for the context in which I plan to teach. Both the IELTS and TOEFL-iBT are similar in terms of cost, time needed to take the test, global distribution of testing centers, and organizational acceptance of scores. However, as mentioned by Uysal (2010), the IELTS is unique from the MET, TOEFL ibt, and other large-scale proficiency exams in that it assesses English as an international language. This is certainly ideologically appealing, as increasingly English is discussed in terms of its status as a lingua franca, and communicative competence is viewed as the ability to engage in meaningful exchanges with an array of speakers in a variety of contexts, rather than one s ability to reproduce only standard or conventional forms of the language (Seidlhofer, 2011).

REVIEW OF IELTS, TOEFL AND MET 18 Ideals are not enough, though, in justifying the usefulness of a test, and thus it is here that I point to the rich amount of research that is conducted and published each year (e.g., IELTS Research Reports series, published since 1998) concerning such varied aspects of the IELTS s development, improvement, reliability, and validity. This plentiful, ongoing research highlights the continued development and quality of the IELTS exam. While ETS similarly publishes research reports on the TOEFL ibt in order to address such issues as reliability and validity (e.g., Enright & Tyson, 2011a), these reports are admittedly designed for the consumption of both researchers and the general public, and are thus less transparent than the IELTS Research Reports in terms of how research was gathered and evaluated. By contrast, there was little information available anywhere in support of the usefulness of the MET. Upon first reading the purpose statement of the MET, I was skeptical of the exam s ability to reliably and validly measure the very inclusive description of skills it purports to measure. Test materials claim that it is designed to measure general English language proficiency in social, educational, and workplace contexts, but unlike the IELTS exam with its separate Academic and General forms test-takers using the results for both academic and professional purposes complete the same types of tasks. This certainly raises issues of construct validity, as academic language ability is often viewed as different than general or social language ability; and I was not, in fact, able to find or acquire (the test publisher did not respond to my request) any published evidence of the statistical reliability or validity of the test. Interestingly, I also could not find any testing centers in the U.S. who administer the MET exam, or institutions that accept its results, and perhaps this lack of visibility is due to the MET s want of evidence of validity and reliability.

REVIEW OF IELTS, TOEFL AND MET 19 References Bachman, L. F., & Palmer, A. S. (1996). Language Testing in Practice. Oxford University Press: Oxford. Breeze, R. & Miller, P. (2011). Report 5: Predictive validity of the IELTS listening test as an indicator of student coping ability in Spain. Retrieved from http://www.ielts.org/pdf/vol12_report_5.pdf. Enright, M. (2011). TOEFL Program History. Retrieved from http://www.ets.org/s/toefl/pdf/toefl_ibt_insight_s1v6.pdf. Enright, M., & Tyson, E. (2011a). Reliability and Comparability of TOEFL ibt Scores. Retrieved from http://www.ets.org/s/toefl/pdf/toefl_ibt_research_s1v3.pdf. Enright, M., & Tyson, E. (2011b). Validity evidence supporting the interpretation and use of TOEFL ibt scores. Retrieved from http://www.ets.org/s/toefl/pdf/toefl_ibt_insight_s1v4.pdf. IELTS guide for educational institutions, governments, professional bodies and commercial organizations. (2011). Retrieved from http://www.ielts.org/pdf/guide%20for%20institutions%20and%20organisations%202011.p df IELTS guide for teachers. (2012). Retrieved from http://www.ielts.org/pdf/ielts_guide_for_teachers_britishenglish_web.pdf IELTS Researchers Test Performance-2011. (2011). Retrieved from http://www.ielts.org/researchers/analysis_of_test_data/test_performance_2011-1.aspx MET: Information Bulletin. (2012). Retrieved from http://www.cambridgemichigan.org/sites/default/files/resources/met_ib.pdf

REVIEW OF IELTS, TOEFL AND MET 20 MET: Test Administration Report. (2011). Retrieved from http://www.cambridgemichigan.org/sites/default/files/resources/programreports/met_rep ort_2011.pdf. Seidlhofer, B. (2011). Understanding English as a lingua franca. Oxford University Press: Oxford. Stoynoff, S., & Chapelle, C. A. (2005). ESOL tests and testing: A resource for teachers and program administrators. Alexandria, VA: TESOL Publications. Test and Score Data Summary for TOEFLiBT Tests and TOEFL PBT Tests. (2012). Retrieved from http://www.ets.org/s/toefl/pdf/94227_unlweb.pdf. TOEFL ibt Test Scores (2012). Retrieved from http://www.ets.org/toefl/ibt/scores/. Uysal, H. H. (2010). A critical review of the IELTS writing test. ELT Journal, 64(3), 314-320.