CORPORA IN LEXICOGRAPHY (PART ONE)

Similar documents
Some Reflections on the Making of the Progressive English Collocations Dictionary

EFL Learners Synonymous Errors: A Case Study of Glad and Happy

The Oxford Learner s Dictionary of Academic English

Corpus and Discourse. The Web As Corpus. Theory and Practice MARISTELLA GATTO LONDON NEW DELHI NEW YORK SYDNEY

Sketch Engine. Sketch Engine. SRDANOVIĆ ERJAVEC Irena, Web 1 Word Sketch Thesaurus Sketch Difference Sketch Engine

ONLINE ENGLISH LANGUAGE RESOURCES

Oxford and the Dictionary

The growing prosperity of on-line dictionaries

j A Handbook of Lexicography

User studies, user behaviour and user involvement evidence and experience from The Danish Dictionary

Reference Books. (1) English-English Dictionaries. Fiona Ross FindYourFeet.de

Download Check My Words from:

COLLOCATION TOOLS FOR L2 WRITERS 1

ASSOCIATING COLLOCATIONS WITH DICTIONARY SENSES

CORPUS VS DICTIONARY IN EFL CLASSES

ELECTRONIC DICTIONARIES AND ESP STUDENTS

George Mikros, Villy Tsakona, Maria Drakopoulou, Alexandra Koutra, Evangelia Triantafylli and Sofia Trypanagnostopoulou University of Athens

English for Business Communications (8959) Marking Guide for Tutors

Online Law Dictionaries: How to Provide Help for EFL Text Production by Law Students

OVERSEAS TEACHERS COURSES

Beyond single words: the most frequent collocations in spoken English

Some remarks on recent trends in lexicography

Review Article. Current lexicographical tools in EFL: monolingual resources for the advanced learner

Dictionary workshop 1. Introduction 2. Dictionaries monolingual dictionaries bilingual dictionaries source language equivalents target language

ESCUELA OFICIAL DE IDIOMAS DE PAMPLONA IRUÑEAKO HIZKUNTZA ESKOLA OFIZIALA DEPARTAMENTO DE INGLES INGELESA SAILA BIBLIOGRAFIA NIVEL BÁSICO 1

Investigating the usefulness of lexical phrases in contemporary coursebooks

Real-Time Identification of MWE Candidates in Databases from the BNC and the Web

Grammar in Dictionaries of Languages for Special Purposes

Department of English Course Description

Glossary of translation tool types

Register Differences between Prefabs in Native and EFL English

THE UNIVERSITY OF BIRMINGHAM. English Language & Applied Linguistics SECOND TERM ESSAY

UK Data Archive Data Dictionary

Libros de consulta recomendados para el Nivel Básico

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features


IELTS ONLINE PRACTICE TEST FREE SAMPLE

Macmillan Practice Online guide for teachers

GRAMMAR, SYNTAX, AND ENGLISH LANGUAGE LEARNERS

Syllabus: a list of items to be covered in a course / a set of headings. Language syllabus: language elements and linguistic or behavioral skills

Presenting Examples in Learners Dictionaries to Assist Chinese Learners in Writing English Texts

Teaching Framework. Framework components

COMPILING A DICTIONARY OF COLLOCATION: PROBLEMS AND SOLUTIONS

A Survey of Online Tools Used in English-Thai and Thai-English Translation by Thai Students

ENCODING REFERENCE WORKS

Student Handbook. Part C Courses & Examinations

Most EFL teachers in Japan find that there are groups of verbs which consistently

TOP 10 RESOURCES FOR TEACHERS OF ENGLISH LANGUAGE LEARNERS. Melissa McGavock Director of Bilingual Education

MASTER OF PHILOSOPHY IN ENGLISH AND APPLIED LINGUISTICS

Master of Arts in Linguistics Syllabus

行 政 院 國 家 科 學 委 員 會 專 題 研 究 計 畫 成 果 報 告

Problems and Measures Regarding Waste 1 Management and 3R Era of public health improvement Situation subsequent to the Meiji Restoration

Challenges in Internalisation

An Overview of Applied Linguistics

FOREIGN LANGUAGE PROGRAMME PREPARATION FOR THE TOEIC EXAM

counterargument with a prerequisite for completing undergraduate studies. A description is usually arranged spatially but can also.

Teaching terms: a corpus-based approach to terminology in ESP classes

Information for candidates For exams from 2015

Differences in linguistic and discourse features of narrative writing performance. Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu 3

Introducing TshwaneLex A New Computer Program for the Compilation of Dictionaries

AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom

ICAME Journal No. 24. Reviews

Online dictionaries of English

Class contents and exam requirements Code (20421) English Language, Second language B1 business

Automatically Generated Customizable Online Dictionaries

Straightforward Practice Online guide for Teachers

The Facilitating Role of L1 in ESL Classes

Why major in linguistics (and what does a linguist do)?

21st CENTURY ENGLISH VOCABULARY

Date submitted: June 1, 2011

What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project

BASIC TERMS. 1) barrister lawyer who presents a case to a higher court. 2) solicitor lawyer who advises clients

Simple maths for keywords

AN ONLINE WRITING SUPPORT INITIATIVE FOR FIRST-YEAR INFORMATION TECHNOLOGY STUDENTS

The Rise of Documentary Linguistics and a New Kind of Corpus

Problems and Possibilities of Information Technology in the Initial Teacher Training of Teachers of English and Modern Languages

COLLOCATIONS AND EXAMPLES OF USE: A LEXICAL-SEMANTIC APPROACH TO TERMINOLOGY

The PALAVRAS parser and its Linguateca applications - a mutually productive relationship

CHINESE SECOND LANGUAGE

On Connotation, Denotation and All That, or: Why a Nigger Is Not a Black Person

Collecting Polish German Parallel Corpora in the Internet

A Tribute to John McHardy Sinclair (14 June March 2007)

AN ONLINE SPANISH LEARNERS DICTIONARY: THE DAELE PROJECT

24 Cambridge grammar English: A comprehensive guide spoken and Carter, Ronald K 112. written English grammar and usage

icompilecorpora: A Web-based Application to Semi-automatically Compile Multilingual Comparable Corpora

2009- Acıbadem University, Foreign Languages Department Foreign Languages Coordinator

CHINESE SECOND LANGUAGE ADVANCED

Oxford Dictionary of Current Idiomatic English

Get Ready for IELTS Writing. About Get Ready for IELTS Writing. Part 1: Language development. Part 2: Skills development. Part 3: Exam practice

GRASP: Grammar- and Syntax-based Pattern-Finder for Collocation and Phrase Learning

Reading Listening and speaking Writing. Reading Listening and speaking Writing. Grammar in context: present Identifying the relevance of

Special Topics in Computer Science

INSTITUTO POLITÉCNICO DE VISEU ESCOLA SUPERIOR DE TECNOLOGIA E GESTÃO. Unidade Curricular. Ano 1.º Semestre 1.º. Ano lectivo ECTS 4

The Teacher as Student in ESP Course Design

International summer schools for teachers of English and EFL teacher trainers

An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System

Developing a User-based Method of Web Register Classification

IELTS ONLINE PRACTICE TEST FREE SAMPLE

ENGLISH ANTONYMS DICTIONARY PDF

Transcription:

Aston Corpus Summer School 2011 CORPORA IN LEXICOGRAPHY (PART ONE) Iztok Kosem Trojina, Institute for Applied Slovene Studies Ljubljana, Slovenia Contact: iztok.kosem@trojina.si

Lexicography The art of compiling, writing and editing dictionaries (Wikipedia) Lexicographer: "writer of dictionaries; a harmless drudge (Johnson) "LEXICOGRAPHER, n. A pestilent fellow who, under the pretense of recording some particular stage in the development of a language, does what he can to arrest its growth, stiffen its flexibility and mechanize its methods." (Ambrose Bierce, The Devil's Dictionary, 1911) First discipline to fully utilize corpus data High demands of lexicographers pushing the functionality of corpus tools

Exercise 1 In the Sketch Engine, select the CAJA corpus. Do the simple search for the lemma authority. Make a sample of 50 concordances (use the function Sample in the Sketch Engine). Analyse the concordances. If you were making a dictionary entry, how many different senses of authority would you record?

Corpora in lexicography Pre-computer age: Samuel Johnson s dictionary (1755), OED (1928), Noah Webster s dictionary (1828) Index cards (e.g. OED had 20 million index cards, 5 million citations) Gathering citations by hand, bias towards atypical 1980s COBUILD Corpus: 18M words (initially 7M), written & spoken (UK & US) Corpus-driven approach! Dictionaries, grammars, word bank Others followed: Longman, Cambridge, Oxford, Macmillan

Corpora in lexicography 1990s-2000s: large corpora: greater depth of analysis, variety of texts, statistical accuracy British National Corpus (100 million words; UK) Bank of English (520 million words; UK, US, Aus, Can) 2000- : huge corpora Web used as data source Corpus collections of publishing houses:

Oxford English Corpus (2 billion words)

Cambridge International Corpus (1,153 billion) spoken AmE, 30 written AmE - academic, 9 written AmE - business, 40 written AmE, 275 spoken BrE - business, 1 written BrE - business, 60 written BrE - academic, 20 spoken BrE, 18 written BrE, 700

Longman Corpus Network (330 million words) Longman/Lancas ter Corpus, 30 Longman Learners Corpus, 10?, 138 Longman Written American Corpus, 100 PICAE, 37 Longman Spoken American Corpus, 5 Spoken British National Corpus, 10

Corpora in lexicography Bilingual lexicography: Corpora used less than in monolingual lexicography Oxford Hachette English-French French-English dictionary (1994) Comprehensive English-Slovene dictionary (2005) Slovene content Reference corpus of Slovene (FIDA) English content Reference corpus of English (BNC) parallel corpora used more and more

Corpus-based vs. corpus-driven

Corpus-based vs. corpus-driven The results of [corpus] analysis are incorporated into specially designed usage notes and study pages in Cambridge dictionaries In addition, dictionary examples illustrating word use can be taken from the corpus, making them sound natural and realistic. (Cambridge University Press) A corpus-driven approach involves a bottom-up methodology, beginning by selecting unedited examples from the corpus, identifying their shared and individual features, and only then grouping them for the purpose of lexicographic presentation. (Krishnamurthy, 2008: 231)

Corpus information in dictionary entries Headword list examples Labels Examples Phrases Collocations definitions like COBUILD Usage notes

Longman Dictionary of Contemporary English

Longman Exams Dictionary

result n. (Macmillan English Dictionary)

Trends in modern lexicography Large corpora more data to analyse More data better for computers (to exclude noise) Automating as much as possible Automatic data collection and annotation (WebBootCat, Baroni et al., 2006) Identifying salient data and presenting them to the lexicographer Lexicographer validates the data, makes the final selection Technology: from supportive to proactive role (Rundell, 2011)