# Simple maths for keywords

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all list, but different lists according to the frequency range the user is interested in. The method includes a variable which allows the user to focus on higher or lower frequency words. This word is twice as common here as there. Such observations are entirely central to corpus linguistics. We very often want to know which words are distinctive of one corpus, or text type, versus another. The simplest way to make the comparison is expressed in my opening sentence. Twice as common means the word s frequency (per thousand words, or million words) in the one corpus is twice its frequency in the other. We count occurrences in each corpus, divide each number by the number of words in that corpus, optionally multiply by 1,000 or 1,000,000 to give frequencies per thousand or million, and divide the one number by the other to give a ratio. (Since the thousands or millions cancel out when we do the division, it makes no difference whether we use thousands or millions. In the below I will assume millions and will use wpm for words per million, as in other sciences which often use parts per million. ) It is often instructive to find the ratio for all words, and to sort words by the ratio to find the words that are most associated with each corpus as against the other. This will give a first pass at two keywords lists, one (taken from the top of the sorted list) of corpus1 vs corpus2, and the other, taken from the bottom of the list (with scores below 1 and getting close to 0), for corpus2 vs corpus1. (In the below I will refer to the two corpora as the focus corpus fc, for which we want to find keywords, and the reference corpus rc: we divide relative frequency in the focus corpus by relative frequency in the reference corpus and are interested in the high-scoring words.) There are four problems with keyword lists prepared in this way. 1. All corpora are different, usually in a multitude of ways. We probably want to examine a keyword list because of one particular dimension of difference between fc and rc perhaps a difference of genre, or of region, or of domain. The list may well be dominated by other differences, which we are not at all interested in. Keyword lists tend to work best where the corpora are very well matched in all regards except the one in question. This is a question of how fc and rc have been prepared. It is often the greatest source of bewilderment when users see keyword lists (and makes keyword lists good tools for identifying the characteristics of corpora). However it is an issue of corpus construction, which is not the topic of this paper, so it is not discussed further. 2. Burstiness. If a word is the topic for one text in the corpus, it may well be used many times in that text with its frequency in that corpus mainly coming

2 from just one text. Such bursty words do a poor job of representing the overall contrast between fc and rc. This, again, is not the topic of this paper. A range of solutions have been proposed, as reviewed by Gries (2007). The method we use in our experiments is average reduced frequency (ARF, Savický and Hlaváčová 2002) which discounts frequency for words with bursty distributions: for a word with an even distribution across a corpus, ARF will be equal to raw frequency, but for a word with a very bursty distribution, only occurring in a single short text, ARF will be a little over You can t divide by zero. It is not clear what to do about words which are present in fc but absent in rc. 4. Even setting aside the zero cases, the list will be dominated by words with very few words in the reference corpus: there is nothing very surprising about a contrast between 10 in fc and 1 in rc, giving a ratio of 10, and we expect to find many such cases, but we would be very surprised to find words with frequency per million of 10,000 in fc and only 1,000 in rc, even though that also gives a ratio of 10. Simple ratios will give a list of rarer words. The last problem has been the launching point for an extensive literature. The literature is shared with the literature on collocation statistics, since formally, the problems are similar: in both cases we compare the frequency of the keyword in condition 1 (which is either in fc or with collocate x ) with frequency in condition 2 ( in rc or not with collocate x ). The literature starts with Church and Hanks (1989) and other much-cited references include Dunning (1993), Pederson. Proposed statistics include Mutual Information (MI), Log Likelihood and Fisher s Exact Test, see Chapter X of Manning and Schütze (1999). I have argued elsewhere (Kilgarriff 2005) that the mathematical sophistication of MI, Log Likelihood and Fisher s Exact Test is of no value to us, since all it serves to do is to disprove a null hypothesis - that language is random - which is patently untrue. Sophisticated maths needs a null hypothesis to build on and we have no null hypothesis: perhaps we can meet our needs with simple maths. A common solution to the zeros problem is add one. If we add one to all the frequencies, including those for words which were present in fc but absent in rc, then we have no zeros and can compute a ratio for all words. A word with 10 wpm in fc and none in rc gets a ratio of 11:1 (as we add 1 to 10 and 1 to 0) or 11. Add one is widely used as a solution to a range of problems associated with low and zero frequency counts, in language technology and elsewhere (Manning and Schütze 1999). Add one (to all counts) is the simplest variant: there are sometimes reasons for adding some other constant, or variable amount, to all frequencies. This suggests a solution to problem 4. Consider what happens when we add 1, 100, or 1000 to all counts-per-million from both corpora. The results, for the three

4 (http://www.sketchengine.co.uk) After some experimentation, we have set a default value of N=100, but it is a parameter that the user can change: for rarer phenomena like collocations, a lower setting seems appropriate. i To demonstrate the method we use BAWE, the British Academic Written English corpus (Nesi 2008). It is a carefully structured corpus so, for the most part, two different subcorpora will vary only on the dimension of interest and not on other dimensions, cf problem 1 above. The corpus comprises student essays, of which a quarter are Arts and Humanities (ArtsHum) and a quarter are Social Sciences (SocSci). Fig 1 shows keywords of ArtsHum vs. SocSci with parameter N=10. Fig 1: Keywords for ArtsHum compared with SocSci essays in BAWE, with parameter N=10, using the Sketch Engine.

5 We see here a range of nouns and adjectives relating to the subject matter of the arts and humanities, and which are, plausibly, less often discussed in the social sciences. Fig 2: Keywords for ArtsHum compared with SocSci essays in BAWE, with parameter N=1000, using the Sketch Engine. Fig. 2 shows the same comparison but with N=1000. Just one word, god (lowercased in the lemmatisation process) is in the top-15 samples of both lists, and Fig 1 has historian against Fig 2 s history. Many of the words in both lists are from the linguistics domain, with them providing corroborating evidence for this being distinctive of ArtsHum in BAWE, but none are shared. Fig 1 has the less-frequent verb, noun, adjective, pronoun, tense, native and speaker (the last two possibly in the compound native speaker) whereas Fig 2 has the closely-allied but more-frequent language, word, write, english and object. (Object may be in this list by virtue of its linguistic meaning, the object of the verb, or its general or verbal meanings, the object of the exercise, he objected ; this would require further examination to resolve.) The most striking feature of the Fig 2 is the four third-person and one first-person pronouns. Whereas the first list told us about recurring topics in ArtsHum vs. SocSci, the second also tells us about a grammatical contrast between the two varieties. In summary I have summarised the challenges in finding keywords of one corpus vs. another. I have proposed a method which is both simpler than other proposals and which has the advantage of a parameter which allows the user to specify whether they want to focus on higher-frequency or lower-frequency keywords. I have pointed out that there are

6 no theoretical reasons for using more sophisticated maths and I have demonstrated the proposed method with corpus data. The new method is implemented in the Sketch Engine corpus query tool and I hope it will be implemented in other tools shortly. References Church, K. and P. Hanks Word association norms, mutual information and lexicography. Computational Linguistics 16(1), 22_29. Dunning, T Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), Gries, S Dispersion and Adjusted Frequencies in Corpora. Corpus Linguistics 13 (4), pp Kilgarriff, A Language is never ever ever random. Corpus Linguistics and Linguistic Theory 1 (2): Manning, C. and H. Schütze Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA. Nesi, H. (2008) BAWE: An introduction to a new resource. In Proc. 8th Teaching and Language Corpora Conference. Lisbon, Portugal: Pedersen, T Fishing for exactness. Proceedings of the Conference of the South-Central SAS Users Group, P. Savický and J. Hlaváčová Measures of word commonness. Journal of Quantitative Linguistics, 9: i Note that the value of the parameter is related to the value of one million that we use for normalising frequencies. If we used words per thousand rather than words per million, the parameter would need scaling accordingly.

### Terminology finding, parallel corpora and bilingual word sketches in the Sketch Engine

Terminology finding, parallel corpora and bilingual word sketches in the Sketch Engine Adam Kilgarriff adam@lexmasterclass.com Lexical Computing Ltd., Brighton, UK The Sketch Engine is a leading corpus

### Studying the role of as in consider and regard sentences with syntactic weight and educational materials

Studying the role of as in consider and regard sentences with syntactic weight and educational materials Tetsu NARITA Introduction When we say consider or regard in conversation, we either do or do not

### Chi-square test considered harmful: Better methods for testing the significance of word frequencies

: Better methods for testing the significance of word frequencies Jefrey Lijffijt *, Tanja Säily **, Terttu Nevalainen ** * Department of Information and Computer Science, Aalto University ** Department

### Tickbox Lexicography. Adam Kilgarriff 1, Vojtěch Kovář 2, Pavel Rychlý 3

Tickbox Lexicography Adam Kilgarriff 1, Vojtěch Kovář 2, Pavel Rychlý 3 Lexical Computing Ltd, Masaryk University Abstract Corpus lexicography involves, first, an analysis of a word, and then, copying

### The Oxford Learner s Dictionary of Academic English

ISEJ Advertorial The Oxford Learner s Dictionary of Academic English Oxford University Press The Oxford Learner s Dictionary of Academic English (OLDAE) is a brand new learner s dictionary aimed at students

### Collocations. The definition of Collocations Frequency Mean and Variance

Collocations The definition of Collocations Frequency Mean and Variance Collocations Collocations of a given word are statements of the habitual or customary places of that word J.R. Firth Examples of

### Finding Multiwords of More Than Two Words 1

Finding Multiwords of More Than Two Words 1 Adam Kilgarriff, Pavel Rychlý, Vojtěch Kovář & Vít Baisa Lexical Computing Ltd., Brighton, United Kingdom NLP Centre, Faculty of Informatics, Masaryk University,

### What belongs in a dictionary? The Example of Negation in Czech

What belongs in a dictionary? The Example of Negation in Czech Dominika Kovarikova, Lucie Chlumska & Vaclav Cvrcek Keywords: negation, lexicography, grammatical category, frequency, lemmatization. Abstract

### Afterword. Reflecting on Interaction and English Grammar. Now that you have finished this look at how interaction affects English grammar,

330 Afterword Reflecting on Interaction and English Grammar Now that you have finished this look at how interaction affects English grammar, what are the new insights that you have about how English works?

### Chi-Square Test. J. Savoy Université de Neuchâtel

Chi-Square Test J. Savoy Université de Neuchâtel C. D. Manning & H. Schütze : Foundations of statistical natural language processing. The MIT Press. Cambridge (MA) 1 Discriminating Features How can we

### Cambridge English: Proficiency (CPE) Frequently Asked Questions (FAQs)

Cambridge English: Proficiency (CPE) Frequently Asked Questions (FAQs) Is there a wordlist for Cambridge English: Proficiency exams? No. Examinations that are at CEFR Level B2 (independent user), or above

### Real-Time Identification of MWE Candidates in Databases from the BNC and the Web

Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Identifying and Researching Multi-Word Units British Association for Applied Linguistics Corpus Linguistics SIG Oxford Text

### Unit 17 Collocation and pedagogical lexicography (Case study 1) 17.1 Introduction

Unit 17 Collocation and pedagogical lexicography (Case study 1) 17.1 Introduction We introduced collocation statistics in unit 6.5 and discussed the use of corpora in lexicographic and collocation studies

### Unlock the Oxford 3000

Unlock the Oxford 3000 The words students need to know to succeed in English Which words should students learn to succeed in English? Patrick White, Head of Dictionaries and Reference Grammar in the English

### Information Retrieval Statistics of Text

Information Retrieval Statistics of Text James Allan University of Massachusetts, Amherst (NTU ST770-A) Fall 2002 All slides copyright Bruce Croft and/or James Allan Outline Zipf distribution Vocabulary

### Differences in linguistic and discourse features of narrative writing performance. Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu 3

Yıl/Year: 2012 Cilt/Volume: 1 Sayı/Issue:2 Sayfalar/Pages: 40-47 Differences in linguistic and discourse features of narrative writing performance Abstract Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu

### DRAFT! c January 7, 1999 Christopher Manning & Hinrich Schütze. 141. 5 Collocations

DRAFT! c January 7, 1999 Christopher Manning & Hinrich Schütze. 141 5 Collocations COMPOSITIONALITY TERM TECHNICAL TERM TERMINOLOGICAL PHRASE A COLLOCATION is an expression consisting of two or more words

### Bilingual Word Sketches: the translate Button

Bilingual Word Sketches: the translate Button Vít Baisa, Miloš Jakubíček, Adam Kilgarriff, Vojtěch Kovář, Pavel Rychlý Lexical Computing Ltd, UK; Faculty of Informatics, Masaryk University, Czech Republic

### Sentences and paragraphs

Sentences and paragraphs From the Skills Team, University of Hull What is a sentence? This is not an easy question to answer but we shall try to define the main characteristics of a sentence in the written

### A Swedish Academic Word List: Methods and Data

A Swedish Academic Word List: Methods and Data Håkan Jansson, Sofie Johansson Kokkinakis, Judy Ribeck & Emma Sköldberg Keywords: Swedish language, language learning and teaching, academic vocabulary, corpus-based

### DEVELOPMENT AND ANALYSIS OF HINDI-URDU PARALLEL CORPUS

DEVELOPMENT AND ANALYSIS OF HINDI-URDU PARALLEL CORPUS Mandeep Kaur GNE, Ludhiana Ludhiana, India Navdeep Kaur SLIET, Longowal Sangrur, India Abstract This paper deals with Development and Analysis of

### Describing and Explaining Grammar and Vocabulary in ELT, Dilin Liu. Routledge, New York (2014). xxii pp., ISBN: (pbk).

Iranian Journal of Language Teaching Research 3(2), (July, 2015) 123-127 123 Content list available at www.urmia.ac.ir/ijltr Iranian Journal of Language Teaching Research (Book Review) Urmia University

### ESOL 94S- Ford Section 3.1 & 3.2: Input Analysis

ESOL 94S- Ford Section 3.1 & 3.2: Input Analysis 3.1: Variation in Language The English language, a phrase heard very frequently, gives the impression that English is one uniform system of communication

### REGISTER ANALYSIS OF ENGLISH BIOLOGY TEXTS: A CORPUS-BASED EXPLORATORY STUDY OF GRAMMAR

WoPaLP, Vol. 7, 2013 Borza 29 REGISTER ANALYSIS OF ENGLISH BIOLOGY TEXTS: A CORPUS-BASED EXPLORATORY STUDY OF GRAMMAR Natália Borza Language Pedagogy PhD Program, Eötvös Loránd University, Budapest nataliaborza@gmail.com

### stress, intonation and pauses and pronounce English sounds correctly. (b) To speak accurately to the listener(s) about one s thoughts and feelings,

Section 9 Foreign Languages I. OVERALL OBJECTIVE To develop students basic communication abilities such as listening, speaking, reading and writing, deepening their understanding of language and culture

### Phrases. Topics for Today. Phrases. POS Tagging. ! Text transformation. ! Text processing issues

Topics for Today! Text transformation Word occurrence statistics Tokenizing Stopping and stemming Phrases Document structure Link analysis Information extraction Internationalization Phrases! Many queries

### 12 FIRST QUARTER. Class Assignments

12 FIRST QUARTER Class Assignments August 6- Handout textbooks. August 7- Overview of the course. Go over senior dates. Go over class syllabus. August 10- Part 2 Chapter 1 Parts of Speech. Nouns. Daily

### Producing a dictionary of collocations

Ústav Českého národního korpusu Czech National Corpus Institute Producing a dictionary of collocations Michael Rundell Macmillan Dictionaries and Lexicography MasterClass Outline Why a collocations dictionary?

### VERIFYING HEAPS LAW USING GOOGLE BOOKS NGRAM DATA

VERIFYING HEAPS LAW USING GOOGLE BOOKS NGRAM DATA Vladimir V. Bochkarev, Eduard Yu.Lerner, Anna V. Shevlyakova Kazan Federal University, Kremlevskaya str.18, Kazan 420018, Russia E-mail: vladimir.bochkarev@kpfu.ru

### Chi Square (χ 2 ) Statistical Instructions EXP 3082L Jay Gould s Elaboration on Christensen and Evans (1980)

Chi Square (χ 2 ) Statistical Instructions EXP 3082L Jay Gould s Elaboration on Christensen and Evans (1980) For the Driver Behavior Study, the Chi Square Analysis II is the appropriate analysis below.

### Get Ready for IELTS Writing. About Get Ready for IELTS Writing. Part 1: Language development. Part 2: Skills development. Part 3: Exam practice

About Collins Get Ready for IELTS series has been designed to help learners at a pre-intermediate level (equivalent to band 3 or 4) to acquire the skills they need to achieve a higher score. It is easy

### Laying the Foundation: Important Terminology. encompasses: syntax, morphology, phonology and semantics )

REFERENCE GUIDE # 1 (MAKE A COPY OF THIS TO KEEP IN YOUR ENGLISH 101 FOLDER) BASIC GENERAL INFORMATION FOR REFERENCE IN MRS. WHITE S ENGLISH GRAMMAR 101 CLASS (3 PAGES) Laying the Foundation: Important

### Two Correlated Proportions (McNemar Test)

Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

### Assessing Writing Performance Level B1

Assessing Writing Performance Level B1 Writing assessment by examiners in the Cambridge English: Preliminary (PET), Preliminary (PET) for Schools and Business Preliminary exams (BEC) Cambridge English

### Age Tagging and Word Frequency for Learner s Dictionaries

Age Tagging and Word Frequency for Learner s Dictionaries Hanhong LI & Alex C. Fang Dialogue Systems Group Department of Chinese, Translation and Linguistics City University of Hong Kong Presented in American

### Academic Writing in English: a corpus-based inquiry into the linguistic characteristics of levels B1-C2

Academic Writing in English: a corpus-based inquiry into the linguistic characteristics of levels B1-C2 Rebecca Present-Thomas VU University Amsterdam Promotors: John H.A.L. de Jong Bert Weltens Background

### construct for language tests?

How valid is the CEFR as a construct for language tests? Cecilie Hamnes Carlsen 5 th ALTE conference, Paris, April 9 th 10 th 2014 [ ] the CEFR levels are neither based on empirical evidence taken from

### Permutation Test & Monte Carlo Sampling

Permutation Test & Monte Carlo Sampling by MA, Jianqiang March 18th, 2009 Outline Introduction to Permutation Test Permutation Test in Linguistics : Measuring Syntactic Differences Brief View of Monte

### National Quali cations EXEMPLAR PAPER ONLY

H National Quali cations EXEMPLAR PAPER ONLY EP42/H/02 Mark Spanish Directed Writing FOR OFFICIAL USE Date Not applicable Duration 1 hour and 40 minutes *EP42H02* Fill in these boxes and read what is printed

### The validity of the Linguistic Fingerprint in forensic investigation Rebecca Crankshaw (Linguistics)

The validity of the Linguistic Fingerprint in forensic investigation Rebecca Crankshaw (Linguistics) Abstract This article is concerned with the definition and validity of a linguistic fingerprint, a relatively

### Keywords academic writing phraseology dissertations online support international students

Phrasebank: a University-wide Online Writing Resource John Morley, Director of Academic Support Programmes, School of Languages, Linguistics and Cultures, The University of Manchester Summary A salient

### What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project

Proceedings of elex 2011, pp. 203-208 What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project Carolin Müller-Spitzer, Alexander Koplenig, Antje Töpel Institute

### Discourse Markers in English Writing

Discourse Markers in English Writing Li FENG Abstract Many devices, such as reference, substitution, ellipsis, and discourse marker, contribute to a discourse s cohesion and coherence. This paper focuses

### HFCC Math Lab Beginning Algebra 13 TRANSLATING ENGLISH INTO ALGEBRA: WORDS, PHRASE, SENTENCES

HFCC Math Lab Beginning Algebra 1 TRANSLATING ENGLISH INTO ALGEBRA: WORDS, PHRASE, SENTENCES Before being able to solve word problems in algebra, you must be able to change words, phrases, and sentences

### CHI-SQUARE TEST. The one-sample t test. Welch s two-sample t test in R. Welch s two-sample t test

The one-sample t test CHI-SQUARE TEST John Fry Boise State University One common statistic for hypothesis testing is the t statistic t = x µ s2 /N The t test looks at the mean x and variance s 2 of a sample

### Author Profiling: Predicting Age and Gender from Blogs

Author Profiling: Predicting Age and Gender from Blogs Notebook for PAN at CLEF 2013 K Santosh, Romil Bansal, Mihir Shekhar, and Vasudeva Varma International Institute of Information Technology, Hyderabad

### ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering

### Opportunities for multi-levelling

Course rationale: Senior NCEA ESOL This course is designed to meet the English language learning needs of students who are working at ELLP stage 1 and stage 2 of The English Language Learning Progressions

### Automatic Acquisition of a Slovak Lexicon from a Raw Corpus

Automatic Acquisition of a Slovak Lexicon from a Raw Corpus Benoît Sagot INRIA-Rocquencourt, Projet Atoll, Domaine de Voluceau, Rocquencourt B.P. 105 78 153 Le Chesnay Cedex, France Abstract. This paper

### Sketch Engine. Sketch Engine. SRDANOVIĆ ERJAVEC Irena, Web 1 Word Sketch Thesaurus Sketch Difference Sketch Engine

Sketch Engine SRDANOVIĆ ERJAVEC Irena, Sketch Engine Sketch Engine Web 1 Word Sketch Thesaurus Sketch Difference Sketch Engine JpWaC 4 Web Sketch Engine 1. 1980 10 80 Kilgarriff & Rundell 2002 500 1,000

### Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

### The Economic Journal of Takasaki City University of Economics vol.52 No pp Akiko Okamura

The Economic Journal of Takasaki City University of Economics vol.52 No.1 2009 pp.17 26 Akiko Okamura This study aims to investigate how the speaker employs personal pronouns (we, you, I) in academic speech

### Testing an electronic collocation dictionary interface: Diccionario de Colocaciones del Español

Testing an electronic collocation dictionary interface: Diccionario de Colocaciones del Español Orsolya Vincze, Margarita Alonso Ramos Universidade da Coruña, Campus da Zapateira s/n, A Coruña 15071, Spain

### Module 13. Software Reliability and Quality Management. Version 2 CSE IIT, Kharagpur

Module 13 Software Reliability and Quality Management Lesson 32 Software Reliability Issues Specific Instructional Objectives At the end of this lesson the student would be able to: Differentiate between

### Basic Spelling Rules: Learn the four basic spelling rules and techniques for studying hard-to-spell words. Practice spelling from dictation.

CLAD Grammar & Writing Workshops Adjective Clauses This workshop includes: review of the rules for choice of adjective pronouns oral practice sentence combining practice practice correcting errors in adjective

### Marking the Word Frequency: A Comparative Study of English and Chinese Learner s Dictionaries

US-China Foreign Language, ISSN 1539-8080 February 2012, Vol. 10, No. 2, 909-914 D DAVID PUBLISHING Marking the Word Frequency: A Comparative Study of English and Chinese Learner s Dictionaries QIAN Gui-qin

### Corpus and Discourse. The Web As Corpus. Theory and Practice MARISTELLA GATTO LONDON NEW DELHI NEW YORK SYDNEY

Corpus and Discourse The Web As Corpus Theory and Practice MARISTELLA GATTO B L O O M S B U R Y LONDON NEW DELHI NEW YORK SYDNEY Contents List of Figures xiii List of Tables xvii Preface xix Acknowledgements

### Fixed-Effect Versus Random-Effects Models

CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

### 10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

### The Michigan State University - Certificate of English Language Proficiency (MSU-CELP)

The Michigan State University - Certificate of English Language Proficiency (MSU-CELP) The Certificate of English Language Proficiency Examination from Michigan State University is a four-section test

### Syntax II: Issues in Syntax Spring Semester 2013

Syntax II: Issues in Syntax Spring Semester 2013 ENGL 627S / LING 522 T-TH 1:30-2:45pm, Heav 110 Instructor Dr. Elaine Francis Email: ejfranci@purdue.edu Office: Heav 408 Office hours: Tues-Thurs 9:45-10:15am

### Supporting Online Material for

www.sciencemag.org/cgi/content/full/316/5828/1128/dc1 Supporting Online Material for Cognitive Supports for Analogies in the Mathematics Classroom Lindsey E. Richland,* Osnat Zur, Keith J. Holyoak *To

### Fall 2010 Practice Exam 1 Without Essay

Class: Date: Fall 2010 Practice Exam 1 Without Essay Multiple Choice Identify the choice that best completes the statement or answers the question. 1. A researcher wants to visually display the U.S. divorce

### Recursion Through Dictionary Definition Space: Concrete Versus Abstract Words

Recursion Through Dictionary Definition Space: Concrete Versus Abstract Words Graham Clark August 11, 2003 Abstract At least a subset of our vocabulary must be grounded in the sensorimotor world for the

### Corpus Linguistics and Grammar Teaching

Corpus Linguistics and Grammar Teaching Douglas Biber & Susan Conrad Susan Conrad is Professor of Applied Linguistics at Portland State University. In addition to doing teacher training, she has taught

Frequently Asked Questions About Using The GRE Search Service General Information Who can use the GRE Search Service? Institutions eligible to participate in the GRE Search Service include (1) institutions

### Beyond single words: the most frequent collocations in spoken English

Beyond single words: the most frequent collocations in spoken English Dongkwang Shin and Paul Nation This study presents a list of the highest frequency collocations of spoken English based on carefully

### Complexity of Chinese Characters

Original Paper Forma, 15, 409 414, 2000 Complexity of Chinese Characters Tatsuo TOGAWA 1, Kimio OTSUKA 1, Shizuo HIKI 2 and Hiroko KITAOKA 3 1 Institute of Biomaterials and Bioeng., Tokyo Medical and Dental

### EFL Learners Synonymous Errors: A Case Study of Glad and Happy

ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 1, No. 1, pp. 1-7, January 2010 Manufactured in Finland. doi:10.4304/jltr.1.1.1-7 EFL Learners Synonymous Errors: A Case Study of Glad and

### Frequency analysis of second-level domain names and detection of pseudo-random domain generation

Frequency analysis of second-level domain names and detection of pseudo-random domain generation Ryan Doyle 12 June 2010 rd@ryandoyle.net Abstract This paper will analyse the letter distribution found

### Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged

### Learning Rules to Improve a Machine Translation System

Learning Rules to Improve a Machine Translation System David Kauchak & Charles Elkan Department of Computer Science University of California, San Diego La Jolla, CA 92037 {dkauchak, elkan}@cs.ucsd.edu

### ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking

ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France annlor@limsi.fr Cyril Grouin LIMSI-CNRS rue John von Neumann 91400

### CHAPTER V DISCUSSION. of the English proceeding of national and international conference.

CHAPTER V DISCUSSION In this chapter, the writer presents and discusses the data result from the previous chapter. After analyze them, she would like to discuss the result which has been found in the finding

### Multilingual Information Retrieval Using English and Chinese Queries

Multilingual Information Retrieval Using and Chinese Queries Aitao Chen School of Information Management and Systems University of California, Berkeley CLEF 2001 Workshop: 3-4 Sept, 2001, Darmstadt, Germany

### USING CORPUS LINGUISTICS TOOLS TO HELP TRANSLATION STUDENTS CREATE TECHNICAL GLOSSARIES

USING CORPUS LINGUISTICS TOOLS TO HELP TRANSLATION STUDENTS CREATE TECHNICAL GLOSSARIES Alexandre Trigo Veiga São Paulo Catholic University Associação Cultura Inglesa Brazil Abstract The creation of glossaries

### English Skills: Structuring Sentences Workshop 6. Promoting excellence in learning and teaching.

English Skills: Structuring Sentences Workshop 6 After successful completion of this workshop you will be able to: ü Write simple, compound and complex sentences confidently ü Use active voice and avoid

### GMAT.cz www.gmat.cz info@gmat.cz. GMAT.cz KET (Key English Test) Preparating Course Syllabus

Lesson Overview of Lesson Plan Numbers 1&2 Introduction to Cambridge KET Handing Over of GMAT.cz KET General Preparation Package Introduce Methodology for Vocabulary Log Introduce Methodology for Grammar

### Tsujimura, Natsuko: An Introduction to Japanese Linguistics (Second Edition). Malden et al.: Blackwell Publishing, 2006, 510 pp., \$ 49.

Tsujimura, Natsuko: An Introduction to Japanese Linguistics (Second Edition). Malden et al.: Blackwell Publishing, 2006, 510 pp., \$ 49.95 Reviewed by Peter Backhaus * To start with a clarifying note, An

### Cohesive writing 1. Conjunction: linking words What is cohesive writing?

Cohesive writing 1. Conjunction: linking words What is cohesive writing? Cohesive writing is writing which holds together well. It is easy to follow because it uses language effectively to guide the reader.

### English Appendix 2: Vocabulary, grammar and punctuation

English Appendix 2: Vocabulary, grammar and punctuation The grammar of our first language is learnt naturally and implicitly through interactions with other speakers and from reading. Explicit knowledge

### THE DEPARTMENT OF PSYCHOLOGY GUIDE TO WRITING RESEARCH REPORTS

THE DEPARTMENT OF PSYCHOLOGY GUIDE TO WRITING RESEARCH REPORTS The following set of guidelines provides psychology students at Essex with the basic information for structuring and formatting reports of

### MORPHOLOGICAL ANALYSIS AS A STEP IN AUTOMATED SYNTACTIC ANALYSIS OF A TEXT

GUSTAV LEUNBACH MORPHOLOGICAL ANALYSIS AS A STEP IN AUTOMATED SYNTACTIC ANALYSIS OF A TEXT Introduction. The general purpose of this study is to investigate the possibility of an almost completely automated

### Web opinion mining: How to extract opinions from blogs?

Web opinion mining: How to extract opinions from blogs? Ali Harb ali.harb@ema.fr Mathieu Roche LIRMM CNRS 5506 UM II, 161 Rue Ada F-34392 Montpellier, France mathieu.roche@lirmm.fr Gerard Dray gerard.dray@ema.fr

### Web Content Summarization Using Social Bookmarking Service

ISSN 1346-5597 NII Technical Report Web Content Summarization Using Social Bookmarking Service Jaehui Park, Tomohiro Fukuhara, Ikki Ohmukai, and Hideaki Takeda NII-2008-006E Apr. 2008 Jaehui Park School

### 12 FIRST QUARTER. Class Assignments

August 7- Go over senior dates. Go over school rules. 12 FIRST QUARTER Class Assignments August 8- Overview of the course. Go over class syllabus. Handout textbooks. August 11- Part 2 Chapter 1 Parts of

### Reading, Thinking Creatively, and Writing

Reading, Thinking Creatively, and Writing Overview http://www.uvu.edu/owl/resource/eslresources/index.html ANTECEDENTS AND APPOSITIVES http://owl.english.purdue.edu/handouts/grammar/g_appos.html Appositives

There are three kinds of people in the world those who are good at math and those who are not. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Positive Views The record of a month

### Summary Report on Measuring Language Outcomes from Newborn Hearing Screening

Summary Report on Measuring Language Outcomes from Newborn Hearing Screening Introduction: The primary focus of Quality Assurance standards for Newborn Hearing Screening Wales (NBHSW) is on process measures.

### Grammar Imitation Lessons: Simple Sentences

Name: Block: Date: Grammar Imitation Lessons: Simple Sentences This week we will be learning about simple sentences with a focus on subject-verb agreement. Follow along as we go through the PowerPoint

### COLLOCATION TOOLS FOR L2 WRITERS 1

COLLOCATION TOOLS FOR L2 WRITERS 1 An Evaluation of Collocation Tools for Second Language Writers Ulugbek Nurmukhamedov Northern Arizona University COLLOCATION TOOLS FOR L2 WRITERS 2 Abstract Second language

### Author Gender Identification of English Novels

Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in

### It s not an English class : Is correct grammar an important part of mathematical proof writing at the undergraduate level?

It s not an English class : Is correct grammar an important part of mathematical proof writing at the undergraduate level? Kristen Lew Rutgers University Juan Pablo Mejía-Ramos Rutgers University We studied

### EAP 1161 1660 Grammar Competencies Levels 1 6

EAP 1161 1660 Grammar Competencies Levels 1 6 Grammar Committee Representatives: Marcia Captan, Maria Fallon, Ira Fernandez, Myra Redman, Geraldine Walker Developmental Editor: Cynthia M. Schuemann Approved:

### Section 8 Foreign Languages. Article 1 OVERALL OBJECTIVE

Section 8 Foreign Languages Article 1 OVERALL OBJECTIVE To develop students communication abilities such as accurately understanding and appropriately conveying information, ideas,, deepening their understanding

### LEXICOGRAMMATICAL PATTERNS OF LITHUANIAN PHRASES

LEXICOGRAMMATICAL PATTERNS OF LITHUANIAN PHRASES Rūta Marcinkevičienė, Gintarė Grigonytė Vytautas Magnus University, Kaunas, Lithuania Abstract The paper overviews the process of compilation of the first