US-China Foreign Language, ISSN 1539-8080 February 2012, Vol. 10, No. 2, 909-914 D DAVID PUBLISHING Marking the Word Frequency: A Comparative Study of English and Chinese Learner s Dictionaries QIAN Gui-qin Ludong University, Yantai, China Frequency first is one of the basic principles in the second language teaching and learning. And the learner s dictionaries, being a pedagogically indispensable tool in the SLA (Second Language Acquisition), apply the frequency mark policy in the compiling. The present paper explores the methods, the granularity, and the differences of the word frequency rating and marking in English learner s dictionaries. It is argued that frequency computing supported by a large and balanced corpus should be incorporated into the frequency marking in contemporary learner s dictionary-making. Keywords: SLA (Second Language Acquisition), learner s dictionary, word frequency Introduction According to the structrualists, language is a hierarchical system, so it is with the lexicon. The hierarchy of the lexicon can be demonstrated by the different usage frequencies of the words in one and the same language. Nation (1990) found that the 10 words of the highest frequencies in English cover 25% of the texts in Longman Corpus Network, and the first 1,000 English words, being a small fraction of the English vocabulary, covers 71.4% of the texts in a corpus of 35 million word tokens (Coxhead, 1998). The frequency difference is pedagogically significant in the practice of the second language teaching and acquisition. According to the economy of language, the most frequent words should be first acquired by the second language learners because the frequently-used words occur repeatedly in different genres of texts and they form the core part of the lexicon. The user-oriented learners dictionaries, serving as an indispensable tool in the SLA (Second Language Acquisition), should adopt a uniform and systematic way to mark the frequency of words to help foster the learners awareness of word frequency. The present paper, based on the theory of vocabulary gradation, tries to explore how English and Chinese learner s dictionaries respectively mark the word frequency of the headwords and what strategies should be implemented. Richards (1976) claimed that Knowing a word means knowing the degree of probability of encountering that word in speech or print, echoed by Nation s statement that a word s frequency is a part of word knowledge, poses a relatively high demand on the lexicon acquisition of nonnative speakers. Obviously it is unrealistic to expect that a second language learner to be familiar with the frequencies of all the words in the target language. QIAN Gui-qin, lecturer at School of Foreign Languages, Ludong University.
910 ENGLISH AND CHINESE LEARNER S DICTIONARIES And it has been statistically proved that English words form a structure of various frequency strata, achieving different coverage rate of the texts in a corpus. Table 1 shows us the correlation of the most 2,000 frequently-used English words and their respective text coverage (Nation, 2001). Table 1 The Top 2,000 English Words and Their Text Coverage in Different Text Genres Text Lexicon type Dialogue (%) Fiction (%) Newspaper (%) Academic writing (%) stratum The first 1,000 words 84.3 82.3 75.6 73.5 The second 2,000 words 6 5.1 4.7 4.6 Technical terms 1.9 1.7 3.9 8.5 Others 7.8 10.9 15.8 13.4 It is self-evident that in the process of second language learning, the learners should first grasp the top 2,000-3,000 English words, which is a necessity to ensure that the learners hard work of lexicon learning can be in direct proportion to their language decoding and encoding activities (West, 1953). Therefore, an exact marking of the word frequency in dictionaries can enhance the learners development of the vocabulary acquisition of the target language. The English learner s dictionaries, aiming to fulfill a need of first things first in the vocabulary gradation, have a threshold of approximately 3,000 basic English words and mark their frequencies. Cambridge Advanced Learner s Dictionary (hereafter abbreviated to CALD) claims in its preface that the introduction of word frequency of its headwords is one of its distinguishing characteristics. Actually marking word frequency is commonplace in nearly all the mainstream contemporary English learners dictionaries. However, there still exists some variations among English learners dictionaries in the frequency marking of the headwords. The Methods of Word Frequency Marking in English Learner s Dictionaries Two methods are adopted in English learner s dictionaries to mark the word frequency of their headwords. The first one is to mark word frequency through typographic changes. And the second one is to use visually different symbols, labels, or graphs to show the frequency differences of the word. Both of the two methods can be employed simultaneously in one and the same dictionary, Longman Dictionary of Contemporary English (the 5th edition) (LDOCE5) is of this kind. And some dictionaries, such as Collins Cobuild Advanced Learner s Dictionary (the 5th edition) (CCALD5), just use one way in it to show the word frequency. First let us see how the typographic variations can demonstrate the word frequency in English learner s dictionaries. Take the example of Oxford Advanced Learner s Dictionary (the 7th edition) (OALD7) and CCALD5. The two dictionaries are similar in their typography to show word frequency with all the headwords printed in blue, in sharp contrast to the right-branch explanations in black. And in OALD7, the fonts of the active words are a bit larger than those of the passive ones. Colors are also used as a visually striking way to make a difference between the productive words and the receptive ones. In LDOCE5, all the productive words are red, and the receptive ones black. In Macmillan English Dictionary for Advanced Learners (the 2nd edition) (MEDAL2), the active words are printed in red, and the passive ones in black. And in CALD2 (the 2nd edition), the productive words are blue, in
ENGLISH AND CHINESE LEARNER S DICTIONARIES 911 contrast to the black receptive words. The visual contrast between the frequent and infrequent words turns out to be an effective way to make the productive words the salient points in the dictionary word list. Besides the typography, some symbols and abbreviations are used in English learner s dictionaries to mark the word frequency. The symbol of a key is used in OALD7 to highlight the top 3,000 English words, and in CCALD5 three diamonds ( ), two diamonds ( ) and one diamond ( ) are employed to respectively denote the top 1,000, 2,000, and 3,000 English words. MEDAL2 uses stars to rate its words, with three starts representing the basic words, two stars the frequent words, and one star the less frequent ones. And what deserves the metalexicographer s attention is the dual-track approach that MEDAL2 has taken to deal with the passive and active words in a radically different way. The active words are provided with detailed explanations, collocation, grammatical patterns, registers, and even their connotations. As to the passive words, a much more simplified method is employed. That is, only brief definitions are given, no grammatical or pragmatic information, even no examples. The dual track approach of treating the headwords is theoretically and practically plausible in that much more headwords can be incorporated in the wordlist of the dictionary with its core part, i.e., the frequently-used words highlighted. Abbreviations are also used to grade the words according to their frequencies. In LDOCE5, abbreviations S and W respectively stand for spoken and written words. Abbreviations plus numbers, for example, S(W)1, S(W)2, and S(W)3, are employed to denote the top 1,000, 2,000, and 3,000, English words in spoken (written) English. Other methods used in LDOCE5 to distinguish word frequency are the frequency graphs, which statistically show the frequency variation of the same headword in written and spoken form, revealing the frequency discrimination between British and American English, or providing the percentage of a given syntactic pattern. It is noticed that frequency graphs in LDOCE5 are not employed systematically. In LDOCE5, there are 53 headwords provided with frequency graphs. Compared with the totality of the 100,000 reference units in the dictionary, the number of frequency graphs is too small to be representative enough. What is worse, there is some inconsistency of the allotment of frequency graphs. For example, both of the two lexemes absolutely and actually have frequency graphs and absolutely is illustrated by its frequency graph that it is used mainly in spoken environment, which, however, is indicated definitely by its labels S1 and W3. Besides, a usage note concerning the difference of its typical occurrences is given just below the headword. A redundancy thus occurs, which is a waste of the dictionary s storage space. Absolutely is the representative of another type, labeled with S1 and W1 and without marked difference in its occurrence in spoken or written form, therefore a detailed frequency graph showing the subtle discrepancy is badly needed. There are 244 words marked in LDOCE5 according to their frequencies, of which 103 headwords are not used equally in spoken and written English and 141words are of the same occurrence frequencies in both spoken and written environment. It seems that LDOCE5 has not yet established a criterion to determine which headword should be provided with a frequency mark. A systematic and consistent criterion and treatment is desired in LDOCE5 s allotment of frequency graphs. To Which Linguistic Unit the Frequency Rating Is Attached Lexeme or Sememe? Researchers have not reached a conclusion on the two terms lexeme and sememe. According to Cruse
912 ENGLISH AND CHINESE LEARNER S DICTIONARIES (1986), lexeme is a set comprised of form-related lexical units and the definition of each lexical unit is a sememe. Therefore, lexeme is a two-fold sign with the same form and various definitions all rolled into one, the sememe, however, is connected only with meaning. In dictionaries, a sememe is approximately equal to one definition of the headword. It is clear that the meanings of a polysemous word do not have the same status according to their occurrences in human communication. CALD2 is the only English learner s dictionary which attaches the frequency rating to the sememes instead of the lexemes. That is to say, CALD2 adopts a much more specific way to discriminate the frequency differences of the related sememes in one and the same lexeme. The frequency marking procedure in CALD2 goes like this: First, the dictionary compliers choose the most frequent words from Cambridge International Corpus and Cambridge Learner s Corpus, then the examples of the frequent words are numbered, on basis of which the occurrence frequency of all the sememes are computed and lastly the sememes concerned are graded and marked. Take an example of the lexeme good. In CALD2, the lexeme good has nine definitions (equal to sememes ) and seven of them are marked E, meaning elementary words, and one marked I, standing for improved words, and one definition are not marked at all for its rare occurrence in communication. In LDOCE5, MEDAL2, CCALD5, and OALD7, lexeme good respectively has 17, 18, 21, and 24 definitions, all of which are labeled with only the same symbol or abbreviation. In LDOCE5, good is marked with W1, S1 and in MEDAL2, it is marked with three stars without any discrimination between the definitions. There is no denying that such a vague frequency marking undoubtedly blurs the distinction between the frequently-used definitions with the less frequent ones, which turns to be a real handicap for a learners to locate the most frequent sememe(s) among the lexeme set. The Granularity of the Word Frequency Marking of the Active Words In the Big Five of English learner s dictionaries, the granularity of the word frequency marking is treated in two ways. The first one is just to mark the productive words, with no discrimination between their internal frequency discrepancies of meanings. OALD7 serves as a typical example of this type. The second one is to subdivide the productive words according to their occurrence frequencies. The other four mainstream English learner s dictionaries except OALD7 all fall into this category. As to the second type, a subdivision of the frequently-used words is made, and CCALD5, LDOCE5, MEDAL2, and CALD2 evenly subdivide the active words into three frequency groups, which is termed as the even frequency marking. Among them, LDOCE5 and CCALD5 have 3,000 frequently-used words marked, with 1,000 thousand words as a group; while MEDAL2 have 7,500 words marked with stars, with 2,500 words as a subcategory. The only exception is CALD2, which adopts a non-even frequency making policy. In CALD2, sememes marked E are words which everyone need to know to communicate effectively, usually over 400 occurrences per 10 minllion corpus words, and total 4,900 in the dictionary. Sememes marked with I are also common in native speaker English, typically between 200-400 occurrences per 10 million words, adding up to 3,300. Sememes marked with A typically occur around 100-200 times per 10 million corpus words, which are needed by advanced learners to make their English more fluent and natural and we have 3,700 words marked with A. The even frequency marking is thought to be a relatively subjective labeling, lack in a theoretical
ENGLISH AND CHINESE LEARNER S DICTIONARIES 913 underpinning while the non-average frequency marking, supported by the frequency computations based on a large, balanced even dynamic corpus, mirrors the real distribution and usage of the linguistic units. Other contemporary English learner s dictionaries should follow the lead of CALD2. The Difference of the Frequency Marking in English Learner s Dictionaries Quantitatively speaking, the productive words marked in CALD amount to 11,900, while OALD7, LDOCE5, and CCALD5 all mark 3,000 frequent words. Obviously there s a large gap between CALD2 and the three other English learner s dictionaries. The main reasons lies in the fact that CALD2, as we have discussed in the preceding part, attach the frequency markers to the sememes instead of lexemes. What deserves our attention is that there is disagreement as to the frequency marking of the same linguistic unit. For example, the headword convince is marked among the top 3,000 word in both LDOCE5 and OALD7. The word convinced, however, is not marked at all, indicating its rare occurrences in the ordinary communication. But in CALD2, both convince and convinced are labeled with I, an abbreviation denoting the words of the top 2,000 English words. The afore-mentioned examples pose a question of the stability of word frequency marking. Does this kind of disagreement of word frequency marking often occur in the dictionary marking? We take the lexemes beginning with letter A as a data sample with the aim of finding out to what extent two dictionaries conflict with each other concerning the word frequency rating. Table 2 The Frequency Marking of the Active Words Beginning With Letter A in LDOCE5 and CCALD5 Frequency Dictionary LDOCE5 CCALD5 The top 1,000 words S1 65 W1 86 The top 2,000 words S2 84 W2 63 The top 3,000 words S3 56 W3 71 50 72 110 It can be seen clearly from Table 2 that the lexemes under letter A shows great difference in the word frequency rating. CCALD5 has a lot more headwords belonging to the top 3,000 words. Besides, LDOCE5 maintains a balance among the three groups of words showing different occurrence frequencies, while CCALD5 shows a tendency of growth from the most frequent words to the least one. It is argued that the disagreement of word frequency marking lies in the fact that different English learner s dictionaries are compiled on the basis of different corpus. As far as the dictionary users are concerned, different and sometimes even conflicting frequency markings make the potential dictionary users puzzled or lost. Conclusions The frequency marking of the headwords or their definitions is a user-friendly treatment for the dictionary users, who, when being second language learners, have to fulfill both the language decoding and encoding tasks.
914 ENGLISH AND CHINESE LEARNER S DICTIONARIES Therefore, a systematic demarcation should be made between the active and passive words in the dictionaries. Frequency rating in dictionaries can help to highlight the frequent words and deal with these words in great details, hence the processing width and depth of the frequent words is a necessity. In the present technology-dominated word, the frequency computing supported by a large and balanced corpus should be incorporated into the frequency marking in learner s dictionaries. References Coxhead, A. (1998). An academic word list (English Language Institute Occasional Publication, Wellington: School of Linguistics and Applied Language Studies, Victoria University of Wellington). Cruse, D. A. (1986). Lexical semantics. Cambridge: Cambridge University Press. Lyons, J. (1977). Semantics. Cambridge: Cambridge University Press. Nation, I. S. P. (1990). Teaching and learning vocabulary. New York: Heinle and Heinle. Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press. Richards, J. C. (1976). The role of vocabulary teaching. TESOL Quarterly, 10(1). West, M. (1953). A general service list of English words. London: Longman.