1. Introduction. 2. The Lexical Constellation Model. Abstract
|
|
|
- Jodie Bathsheba Paul
- 10 years ago
- Views:
Transcription
1 Towards a Dynamic Combinatorial Dictionary: A Proposal for Introducing Interactions between Collocations in an Electronic Dictionary of English Word Combinations Moisés Almela, Pascual Cantos, Aquilino Sánchez Universidad de Murcia (Spain) Depto. de Fil. Inglesa, Fac. de Letras, Campus de la Merced, Murcia (Spain) [email protected], [email protected], [email protected] Abstract This paper presents an academic (non-commercial) lexicographic project called Dynamic Combinatorial Dictionary, which is currently being developed by members of the LACELL Research Group at the University of Murcia. The aim of this project is to bring e-lexicography in closer alignment with lexical models that cannot be implemented in printed dictionaries. Theoretically, the project is informed by the Lexical Constellation Model. The main difference between this model and the mainstream approaches to collocation lies in its suitability for recognising more than one domain of lexical attraction within the same collocational window. We will distinguish two different manifestations of this multiplicity of domains. The first one is the phenomenon of indirect collocation, which has been investigated in previous Lexical Constellation research, and the second one is inter-collocability. This concept refers to positive or negative dependency relations between collocational pairs (not between words). It will be argued that incorporating inter-collocability features into lexical entries can lead to significant advances in the field of combinatorial lexicography. Keywords: collocation; lexical constellations; corpus linguistics; e-lexicography; combinatorial dictionaries. 1. Introduction The potential of electronic formats for increasing the variety of contextual data offered to the user, as well as for facilitating an interactive management of the information contained in lexical entries, is underexploited in current combinatorial dictionaries. This is in part due to the fact that the design of electronic combinatorial dictionaries is to a large extent informed by the design of earlier printed dictionaries. At present, the difference between electronic and printed developments in combinatorial lexicography lies more in the material format (i.e. in the medium) than in the kind and amount of information provided. In this study, we present a proposal for exploiting more effectively and thoroughly the opportunities created by the electronic format in combinatorial lexicography. Our research is motivated by the idea that in an electronic dictionary it is possible to incorporate collocational information of a qualitatively different kind from the one that is offered to the user of a conventional collocation dictionary. More specifically, we submit that collocational information in an electronic dictionary need not be restricted to dependencies between words, and that it can be extended to include dependencies between different collocations. The paper is structured as follows. First, in the next section we shall explain the theoretical framework of the proposal, which is based on Cantos & Sánchez s (2001) Lexical Constellation Model. It will be argued that the analysis of collocation as a relationship between lexical items is incomplete and should be complemented with a description of interactions between different collocations of a lemma. In section 3 the workings of the model are illustrated with reference to the collocational profile of the noun goods. The lexicographic treatment of this information is illustrated in Section 4, where we present parts of a sample entry from the Dynamic Combinatorial Dictionary (DCD). The advantages of the DCD over conventional approaches to combinatorial lexicography are also explained in this section. 2. The Lexical Constellation Model The Lexical Constellation Model (henceforth: LCM) originated from the observation that the node, i.e. the word under investigation in corpus collocational research, does not exert an unlimited influence on its environment (Cantos & Sánchez, 2001). This means that the node is not the only lexical item to restrict the range of lexical choices in its textual environment. In the syntagmatic context of the node there are other lexical items which can be endowed with a context-predictive potential. To express it in more formal terms, we can say that what differentiates the LCM approach from mainstream approaches to collocation is its determination to resolve difficulties caused by the phenomenon of lexical gravity overlaps (or lexical gravity interference). The term lexical gravity, as is well known, was used by Mason (2000) to denote the context-predictive potential associated with the selection of a word in the discourse. To quote the author, lexical gravity can be defined as the restriction a word imposes on the variability of its context (Mason, 2000: 270). The lexical gravity of a word is the influence it exerts on restricting the choice of possible words in specific positions of its textual environment. 1
2 The problem brought to the fore by LCM research is that lexical gravity can be exerted by more than one item in the same textual window. The imposition of restrictions on lexical choices in the context of the node is not an exclusive function of the node. The lexical gravity exerted by collocates of the node can interfere with the gravity attributed to the node. This begs the need to distinguish which features of lexical gravity are a contribution of the node and which ones are contributions of other elements. In this respect, the LCM outperforms the conventional approaches to collocation. The received models of collocation are not suitable for dealing with the problem of lexical gravity interference. The reason for this is that they are linear, in the sense that they fail to divide the collocational patterns of the node into different domains of lexical attraction. The LCM seeks to resolve this problem by comparing the influence of the node and the influence exerted by other items or structures that co-exist within the same textual window. Figure 1: Structure of a plain collocational network Figure 2: Structure of a lexical constellation (type 1) Figure 3: Structure of a lexical constellation (type 2) The differences between plain (or linear) collocational analysis and constellational analysis are graphically represented in Figure 1, 2 and 3. In the three figures, a dot represents a lexical item, and a line represents a relationship of statistically significant co-occurrence. 1 Thus, a pair of dots connected by a line represents a collocational bi-gram. Additionally, in Figures 2 and 3 each circle symbolises a domain of lexical attraction. Figures 2 and 3 represent different types of lexical constellations, the central category of description in the LCM. A lexical constellation is a collocational network hierarchically organised in two or more centres of lexical attraction. The first type of lexical constellation shown above (Figure 2) corresponds to the phenomenon of indirect collocation; the second type corresponds to patterns of inter-collocability 2 (Figure 3). These two classes of lexical constellations are described separately in the next subsections. 2.1 Indirect collocation The phenomenon of indirect collocation was the first problem of lexical gravity interference to be investigated within the framework of the LCM (Cantos & Sánchez, 2001). This problem originates when a word so to say intrudes one of its collocates into the context of a another word. The phenomenon of indirect collocation is thus responsible for a large part of the unwanted items that are found in collocate lists. The strategy adopted by the proponents of the LCM in order to detect cases of indirect lexical attraction among statistical collocates is based on comparisons of conditional probabilities. Once the statistically significant co-occurrences of a node have been extracted from the corpus using the conventional parameters of collocational analysis (defining a span, establishing a frequency threshold, selecting an association measure, etc.), the method proceeds to calculate the values of conditional probabilities between three different words: the node, the collocate and a candidate sub-collocate (i.e. a collocate which is suspected of being indirectly attracted because it shares more semantic features with other collocates than with the node). 1 Following the Sinclairian line of thinking, collocation is defined in this study in statistical terms. Thus, it denotes a pair or group of words which co-occur with a probability greater than chance. However, we must be aware that this definition of collocation is controversial and has been criticised by notable experts in the field, especially by Bosque (2001). We will not tackle the debate here because the issue lies beyond the specific aims set for the present investigation. 2 To avoid possible misunderstandings, a brief terminological note is in place here. The term inter(-)collocation is sometimes used in the literature to denote a reciprocal relationship of collocation. Thus, if a word a is a statistically significant co-occurrence of b, and b is a statistically significant co-occurrence of a, the two terms are said to form an inter-collocation. This notion of inter-collocation is not equivalent to the phenomenon that we call inter-collocability. The latter refers to a relationship between different collocational pairs. 2
3 Conditional probabilities are indicative of the strength of the dependency of one event on another event. For example, if we want to know how probable is the event a (say, the occurrence of a word a) given the occurrence of b as a fact, we can divide the total number of occurrences of a by the number of joint occurrences of a and b in a corpus. The value indicates the proportion of occurrences of a that take place in the company of b. This can be interpreted as an estimation of the dependency of the event a on the event b. The notation is P(b a), which is read as follows: the probability of b given the occurrence of a. Thus, in previous research it was shown that dental collocates with incidence not because it is attracted towards incidence but because it has a strong dependency on another collocate of incidence, i.e. caries (Almela, 2011; Almela, Cantos & Sánchez, 2011). The probability of finding dental given caries in the Bank of English (55.2%) is more than a hundred times higher than that of finding dental given the occurrence of incidence in the same corpus (0.5%). This data is consistent with the observation that dental shares more semantic features with caries than with incidence. Thus, in Figure 2, the biggest circle can stand for incidence, the intermediate one for caries, and the smallest one for dental. More generally, it was also found that collocates of incidence referring to body parts (dental, heart, lung, etc.) are more strongly attracted to other collocates of incidence, especially to those denoting a disease or health problem (e.g. caries, attack, cancer, etc.), than they are to the node. Hence, in expressions such as incidence of dental caries, incidence of heart attack or incidence of lung cancer, the modifier can be categorised as an indirect collocate of incidence (Almela, 2011; Almela, Cantos & Sánchez, 2011). It is important to point out that grammar alone does not provide an explanation for the phenomenon of indirect collocation. Admittedly, in the above example there is a close correlation between phrase structure and the structure of the lexical constellation: the noun that modifies incidence is the direct collocate, and in turn, the modifier of the second noun is the indirect collocate of incidence. However, it should be added that in other cases the node has a closer syntactic connection with the indirect collocate than with the direct collocate. For instance, in expressions such as caused by faulty design or caused by a defective gene, the adjective is a direct collocate of the verb, and the noun is an indirect collocate (the data supporting this conclusion will be published in forthcoming research). In sum, the analysis of indirect collocation in the LCM serves to uncover some discrepancies between statistical significance and lexical relevancy. From the fact that two or more words co-occur significantly in a corpus it does not necessarily follow that they are attracted to one another. One of the reasons for this is that there can be more than one centre of attraction within the same textual window. The detection of cases of indirect collocation is useful in combinatorial lexicography, because it helps us to optimise the criteria for selecting the collocates of a headword. However, the implications are similar in printed and electronic dictionaries the exclusion of irrelevant collocates is advisable in both types of dictionaries. Therefore, in what follows we will concentrate our analysis on the phenomenon of inter-collocability. As will be argued below, this second aspect of lexical constellations has important implications for the micro-structural design of collocation dictionaries, and consequently, it bears greater relevance for the discussion of issues that are specific to the field of electronic lexicography. 2.2 Inter-collocability The concept of inter-collocability denotes the existence of dependency relations between different collocations of a word. The manifestations of inter-collocability are varied. In a positive sense, inter-collocability can be defined as the contribution which a collocation makes to the activation of another collocation of the same node. In a negative sense, inter-collocability can be defined in terms of restrictions on the combinational possibilities among different collocates of the same node. As a method for identifying cases of inter-collocability we can use a variant of the technique employed for identifying cases of indirect collocation. Instead of calculating and comparing conditional probabilities between individual members of overlapping collocations, e.g. P(a b), P(a c), P(b c), P(b a), etc., we can calculate conditional probabilities between events of a larger size, for instance, the probability that a collocate of the node is selected given as a fact the co-occurrence of the node and another collocate: P(c1 n,c2), where n stands for the node, and c1 and c2 represent two different collocates. This value can then be compared with the corresponding conditional probability at the intra-collocational level, namely: P(c1 n). Thus, given two collocates c1 and c2 of a node n, we can say that there is a relationship of positive intercollocability between the pairs (n,c1) and (n,c2) if the probability of (n,c1) co-occurring with c2 is higher than the probability of the node occurring with c2 alone, or if the probability that (n,c2) co-occurs with c1 is higher than the probability of the node co-occurring with c1. In the first case, we can say that c2 is a positive co-collocate of c1, because the collocation (n,c2) is made more probable by the selection of c1; conversely, in the second case we say that c1 is a positive co-collocate of c2, because the collocation (n,c1) is made 3
4 more probable by the selection of c2. The relationship of positive co-collocation can be mutual that is, it can be observed in the two directions, from c1 to c2 and vice versa. As for negative inter-collocability, we can say that c2 is a negative co-collocate of c1 if the capacity of the node for predicting the choice of c1 is higher than the capacity of the collocation (n,c2) for predicting the choice of c1. This indicates that the collocation of the node with c2 diminishes the probability of finding the collocation (n,c1); conversely, we can say that c1 is a negative co-collocate of c2 if the selection of the collocation (n,c2) diminishes the probability of (n,c1). Like positive inter-collocability, negative inter-collocability can be mutual: c1 and c2 can be negative co-collocates of one another. Inter-collocability is extremely frequent in patterns consisting of a verb and a noun phrase, especially when the noun phrase features a modifier-noun collocation. This reflects a characteristic of argument structure that we can describe as valency stratification. The capacity of a predicative lexeme, typically a verb, for restricting the lexical class of its arguments can extend over more than one layer of phrase structure. At one level, the valency carrier restricts the class of the head of the valency filler (i.e. the noun heading the argument phrase). For instance, return selects nouns denoting data (e.g. value, string, integer, list, row, zero, tuple, etc.) or goods (goods, vehicle, equipment, medicines, etc.), among many others. This aspect of argument structure has been extensively described under different names. In valency theory it is described as a feature of semantic valency, along with semantic roles. In generative grammar, the terms employed are selectional restrictions and s-selection. Bosque opts for the term lexical restrictions (Bosque, 2001, 2004). This aspect of valency patterning has also been extensively described in valency dictionaries and similar reference works. In Herbst et al. s (2004) Valency Dictionary of English, the arguments of verbs are assigned general semantic categories. For instance, the direct object of translate (in its primary meaning) is categorised as text. Similarly, in P. Hanks Pattern Dictionary the same argument of translate is categorised as document for more detailed information on the semantic categorisation of arguments in this dictionary, see Hanks & Pustejovsky (2005). Less explored, however, is the second stratum of semantic valency. On top of restricting the lexical class of the head noun, the verb can also impose constraints on the collocability of different words within the argument phrase. Generally, these constraints exhibit a high level of semantic regularity this justifies the treatment of valency stratification as a special feature of semantic valency patterning rather than as an idiosyncratic restriction. One way of discovering patterns of valency stratification is to analyse adjectival co-collocates of verbs. The probability that the noun co-occurs with one or other adjectival collocate is often readjusted to the selection of a specific verbal collocate. Another way of approaching the phenomenon of valency stratification is by analysing inter-collocability relations in the reverse direction that is, by analysing adjectival co-collocates of verbs. Because different modifier-noun collocations are associated with different verbs, the probability of finding a specific verb-noun collocation will be affected by the selection of different modifier-noun collocations. In principle, we can assume that these procedures are complementary. Both of them will be applied in the next section. 3. Lexical constellations at work In this section the analytical framework sketched out above is applied to the description of lexical constellations formed with the noun goods. The analysis will be focused on capturing features of inter-collocability and valency stratification in verb-noun and modifier-noun collocations. 3.1 Method and results The data and the examples have been extracted from the ukwac corpus (1,565,274,190 tokens), accessible at the SketchEngine query system. All queries are syntactically restricted. We have taken into account only occurrences of the noun phrase (i.e. the adjective-noun collocation) as a direct object of the verb in an active construction, or as the subject in a passive construction (the connection between the two constructions is that in both cases the collocation ADJ+goods performs the semantic role THEME). The WordSketch function proved very useful in limiting our queries to the foregoing grammatical scheme. Nevertheless, manual supervision was required in order to detect possible parsing errors. Following the remarks made at the end of section 4, we have approached the phenomenon of inter-collocability from two complementary perspectives. Tables 1-3 reflect the perspective provided by the analysis of adjectival co-collocates, and Tables 4-6 reflect the perspective provided by verbal collocates. The criteria applied in the selection of the potential co-collocates were aimed at testing the initial hypothesis that the lexical constellations of goods follow highly systematic semantic patterns (at this point it should be remembered that in section 3 valency stratification was described as a special feature of semantic valency). The verbs return, replace and reject have been selected because they share important aspects of meaning. In collocation with goods they denote an action whereby the consumer does not accept the goods initially bought or received. As for the adjectives faulty, defective and damaged, they all describe a flaw or imperfection. 4
5 The results are shown in Tables 1-6. In each table the left-most column is a list of collocates of goods. In Tables 1-3 these collocates are modifiers (more specifically adjectives) 3, and in Tables 4-6 they are verbs. A frequency threshold and a statistical filter were applied to all the collocates. We made sure that all of them co-occur at least three times with goods (in the specified grammatical framework), and that they all are statistically significant co-occurrences of this noun. Statistical significance was defined in terms of logdice for an explanation of the advantages of this measure the reader is referred to Rychlý (2008). Again, these data were obtained from the WordSketch function at SketchEngine. The next two columns indicate raw frequency data. The first of them indicates the frequency of the whole 3-gram (verb, modifier, noun) in the corpus (for instance, the frequency of return defective goods). A minimum frequency threshold of 3 was also applied in this column. This was motivated by purely practical reasons that are independent of the research methodology: the list of 3-grams with a frequency lower than two would generate excessively long tables difficult to fit into the size of this paper. The second frequency data column corresponds to the collocational pair formed by the noun (goods) and each of the collocates listed in the left-most column. Thus, in Tables 1-3 this column specifies the frequency of modifier-noun collocations (e.g. faulty + goods, defective + goods, etc.), while in Tables 4-6 the same column indicates the frequency of verb-noun collocations (e.g. return + goods, replace + goods, and so on). These data were obtained by checking the results in different SketchEngine tools (Concordance, WordSketch, Collocation, etc.). As for the last two columns, they indicate values of conditional probabilities between collocations and between words, respectively. The first of these columns returns the value of P(m v,n) in Tables 1-3, and of P(v m,n) in Tables 4-6. The first formula can be read as the probability that the modifier occurs given the occurrence of the verb+noun collocation (where the noun is always goods). Correspondingly, the second formula can be read as the probability that the verb occurs given the occurrence of the modifier+noun collocation. Finally, the right-most column returns the value of P(m n) in Tables 1-3, and of P(v n) in Tables 4-6. The first value reflects the probability that the modifier occurs given the occurrence of the noun; the second one specifies the probability that the verb occurs given the selection of the noun. In all the tables the order of the 3 A priori we did not decide to exclude noun modifiers from this list (e.g. consumer goods, household goods, etc.). However, for some reason, none of the modifiers that met the conditions set in the first three columns were nouns; all of them were adjectives. rows is determined by the difference between the values of these two columns. Thus, the word at the top of the list is the best candidate for positive co-collocate. 4 f(v,m,n) f(m,n) P(m v,n) P(m n) faulty % 0.36% unwanted % 0.15% defective % 0.14% unused % 0.02% undamaged % 0.01% damaged % 0.21% non-faulty % 0.01% stolen % 0.44% Table 1: Adjectival co-collocates of return. 5 f(v,m,n) f(m,n) P(m v,n) P(m n) faulty % 0.36% defective % 0.14% damaged % 0.21% electrical % 0.86% Table 2: Adjectival co-collocates of replace. 6 f(v,m,n) f(m,n) P(m v,n) P(m n) faulty % 0.36% defective % 0.14% Table 3: Adjectival co-collocates of reject. 7 f(v,m,n) f(v,n) P(v m,n) P(v n) return % 1.50% replace % 0.16% receive % 0.92% buy % 1.60% reject % 0.11% supply % 0.97% collect % 0.27% sell % 2.25% Table 4: Verbal co-collocates of faulty. 8 4 In Tables 5 and 6, the position of deliver at the bottom of the list might be misleading. The value of P(v n) in this row is inflated by occurrences of the idiom deliver the goods. If we were able to exclude this idiom from the count of collocations of deliver + goods, the difference with P(v m,n) would be greater in Table 5, and in Table 6 the value of P(v m,n) would be higher than P(v n). However, the occurrences of deliver the goods as an idiom cannot be separated automatically from those of deliver the goods as a collocation, and doing it manually is far too time-consuming a task to be considered a convenient method in lexicography. 5 F(return,goods) = F(replace,goods) = F(reject,goods) = 111 5
6 f(v,m,n) f(v,n) P(v m,n) P(v n) return % 1.50% replace % 0.16% reject % 0.11% inspect % 0.12% deliver % 1.94% lexical items in Figure 3 we obtain the picture below: Table 5: Verbal co-collocates of defective. 9 f(v,m,n) f(v,n) P(v m,n) P(v n) receive % 0.82% replace % 0.16% return % 1.50% inspect % 0.12% deliver % 1.94% Table 6: Verbal co-collocates of damaged. 10 The frequency of the noun remains constant in all the tables. The frequency of the noun goods in the corpus is (substantivisations of the adjective good were excluded from this count). Besides, the frequency of verb-noun collocations remains constant within each of the first three tables. Likewise, the frequency of modifier-noun collocations remains constant within each of the last three tables (4-6). Therefore, the figures are indicated in a footnote added to the caption. 3.2 Analysis and discussion The results displayed in Tables 1-6 lend strength to the initial hunch that the lexical constellations of goods exhibit a high degree of semantic systematicity. The strongest positive co-collocates tend to be grouped together around a common core of meaning. In Tables 1-3 the dominant group of adjectives is formed by words depicting a flaw : faulty, defective, damaged. Observe that faulty and defective occur in the three tables, and that in all of them faulty lies at the top. The fact that unwanted is a stronger co-collocate than defective in Table 1 does not run counter to the general pattern, because the meaning of unwanted is conceptually related to faulty, defective and damaged (as a rule, goods that are in a bad condition are not desired by the consumer). Particularly significant are the values of conditional probabilities in Table 2. Observe that the capacity of the collocation replace goods for predicting the choice of faulty reaches percent, a figure more than 50 times higher than the capacity of the noun goods for predicting the selection of faulty. This constellation is thus a very good example of the kind of dependency relation depicted in Figure 3 (see section 2). If we insert these 8 F(faulty,goods) = F(defective,goods) = F(damaged,goods) = 209 Figure 4: Positive inter-collocability The results displayed in Tables 4-6 are equally coherent from the point of view of meaning. The dominant group is formed by verbs implying a decision of non-acceptance of the goods received : return, replace, reject. The verbs return and replace appear in the three tables, and in two of them, return is the strongest co-collocate. Overall, the semantic regularities observed in these lexical constellations suggest that verb-noun collocations expressing non-acceptance of goods are likely to converge with adjective-noun collocations describing goods as having a flaw. This speaks strongly for the conception of lexical constellations as surface lexical realisations of underlying conceptual (cognitive) structures. In the same line of reasoning, it would be interesting to determine the extent to which lexical constellations are language-independent. Obviously, this objective cannot be pursued in the present article, because it requires more empirical research in English and in other languages. Another interesting remark concerns the consistency of the findings obtained in the two groups of tables (1-3 and 4-6). The output of Tables 1-3 overlaps with the input of Tables 4-6, and vice versa. The dominant adjectives in Tables 1-3 coincide roughly with the elements analysed in Tables 4-6, and conversely, the dominant verbs in Tables 4-6 contain the elements analysed in Tables 1-3. This confirms the claim made in section 3.2 that co-collocation can be mutual. Defective is a co-collocate of return, and conversely, return is a co-collocate of defective (see Tables 1 and 4). The same holds true for other pairs: (defective, replace), (defective, reject), (faulty, return), (faulty, replace), (faulty, reject), (damaged, return), (damaged, replace). This reinforces the idea that the two perspectives on valency stratification (the one provided by verbal co-collocates and the one provided by modifiers) are complementary and lead to relatively similar results. Finally, it should be noted that the prevalence of positive co-collocates over negative ones in Tables 1-6 results 6
7 mainly from the decision to set a minimum frequency threshold for the 3-gram. If the analysis had been focused on verbs or adjectives occurring in low-frequency 3-grams, we would have obtained several prominent patterns of negative inter-collocability. Interestingly, these patterns can also be characterised by a high degree of semantic regularity. A case in point is the relationship between verbs such as ship and transport and the adjectives analysed in tables 4-6. There is evidence that ship and transport, which are quasi-synonyms, are negative verbal co-collocates of faulty, defective and damage. The probability of these verbs given the occurrence of goods is 0.31 percent in the case of ship (309/99393), and 0.43 percent in the case of transport (426/99393). These figures, however low, are considerably greater than the probability of these verbs occurring in the context of modifier-noun collocations such as faulty goods, defective goods, or damaged goods. In almost all these cases the probability is zero. In the whole ukwac corpus, which, it should be emphasised, contains more than one billion words, there is no single instance of 3-grams such as ship defective goods, transport faulty goods, transport damaged goods, etc. The sequence ship faulty goods yields one hit, but obviously the value of P(ship faulty goods) is lower than P(ship goods). Clearly, the collocations ship/transport goods tend to avoid the selection of modifiers describing a flaw or imperfection. This can be interpreted as an indication that semantic systematicity is a characteristic both of positive and of negative inter-collocability. 4. Lexical constellations in lexicography From the previous sections we can draw the overall conclusion that the choice of a collocation influences the range of choice of other collocations in the same context. The choice of a collocation can contribute to activating or blocking other collocations of the same node. Once this fact has been established, the question that needs to be addressed is: should lexical constellations be recorded in combinatorial dictionaries, and if so, what are the appropriate lexicographic techniques for dealing with them? The first part of the question is answered in 4.1. The answer to the second part of the question is given in 4.2. In Section 4.3 we explain the guidelines for our lexicographic project and present some examples. 4.1 The relevance of constellational information Lexical constellations provide a potentially useful type of information in a collocation dictionary. One of the main functions of this kind of dictionary is to assist the user typically a foreign or second language speaker in achieving native-like, fluent composition. Precisely, lexical constellations are one of the principal resources of fluency and cohesion in a text, because they make the word fit within a context broader than the simple collocational bi-gram. Compared to the simple collocation, a lexical constellation provides, so to say, an extended pattern of lexical cohesion. Apart from this general consideration, there are two more specific arguments for introducing lexical constellations into collocation dictionaries. The first of these arguments concerns the strength of constellational patterns. In some respects, these patterns are stronger than most of the simple collocational bi-grams recorded in a conventional combinatorial dictionary. Observe, for example, that the dependency of the collocation defective goods on return, measured in terms of conditional probability, is ten times higher than the dependency of goods on return (see Table 5). In this light it is difficult to justify why the weaker pattern (the bi-gram) should be included in a dictionary while the stronger pattern (the constellation) is omitted. A further argument for the incorporation of collocational data refers to the connection of form and meaning. The syntagmatic behaviour of words is closely associated with their semantic properties. Therefore, collocation is more than a surface co-occurrence pattern; it also provides a representation of word meaning (Renouf, 1996). Knowing the collocations of words is a contributing factor to the development of lexical semantic competence. This idea, which was formulated by Firth in his well-known definition of meaning by collocation, has inspired much of the work conducted in corpus-driven lexicology, both theoretical and applied. Lexical constellations can help to provide a much more detailed and refined account of the connections between context and meaning. Notice, for example, that some semantic aspects of adjectives such as faulty, defective, or damage are better represented by their verbal co-collocates (reject, return, replace) than by the noun (goods). The discovery of a flaw is causally connected with the decision of non-acceptance, and this decision is implied by the meaning of verbs such as reject, return, or replace, but not by the meaning of goods. Considering these arguments, we can conclude that lexical constellations can improve the utility of collocation dictionaries. Having answered this question, the next problem to be resolved concerns the know-how. Clearly, the incorporation of lexical constellations requires the development of innovative practices, because current collocational dictionaries do not provide this kind of information. This gives rise to the question: what exactly are changes that have to be introduced in order to accommodate lexical constellations into collocation dictionaries? The issue is addressed below. 4.2 The treatment of constellational information By definition, a lexical constellation always involves some form of interaction between different collocations of a node. However, in a standard collocation dictionary the different words in an entry are directly related to the headword and not to one another. Therefore, the main obstacle that has to be overcome in order to integrate constellations within collocation dictionaries is the lack 7
8 Proceedings of elex 2011, pp of explicit connection between different collocates of a headword. can gather that return faulty goods and transport faulty goods are possible lexical combinations expressing the meaning send goods which do not work properly. What we are not told, however, is that the selection of the modifier is adjusted to choice of a verbal collocate, and that the selection of return goods makes the selection of the adjective faulty highly probable, while the collocations transport goods and faulty goods tend to avoid each other. That is to say, the MCD does not inform us that some pairs of verbal and adjectival collocates of goods are more likely to converge in the same complex expression than others. The incorporation of lexical constellations requires us to take a step from an intra-collocational to an inter-collocational perspective. Thus far, the analysis of syntagmatic dependencies in collocation dictionaries, both printed and electronic, has been focused on relationships between the parts of a collocation. In this sense, we can say that combinatorial lexicography has done justice to Sinclair s (1991) remark that the choice of a word affects the choice of other words in its vicinity. What combinatorial lexicography has so far failed to reflect is the fact that the choice of a collocation can also affect the choice of other collocations in its vicinity. These facts are not reflected in the MCD or in any other major collocation dictionary, because the design does not contemplate any form of interaction between different collocations in an entry. The same remark applies to other important combinatorial dictionaries of English, notably the BBI and the OCD, or of other languages such as Spanish (e.g. REDES). In a conventional collocation dictionary the user is not provided with information concerning how different elements and sections in an entry can or tend to be combined in the discourse. There is, of course, information about the relationship between the headword and each of the collocates. However, this is not complemented by any specification of whether particular collocations or groups of collocations of the lemma tend to attract or repel each other. This criticism can also be made of electronic collocation dictionaries, such as the Diccionario de Colocaciones del Español (DiCE, an online dictionary of Spanish collocations), as well as of electronic versions of printed dictionaries (e.g. the OCD on CD-ROM). In none of these resources is the user provided with specifications of how the selection of a collocate influences the range of choice of further collocates of the same headword. Observe, for example, Figure 5, where we reproduce an entry from the OCD on CD-ROM. Here, we find some of the adjectival and verbal collocates of goods mentioned above (faulty, defective, deliver, transport, etc.), but again, no specification is given of their inter-collocability. For example, in Macmillan Collocations Dictionary (MCD), to quote the most recent major dictionary of English collocations, faulty and return are presented as different categories of collocates of the noun goods. Faulty is one of the three adjectives in this entry, along with defective and damaged, which are labelled as expressing the meaning not working properly. Return is one of the four verbal collocates in the same entry which are ascribed the meaning send goods (the others are deliver, transport and ship). From this information we Figure 5: An entry from the OCD on CD-ROM 8
9 Proceedings of elex 2011, pp Logically, the possibilities of accommodating lexical constellations are not equal for printed dictionaries and electronic dictionaries. The printed format imposes a number of material conditions which render the incorporation of lexical constellations virtually impracticable. Supplying this kind of information in a printed dictionary would imply an excessive increase in size, probably beyond what is commercially viable. However, these practical difficulties can be resolved in an electronic dictionary. The user interface allows an interactive management of the information contained in lexical entries. With a simple click, the user can choose to expand the information on the collocations associated with a particular item, and precisely, one of the choices that can be made available in this menu is the generation of a list of collocates that are attracted to specific collocations of the lemma. For these reasons, we think that, in the present state of the art, the project of developing a collocation dictionary that includes lexical constellations is conceivable only in electronic format. Figure 6: Extract from a DCD entry (first stage) 4.3 The Dynamic Combinatorial Dictionary The treatment of lexical constellations in our lexicographic project, the DCD, follows four main guidelines: dynamicity, progressiveness, compactness and systematicity. Firstly, the micro-structural design is dynamic, because the information presented in a lexical entry is readjusted to the selections made by the dictionary user. This is why the project has been called a Dynamic Combinatorial Dictionary. This means, for example, that by clicking on the collocate faulty under the entry for goods the positive verbal co-collocates (e.g. return, replace, reject) are foregrounded, and the negative ones are omitted. Figure 7: Extracts from a DCD entry (second stage) Secondly, the step from simple collocational bi-grams to lexical constellations is made in a progressive manner. As a default option, the entry offers only plain collocational information. The user is not provided with information on lexical constellations before s/he clicks on a specific collocate in search for more detailed information, and when this happens, the entry zooms in to show only the most relevant contextual data. That is, in the transition from purely collocational information to constellational information the dictionary leaves out all those elements which are not positive co-collocates of the items selected by the user. The principle behind this criterion is one of user-friendliness. It is not advisable to increase at the same time the level of detail and the amount of information. An increase in the depth of information should be compensated by a decrease in the width of information. Figure 8: Extracts from a DCD entry (third stage) 9
10 The level of detail or granularity in DCD entries unfolds gradually through three different steps. In a first stage, the screen displays collocational pairs, similarly to conventional collocation dictionaries (Figure 6). In a second stage, the screen displays a semantic description of lexical constellations related to the collocate on which the user has clicked (see Figure 7). Finally, in a third stage, the user is provided with a series of examples representing different lexical realisations of the constellation (see Figure 8). This list is accessed by clicking on the semantic description of the constellation. Where relevant, the list includes references to other headwords sharing in the same lexical constellation pattern (e.g. cargo, load, substance, etc., in the lower part of Figure 8). In these cases, the words are underlined so that the user can follow the link to the corresponding noun entry. Concerning the third guideline, i.e. compactness, information about lexical constellations is presented in a format as succinct as possible. One implication is that labels such as lexical constellation, inter-collocability or positive co-collocate are not explicitly mentioned by any means in the entry. This marks a difference with some collocation dictionaries, especially in the Meaning-Text Theory (MTT) framework (notably the DiCE), which make extensive use of specialised terms that are not known to the wider audience and the lay speaker. These terms include MTT jargon such as gloss and lexical function labels such as Magn, Anti Bon, etc. In the DCD project we try to make the dictionary accessible by keeping metalinguistic data to a minimum. Metalinguistic information is reduced to basic grammatical categories (Verb, Noun, Adjective, etc,) and to semantic labels. For similar reasons, probability and statistical data are not shown to the user. The structure of constellations is signalled only by means of symbols such as arrows, and by highlighting words in authentic examples (see Figures 7 and 8). Finally, the fourth guiding principle is the maximisation of systematicity. This apparently trivial statement contains important implications for the design of dictionary entries. It entails, among other things, the attempt at subsuming as much lexical information as possible under general combination rules. This implies first and foremost that semantic labels will be used to show the interconnectedness of several collocational patterns. This practice, i.e. the grouping of different collocations under meaning categories, has been adopted to a greater or lesser extent by previous collocation dictionaries such as MCD, REDES and the DiCE, but no by others such as the OCD or the BBI. The specific challenge faced now by the DCD is to extend this strategy to apply to the description of semantic regularities underlying lexical constellations. This problem is resolved by inserting semantic paraphrases of constellations at an intermediate stage between collocational information and real examples of constellations (see Figures 7 and 8). The rationale behind this emphasis on the connection of combinatorial and semantic properties of words is our strive for abridging the distance between the collocation dictionary and the general-purpose dictionary. In the line of neo-firthian thinking, it is our conviction that a well-organised, detailed description of the syntagmatic behaviour of a word has a definitional value. Collocation provides a representation of word meaning, as Firth suggested. 5. Conclusion In this article we have argued that the mainstream approaches to collocation have missed an important aspect of collocational patterning, namely, the operation of dependency relations between different collocations. Crucially, this level of analysis should not be confused with observation of dependency relations between the parts of a collocation. Collocability must be analysed at a different level than inter-collocability. It has also been argued that the LCM provides an adequate analytical framework for inter-collocability. After applying the methodology of constellational analysis to collocational patterns of the noun goods, we have confirmed that different collocations influence in different ways the selection of other collocations of the same noun. Finally, we have explained that dealing with lexical constellations in a dictionary is only possible in an electronic format and requires us to introduce a number of substantial changes with respect to the conventional micro-structural design of collocation dictionaries (including electronic ones). Some of these changes have been illustrated with reference to sample parts from the DCD. 6. Acknowledgements The project presented in this paper is generously funded by a grant from Fundación Séneca, Agencia de Ciencia y Tecnología de la Región de Murcia (Ref / PHCS/08). We are most grateful for this financial support. 7. References Almela, M. (2011). Improving corpus-driven methods of semantic analysis: a case study of the collocational profile of incidence. English Studies, 92(1), pp Almela, M., Cantos, P. & Sánchez, A. (2011). From collocation to meaning: revising corpus-based techniques of lexical semantic analysis. In I. Balteiro (ed.) New Approaches to Specialized English Lexicology and Lexicography. Newcastle u. T.: Cambridge Scholars Press, pp The BBI Dictionary of English Word Combinations 10
11 (1997). Compiled by M. Benson, E. Benson & R. Ilson. Amsterdam: John Benjamins. Bosque, I. (2001). Sobre el concepto de colocación y sus límites. Lingüística Española Actual, 23(1), pp Bosque, I. (2004). La direccionalidad en los diccionarios combinatorios y el problema de la selección léxica. In T. Cabré (ed.) Lingüística teórica: anàlisi i perspectives. Bellaterra: Universitat Autonoma de Barcelona, pp Cantos, P., Sánchez, A. (2001). Lexical constellations: what collocates fail to tell. International Journal of Corpus Linguistics, 6(2), pp DiCE: Diccionario de colocaciones del español. Accessed at: Hanks, P., Pustejovsky, J. (2005). A Pattern Dictionary for Natural Language Processing. Révue Française de Linguistique Appliquée, 10, pp Herbst, T., Heath, D., Roe, I.F. & Götz, D. (2004). A Valency Dictionary of English. A Corpus-Based Analysis of the Complementation Patterns of English Verbs, Nouns and Adjectives. Berlin: Mouton de Gruyter. Mason, O. (2000). Parameters of collocation: the word in the centre of gravity. In J.M. Kirk (ed.) Corpora Galore. Analyses and techniques in describing English. Amsterdam: Rodopi, pp Macmillan Collocations Dictionary for Learners of English (2010). Compiled by M. Rundell. Oxford: Macmillan. Oxford Collocations Dictionary for Students of English (2009). Compiled by C. McIntosh. Oxford: Oxford University Press. A Pattern Dictionary of English Verbs. Accessed at: REDES: Diccionario combinatorio del español contemporáneo (2004). Compiled by I. Bosque. Madrid: SM. Renouf, A. (1996). Les nyms: en quête du thésaurus des textes. Lingvisticae Investigationes, 20(1), pp Rychlý, P. (2008). A lexicographer-friendly association score. In P. Sojka, A. Horák (eds.) Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN Brno: Masaryk University, pp Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press. 11
DiCE in the web: An online Spanish collocation dictionary
GRANGER, S.; PAQUOT, M. (EDS.). 2010. ELEXICOGRAPHY IN THE 21ST CENTURY: NEW CHALLENGES, NEW APPLICATIONS. PROCEEDINGS OF ELEX2009, LOUVAIN-LA-NEUVE, 22-24 OCTOBER 2009. CAHIERS DU CENTAL 7. LOUVAIN-LA-NEUVE,
Testing an electronic collocation dictionary interface: Diccionario de Colocaciones del Español
Testing an electronic collocation dictionary interface: Diccionario de Colocaciones del Español Orsolya Vincze, Margarita Alonso Ramos Universidade da Coruña, Campus da Zapateira s/n, A Coruña 15071, Spain
Methodological Issues for Interdisciplinary Research
J. T. M. Miller, Department of Philosophy, University of Durham 1 Methodological Issues for Interdisciplinary Research Much of the apparent difficulty of interdisciplinary research stems from the nature
Teaching terms: a corpus-based approach to terminology in ESP classes
Teaching terms: a corpus-based approach to terminology in ESP classes Maria João Cotter Lisbon School of Accountancy and Administration (ISCAL) (Portugal) Abstract This paper will build up on corpus linguistic
Information Technology Security Evaluation Criteria. ITSEC Joint Interpretation Library (ITSEC JIL)
S Information Technology Security Evaluation Criteria ITSEC Joint Interpretation Library (ITSEC JIL) Version 2.0 November 1998 This document is paginated from i to vi and from 1 to 65 ITSEC Joint Interpretation
The Oxford Learner s Dictionary of Academic English
ISEJ Advertorial The Oxford Learner s Dictionary of Academic English Oxford University Press The Oxford Learner s Dictionary of Academic English (OLDAE) is a brand new learner s dictionary aimed at students
Simple maths for keywords
Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd [email protected] Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all
EFL Learners Synonymous Errors: A Case Study of Glad and Happy
ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 1, No. 1, pp. 1-7, January 2010 Manufactured in Finland. doi:10.4304/jltr.1.1.1-7 EFL Learners Synonymous Errors: A Case Study of Glad and
For example, estimate the population of the United States as 3 times 10⁸ and the
CCSS: Mathematics The Number System CCSS: Grade 8 8.NS.A. Know that there are numbers that are not rational, and approximate them by rational numbers. 8.NS.A.1. Understand informally that every number
2. Basic Relational Data Model
2. Basic Relational Data Model 2.1 Introduction Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS s that
Milk, bread and toothpaste : Adapting Data Mining techniques for the analysis of collocation at varying levels of discourse
Milk, bread and toothpaste : Adapting Data Mining techniques for the analysis of collocation at varying levels of discourse Rob Sanderson, Matthew Brook O Donnell and Clare Llewellyn What happens with
Collated Food Requirements. Received orders. Resolved orders. 4 Check for discrepancies * Unmatched orders
Introduction to Data Flow Diagrams What are Data Flow Diagrams? Data Flow Diagrams (DFDs) model that perspective of the system that is most readily understood by users the flow of information around the
THE USE OF SPECIALISED CORPORA: IMPLICATIONS FOR RESEARCH AND PEDAGOGY
THE USE OF SPECIALISED CORPORA: IMPLICATIONS FOR RESEARCH AND PEDAGOGY Luz Gil Salom Universidad Politécnica de Valencia 1. Introduction Corpus-based genre analysis usually employ specialised corpora with
MEANINGS CONSTRUCTION ABOUT SAMPLING DISTRIBUTIONS IN A DYNAMIC STATISTICS ENVIRONMENT
MEANINGS CONSTRUCTION ABOUT SAMPLING DISTRIBUTIONS IN A DYNAMIC STATISTICS ENVIRONMENT Ernesto Sánchez CINVESTAV-IPN, México Santiago Inzunza Autonomous University of Sinaloa, México [email protected]
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
Broad and Integrative Knowledge. Applied and Collaborative Learning. Civic and Global Learning
1 2 3 4 5 Specialized Knowledge Broad and Integrative Knowledge Intellectual Skills Applied and Collaborative Learning Civic and Global Learning The Degree Qualifications Profile (DQP) provides a baseline
Syllabus: a list of items to be covered in a course / a set of headings. Language syllabus: language elements and linguistic or behavioral skills
Lexical Content and Organisation of a Language Course Syllabus: a list of items to be covered in a course / a set of headings Language syllabus: language elements and linguistic or behavioral skills Who
2010 School-assessed Task Report. Media
2010 School-assessed Task Report Media GENERAL COMMENTS Task summary This task involves three outcomes, two in Unit 3 and the one in Unit 4. In Unit 3, students undertake Outcome 2 Media Production Skills,
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Analyzing Research Articles: A Guide for Readers and Writers 1. Sam Mathews, Ph.D. Department of Psychology The University of West Florida
Analyzing Research Articles: A Guide for Readers and Writers 1 Sam Mathews, Ph.D. Department of Psychology The University of West Florida The critical reader of a research report expects the writer to
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Lexico-Semantic Relations Errors in Senior Secondary School Students Writing ROTIMI TAIWO Obafemi Awolowo University, Ile-Ife, Nigeria
Nordic Journal of African Studies 10(3): 366-373 (2001) Lexico-Semantic Relations Errors in Senior Secondary School Students Writing ROTIMI TAIWO Obafemi Awolowo University, Ile-Ife, Nigeria ABSTRACT The
MATRIX OF STANDARDS AND COMPETENCIES FOR ENGLISH IN GRADES 7 10
PROCESSES CONVENTIONS MATRIX OF STANDARDS AND COMPETENCIES FOR ENGLISH IN GRADES 7 10 Determine how stress, Listen for important Determine intonation, phrasing, points signaled by appropriateness of pacing,
Modeling Guidelines Manual
Modeling Guidelines Manual [Insert company name here] July 2014 Author: John Doe [email protected] Page 1 of 22 Table of Contents 1. Introduction... 3 2. Business Process Management (BPM)... 4 2.1.
QUALITATIVE RESEARCH. [Adapted from a presentation by Jan Anderson, University of Teesside, UK]
QUALITATIVE RESEARCH [Adapted from a presentation by Jan Anderson, University of Teesside, UK] QUALITATIVE RESEARCH There have been many debates around what actually constitutes qualitative research whether
Competencies of BSc and MSc programmes in Electrical engineering and student portfolios
C:\Ton\DELTA00Mouthaan.doc 0 oktober 00 Competencies of BSc and MSc programmes in Electrical engineering and student portfolios Ton J.Mouthaan, R.W. Brink, H.Vos University of Twente, fac. of EE, The Netherlands
SUBGROUPS OF CYCLIC GROUPS. 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by
SUBGROUPS OF CYCLIC GROUPS KEITH CONRAD 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by g = {g k : k Z}. If G = g, then G itself is cyclic, with g as a generator. Examples
Some Reflections on the Making of the Progressive English Collocations Dictionary
43 Some Reflections on the Making of the Progressive English Collocations Dictionary TSUKAMOTO Michihisa Faculty of International Communication, Aichi University E-mail: [email protected] 1939
COLLOCATION TOOLS FOR L2 WRITERS 1
COLLOCATION TOOLS FOR L2 WRITERS 1 An Evaluation of Collocation Tools for Second Language Writers Ulugbek Nurmukhamedov Northern Arizona University COLLOCATION TOOLS FOR L2 WRITERS 2 Abstract Second language
Data Deduplication in Slovak Corpora
Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain
TEACHERS VIEWS AND USE OF EXPLANATION IN TEACHING MATHEMATICS Jarmila Novotná
TEACHERS VIEWS AND USE OF EXPLANATION IN TEACHING MATHEMATICS Jarmila Novotná Abstract This study analyses teachers of mathematics views on explications in teaching mathematics. Various types of explanations
Quality Control in Spreadsheets: A Software Engineering-Based Approach to Spreadsheet Development
Quality Control in Spreadsheets: A Software Engineering-Based Approach to Spreadsheet Development Kamalasen Rajalingham, David Chadwick, Brian Knight, Dilwyn Edwards Information Integrity Research Centre
CFSD 21 ST CENTURY SKILL RUBRIC CRITICAL & CREATIVE THINKING
Critical and creative thinking (higher order thinking) refer to a set of cognitive skills or strategies that increases the probability of a desired outcome. In an information- rich society, the quality
SUPPORTING LOGISTICS DECISIONS BY USING COST AND PERFORMANCE MANAGEMENT TOOLS. Zoltán BOKOR. Abstract. 1. Introduction
SUPPORTING LOGISTICS DECISIONS BY USING COST AND PERFORMANCE MANAGEMENT TOOLS Zoltán BOKOR Department of Transport Economics Faculty of Transportation Engineering Budapest University of Technology and
Project Management in Marketing Senior Examiner Assessment Report March 2013
Professional Diploma in Marketing Project Management in Marketing Senior Examiner Assessment Report March 2013 The Chartered Institute of Marketing 2013 Contents This report contains the following information:
Trailblazing Metadata: a diachronic and spatial research platform for object-oriented analysis and visualisations
Trailblazing Metadata: a diachronic and spatial research platform for object-oriented analysis and visualisations Pim van Bree ([email protected]) researcher and software engineer at LAB1100, Geert Kessels
(Refer Slide Time 00:56)
Software Engineering Prof.N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture-12 Data Modelling- ER diagrams, Mapping to relational model (Part -II) We will continue
Corpus-Based Text Analysis from a Qualitative Perspective: A Closer Look at NVivo
David Durian Northern Illinois University Corpus-Based Text Analysis from a Qualitative Perspective: A Closer Look at NVivo This review presents information on a powerful yet easy-to-use entry-level qualitative
Analyzing survey text: a brief overview
IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining
STAGE 1 COMPETENCY STANDARD FOR PROFESSIONAL ENGINEER
STAGE 1 STANDARD FOR PROFESSIONAL ENGINEER ROLE DESCRIPTION - THE MATURE, PROFESSIONAL ENGINEER The following characterises the senior practice role that the mature, Professional Engineer may be expected
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
A Survey of Online Tools Used in English-Thai and Thai-English Translation by Thai Students
69 A Survey of Online Tools Used in English-Thai and Thai-English Translation by Thai Students Sarathorn Munpru, Srinakharinwirot University, Thailand Pornpol Wuttikrikunlaya, Srinakharinwirot University,
ESP in European Higher Education. Integrating Language and Content
ESP in European Higher Education. Integrating Language and Content Inmaculada Fortanet-Gómez & Christine A. Räisänen (eds). Amsterdam: John Benjamins, 2008. 285 pages. ISBN 978-90-272-0520-9. This book
User studies, user behaviour and user involvement evidence and experience from The Danish Dictionary
User studies, user behaviour and user involvement evidence and experience from The Danish Dictionary Henrik Lorentzen, Lars Trap-Jensen Society for Danish Language and Literature, Copenhagen, Denmark E-mail:
TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS
TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS 1. Bandwidth: The bandwidth of a communication link, or in general any system, was loosely defined as the width of
A System for Labeling Self-Repairs in Speech 1
A System for Labeling Self-Repairs in Speech 1 John Bear, John Dowding, Elizabeth Shriberg, Patti Price 1. Introduction This document outlines a system for labeling self-repairs in spontaneous speech.
Part 1 Foundations of object orientation
OFWJ_C01.QXD 2/3/06 2:14 pm Page 1 Part 1 Foundations of object orientation OFWJ_C01.QXD 2/3/06 2:14 pm Page 2 1 OFWJ_C01.QXD 2/3/06 2:14 pm Page 3 CHAPTER 1 Objects and classes Main concepts discussed
Probability Using Dice
Using Dice One Page Overview By Robert B. Brown, The Ohio State University Topics: Levels:, Statistics Grades 5 8 Problem: What are the probabilities of rolling various sums with two dice? How can you
The compositional semantics of same
The compositional semantics of same Mike Solomon Amherst College Abstract Barker (2007) proposes the first strictly compositional semantic analysis of internal same. I show that Barker s analysis fails
COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES
COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES JULIA IGOREVNA LARIONOVA 1 ANNA NIKOLAEVNA TIKHOMIROVA 2 1, 2 The National Nuclear Research
Linear Codes. Chapter 3. 3.1 Basics
Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length
WRITING A CRITICAL ARTICLE REVIEW
WRITING A CRITICAL ARTICLE REVIEW A critical article review briefly describes the content of an article and, more importantly, provides an in-depth analysis and evaluation of its ideas and purpose. The
Measurement in ediscovery
Measurement in ediscovery A Technical White Paper Herbert Roitblat, Ph.D. CTO, Chief Scientist Measurement in ediscovery From an information-science perspective, ediscovery is about separating the responsive
English Descriptive Grammar
English Descriptive Grammar 2015/2016 Code: 103410 ECTS Credits: 6 Degree Type Year Semester 2500245 English Studies FB 1 1 2501902 English and Catalan FB 1 1 2501907 English and Classics FB 1 1 2501910
Problem Solving Basics and Computer Programming
Problem Solving Basics and Computer Programming A programming language independent companion to Roberge/Bauer/Smith, "Engaged Learning for Programming in C++: A Laboratory Course", Jones and Bartlett Publishers,
IB Business & Management. Internal Assessment. HL Guide Book
IB Business & Management Internal Assessment HL Guide Book And Summer Assignment 2012-2013 1 Summer 2012 Summer Reading Assignment-You must read and complete the following assignments: 1. Select one of
KNOWLEDGE ORGANIZATION
KNOWLEDGE ORGANIZATION Gabi Reinmann Germany [email protected] Synonyms Information organization, information classification, knowledge representation, knowledge structuring Definition The term
Problem of the Month: Digging Dinosaurs
: The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards: Make sense of
King s College London - FILM STUDIES 6AAQS400 INDEPENDENT STUDY GUIDELINES 2013-14 for final year students
King s College London - FILM STUDIES 6AAQS400 INDEPENDENT STUDY GUIDELINES 2013-14 for final year students Convenors: Mark Betz (through summer 2013, then from 1 January 2014) Belén Vidal (1 September
Study Plan for Master of Arts in Applied Linguistics
Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment
A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts
[Mechanical Translation, vol.5, no.1, July 1958; pp. 25-41] A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts A notational
Register Differences between Prefabs in Native and EFL English
Register Differences between Prefabs in Native and EFL English MARIA WIKTORSSON 1 Introduction In the later stages of EFL (English as a Foreign Language) learning, and foreign language learning in general,
1. Learner language studies
1. Learner language studies The establishment of learner language research as a particular area of linguistic investigation can be traced to the late 1940s and early 1950s, when CA and EA started to compare
[Refer Slide Time: 05:10]
Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture
Objectives After completion of study of this unit you should be able to:
Data Flow Diagram Tutorial Objectives After completion of study of this unit you should be able to: Describe the use of data flow diagrams Produce a data flow diagram from a given case study including
Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration
Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the
CHANCE ENCOUNTERS. Making Sense of Hypothesis Tests. Howard Fincher. Learning Development Tutor. Upgrade Study Advice Service
CHANCE ENCOUNTERS Making Sense of Hypothesis Tests Howard Fincher Learning Development Tutor Upgrade Study Advice Service Oxford Brookes University Howard Fincher 2008 PREFACE This guide has a restricted
Guide for the Development of Results-based Management and Accountability Frameworks
Guide for the Development of Results-based Management and Accountability Frameworks August, 2001 Treasury Board Secretariat TABLE OF CONTENTS Section 1. Introduction to the Results-based Management and
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information
Noam Chomsky: Aspects of the Theory of Syntax notes
Noam Chomsky: Aspects of the Theory of Syntax notes Julia Krysztofiak May 16, 2006 1 Methodological preliminaries 1.1 Generative grammars as theories of linguistic competence The study is concerned with
TEACHING INTERCULTURAL COMMUNICATIVE COMPETENCE IN BUSINESS CLASSES
22 TEACHING INTERCULTURAL COMMUNICATIVE COMPETENCE IN BUSINESS CLASSES Roxana CIOLĂNEANU Abstract Teaching a foreign language goes beyond teaching the language itself. Language is rooted in culture; it
Chapter 4 DECISION ANALYSIS
ASW/QMB-Ch.04 3/8/01 10:35 AM Page 96 Chapter 4 DECISION ANALYSIS CONTENTS 4.1 PROBLEM FORMULATION Influence Diagrams Payoff Tables Decision Trees 4.2 DECISION MAKING WITHOUT PROBABILITIES Optimistic Approach
Orthogonal Projections
Orthogonal Projections and Reflections (with exercises) by D. Klain Version.. Corrections and comments are welcome! Orthogonal Projections Let X,..., X k be a family of linearly independent (column) vectors
LAB 3: Introduction to Domain Modeling and Class Diagram
LAB 3: Introduction to Domain Modeling and Class Diagram OBJECTIVES Use the UML notation to represent classes and their properties. Perform domain analysis to develop domain class models. Model the structural
CREATING LEARNING OUTCOMES
CREATING LEARNING OUTCOMES What Are Student Learning Outcomes? Learning outcomes are statements of the knowledge, skills and abilities individual students should possess and can demonstrate upon completion
psychology and its role in comprehension of the text has been explored and employed
2 The role of background knowledge in language comprehension has been formalized as schema theory, any text, either spoken or written, does not by itself carry meaning. Rather, according to schema theory,
Overview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
Corpus and Discourse. The Web As Corpus. Theory and Practice MARISTELLA GATTO LONDON NEW DELHI NEW YORK SYDNEY
Corpus and Discourse The Web As Corpus Theory and Practice MARISTELLA GATTO B L O O M S B U R Y LONDON NEW DELHI NEW YORK SYDNEY Contents List of Figures xiii List of Tables xvii Preface xix Acknowledgements
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Identifying and Researching Multi-Word Units British Association for Applied Linguistics Corpus Linguistics SIG Oxford Text
THE UNIVERSITY OF BIRMINGHAM. English Language & Applied Linguistics SECOND TERM ESSAY
THE UNIVERSITY OF BIRMINGHAM English Language & Applied Linguistics SECOND TERM ESSAY Student Number: 1277536 MA - TEFL/TESL 2012/2013 Title of option(s) for which work is being submitted: Business English
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
Chapter 6. Data-Flow Diagrams
Chapter 6. Data-Flow Diagrams Table of Contents Objectives... 1 Introduction to data-flow diagrams... 2 What are data-flow diagrams?... 2 An example data-flow diagram... 2 The benefits of data-flow diagrams...
8. Management System. 8. Management System
Department of Global Business and Transportation Introduction The subject of transportation management suggests an associated management system. This note 1 discusses building such a system. A management
Inflation. Chapter 8. 8.1 Money Supply and Demand
Chapter 8 Inflation This chapter examines the causes and consequences of inflation. Sections 8.1 and 8.2 relate inflation to money supply and demand. Although the presentation differs somewhat from that
Nonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
Week 3. COM1030. Requirements Elicitation techniques. 1. Researching the business background
Aims of the lecture: 1. Introduce the issue of a systems requirements. 2. Discuss problems in establishing requirements of a system. 3. Consider some practical methods of doing this. 4. Relate the material
estatistik.core: COLLECTING RAW DATA FROM ERP SYSTEMS
WP. 2 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing (Bonn, Germany, 25-27 September
Marketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA
We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical
Choosing a CAQDAS Package Using Software for Qualitative Data Analysis : A step by step Guide
A working paper by Ann Lewins & Christina Silver, 6th edition April 2009 CAQDAS Networking Project and Qualitative Innovations in CAQDAS Project. (QUIC) See also the individual software reviews available
From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files
Journal of Universal Computer Science, vol. 21, no. 4 (2015), 604-635 submitted: 22/11/12, accepted: 26/3/15, appeared: 1/4/15 J.UCS From Terminology Extraction to Terminology Validation: An Approach Adapted
Text Analytics. A business guide
Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application
The Logical Framework Approach An Introduction 1
The Logical Framework Approach An Introduction 1 1. What is the Logical Framework Approach? 1.1. The background The Logical Framework Approach (LFA) was developed in the late 1960 s to assist the US Agency
