Deriving Concept Hierarchies from Text by Smooth Formal Concept Analysis

Size: px
Start display at page:

Download "Deriving Concept Hierarchies from Text by Smooth Formal Concept Analysis"

Transcription

1 # Deriving Concept Hierarchies from Text by Smooth Formal Concept Analysis Philipp Cimiano, Steffen Staab, Julien Tane Institute AIFB University of Karlsruhe Abstract We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from texts based on Formal Concept Analysis. Our approach is based on the assumption that verbs pose strong selectional restrictions on their arguments. The conceptual hierarchy is then built on the basis of the inclusion relations between the extensions of the selectional restrictions of all the verbs, while the verbs themselves provide intensional descriptions for each concept. We formalize this idea in terms of FCA and show how our approach can be used to acquire a concept hierarchy for the tourism domain out of texts. In particular, we focus on the question if smoothing techniques have an influence on the quality of the generated concept hierarchies. We evaluate our approach by considering an already existing ontology for this domain. 1 Introduction Taxonomies or conceptual hierarchies are crucial for any knowledge-based system, i.e. any system making use of declarative knowledge about the domain it deals with. However, it is also well known that every knowledge-based system suffers from the so called knowledge acquisition bottleneck, i.e. the difficulty to actually model the knowledge relevant for the domain in question. In particular ontology development is known to be a hard and timeconsuming task in the sense that the more people are involved in it, the harder it is to actually agree on a certain conceptualization. On the other hand, ontologies are only useful if a larger community agrees on them. This trade-off seems definitely difficult (if not impossible) to overcome ([Pinto and Martins, 23]). In this paper we present a novel method to automatically acquire taxonomies from texts based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data ([Ganter and Wille, 1999]). In this context, we address in particular the question whether the classical data sparseness problem one typically encounters when working with corpora ([Manning and Schuetze, 1999]) can be overcome by a smoothing technique based on the conceptual clustering algorithm presented in [Faure and Nedellec, 1998]. In particular, we cluster nouns and verbs based on the distributional hypothesis, i.e. the assumption that nouns and verbs are similar to the extent to which they share contexts. In this line, the context of a noun is represented as a vector containing all the verbs the noun appears as direct object of and the context of a verb is symmetrically represented by a vector containing all the nouns appearing as its direct objects. On the basis of such a representation, similar nouns and verbs are pairwise clustered together and the joint frequencies in the corpus are updated accordingly. As a result, some previously unseen events may yield a nonzero frequency. After this smoothing step, the significant verb/term pairs are considered for FCA as attribute/object pairs in order to yield a taxonomy. The main benefits of our method are its adaptivity as it can be applied to arbitrary texts and domains, its speed (compared to the process of hand-coding an ontology) as well as its robustness in the sense that it will not fail due to social aspects as present in traditional ontology development projects. Furthermore, if the corpora are updated regularly, it is also possible to let the ontology evolve according to the changesin the corpus ([Stojanovic et al., 22]). This is in line with the domain and corpus specific form of lexicon as envisioned in [Buitelaar, 2]. 2 The General Idea An ontology is a formal specification of a conceptualization ([Gruber, 1993]). A conceptualization can be understood as an abstract representation of the world or domain we want to model for a certain purpose. The ontological model underlying this work is the one in [Bozsak et al., 22]: Definition 1 (Ontology) An ontology is a structure consisting of (i) two disjoint sets and called concept identifiers and relation identifiers respectively, (ii) a partial order on called concept hierarchy or taxonomy, (iii) a function called signature and (iv) a partial order on called relation hierarchy, where" # " & implies ( " # ( ( " & ( and -. " # -. " & for each 3 5 ( " # (. Furthermore, for each ontology O we will define a lexicon 9 : as well as a mapping < : A C D by which each concept is mapped to its possible lexical realizations. In addition, we will also consider the inverse function < : F 9 : A. Thus in our model a concept can be expressed through different expressions (synonyms) and one expression can refer to different concepts, i.e. expressions can be polysemous. The aim of the approach presented in this paper is now to automatically acquire the partial order between a given set of concepts. The general idea underlying our approach can be best illustrated with an example. In the context of the tourism domain, we all have for example the knowledge that things like a hotel, acar, abike, atrip or

2 bookable joinable rentable excursion trip driveable apartment rideable car motor-bike Figure 1: Hierarchy for the tourism example Figure 2: The tourism lattice an excursion can be booked. Furthermore, we know that we can rent a car, abike or an apartment and that we can drive a car or a bike, but only ride a bike. Moreover, we know that we can join an excursion or a trip. We can now represent this knowledge in form of a matrix as depicted in table 1. On the basis of this knowledge, we could intuitively build a conceptual hierarchy as depicted in figure 1. If we furthermore reflect about the intuitive method we have used to construct this conceptual hierarchy, we would come to the conclusion that we have basically mapped the inclusion relations between the sets of the verbs arguments to a partial order and furthermore have used the verbs itself to provide an intensional description of the abstract or non-lexical concepts we have created to group together certain lexicalized concepts. In the next section we introduce Formal Concept Analysis and show how it can be used to formalize the intuitive method described above. 3 Formal Concept Analysis Formal Concept Analysis (FCA) is a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. Such data are structured into units which are formal abstractions of concepts 1 of human thought allowing meaningful comprehensible interpretation ([Ganter and Wille, 1999]). Central to FCA is the notion of a formal context: 1 Throughout this paper we will use the notion concept in the sense of formal concept as used in FCA (see below) as well in the ontological sense as defi ned in section 2. The meaning should in any case be clear from the context. Definition 2 (Formal Context) Atriple(,, ) is called a formal context if and are sets and is a binary relation between and. The elements in are called objects, those in attributes and I the incidence of the context. For and dually for,we define: ( ( Intuitively speaking, A is the set of all the attributes common to the objects in A, while B is respectively the set of all the objects which have in common with each other the attributes in B. Furthermore, we define what a formal concept is: Definition 3 (Formal Concept) Apair(, )isaformal concept of (,, ) if and only if + In other words, (, )isaformal concept if and only if the set of all attributes shared by the objects in A is identical with B and on the other hand A is also the set of all the objects which have in common with each other the attributes in B. is then called the extent and the intent of the concept (, ). The concepts of a given context are naturally ordered by the subconcept-superconcept relation as defined by: # # & & 2 # & bookable rentable driveable rideable joinable apartment x x car x x x motor-bike x x x x excursion x x trip x x Table 1: Tourism domain knowledge as matrix or which is equivalent: # # & & 2 & # Thus, formal concepts are partially ordered with regard to inclusion of their extents or (which is equivalent) to inverse inclusion of their intent. Thus, table 1 represents the incidence of the formal context in form of a matrix. The corresponding sub- /superconcept partial order computer by FCA is depicted

3 in figure 2 in form of a lattice. The representation makes use of reduced labeling as described in [Ganter and Wille, 1999] such that each object and each attribute is entered only once in the diagram. Finally, it is just left to clarify how we obtain a concept hierarchy, i.e. our partial order out of a lattice such as depicted in figure 2. We accomplish this by creating for each node in the lattice a concept labeled with the intent of the node as well as a subconcept of this concept for each element in the extent of that node. Furthermore, we also remove the bottom element of the lattice and preserve the other nodes and edges. Thus, in general we yield a partial order which can be represented as a DAG. In particular, for the lattice in figure 2 we yield the partial order in figure 1. 4 Smoothing FCA The decisive question is now where to get from the objects as well as the corresponding attributes in order to create a taxonomy by using Formal Concept Analysis. A straightforward idea is to extract verb/object dependencies from texts and turn the head of the subcategorized objects into FCA objects and the corresponding verbs together with the postfix able into FCA attributes. As already mentioned before, we concentrated on the tourism domain. In particular, we used two rather small domain-specific corpora as well as one larger corpus of a general nature. The idea behind this decision is to get not only data about domain-specific terms but also make use of more general resources to get additional data about more general terms. The first domain-specific corpus was acquired from a web-page containing information about the history, cultural events, accommodation facilities, etc. of Mecklenburg Vorpommern, a region in north-east Germany. The second domain-specific corpus is a collection of texts from As general corpus we used the British National Corpus. The total size of the three corpora together was roughly over 1 million words. In order to acquire the verb-object dependencies from these corpora, we used LoPar, a trainable statistical left-corner parser ([Schmid, 2]). From the parser s output we extracted the respective heads with tgrep 2 and lemmatized them. It is important to mention that by this method we also yield multi-word terms. Regarding the postprocessed output of the parser, it has to be taken into account that on the one hand some verb/object dependencies can be spurious and that on the other hand not all the pairs are significant from a statistical point of view. Thus an important issue is actually to reduce the noise produced by the parser before feeding the output into FCA. For this purpose, we calculate the conditional probability that a certain (multi-word) term appears as direct object of a certain verb : ( where is the number of occurrences of a term as object of a verb and is the number of occurrences of verb with a direct object. As mentioned already in the introduction, in order to overcome the typical data sparseness problem we make use of a word clustering method based on the conceptual clustering approach presented in [Faure and Nedellec, ]. There, nouns are clustered based on the assumption that they are similar to the extent to which they share contexts. In the approach presented here, we will consider as the context of a noun all the verbs it appears with as direct object. As context for a verb we will consider all the nouns that appear as its direct objects. We will formalize these contexts respectively as sets C( ) for a noun and C( )foraverb. As similarity measure we use the cosine metric, i.e. for two nouns # and & the similarity is calculated as follows:! " # %! " # 5 # & '! " # %! " # For verbs # and & the similarity is symmetrically computed as follows:! " # %! " # 5 # & '! " # %! " # Furthermore, for a noun, the most similar noun is defined as follows: " 2 4! 6 5 and symmetrically also for verbs. As in [Faure and Nedellec, 1998], we also use an iterative clustering approach. In each iteration, the most similar terms are clustered together, where only those pairs # & are considered as potential candidates to be clustered which are mutually most similar to each other, i.e " # & and " & #. After each iteration - consisting in pairwise clusterings - the joint frequencies for verbs and nouns are recalculated and the conditional probabilities are updated. Then, in a bootstrapping fashion, these updated probabilities are considered in the next clustering iteration. It is important to mention that in our approach nouns and verbs are clustered alternately at each iteration. In particular, odd iterations will correspond to noun clustering phases and even iterations will correspond to verb clustering phases. Furthermore, it is also important to mentioned that previously pairwise clustered pairs are stored such that no pair is clustered more than once. Finally, we then only consider those verb/term pairs (v,t) as attribute-object pairs for FCA for which the conditional probability : ( is above some threshold : after the clustering iteration 5. For the Formal Concept Analysis we use the Concepts tool downloadable from The processing time for building the lattice was for all thresholds below 1 seconds. Thus, the FCA processing time can certainly be neglected in comparison to the parsing and clustering time. 5 Evaluation Before actually presenting the results of our evaluation, we first have to describe the task we are evaluating against. Basically, the task can be described as follows: given a set of concepts relevant for a certain domain, order these concepts hierarchically in form of a taxonomy. Certainly, this is not a trivial task and as shown in [Maedche and Staab, 22] human agreement on such a task has also its limits. In order to evaluate our automatically generated taxonomies, we compare them with the tourism domain gold standard used in the study presented in [Maedche and Staab, 22]. For this purpose, we translated the concepts of this gold standard from german to english thus yielding

4 5 a reference ontology consisting of 3 concepts with most of them ( 95%) having also an english label. In what follows, we will refer to this reference ontology simply as gold standard. Certainly, it is not clear how two ontologies (as a whole) can be compared to each other in terms of similarity. In fact, the only work in this direction the authors are aware of is the one in [Maedche and Staab, 22]. There, ontologies are seen as a semiotic sign system and compared at a syntactic, i.e. lexical, as well as semantic level. In this line, we present a comparison based on lexical overlap as well as taxonomic similarity between ontologies. Lexical overlap (LO) of two ontologies 5 and 8 will be measured as the recall of the lexicon W X compared to the lexicon W X,i.e. " " W 5 8 & g h " g " h 3 g h In order to compare the taxonomy of the ontologies, we use the semantic cotopy (SC) presented in [Maedche and Staab, 22]. The semantic cotopy of a concept is defined as the set of all its super- and subconcepts: F F F, where F. Now the taxonomic overlap of two ontologies 5 and 8 is computed as follows: 5 8 & where 5 8 & " " 5 8 & 5 8 & S 5 8 & S and TO and TO are defined as follows: 5 8 & 5 8 & In order to compare the performance of our approach with the human performance on the task, we will interpret our results with regard to the study presented in [Maedche and Staab, 22]. In this study, four subjects were asked to model a taxonomy on the basis of 31 lexical entries relevant for the tourism domain. The taxonomic overlap ( ) between the manually engineered ontologies reached from 47% to 87% with an average of 56.35%. Thus, it is clear that any automatic approach to derive a conceptual hierarchybetween a set of concepts has definitely its limits. 5.1 Results We generated different concept hierarchies with the approach described in section 4 by using different values for the threshold parameter :. In particular, we used the values : # Q # Q & # ' ).Furthermore, we experimented with up to 11 clustering iterations with the maximal number of terms being clustered at each of them set to 5. The reason for using up to 11 iterations is that at the 12th iteration no more mutually similar pairs were found. Figure 3 shows the nouns and verbs which 3 As the terms to be ordered hierarchically are given there is no need to measure the lexical precision. LO F t Baseline 18.78% 19.44% 19.11%.7 Iteration % 19.44% 19.11%.7 Iteration % 19.44% 19.11%.7 Iteration % 22.25% 19.52%.3 Iteration % 22.25% 19.52%.3 Iteration % 21.41% 19.52%.5 Iteration % 21.41% 19.52%.5 Iteration % 21.8% 19.88%.5 Iteration % 21.8% 19.88%.5 Iteration % 22.65% 2.35%.5 Iteration % 22.65% 2.35%.5 Iteration % 23.% 2.5%.5 Table 2: Best result of each iteration were pairwise clustered at each of the iterations. The odd numbers correspond to noun clustering phases while the even numbers correspond to verb clustering phases. The values in parenthesis indicate the average mutual distance ( S ) between the two nouns. Obviously, there are also spurious pairs, but in general the clustering seems reasonable. We then compared these 88 (11x8) automatically generated taxonomies against the taxonomy of the gold standard in terms of an F-Measure balancing lexical overlap (LO) and taxonomic overlap against each other: ` e * * W - W Figures 4-13 show the results of this comparison against the baseline without clustering in terms of the F-Measure for the different number of iterations as well as the different threshold values for :. It is interesting to observe that there are increases in F-Measure only after iterations with an odd number, i.e. iterations corresponding to a noun clustering phase. However, it can not be concluded from this observation that the verb clustering has no influence on the results. In fact, as the joint frequencies of verb and nouns are updated after each iteration, the influence of the verb clustering could be an indirect one. Further experiments seem necessary to clarify this question. Table 2 presents the best results of each iteration. It shows that the best results are increasing nonmonotonically with each clustering iteration. Overall, the results with clustering are almost 1.5% better than the baseline without clustering. 5.2 Discussion of Results It has become clear that the described smoothing method can in fact improve the overall results of the FCA-based approach to the automatic acquisition of concept hierarchies presented in this paper. In order to compare the results of the approach presented in this paper against human performance on the task, we first have to assess human performance in terms of the F-Measure. The assumption will be that if humans have to order hierarchically a set of terms they will always succeed in ordering all of them. In this sense, assuming for humans a LO of 1% and considering the average human agreement of 56.35% in terms of as given above, we yield a human average performance on the task of F=72.8% which we are still very far away from. On the other hand, the assumption of a human LO of 1%

5 .2.18 Iteration 1: whirlpool solarium (.87) ballroom lounge (.72) cinema theatre (.76) badminton billiard (.82) swimming kitchenette (.92) Iteration 2: Iteration 3: Iteration 4: Iteration 5: Iteration 6: Iteration 7: know think (.7) leave enter (.77) love like (.66) work go (.79) keep open (.75) yacht boat (.25) club party (.11) telephone terrace (.11) pension musical (.12) drier radio (.14) put hold (.44) provide offer (.55) watch saw (.65) move turn (.45) follow hear (.64) person family (.9) tent caravan (.6) park kiosk (.9) masseur basilica (.9) animation booking (.9) start enjoy (.37) require involve (.43) contain include (.35) consider describe (.4) gallery shower () period daytime (.6) time distance (.6) autumn cure (.5) agreement situation (.5) Iteration 8: have remember (.63) Iteration 9: night bank (.3) organization room () group place () view law () city sight () Iteration 1: no mutually similar verbs found Iteration 11: bicycle bed (.1) station service (.1) Figure 3: Results of the clustering iteration Figure 4: Iteration 1 iteration Figure 5: Iteration 2 iteration Figure 6: Iteration 3 iteration Figure 7: Iteration 4

6 iteration Figure 8: Iteration 5 iteration Figure 9: Iteration 6 iteration Figure 1: Iteration 7 iteration iteration Figure 12: Iteration 9 iteration Figure 13: Iteration Figure 14: Iteration 11 iteration 11 Figure 11: Iteration 8

7 as well as the above average agreement may hold for relatively trivial domains such as tourism, but are certainly too optimistic for more technical domains such as for example bio-medicine. Thus, we believe that the more specific and technical the underlying corpus is, the closer our approach will get to human performance. In the future, we hope to support this claim with further experiments on more technical domains. 6 Discussion of Related Work In this section, we discuss some work related to the automatic acquisition of taxonomies out of texts. The early work of [Hindle, 199] on noun classification from predicate-argument structures is very related to the approach presented here. Hindle s work is as ours based on the distributional hypothesis, i.e. that nouns are similar to the extent that they share contexts. The central idea of his approach is that nouns may be grouped according to the extent to which they appear in similar verb frames. In particular, he takes into account nouns appearing as subjects and objects of verbs, but does not distinguish between them in his similarity measure. Our approach goes one step further in the sense that we do not only group nouns together, but also derive a hierarchical order between them. Also very related to the work presented here is the approach of [Faure and Nedellec, 1998]. Their work is also based on the distributional hypothesis and they present an iterative bottom-up clustering approach of nouns appearing in similar contexts. At each step, they cluster together the two most similar extents of some argument position of two verbs. However, their approach requires manual validation after each clustering step so that in our view it can not be called unsupervised or automatic anymore. [Hearst, 1992] aims at the acquisition of hyponym relations from Grolier s American Academic Encyclopedia. In order to identify these relations, she makes use of lexico-syntactic patterns manually acquired from her corpus. Hearst s approach is characterized by a high precision in the sense that the quality of the learned relations is very high. However, her approach suffers from a very low recall which is due to the fact that the patterns are very rare. Thus it is doubtful that her approach would be scalable to full text articles such as considered here. [Pereira et al., 1993] present a top-down clustering approach to build an unlabeled hierarchy of nouns. As in our approach, they also make use of verb-object relations to represent the context of a certain noun. In contrast to Pereira et al. s work, in our approach the clustering is only used as a method in order to overcome data sparseness, while the hierarchy is then built at a second step on the basis of FCA. [Schulte im Walde, 2] approaches the clustering of verbs based on syntactic subcatgorization frames. Her model allows to account for polysemy as verbs can be assigned to different clusters. Interestingly, her clustering results get worse when adding selectional restrictions to the subcategorization frames. [Basili et al., 1993] also address the problem of clustering verbs with the aim of building a verb hierarchy. For this purpose they make use of an incremental conceptual clustering algorithm which does not only consider syntactic relations but manually assigned selectional restrictions. Their algorithm is able to consider multiple instances of a given verb such that it can in principle account for polysemous verbs. [Bisson et al., 2] present an interesting framework and a corresponding workbench - Mo K - allowing users to design conceptual clustering methods to assist them in an ontology building task. The framework is general enough to integrate different clustering methods. Thus, it would be certainly interesting to clarify if the approach presented in this paper is compatible with their framework. [Velardi et al., 21] present the OntoLearn system which discovers i) the domain concepts relevant for a certain domain, i.e. the relevant terminology, ii) named entities, iii) vertical (is-a or taxonomic) relations as well iv) as certain relations between concepts based on specific syntactic relations. In their approach a vertical relation is established between a term : # and a term : &,i.e.is-a(: #,: & ), if the head of : & matches the head of : # and additionally the former is additionally modified in : #. Thus, a vertical relation is for example established between the term international credit card and the term credit card, i.e. is-a(international credit card,credit card). This approach is certainly very simple and could be complemented by the one presented in this paper. [Caraballo, 1999] also uses clustering methods to derive an unlabeled hierarchy of nouns by using data on conjunctions of nouns and appositive constructs collected from the Wall Street Journal corpus. Interestingly, at a second step she also labels the abstract concepts of the hierarchy by considering the Hearst patterns in which the children of the concept in question appear as hyponyms. The most frequent hypernym is then chosen in order to label the concept. Furthermore, at a further step she also compresses the produced ontological tree by eliminating internal nodes without a label. The final ontological tree is then evaluated by presenting a random choice of clusters and the corresponding hypernym to three human judges for validation. [Sanderson and Croft, 1999] describe an interesting approach to automatically derive a hierarchy by considering the document a certain term appears in as context. In particular, they present a document-based definition of subsumption according to which a certain term : # is more special than a term : & if : & also appears in all the documents in which : # appears. [Hahn and Schanttinger, 1998] aim at learning the correct ontological class for unknown words. For this purpose, when encountering an unknown word in a text they initially create one hypothesis space for each concept the unknown word could actually belong to. These initial hypothesis spaces are then iteratively refined on the basis of evidence extracted from the linguistic context the unknown word appears in. In their approach, evidence is formalized in the form of quality labels attached to each hypothesis space. At the end the hypothesis space with maximal evidence with regard to the qualification calculus used is chosen as the correct ontological concept for the word in question. [Maedche et al., 22] present a classification approach based on a combination of the k nearest neighbor (knn) method and the tree-ascending classification algorithm in order to learn the appropriate class in a given thesaurus for a certain term. In [Maedche and Staab, 2], they moreover describe an approach to learning generic binary relations between terms with regard to given a certain taxonomy by computing association rules. In general, we believe that a combination approach of different methodologies is the key towards automatically generating acceptable and reasonable taxonomies which can

8 be used as a starting point for applications and then be refined during their life-cycle. From this point of view all the above approaches are definitely related to ours. 7 Conclusion and Further Work We have presented a method for the automatic acquisition of taxonomies out of text which is in line with the idea of a dynamic and corpus-specific lexicon as presented in [Buitelaar, 2]. In future work, we would like to apply our approach to other domains as well as other languages, in particular german. On the other hand, our approach suffers from the fact that it completely neglects polysemy. Further work will thus address the question if polysemy can be taken into account in some way. Further work will also consider other similarity measures as well as clustering techniques. Finally, we also aim at learning other relations than taxonomic ones. For this purpose we envision an approach as described in [Resnik, 1997] in order to learn relations at the right level of abstraction with regard to our automatically acquired taxonomy. Acknowledgments This work was carried out within the ISTdot.kom project ( sponsored by the European Commission as part of the framework V, (grant IST ). dot.kom involves the University of Sheffi eld (UK), ITC- Irst (I), Ontoprise (D), the Open University (UK), Quinary (I) and the University of Karlsruhe (D). Its objectives are to develop Knowledge Management and Semantic Web methodologies based on Adaptive Information Extraction from Text. We would also like to thank Martin Kavalec for kindly providing the Lonely Planet Corpus. References [Basili et al., 1993] R. Basili, M.T. Pazienza, and P. Velardi. Hierarchical clustering of verbs. In Proceedings of the Workshop on Acquisition of Lexical Knowledge from Text, [Bisson et al., 2] G. Bisson, C. Nedellec, and L. Canamero. Designing clustering methods for ontology building - The Mo K workbench. In Proceedings of the ECAI Ontology Learning Workshop, 2. [Bozsak et al., 22] E. Bozsak, M. Ehrig, S. Handschuh, A. Hotho, A. Maedche, B. Motik, D. Oberle, C. Schmitz, S. Staab, L. Stojanovic, N. Stojanovic, R. Studer, G. Stumme, Y. Sure, J. Tane, R. Volz, and V. Zacharias. Kaon - towards a large scale semantic web. In Proceedings of the Third International Conference on E-Commerce and Web Technologies (EC-Web 22). Springer Lecture Notes in Computer Science, 22. [Buitelaar, 2] Paul Buitelaar. Semantic lexicons. In K. Simov and A. Kiryakov, editors, Ontologies and Lexical Knowledge Bases, pages [Caraballo, 1999] S.A. Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguists, pages , [Faure and Nedellec, 1998] D. Faure and C. Nedellec. A corpusbased conceptual clustering method for verb frames and ontology. In P. Verlardi, editor, Proceedings of the LREC Workshop on Adapting lexical and corpus resources to sublanguages and applications, [Ganter and Wille, 1999] B. Ganter and R. Wille. Formal Concept Analysis Mathematical Foundations. Springer Verlag, [Gruber, 1993] T.R. Gruber. Toward principles for the design of ontologies used for knowledge sharing. In Formal Analysis in Conceptual Analysis and Knowledge Representation. Kluwer, [Hahn and Schanttinger, 1998] U. Hahn and K. Schanttinger. Towards text knowledge engineering. In AAAI 98/IAAI 98 Proceedings of the 15th National Conference on Artifi cial Intelligence and the 1th Conference on Innovative Applications of Artifi cial Intelligence, [Hearst, 1992] M.A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics, [Hindle, 199] D. Hindle. Noun classifi cation from predicateargument structures. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages , 199. [Maedche and Staab, 2] A. Maedche and S. Staab. Mining ontologies from text. In Proceedings of the European Conference on Artifi cial Intelligence (ECAI 2), 2. [Maedche and Staab, 22] A. Maedche and S. Staab. Measuring similarity between ontologies. In Proceedings of the European Conference on Knowledge Acquisition and Management (EKAW). Springer, 22. [Maedche et al., 22] A. Maedche, V. Pekar, and S. Staab. Ontology learning part one - on discovering taxonomic relations from the web. In Web Intelligence. Springer, 22. [Manning and Schuetze, 1999] C. Manning and H. Schuetze. Foundations of Statistical Language Processing. MIT Press, [Pereira et al., 1993] F. Pereira, N. Tishby, and L. Lee. Distributional clustering of english words. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pages , [Pinto and Martins, 23] H.S. Pinto and J.P. Martins. Ontologies: How can they be built? Knowledge and Information Systems, 23. [Resnik, 1997] Philip Resnik. Selectional preference and sense disambiguation. In Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, [Sanderson and Croft, 1999] M. Sanderson and B. Croft. Deriving concept hierarchies from text. In Research and Development in Information Retrieval, pages [Schmid, 2] Helmut Schmid. Lopar: Design and implementation. In Arbeitspapiere des Sonderforschungsbereiches 34, number [Schulte im Walde, 2] S. Schulte im Walde. Clustering verbs semantically according to their alternation behaviour. In Proceedings of the 18th International Conference on Computational Linguistics (COLING-), 2. [Stojanovic et al., 22] L. Stojanovic, N. Stojanovic, and A. Maedche. Change discovery in ontology-based knowledge management systems. In Proceedings of the 2nd International Workshop on Evolution and Change in Data Management (ECDM 22), 22. [Velardi et al., 21] P. Velardi, P. Fabriani, and M. Missikoff. Using text processing techniques to automatically enrich a domain ontology. In Proceedings of the ACM International Conference on Formal Ontology in Information Systems, 21.

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,

More information

Clustering of Polysemic Words

Clustering of Polysemic Words Clustering of Polysemic Words Laurent Cicurel 1, Stephan Bloehdorn 2, and Philipp Cimiano 2 1 isoco S.A., ES-28006 Madrid, Spain lcicurel@isoco.com 2 Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe,

More information

Accessing Distributed Learning Repositories through a Courseware Watchdog

Accessing Distributed Learning Repositories through a Courseware Watchdog Accessing Distributed Learning Repositories through a Courseware Watchdog Christoph Schmitz, Steffen Staab, Rudi Studer, Gerd Stumme, Julien Tane Learning Lab Lower Saxony (L3S), Expo Plaza 1, D 30539

More information

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

The Courseware Watchdog: an Ontology-based tool for finding and organizing learning material

The Courseware Watchdog: an Ontology-based tool for finding and organizing learning material The Courseware Watchdog: an Ontology-based tool for finding and organizing learning material Julien Tane, Christoph Schmitz, Gerd Stumme, Steffen Staab, Rudi Studer Learning Lab Lower Saxony (L3S), Expo

More information

Integrating Information Extraction, Ontology Learning and Semantic Browsing into Organizational Knowledge Processes

Integrating Information Extraction, Ontology Learning and Semantic Browsing into Organizational Knowledge Processes Integrating Information Extraction, Ontology Learning and Semantic Browsing into Organizational Knowledge Processes José Iria 1, Fabio Ciravegna 1, Philipp Cimiano 2, Alberto Lavelli 3, Enrico Motta 4,

More information

Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

More information

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan hasegawa.takaaki@lab.ntt.co.jp

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

The Semantic Web relies heavily on formal ontologies to structure data for comprehensive

The Semantic Web relies heavily on formal ontologies to structure data for comprehensive T h e S e m a n t i c W e b Ontology Learning for the Semantic Web Alexander Maedche and Steffen Staab, University of Karlsruhe The Semantic Web relies heavily on formal ontologies to structure data for

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

Multi-Algorithm Ontology Mapping with Automatic Weight Assignment and Background Knowledge

Multi-Algorithm Ontology Mapping with Automatic Weight Assignment and Background Knowledge Multi-Algorithm Mapping with Automatic Weight Assignment and Background Knowledge Shailendra Singh and Yu-N Cheah School of Computer Sciences Universiti Sains Malaysia 11800 USM Penang, Malaysia shai14@gmail.com,

More information

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process

More information

Ontology Learning for the Semantic Web

Ontology Learning for the Semantic Web Ontology Learning for the Semantic Web Alexander Maedche and Steffen Staab Institute AIFB, D-76128 Karlsruhe, Germany http://www.aifb.uni-karlsruhe.de/wbs and Ontoprise GmbH, Haid-und-Neu-Strasse 7, 76131

More information

A Framework for Ontology-Based Knowledge Management System

A Framework for Ontology-Based Knowledge Management System A Framework for Ontology-Based Knowledge Management System Jiangning WU Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China E-mail: jnwu@dlut.edu.cn Abstract Knowledge

More information

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES

More information

Semantic Resource Management for the Web: An E-Learning Application

Semantic Resource Management for the Web: An E-Learning Application Semantic Resource Management for the Web: An E-Learning Application Julien Tane Christoph Schmitz Gerd Stumme Learning Lab Lower Saxony, Expo Plaza 1, D-30539 Hannover, Germany @learninglab.de

More information

A Pattern-based Framework of Change Operators for Ontology Evolution

A Pattern-based Framework of Change Operators for Ontology Evolution A Pattern-based Framework of Change Operators for Ontology Evolution Muhammad Javed 1, Yalemisew M. Abgaz 2, Claus Pahl 3 Centre for Next Generation Localization (CNGL), School of Computing, Dublin City

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Irina Astrova 1, Bela Stantic 2 1 Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn,

More information

Mining Domain Specific Texts and Glossaries to Evaluate and Enrich Domain Ontologies

Mining Domain Specific Texts and Glossaries to Evaluate and Enrich Domain Ontologies Mining Domain Specific Texts and Glossaries to Evaluate and Enrich Domain Ontologies Viral Parekh Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County Baltimore,

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Enriching the Crosslingual Link Structure of Wikipedia - A Classification-Based Approach -

Enriching the Crosslingual Link Structure of Wikipedia - A Classification-Based Approach - Enriching the Crosslingual Link Structure of Wikipedia - A Classification-Based Approach - Philipp Sorg and Philipp Cimiano Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe, Germany {sorg,cimiano}@aifb.uni-karlsruhe.de

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Michelle

More information

Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets

Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells Computer Science Dep., Universidad Autonoma de Madrid, 28049 Madrid, Spain

More information

The Theory of Concept Analysis and Customer Relationship Mining

The Theory of Concept Analysis and Customer Relationship Mining The Application of Association Rule Mining in CRM Based on Formal Concept Analysis HongSheng Xu * and Lan Wang College of Information Technology, Luoyang Normal University, Luoyang, 471022, China xhs_ls@sina.com

More information

Ontology Evolution for Customer Services

Ontology Evolution for Customer Services Ontology Evolution for Customer Services Tho T. Quan 1 and Thai D. Nguyen 2 1 Faculty of Computer Science and Engineering, Hochiminh City University of Technology Hochiminh City Vietnam qttho@cse.hcmut.edu.vn

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

What Is This, Anyway: Automatic Hypernym Discovery

What Is This, Anyway: Automatic Hypernym Discovery What Is This, Anyway: Automatic Hypernym Discovery Alan Ritter and Stephen Soderland and Oren Etzioni Turing Center Department of Computer Science and Engineering University of Washington Box 352350 Seattle,

More information

KNOWLEDGE ORGANIZATION

KNOWLEDGE ORGANIZATION KNOWLEDGE ORGANIZATION Gabi Reinmann Germany reinmann.gabi@googlemail.com Synonyms Information organization, information classification, knowledge representation, knowledge structuring Definition The term

More information

Requirements Ontology and Multi representation Strategy for Database Schema Evolution 1

Requirements Ontology and Multi representation Strategy for Database Schema Evolution 1 Requirements Ontology and Multi representation Strategy for Database Schema Evolution 1 Hassina Bounif, Stefano Spaccapietra, Rachel Pottinger Database Laboratory, EPFL, School of Computer and Communication

More information

A typology of ontology-based semantic measures

A typology of ontology-based semantic measures A typology of ontology-based semantic measures Emmanuel Blanchard, Mounira Harzallah, Henri Briand, and Pascale Kuntz Laboratoire d Informatique de Nantes Atlantique Site École polytechnique de l université

More information

Basic Parsing Algorithms Chart Parsing

Basic Parsing Algorithms Chart Parsing Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS 2011/2012 Anna Schmidt Talk Outline Chart Parsing Basics Chart Parsing Algorithms Earley Algorithm CKY Algorithm

More information

How To Find Influence Between Two Concepts In A Network

How To Find Influence Between Two Concepts In A Network 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing

More information

Topics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment

Topics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

BUSINESS RULES AND GAP ANALYSIS

BUSINESS RULES AND GAP ANALYSIS Leading the Evolution WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Discovery and management of business rules avoids business disruptions WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Business Situation More

More information

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Overview of MT techniques. Malek Boualem (FT)

Overview of MT techniques. Malek Boualem (FT) Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,

More information

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

More information

Visualizing WordNet Structure

Visualizing WordNet Structure Visualizing WordNet Structure Jaap Kamps Abstract Representations in WordNet are not on the level of individual words or word forms, but on the level of word meanings (lexemes). A word meaning, in turn,

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

How To Write A Summary Of A Review

How To Write A Summary Of A Review PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

SWAP: ONTOLOGY-BASED KNOWLEDGE MANAGEMENT WITH PEER-TO-PEER TECHNOLOGY

SWAP: ONTOLOGY-BASED KNOWLEDGE MANAGEMENT WITH PEER-TO-PEER TECHNOLOGY SWAP: ONTOLOGY-BASED KNOWLEDGE MANAGEMENT WITH PEER-TO-PEER TECHNOLOGY M. EHRIG, C. TEMPICH AND S. STAAB Institute AIFB University of Karlsruhe 76128 Karlsruhe, Germany E-mail: {meh,cte,sst}@aifb.uni-karlsruhe.de

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

Ontology-Based Filtering Mechanisms for Web Usage Patterns Retrieval

Ontology-Based Filtering Mechanisms for Web Usage Patterns Retrieval Ontology-Based Filtering Mechanisms for Web Usage Patterns Retrieval Mariângela Vanzin, Karin Becker, and Duncan Dubugras Alcoba Ruiz Faculdade de Informática - Pontifícia Universidade Católica do Rio

More information

Ontology construction on a cloud computing platform

Ontology construction on a cloud computing platform Ontology construction on a cloud computing platform Exposé for a Bachelor's thesis in Computer science - Knowledge management in bioinformatics Tobias Heintz 1 Motivation 1.1 Introduction PhenomicDB is

More information

Identifying Thesis and Conclusion Statements in Student Essays to Scaffold Peer Review

Identifying Thesis and Conclusion Statements in Student Essays to Scaffold Peer Review Identifying Thesis and Conclusion Statements in Student Essays to Scaffold Peer Review Mohammad H. Falakmasir, Kevin D. Ashley, Christian D. Schunn, Diane J. Litman Learning Research and Development Center,

More information

Quotes from Object-Oriented Software Construction

Quotes from Object-Oriented Software Construction Quotes from Object-Oriented Software Construction Bertrand Meyer Prentice-Hall, 1988 Preface, p. xiv We study the object-oriented approach as a set of principles, methods and tools which can be instrumental

More information

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

More information

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval

More information

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering

More information

Information Need Assessment in Information Retrieval

Information Need Assessment in Information Retrieval Information Need Assessment in Information Retrieval Beyond Lists and Queries Frank Wissbrock Department of Computer Science Paderborn University, Germany frankw@upb.de Abstract. The goal of every information

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D. Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital

More information

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach Outline Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach Jinfeng Yi, Rong Jin, Anil K. Jain, Shaili Jain 2012 Presented By : KHALID ALKOBAYER Crowdsourcing and Crowdclustering

More information

Semantic Clustering in Dutch

Semantic Clustering in Dutch t.van.de.cruys@rug.nl Alfa-informatica, Rijksuniversiteit Groningen Computational Linguistics in the Netherlands December 16, 2005 Outline 1 2 Clustering Additional remarks 3 Examples 4 Research carried

More information

Natural Language Processing: Mature Enough for Requirements Documents Analysis?

Natural Language Processing: Mature Enough for Requirements Documents Analysis? Natural Language Processing: Mature Enough for Requirements Documents Analysis? Leonid Kof Technische Universität München, Fakultät für Informatik, Boltzmannstr. 3, D-85748 Garching bei München, Germany

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information

More information

LexOnt: A Semi-automatic Ontology Creation Tool for Programmable Web

LexOnt: A Semi-automatic Ontology Creation Tool for Programmable Web LexOnt: A Semi-automatic Ontology Creation Tool for Programmable Web Knarig Arabshian Bell Labs, Alcatel-Lucent Murray Hill, NJ Peter Danielsen Bell Labs, Alcatel-Lucent Naperville, IL Sadia Afroz Drexel

More information

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD. Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.

More information

Text Analytics. A business guide

Text Analytics. A business guide Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Building Ontology Networks: How to Obtain a Particular Ontology Network Life Cycle?

Building Ontology Networks: How to Obtain a Particular Ontology Network Life Cycle? See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/47901002 Building Ontology Networks: How to Obtain a Particular Ontology Network Life Cycle?

More information

Improving information retrieval effectiveness by using domain knowledge stored in ontologies

Improving information retrieval effectiveness by using domain knowledge stored in ontologies Improving information retrieval effectiveness by using domain knowledge stored in ontologies Gábor Nagypál FZI Research Center for Information Technologies at the University of Karlsruhe Haid-und-Neu-Str.

More information

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015 ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1

More information

Learning Translation Rules from Bilingual English Filipino Corpus

Learning Translation Rules from Bilingual English Filipino Corpus Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

Annotation for the Semantic Web during Website Development

Annotation for the Semantic Web during Website Development Annotation for the Semantic Web during Website Development Peter Plessers, Olga De Troyer Vrije Universiteit Brussel, Department of Computer Science, WISE, Pleinlaan 2, 1050 Brussel, Belgium {Peter.Plessers,

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Component visualization methods for large legacy software in C/C++

Component visualization methods for large legacy software in C/C++ Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University mcserep@caesar.elte.hu

More information

Chapter 8 The Enhanced Entity- Relationship (EER) Model

Chapter 8 The Enhanced Entity- Relationship (EER) Model Chapter 8 The Enhanced Entity- Relationship (EER) Model Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Outline Subclasses, Superclasses, and Inheritance Specialization

More information

Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets

Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets Disambiguating Implicit Temporal Queries by Clustering Top Ricardo Campos 1, 4, 6, Alípio Jorge 3, 4, Gaël Dias 2, 6, Célia Nunes 5, 6 1 Tomar Polytechnic Institute, Tomar, Portugal 2 HULTEC/GREYC, University

More information

Ontology Modeling Using UML

Ontology Modeling Using UML Ontology Modeling Using UML Xin Wang Christine W. Chan Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 wangx@cs.uregina.ca, chan@cs.uregina.ca Abstract Ontology

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Semantic Analysis of. Tag Similarity Measures in. Collaborative Tagging Systems

Semantic Analysis of. Tag Similarity Measures in. Collaborative Tagging Systems Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems 1 Ciro Cattuto, 2 Dominik Benz, 2 Andreas Hotho, 2 Gerd Stumme 1 Complex Networks Lagrange Laboratory (CNLL), ISI Foundation,

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Personalized Hierarchical Clustering

Personalized Hierarchical Clustering Personalized Hierarchical Clustering Korinna Bade, Andreas Nürnberger Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, D-39106 Magdeburg, Germany {kbade,nuernb}@iws.cs.uni-magdeburg.de

More information

Creating Template Contract Documents using Multi- Agent Text Understanding and Clustering in Cars Insurance Domain

Creating Template Contract Documents using Multi- Agent Text Understanding and Clustering in Cars Insurance Domain Creating Template Contract Documents using Multi- Agent Text Understanding and Clustering in Cars Insurance Domain Igor Minakov 1, George Rzevski 2, Petr Skobelev 1, Simon Volman 1 1 MAGENTA Development,

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

Learning of Ontologies for the Web: the Analysis of Existent Approaches

Learning of Ontologies for the Web: the Analysis of Existent Approaches Learning of Ontologies for the Web: the Analysis of Existent Approaches Borys Omelayenko Vrije Universiteit Amsterdam, Division of Mathematics and Computer Science, De Boelelaan 1081a, 1081 HV Amsterdam,

More information

RRSS - Rating Reviews Support System purpose built for movies recommendation

RRSS - Rating Reviews Support System purpose built for movies recommendation RRSS - Rating Reviews Support System purpose built for movies recommendation Grzegorz Dziczkowski 1,2 and Katarzyna Wegrzyn-Wolska 1 1 Ecole Superieur d Ingenieurs en Informatique et Genie des Telecommunicatiom

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

A Semantically Enriched Competency Management System to Support the Analysis of a Web-based Research Network

A Semantically Enriched Competency Management System to Support the Analysis of a Web-based Research Network A Semantically Enriched Competency Management System to Support the Analysis of a Web-based Research Network Paola Velardi University of Roma La Sapienza Italy velardi@di.uniroma1.it Alessandro Cucchiarelli

More information

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

Parsing Software Requirements with an Ontology-based Semantic Role Labeler Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh mroth@inf.ed.ac.uk Ewan Klein University of Edinburgh ewan@inf.ed.ac.uk Abstract Software

More information

On the role of a Librarian Agent in Ontology-based Knowledge Management Systems

On the role of a Librarian Agent in Ontology-based Knowledge Management Systems On the role of a Librarian Agent in Ontology-based Knowledge Management Systems Nenad Stojanovic Institute AIFB, University of Karlsruhe, 76128 Karlsruhe, Germany nst@aifb.uni-karlsruhe.de Abstract: In

More information

Change Management for Metadata Evolution

Change Management for Metadata Evolution Change Management for Metadata Evolution Diana Maynard 1, Wim Peters 1, Mathieu d Aquin 2, and Marta Sabou 2 1 Department of Computer Science, University of Sheffield Regent Court, 211 Portobello Street,

More information