3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work
|
|
- Damian Hicks
- 8 years ago
- Views:
Transcription
1 Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa , Japan Abstract One of the difficulties in Natural Language Processing is the fact that there are many way to express the same thing or event. These expressions are called Paraphrases. Paraphrase is important in applications such as IR, QA and IE, and one of the difficulties in paraphrase research is acquiring the requisite paraphrase knowledge. In this paper, we describe an unsupervised method to discover paraphrases containing two named entities from a large untagged corpus. The proposed method consists of two stages. First, it finds relations between named entities using similarity of context and clustering. Then, the phrases which express the relation are selected from each cluster to acquire paraphrases. Our experiments with one year of newspaper reveal that we can discover a variety of paraphrases with high precision and high recall. 1 Introduction One of the difficulties in Natural Language Processing is the fact that there are many way to express the same thing or event. If the expression is a word or a short phrase (like corporation and company ), it is called a synonym. There has been a lot of research on such lexical relations, along with the creation of resources such as WordNet. If the expression is longer or complicated (like A buys B and A s purchase of B ), it is called paraphrase, i.e. a set of phrases which express the same thing or event. Recently, this topic has been getting more attention, as is evident Satoshi Sekine and Ralph Grishman NewYorkUniversity 715 Broadway, 7th floor, New York, NY 10003, U.S.A. sekine@cs.nyu.edu from the Paraphrase Workshops in 2003 and 2004, driven by the needs of various NLP applications. For example, in Information Retrieval (IR), we have to match a user s query to the expressions in the desired documents, while in Question Answering (QA), we have to find the answer to the user s question even if the formulation of the answer in the document is different from the question. Also, in Information Extraction (IE), in which the system tries to extract elements of some events (e.g. date and company names of a corporate merger event), several event instances from different news articles have to be aligned even if these are expressed differently. We have realized the importance of paraphrase; however, the major obstacle is the construction of paraphrase knowledge. For example, we can easily imagine that the number of paraphrases for A buys B is enormous and it is not possible to create comprehensive knowledge. Also, we don t know how many kinds of such paraphrase sets are necessary to cover even some everyday things or events. Up to now, most IE researchers have been creating paraphrase knowledge (or IE patterns) by hand and for specific tasks. So, there is a limitation that IE can only be performed for a pre-defined task, like corporate mergers or management succession. In order to create an IE system for a new domain, you have to spend a long time to create the knowledge. So, it is too costly to make IE technology open-domain like IR or QA. In this paper, we will propose an unsupervised method to discover paraphrases from a large untagged corpus. We are focusing on phrases which have two Named Entities (NEs), as those types of phrases are very important for IE applications. After tagging a large corpus with an automatic NE tagger, the method tries to find sets of paraphrases automatically without being given a seed phrase or any kinds of cue. The proposed ap-
2 proach uses the relation discovery method described in (Hasegawa et al. 04). It is an unsupervised method for finding common relations from a large corpus. We will describe this method below, as it is integral to our paraphrase discovery procedure. The rest of this paper is organized as follows. We discuss the prior work in paraphrase discovery and their limitations in section 2. We describe our method in section 3. Then we report experiments and evaluations in section 4, and discuss the result in section 5. 2 Prior Work There have been several efforts to discover paraphrase automatically from corpora. One general approach uses comparable documents, which are sets of documents whose content are known to be almost the same. In other words, those methods need comparable corpora, implicit or explicit, such as different newspaper stories about the same event (Shinyama and Sekine 03) or different translations of the same story (Barzilay 01). They basically try to find paraphrases in the comparable parts of documents using clues like named entities. However, the availability of comparable corpora is limited; in particular, in the case of Barzilay s approach, the availability of multiple translations of the same story is clearly limited. This is a significant limitation on this general approach. Another approach to finding paraphrases is to find phrases which take similar subjects and objects in large corpora by using mutual information of word distribution (Lin and Pantel 01). This approach is designed to accumulate phrases useful for the QA task by giving a pair consisting of two important phrases from the question and the answer. So, this approach needs a phrase as an initial seed and thus the possible relationships to be extracted are naturally limited. There has also been work using a bootstrapping approach (Brin 98; Agichtein and Gravano 00; Ravichandran and Hovy 02). Their basic strategy is, for a given pair of entity types, to start with some examples, like several famous book title and author pairs; and find expressions which contains those names; then using the found expressions, find more author and book title pairs. This can be repeated several times and collect a list of author and book title pairs and expressions. Ravichandran demonstrated that the collected list improved the accuracy of a QA system. However, those methods need initial seeds, so the relation between entities has to be known in advance. This limitation is the obstacle to making the technology open domain. 3 Paraphrase Acquisition 3.1 Overview Our goal is to discover the paraphrases that represent a particular relation between two named entities. If we could identify pairs of named entities (such as Cingular and AT&T Wireless ) which have a particular relation (such as merger & acquisition ), we could also find paraphrases expressing the relation between these two names. Under this assumption, we propose an approach of paraphrase acquisition via relation discovery from large text documents. Our approach is fully unsupervised and we only need a named entity tagger and large text corpora. The outline of the method is as the follows: 1. Tag named entities in text corpora 2. Discover particular relations by clustering named entity pairs by their context 3. Select phrases which express the relation from those in the cluster Figure 1 shows the overview of the method. First, from the NE-tagged newspaper corpus, we extract expressions containing frequently-appearing pairs of named entities; in the figure, these are expressions containing the pair of COMPANY A and B, C and D, and E and F. Then, we accumulate the context words intervening between these entities, such as is offered to buy, negotiate to acquire for A and B. If the contexts for A and B and those for E and F are similar, it is likely that these pairs represent the same relation; in the figure, A and B and E and F have M&A relation. By this method, we believe we can accumulate the instances of phrases, as well as the instances of relations which are important in the text.
3 aaa NE tagged corpus (Newspaper) 1) Extract expressions between two NE instances 2) build clusters of NE pairs <Company-A Company-B> A is offering to buy B A s proposed acquisitions of B A s interest in B A negotiates to acquire B A is discussing with B. <Company-C Company-D> C s parent company D C is a subsidiary of D <Company-E Company-F> E s acquisition of F E would buy F 3) Find phrases which express the relation (paraphrases) in each clusters Figure 1. Overview of the method Next, we try to acquire phrases to represent the relation from the expressions found in each cluster. The expressions in the cluster include expressions irrelevant to the relation, such as A is discussing with B which is not really the M&A relation, so we apply two constraints in order to select only the phrases expressing the relation. One is the phrase duplication constraint, where the phrase has to appear with some minimum number of NE pair instances in the cluster. The other constraint is the common word constraint, which is to select phrases which contain a frequent word in the cluster. For example, if the word acquisition appears frequently in the cluster, phrases including the word acquisition are likely to be phrases expressing the relation, here the M&A relation. 3.2 Named entity tagging Our proposed method is fully unsupervised. We do not need comparable corpora or any initial seeds which are manually selected. Instead, we use a named entity (NE) tagger. Current automatic named entity taggers have quite satisfactory performance. In addition, the set of NE types has been extended. For example, (Sekine et al. 02) proposed 150 NE types. Extending the NE types would lead to more effective relation discovery. For example, if the type ORGANIZATION is divided into several subtypes, like COMPANY, MILITARY, GOVERNMENT and so on, the discovery procedure could detect more specific relations such as those between COMPANYs. We use an extended NE tagger (reference to be provided in the final paper). 3.3 Relation Discovery We define the co-occurrence between NE pairs as follows: two named entities are considered to cooccur if they appear with no more than 5 intervening words in the same sentence. We collect the intervening words between two named entities for each co-occurrence. These words, which are stemmed, could be regarded as the context of the pair of named entities. Different orders of occurrence of the same named entities are considered as
4 different co-occurrence. Less frequent NE pairs are eliminated because they might be less reliable for relation discovery. We set the co-occurrence frequency threshold to be 30. The vector space model of the context words and the cosine similarity of the vectors are used in order to calculate the similarities between NE pairs. A context vector for each NE pair instance consists of the bag of words formed from all intervening words (excluding stop words) of two named entities. Each word of a context vector is weighted by tf*idf, the product of term frequency and inverse document frequency. Term frequency is the number of occurrences of a word in the collected context words. Document frequency is the number of documents which include the word. The similarity of two pairs of named entities is calculated by cosine similarity of the two vectors. We compare NE pairs of the same NE types, e.g., PERSON-GPE (a geographical-political entity -- a region with a government) pair. In this paper, we will refer to a pair of named entity types as a domain. In addition to the PERSON-GPE domain, we will report on our experiment on the COMPANY-COMPANY domain. 3.4 Clustering After we calculate the similarity among context vectors of NE pairs, we make clusters of NE pairs based on the similarity. We adopt hierarchical clustering and used complete linkage to avoid the chain effect of single-link clustering, which could join two not-so-similar members into a single cluster. In the complete linkage method, the distance between clusters is taken to be the distance of the furthest nodes in the two clusters. Now we have a set of named entity pairs which are likely to express the same relation. 3.5 Selection of Paraphrases Even though a set of named entity pairs in the same relation have been found, not all of the phrases used in those clusters express the relation. In order to filter out the phrases which do not express the relation, we applied two constrains: [Phrase duplication constraint:] A phrase must be shared by at least two NE pairs in a cluster. [Common word constraint:] A phrase must include one of the frequent common words in a cluster. The phrase duplication constraint requires a phrase to have appeared in multiple NE pairs in the same cluster. It is intended to delete phrases which appear accidentally or are specific phrases to a particular NE pair. In the common word constraint, we rely on the idea that words appearing frequently in the cluster are relevant to the relation of the cluster and if a phrase contains one or more of such words, the phrase is considered to express the relation. 4 Experiment We will report on our experiment in two successive stages. The first stage was relation discovery and the second stage was paraphrase acquisition. We conducted the experiment with one year of The New York Times (1995) as our corpus to verify the method. 4.1 Relation discovery First, the frequent NE pairs are found, and the NE pairs along with their intervening words are extracted and clustered. In order to evaluate the result, we analyzed all the extracted NE pair instances manually and identified the relations for two different domains. One was the PERSON- GPE domain, in which 177 distinct NE pairs are obtained and manually classified into 38 relations. The other was the COMPANY-COMPANY domain. We got 65 distinct NE pairs and manually classified them into 10 relations. We evaluated automatically extracted clusters consisting of two or more pairs. For each cluster, the most frequent relation represents the relation of the cluster. For example, in a cluster if there are seven NE pairs of relation A and three NE pairs of relation B, the cluster is labeled as A. When the relation of an NE pair instance is the same as the label of the cluster, it is counted as correct; the correct pair count, N correct, is defined as the total number of correct pairs in all clusters. Other NE pairs in the cluster are counted as incorrect; the incorrect pair count, N incorrect, is also defined as the total number of incorrect pairs in all clusters. We evaluate the clusters based on Recall, Precision
5 and F-measure. The definitions of these measures are as follows. [Recall (R)] How many correct pairs are detected out of all the key pairs? The key pair count, N key, is defined as the total number of pairs manually classified in clusters of two or more pairs. Recall = N correct / N key [Precision (P)] How many correct pairs are detected among the pairs clustered automatically? Precision = N correct / (N correct + N incorrect ) [F-measure (F)] F-measure is defined as a combination of recall and precision according to the following formula: F-measure = 2*Recall*Prec/(Recall+Prec) These values vary depending on the threshold of cosine similarity. We fixed the cosine threshold at a single value just above 0 for both domains, which gives almost maximum F values for both domains. This setting does not require parameter optimization and we believe it works for other domains, as well, because it means that all members of a cluster have to have at least one word in common with the other members of the same cluster. We got 34 clusters in the PER-GPE domain and 15 clusters in the COM-COM domain. Table 1 shows the result in both domains. We achieved 80 F-measure in the PER-GPE domain and 75 in the COM-COM domain. Domain Prec. Recall F PER-GPE COM-COM Table 1. Result of relation clustering 4.2 Paraphrase acquisition In the second stage, we are going to acquire paraphrases from the clusters of the same relation. Although we obtained some meaningful relations in smaller clusters, we will focus on the larger clusters, those with more than 4 members. We found that all large clusters have meaningful major relations and that the common words in those clusters accurately represented the relations. The large clusters represent the President, Senator, Prime Minister, Governor, Secretary, Republican and Coach relations in the PER-GPE domain, and the M&A, Parent and Alliance relations in the COM-COM domain. We made a reference data set of paraphrases by looking at the phrases in each cluster for both domains. We eliminated the single frequency phrases and phrases which consist of only symbols and stop words from the evaluation. With respect to the major relation, each phrase is categorized into one of the following 4 classes. Table 2 shows the distribution of the phrases. [Class 1:] Phrases which represent the major relation (i.e. strict paraphrases) [Class 2:] Phrases which almost represent the major relation but include extra words (i.e. more restrictive relations) [Class 3:] Phrases which suggest broader meaning than just the major relation (i.e. more general relations) [Class 4:] Phrases which cannot be regarded as representing the major relation (i.e. others) Phrase class total PER-GPE COM-COM Table 2. Reference data of phrase classes P-G C+C Baseline Phrase duplication Common word Phrase+Common R P F R P F R P F R P F Table 3. Evaluation result for paraphrase discovery
6 Then, we evaluated the result of the paraphrase acquisition experiment for the PER-GPE domain and the COM-COM domain. There are three criteria: 1) setting the key phrases to be those in Class 1 (strict paraphrases), 2) in Class 1 plus Class 2 and 3) in Class 1, Class 2 plus Class 3 (loose paraphrases). The loose paraphrases could be useful in an IE application. Even though the phrases are not interchangeable in general, those phrases can be used to extract information once the task is specific. The evaluation metric is the normal Recall, Precision and F-measure. Table 3 shows the evaluation results using different constraints (i.e. no constraint, the phrase duplication constraint, the common word constraint and the combined constraints). In the combined constraints, phrases which satisfy either constraint are saved, rather than satisfying both constraints (i.e. disjunction, rather than conjunction). In the common word constraint, we select the phrases for which the sum of the relative frequencies for each common word was above 0.4. The recall is calculated relative to the case of no constraint (baseline), as we are comparing the phrase sets among the phrases in the baseline, so the recall is 100% for the baseline experiment. However, the precision for the baseline is low because the reference data included a lot of irrelevant phrases. The aim of the two constraints is to push the precision higher while keeping the recall high. The best result is obtained with the common word constraint in the PER-GPE domain, and with the combined constraints in the COM-COM domain. In general, the common word constraint helps to improve the precision compared to the duplicated phrase constraints. This means that the paraphrases in the clusters are not shared by different NE pair instances so much, even though the paraphrases share some words in common. There are a variety of phrases in the COM-COM domain, compared with the PER-GPE domain. In the PER- GPE domain, there are rather small number of typical phrases for the relation (e.g., A is the President of B ). We believe that the PER-GPE domain contains more static relations, compared with the COM-COM domain, which contains more event relations. This assumption is also suggested by the result that the phrase duplication constraint works better in the PER-GPE domain. Table 4 shows some examples of successfully acquired paraphrases for the M&A and Parent relations in the COM-COM domain using the combined constraints. These phrases are paraphrases and would be useful for applications like Information Retrieval, Question Answering or Information Extraction. President A, the president of B B s new President A B s newly elected President, A A becomes president of B B under President A M&A A bought B A has agreed to buy B A, which is buying B A's proposed acquisition of B A's acquisition of B A's agreement to buy B A's purchase of B A bid for B A's takeover of B A merger with B A succeeded in buy B B, which was acquired by A B would become a subsidiary of A B agreed to be bought by A Parent A, a unit of the B A, owned by B A' parent, B B, the parent company of A B, hold company for A B, the company that own A Table 4. Examples of discovered paraphrases 5 Discussion In this section, we will discuss several issues regarding the proposed method. Error Analysis We analyze the errors which lower the precision, a problem primarily in the PER-GPE domain (False Alarms). This analysis was done for the data using the combined constraints. We categorized the errors into the following four types. The distribution of the errors is shown in Table 5.
7 [Error 1:] Phrase contains two different phrases [Error 2:] Relation discovery error [Error 3:] Relations dependent on context [Error 4:] Other errors Error P-G C-C Table 5. Error distribution The most severe error type (Error 1) involves phrases which actually contain different phrases. An example of such a phrase is visited France (GPE), when President Chirac (PERSON) invite the world leaders. Because France and Chirac are co-occuring frequent NE pairs and phrase (actually a sequence of words) GPE, when President PERSON satisfy the common word constraint, it was taken as a paraphrase candidate. This kind of errors made the precision lower, but we believe if we can use a parser to find the boundary of phrases, this error might be eliminated. Error 2 involves an example like U.S. (GPE) Vice President Al Gore (PERSON). As its context contains word President, the NE pair is regarded as president relation. This should be solved using frequent multi-word terms as keyword, but this remains one of our future work. An example of Error 3 is a phrase Tommy Thompson (PERSON), a Republican from Wisconsin (GPE). Actually, Mr. Tommy Thompson is not a senator or a representative, but a governor. When the sentence appears in a context of different view of different governors (i.e. it is obvious from the context that he is a governor), it does not mention Governor explicitly. So the phrase can be a paraphrase of governor relation in such context, but not always. We don t have a good idea for solving this kind of error. Limitation and Future Direction Our method has some limitations. We set several frequency thresholds, so we can t find less frequent relations between NE pairs and can t find paraphrases for such relations. However, we think that we could possibly resolve the limitation by two approaches. One approach is to increase the amount of text. We used only one-year corpus for this experiment, but there are much more corpus, e.g. newspaper corpus of more than 10 years, or much larger corpus of Web texts. If we can use such corpora, hopefully the sparseness problem will be diminished. The other approach is to combine bootstrapping methods (Brin 98; Agichtein and Gravano 00) with our relation discovery stage. We first find reliable paraphrases using frequent instances, then using the obtained knowledge, less frequent instances will be found. Unsupervised Methods The proposed method is a fully unsupervised method. When we look back over the last decade, there have been great advances in many fields of NLP using supervised machine learning. These include corpus-based POS taggers, NE taggers and treebank-based parsers. We believe that this was possible because those tasks can be decomposed into simple categorization tasks, and the amount of training text required is small enough to be prepared in reasonable time and effort. However, most of the serious NLP applications require a higher level of knowledge, in particular semantic knowledge. We believe that problem can t be solved by a small categorization task. So, recently we have observed an increasing focus on discovering semantic knowledge from untagged corpora, for example (Hearst 92; Riloff 98; Sudo et al. 03). The work in this paper is aiming the same objective, which is to find useful semantic knowledge from untagged corpora using unsupervised methods. As we are fortunate to be able to use enormous corpora, which was not possible 10 years ago, we believe this will be a fruitful direction for investing our efforts to advance NLP technologies. 6 Conclusion In this paper, we proposed an unsupervised method to discover paraphrases via relation discovery. The basic idea was, first, discovering the relation between named entities by clustering their contexts; and then selecting phrases expressing a major relation of the cluster by using the phrase duplication constraint and the common word constraint. Our experiments with one year of newspaper reveals that we were able to discover a variety of paraphrases with high precision and
8 high recall through the phrase selection constraint as well as the relation discovery process. of the 41 st Annual Meeting of the Association for Computational Linguistics (ACL03) References Agichtein, Eugene and Gravano, Luis Snowball: Extracting reations from large plain-text collocations. In Proc. of the 5 th ACM International Conference on Digital Libruaries (ACM DL00) pp Barzilay, Regina and McKeown, Kathleen Extracting paraphrases from a parallel corpus. In Proc. of the 39 th Annual Meeting of the Association for Computational Linguistics (ACL-EACL01), pp Brin, Sergey Extracting patterns and relations from world wide web. In Proc. of the WebDB Workshop at 6 th International Conference on Extending Database Technology (WebDB98), pp Hasegawa, Takaaki, Sekine, Satoshi and Grishman, Ralph Discovering Relations among Named Entities from Large Corpora, In the Proc. of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL04), pp Hearst, Marti A Automatic acquisition of hyponyms from large text corpora. In Proc of the Fourteenth International Conference on Computational Linguistics (COLING92). Lin, Dekang and Pantel, Patrick Dirt discovery of inference rules from text. In Proc. of the 7 th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD01), pp Ravichandran, Deepak and Hovy, Eduard Learning Surface Text Patterns for a Question Answering System. In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL02) Riloff E Automatically Generating Extraction Patterns from Untagged Text. In Proc.of the 13 th National Conference on Artificial Intelligence (AAAI96), Sekine, Satoshi, Sudo, Kiyoshi and Nobata Chikashi Extended Named Entity Hierarchy. In Proc. of the Third International Conference on Language Resource and Evaluation (LREC02), pp Shinyama, Yusuke and Sekine, Satoshi Paraphrase acquisition for information extraction. In Proc. of the Second International Workshop on Paraphrasing (IWP03) Sudo Kiyoshi, Sekine, Satoshi and Grishman, Ralph An improved extraction pattern representation model for automatic IE pattern acquisition. In Proc.
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationMining Signatures in Healthcare Data Based on Event Sequences and its Applications
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1
More informationTerm extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
More informationInteractive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More informationTaxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,
More informationBuilding a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationWhat Is This, Anyway: Automatic Hypernym Discovery
What Is This, Anyway: Automatic Hypernym Discovery Alan Ritter and Stephen Soderland and Oren Etzioni Turing Center Department of Computer Science and Engineering University of Washington Box 352350 Seattle,
More informationConstructing Dictionaries for Named Entity Recognition on Specific Domains from the Web
Constructing Dictionaries for Named Entity Recognition on Specific Domains from the Web Keiji Shinzato 1, Satoshi Sekine 2, Naoki Yoshinaga 3, and Kentaro Torisawa 4 1 Graduate School of Informatics, Kyoto
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationThe Enron Corpus: A New Dataset for Email Classification Research
The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationOn the Feasibility of Answer Suggestion for Advice-seeking Community Questions about Government Services
21st International Congress on Modelling and Simulation, Gold Coast, Australia, 29 Nov to 4 Dec 2015 www.mssanz.org.au/modsim2015 On the Feasibility of Answer Suggestion for Advice-seeking Community Questions
More informationData Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
More informationOpen Information Extraction from the Web
Open Information Extraction from the Web Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni Turing Center Department of Computer Science and Engineering University of
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationTopics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer
More informationPhase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
More informationMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews Minqing Hu and Bing Liu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu
More informationMining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
More informationTechnical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG
More informationResolving Common Analytical Tasks in Text Databases
Resolving Common Analytical Tasks in Text Databases The work is funded by the Federal Ministry of Economic Affairs and Energy (BMWi) under grant agreement 01MD15010B. Database Systems and Text-based Information
More informationTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
More informationDomain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu
Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!
More informationSYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
More informationEnhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects
Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Mohammad Farahmand, Abu Bakar MD Sultan, Masrah Azrifah Azmi Murad, Fatimah Sidi me@shahroozfarahmand.com
More informationReal-Time Identification of MWE Candidates in Databases from the BNC and the Web
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Identifying and Researching Multi-Word Units British Association for Applied Linguistics Corpus Linguistics SIG Oxford Text
More informationFacilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
More informationWikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
More informationChapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
More informationIdentifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of
More information72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology Paulo.gottgtroy@aut.ac.nz Abstract This paper is
More informationHow To Write A Summary Of A Review
PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationA Framework for Named Entity Recognition in the Open Domain
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics School of Humanities, Languages, and Social Sciences University of Wolverhampton Stafford
More informationSentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
More informationCross-Language Information Retrieval by Domain Restriction using Web Directory Structure
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure Fuminori Kimura Faculty of Culture and Information Science, Doshisha University 1 3 Miyakodani Tatara, Kyoutanabe-shi,
More informationGenerating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
More informationData Selection in Semi-supervised Learning for Name Tagging
Data Selection in Semi-supervised Learning for Name Tagging Abstract We present two semi-supervised learning techniques to improve a state-of-the-art multi-lingual name tagger. They improved F-measure
More informationDuplication in Corpora
Duplication in Corpora Nadjet Bouayad-Agha and Adam Kilgarriff Information Technology Research Institute University of Brighton Lewes Road Brighton BN2 4GJ, UK email: first-name.last-name@itri.bton.ac.uk
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationBrill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
More informationTREC 2003 Question Answering Track at CAS-ICT
TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China changyi@software.ict.ac.cn http://www.ict.ac.cn/
More informationCross-Lingual Concern Analysis from Multilingual Weblog Articles
Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/
More informationWhy is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
More informationPOSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationMovie Classification Using k-means and Hierarchical Clustering
Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani
More informationHow To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationCloud Storage-based Intelligent Document Archiving for the Management of Big Data
Cloud Storage-based Intelligent Document Archiving for the Management of Big Data Keedong Yoo Dept. of Management Information Systems Dankook University Cheonan, Republic of Korea Abstract : The cloud
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationdm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationNamed Entity Recognition in Broadcast News Using Similar Written Texts
Named Entity Recognition in Broadcast News Using Similar Written Texts Niraj Shrestha Ivan Vulić KU Leuven, Belgium KU Leuven, Belgium niraj.shrestha@cs.kuleuven.be ivan.vulic@@cs.kuleuven.be Abstract
More informationA Mutually Beneficial Integration of Data Mining and Information Extraction
In the Proceedings of the Seventeenth National Conference on Artificial Intelligence(AAAI-2000), pp.627-632, Austin, TX, 20001 A Mutually Beneficial Integration of Data Mining and Information Extraction
More informationBayesian Spam Filtering
Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary amaobied@ucalgary.ca http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating
More information3 Learning IE Patterns from a Fixed Training Set. 2 The MUC-4 IE Task and Data
Learning Domain-Specific Information Extraction Patterns from the Web Siddharth Patwardhan and Ellen Riloff School of Computing University of Utah Salt Lake City, UT 84112 {sidd,riloff}@cs.utah.edu Abstract
More informationTerminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
More informationMachine Learning Approach To Augmenting News Headline Generation
Machine Learning Approach To Augmenting News Headline Generation Ruichao Wang Dept. of Computer Science University College Dublin Ireland rachel@ucd.ie John Dunnion Dept. of Computer Science University
More informationHow To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
More informationSentiment-Oriented Contextual Advertising
Sentiment-Oriented Contextual Advertising Teng-Kai Fan, Chia-Hui Chang Department of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan 320, ROC tengkaifan@gmail.com,
More informationLarge-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
More informationArtificial Intelligence and Transactional Law: Automated M&A Due Diligence. By Ben Klaber
Artificial Intelligence and Transactional Law: Automated M&A Due Diligence By Ben Klaber Introduction Largely due to the pervasiveness of electronically stored information (ESI) and search and retrieval
More informationOn-Demand Information Extraction. Summer/Fall 07. New York University Satoshi Sekine
On-Demand Information Extraction Summer/Fall 07 New York University Satoshi Sekine Introduction (http://nlp.cs.nyu.edu/sekine) Research topics On-demand IE IE pattern Discovery Multi/Sing doc. sum. IE
More informationCustomer Intentions Analysis of Twitter Based on Semantic Patterns
Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun mohamed.hamrounn@gmail.com Mohamed Salah Gouider ms.gouider@yahoo.fr Lamjed Ben Said lamjed.bensaid@isg.rnu.tn ABSTRACT
More informationSemantic Class Induction and Coreference Resolution
Semantic Class Induction and Coreference Resolution Vincent Ng Human Language Technology Research Institute University of Texas at Dallas Richardson, TX 75083-0688 vince@hlt.utdallas.edu Abstract This
More informationBlog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationPersonalization of Web Search With Protected Privacy
Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information
More informationDomain Specific Word Extraction from Hierarchical Web Documents: A First Step Toward Building Lexicon Trees from Web Corpora
Domain Specific Word Extraction from Hierarchical Web Documents: A First Step Toward Building Lexicon Trees from Web Corpora Jing-Shin Chang Department of Computer Science& Information Engineering National
More informationAutomated News Item Categorization
Automated News Item Categorization Hrvoje Bacan, Igor S. Pandzic* Department of Telecommunications, Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia {Hrvoje.Bacan,Igor.Pandzic}@fer.hr
More informationAn Empirical Study on Web Mining of Parallel Data
An Empirical Study on Web Mining of Parallel Data Gumwon Hong 1, Chi-Ho Li 2, Ming Zhou 2 and Hae-Chang Rim 1 1 Department of Computer Science & Engineering, Korea University {gwhong,rim}@nlp.korea.ac.kr
More informationAnalyzing survey text: a brief overview
IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining
More informationOptimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,
More informationAccelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
More informationExtracting Events from Web Documents for Social Media Monitoring using Structured SVM
IEICE TRANS. FUNDAMENTALS/COMMUN./ELECTRON./INF. & SYST., VOL. E85A/B/C/D, No. xx JANUARY 20xx Letter Extracting Events from Web Documents for Social Media Monitoring using Structured SVM Yoonjae Choi,
More informationUsing Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams
2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationA Comparative Study on Sentiment Classification and Ranking on Product Reviews
A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan
More informationA Survey of Text Mining Techniques and Applications
60 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 1, NO. 1, AUGUST 2009 A Survey of Text Mining Techniques and Applications Vishal Gupta Lecturer Computer Science & Engineering, University
More informationVCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
More informationOverview of the TACITUS Project
Overview of the TACITUS Project Jerry R. Hobbs Artificial Intelligence Center SRI International 1 Aims of the Project The specific aim of the TACITUS project is to develop interpretation processes for
More informationANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
More informationDeveloping a Collaborative MOOC Learning Environment utilizing Video Sharing with Discussion Summarization as Added-Value
, pp. 397-408 http://dx.doi.org/10.14257/ijmue.2014.9.11.38 Developing a Collaborative MOOC Learning Environment utilizing Video Sharing with Discussion Summarization as Added-Value Mohannad Al-Mousa 1
More informationBridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded
More informationFinding Advertising Keywords on Web Pages. Contextual Ads 101
Finding Advertising Keywords on Web Pages Scott Wen-tau Yih Joshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University Contextual Ads 101 Publisher s website Digital Camera Review The
More informationQualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1
Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Introduction Electronic Commerce 2 is accelerating dramatically changes in the business process. Electronic
More informationTerminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet, Mathieu Roche To cite this version: Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet,
More informationSustaining Privacy Protection in Personalized Web Search with Temporal Behavior
Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), jagatheshwaran.n@gmail.com, Velalar College of Engineering and Technology,
More informationAn ontology-based approach for semantic ranking of the web search engines results
An ontology-based approach for semantic ranking of the web search engines results Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University, Country Open review(s): Name
More informationSemantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
More informationONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
More information