Turkish synonym identification from multiple resources: monolingual corpus, mono/bilingual on-line dictionaries and WordNet
|
|
|
- Baldwin Black
- 9 years ago
- Views:
Transcription
1 Turkish synonym identification from multiple resources: monolingual corpus, mono/bilingual on-line dictionaries and WordNet Tuğba YILDIZ 1,*, Banu DİRİ, Savaş YILDIRIM 1 1 Department of Computer Engineering, Faculty of Engineering and Natural Science, İstanbul Bilgi University, İstanbul, Turkey Department of Computer Engineering, Faculty of Engineering, Yıldız Technical University, İstanbul, Turkey *Correspondence: [email protected] Abstract: In this study, a model is proposed to determine synonymy by incorporating several resources. The model extracts the features from monolingual on-line dictionaries, a bilingual on-line dictionary, WordNet and a monolingual Turkish corpus. Once it has built a candidate list, it determines the synonymy for a given word by means of those features. All these resources and the approaches are evaluated. Taking all features into account and applying machine learning algorithms, the model shows good performance of F-Measure with 1.%. The study contributes to the literature by integrating several resources and attempting the first corpus-driven synonym detection system for Turkish. Key words: Synonym, dependency relations, corpus-based statistics 1. Introduction As one of the most well-known semantic relations, synonymy has been subject to numerous studies. Synonym is defined as Expressions with the same meaning are synonymous [1]. The identification of synonym relations automatically helps to address various Natural Language Processing (NLP) applications, such as information retrieval and question answering [,], automatic thesaurus construction [,], automatic text summarization [], language generation [], and lexical entailment acquisition []. 1
2 A variety of methods have been proposed to automatically or semi-automatically detect synonyms from a text source, dictionaries, Wikipedia and the search engines. Among them, distributional similarity is the most popular method. It is based on the distributional hypothesis [] which assumes that semantically similar words share similar contexts. There have been many studies [,,] which use distributional similarity in the automatic extraction of semantically related words from large corpora. Distributional approaches are applied to monolingual corpora [,1], monolingual parallel corpora [1,1], bilingual corpora [1,1], multilingual parallel corpora [1], monolingual dictionaries [1,1] and bilingual dictionaries [1]. Some of the studies [0-] rely on multiple choice synonym questions such as the Scholastic Aptitude Test (SAT), the Test of English as a Foreign Language (TOEFL), the English as a Second Language (ESL). Although distributional similarity is the most popular method for extracting synonymy, this methodology itself can be ambiguous and insufficient. This is because it can cover other semantically related words and might not distinguish between synonyms and other semantic relations. Hence, recent studies have used different strategies such as integrating independent approaches such as distributional similarity and pattern-based approaches, utilizing external features or the ensemble method which uses multiple learning algorithms to obtain more accuracy. One study [] integrated the pattern-based and distributional similarity methods to acquire lexical entailment. Another study [] investigated the impact of contextual information selection for automatic synonym acquisition by extracting kinds of contextual information; dependency, sentence cooccurrence, and proximity from different corpora. The authors proposed that while dependency relations and proximity perform relatively well by themselves, combination of two or more kinds of contextual information gives more stable result. In another study
3 [], a synonym extraction method was proposed using supervised learning based on distributional and/or pattern-based features. Yet another study [] used vector-based models to detect semantically related nouns in Dutch and analyzed the impact of linguistic properties of the nouns. The authors compared the results from a dependencybased model with context features with first and second order bag-of-words model. They examined the effect of the nouns frequency, semantic specificity and semantic class. In one of the recent studies [], the authors introduced synonym discovery as a graded relevance ranking problem in which synonym candidates of a target term are ranked by their quality. The method used linear regression with contextual features and one string similarity feature. The method was compared with different methods [1,]. The proposed method outperformed the existing ones. For Turkish, a WordNet for Turkish was proposed as a part of the BalkaNet project, which is a multilingual lexical database comprising of individual WordNets for the Balkan languages []. Dictionary entries were parsed and patterns such as the following were used; hw: w1, w wn and hw: (wi)*,w, where, hw is a head word and wi is a single word. In these cases, the authors harvested synonyms from the dictionary definition that consists of a list of synonyms and that are separated by comma. K and.k sets of potential synonyms were extracted. Recent studies for Turkish [0,1] are based on dictionary definitions from Türk Dil Kurumu (TDK) Büyük Türkçe Sözlük (The Turkish Language Association Comprehensive Turkish Dictionary) and VikiSözlük (Wiktionary- Wiki) to find specific semantic relations. As a first step, the researchers defined some phrasal patterns that were observed in dictionary definitions to represent specific semantic relations. Then, reliable patterns were applied to the dictionary to find the semantic relations. In study [0], the researchers extracted.k relations with.% accuracy
4 rate on average. The same process and the same patterns were applied to TDK and also to Wiki [1]. Only 0 out of K synonymy relations were taken into consideration with % success rate. As another attempt to detect synonyms in Turkish [], a corpus-driven distributional similarity model was proposed based on only features; i.e. dependency relations and semantic features obtained by syntactic patterns and Lexico-Syntactic Patterns (LSPs), respectively. The linear regression algorithm was applied to all acquired features and it showed promising results with 0.% success rate. In the current study, the overall objective was to determine synonym nouns from a monolingual Turkish corpus. For each target word, the model automatically proposes the closest K candidate words that are built by means of dependency relations which refer to certain kinds of grammatical relations between words and phrases such as subjects and objects of verbs, and modifications of nouns. The similarities between the target and each candidate were computed using many different features ranging from dictionary definitions to corpus-based statistics. The class labels of the pairs were obtained from monolingual on-line synonym dictionaries. Finally, linear regression was successfully applied on the data including all these extracted features. The proposed model and the extracted features are presented in Figure. This study also incorporated semantic relations such as hypernymy, meronymy, etc. gathered from a corpus by LSPs which are string matching patterns based on tokens in the text and syntactic structure and used dependency relations for the problem of synonymy detection. The main difference and motivation of this framework is that it is the first major attempt for Turkish synonym identification based on a corpus-driven distributional similarity approach using multiple resources such as monolingual on-line dictionaries, a bilingual on-line dictionary and the WordNet.
5 Methodology.1. Resources The methodology employed here is to identify the synonym pairs with the aid of a monolingual Turkish corpus of 00M words. A finite-state implementation of a morphological parser and an averaged perceptron-based morphological disambiguator were also used []. The parser is based on a two-level morphology with accuracy of %. Corpus contains four sub-corpora; three of them are from major Turkish news portals and another corpus is a general sampling of web pages in the Turkish Language. In addition, monolingual on-line dictionaries (TDK and Wiki), a bilingual on-line dictionary (TURENG * ) and WordNet were utilized to build features... Features In this study, the target/candidate words are represented by a set of features which are compatible with machine learning algorithms. Features are extracted from different resources: a monolingual Turkish corpus, monolingual on-line dictionaries, a bilingual on-line dictionary and WordNet. For class labels, monolingual on-line synonym dictionaries are used to tag a given synonym pair. All the process and features are shown in Figure...1. Features from a monolingual Turkish corpus Our corpus-based feature extraction methodology relies on the assumption that synonym pairs mostly show similar dependency and semantic characteristics in a monolingual corpus. In terms of semantic relations, if words share the same meronym/holonym and the hyponym/hypernym relations, they are more likely to be synonymous. With a similar * Tureng:
6 approach, they can have the same particular list of governing verbs and a similar adjective modification profile. 1 different features are extracted from the corpus: co-occurrence, semantic relations based on LSPs and dependency relations based on syntactic patterns. Co-occurrence: The first feature is the co-occurrence of word pairs within a broad context where window size is from left and right. Synonym pairs are not likely to cooccur together in same discourse. Thus, a simple co-occurrence measure might not be used for synonymy but for non-synonymy. We experimentally selected the dice metric to measure the co-occurring feature. Meronym/Holonym: The meronymy/holonymy relation is used to detect a synonymy relation. For meronym relation extraction, some LSPs are analyzed into forms for the Turkish corpus; General Patterns, Dictionary-based Patterns and Bootstrapped Patterns as proposed in a previous work []. All the details and examples of pattern specifications can be found in []. After applying these LSPs, some elimination assumptions and measurement metrics such as χ or Pmi are utilized to acquire the meronym/holonym relation. A big matrix in which rows depict whole candidates, columns depict part candidates, is derived. Each cell in matrix represents the co-occurrence of the corresponding whole and part. Cosine similarity is used to measure the meronym/holonym profile of given words by applying it to the rows/columns in the matrix. Hyponym/Hypernym: The same procedure used for meronymy acquisition is applicable for the hyponym/hypernym relation. Likewise, a big matrix in which rows depict hyponym candidates, columns depict hypernym candidates, is derived by the hyponymy and hypernymy patterns that were explained in details [].
7 Dependency Relations: The dependency relation profile of the nouns can be utilized for the synonymy detection problem. For example, verb/direct object and modifying adjective/noun relations are easily captured by syntactic patterns. In the phrases, "driving car" and "driving auto", car and auto share the same governing verb "drive". Other verbs such as design, produce, manufacture, might also take place in the same relation with these two nouns and they can be easily captured by the patterns. Such cases are simply and technically recognized by regular expressions. The more two given nouns are governed by the same verb and modified by same adjective, the more likely it is that they are synonyms. The dependency relation profiles of all nouns are built in advance from the corpus, then the similarity between any given two nouns is computed over these profiles. The other technical details of dependency relation features are also shown in Table 1. Candidate Selection for Synonymy: We randomly selected 0 target words from the corpus to test the model. To harvest the candidates for a given target word, the system only incorporates dependency relations because of its production capacity and simplicity. We looked at the dependency relation features which are strong indicators or predictors of semantic relations. For this purpose, the first K, which was chosen to be after a number of tests and the nearest neighboring words to a given 0 target words were selected in terms of dependency relation features. Four semantic relations were considered for each target word and its nearest words: Hypernym/Hyponym, Meronym/Holonym, Synonym and Co-Hyponym. The objective is to find which dependency relation features are the most informative and have a tendency to indicate a synonym relation. The model uses them to produce candidate synonyms. For example, the G dependency feature proposed nearest words of car as vehicle, automobile, jeep,
8 engine, etc. These words are tagged as Hypernym, Synonym, Co-Hyponym and Meronym, respectively. All the nearest words are detected and manually evaluated for each target word. According to our results, features G1 and G are the most productive for the Meronym/Holonym relations with a performance of.%. Feature G has the highest tendency for a Hypernym/Hyponym relation with 1.% success rate. While features G and G have tendencies for a Co-Hyponym with a performance of 1.% and.%, features G and G are the most successful dependency relations for Synonym detection with.0% and 0.% success rate. Thus, for each target word, only the candidates proposed by features G and G were taken into consideration. The output of this process gives us up to 1 candidates for 0 target words.... Features from monolingual on-line dictionaries Dictionaries are the most popular source for acquiring a synonymy relation. In our study, dictionary definitions of target/candidate words are incorporated to compute the similarity of words. All the definitions of target/candidate words are accessed through the TDK and Wiki. If the target word and its potential synonym mutually appear in their definitions in TDK and Wiki, they are labeled as true, otherwise false, as a Boolean value.... Features from WordNet This phase is divided into steps: the translation of each pair from Turkish to English using a bilingual on-line dictionary and the features extracted from WordNet::Similarity modules for each target/candidate word pairs. Bilingual on-line dictionary: Using bilingual dictionaries is another method for extracting semantic relations, especially for extracting the synonymy relations. We also utilized the bilingual on-line dictionary, Tureng. Each target and candidate synonym was translated from Turkish to English to exploit the modules in WordNet resources.
9 WordNet Corporation: WordNet::Similarity [] is a freely available software package which contains a variety of semantic similarity and relatedness measures based on WordNet. It supports the measures of Hirst-St.Onge (HSO), Jiang-Conrath (JCN), Leacock-Chodorow (LCH), Lin (LIN), Banerjee-Pedersen (LESK), Patwardhan- Pedersen (PATH), Resnik (RES), Wu-Palmer (WUP), Vector and Vector Pair. While some similarity measures are based on path lengths between concepts, some are based on information content or vector modules. Translated target/candidate pairs are given to all modules. Their similarities are computed and normalized to be between 0-1. The sum of all normalized scores is also kept as an additional potential feature, namely TOTAL.... Synonym Classification The target class is labeled as SYN/NONSYN by using monolingual on-line synonym dictionaries (TDK and Wiki). Synonymy of all pairs is mutually checked and tagged. While most of the attributes contain real values between 0-1, a few of them contain Boolean data. Linear regression is an excellent and simple approach for such a classification.. Results and Discussions The similarities between the target and each candidate are computed by different features. The evaluation of attributes can be simply done by looking at their information gain (IG) scores in Table. All attributes but co-occurrence provide a positive indicator to detect synonymy. At first glance, the WordNet and dictionary-based similarities seem to show good performance. The most successful algorithms of WordNet are LCH and WUP. They are based on depth and shortest-path approaches. All attributes regarding corpus-based similarities are provided at the end of the list in Table. Among corpusbased attributes, the most important dependency relations are G and G. Among the
10 semantic relations, the meronym and holonym relations have better performance than the others. The co-occurrence measure shows a very slight indicator capacity. Error analysis is based on some observations and simple statistics. We applied univariate analysis, the simplest form of statistical analysis, where each individual feature is evaluated within the entire training set. This analysis gives a chance to explore the characteristics of the variables. For example, as a result of this analysis, we found that some semantic relations such as hypernymy suffer from sparse data and lack of corpus evidence for words. Taking all the attributes as features set of training data, the success rate is.%, and the F-Measure for synonym is 1.%, where False Positive Rate is % and False Negative Rate is 1.%, as shown in Table. When running the model only with WordNet similarity scores, linear regression algorithm performs an F-measure score of.% as given in Table. Although WordNet has a good performance, it is times more expensive than other dictionary-based approaches. This is because the words need to be translated into English, afterwards their similarities are measured through WordNet packages. The second weakest point is the selection of the translation of the first sense of the given word. The other translations and senses are ignored. WordNet needs to be used due to the incompleteness of Turkish WordNet []. That lack makes the model more expensive. Likewise, the success of the dictionary-based approach is % and the F-measure is.% as shown in Table. It is the most naïve approach; the definitions of words are automatically retrieved from the dictionary and their mutual presence are utilized. However, the approach can be considered costly due its nature, unless we dump the whole content of the dictionaries. As expected, the corpus-based model has a weak performance with an F-measure of 1.%. Table covers all the performance measurements for the corpus-based model. The main reason of the failure is that some word pairs cannot be
11 represented in a corpus-based framework. For example, the production capacity of the hypernym /hyponymy relation is limited due to the fact that not every pair is matched by LSPs designed for hypernymy. The same argument holds true for other semantic and dependency relations. In terms of time-cost, the proper features are corpus-based ones. Dependency relations are easily extracted from the corpus by syntactic patterns as shown in Table 1. Among the semantic relations, meronym/holonym and hyponym/hypernym have a high time complexity. The candidate selection phase exploits only dependency relation, since that these relations are easily captured by the system. Other corpus-based features are discarded for several reasons. The most important one is the production capacity of those relations. The hypernymy or meronymy relations do not guarantee returning a corresponding result for any given word. We applied a variety of dependency relations. The most effective relations found are G and G. Surprisingly, both relations are the alternations of adjective-modifier patterns. Here, we can conclude that the patterns of adjective modification are very useful to disclose important characteristics of words. There are no corpus-based synonym detection studies but dictionary-based ones for Turkish language to compare our results. The study [0] achieved.% accuracy rate on average with.k relations. In [1], only 0 out of K synonym relations are taken into consideration with an % success rate. In our previous study [], the F-Measure was 0.%. In the current study, we used variety of features which are obtained from multiple sources and the success rate is.%. F-Measure for synonymy is increased to 1.%. In future work, we are planning to extract antonym relation as a filter to improve the performance of synonym identification. References [1] Lyons J. Introduction to Theoretical Linguistics. University Press, Cambridge.
12 [] Kiyota Y, Kurohashi S, Kido F. Dialog navigator: A question answering system based on large text knowledge base. In: COLING 00 1th International Conference on Computational Linguistics; August-1 September 00; Taipei, Taiwan: ACL. pp. 1-. [] Riezler S, Liu Y, Vasserman A. Translating queries into snippets for improved query expansion. In: ACL 00 The nd International Conference on Computational Linguistics; 1- August 00; Manchester, UK: ACL pp. -. [] Inkpen DZ. A statistical model for near-synonym choice. ACM Transactions on Speech and Language Processing 00; : 1-1. [] Lin D. Automatic retrieval and clustering of similar words. In: COLING-ACL' th Annual Meeting of the Association for Computational Linguistics; -1 August 1; Université de Montréal, Montréal, Quebec, Canada: ACL. pp. -. [] Barzilay R, Elhadad M. Using lexical chains for text summarization. In: ACL/EACL Workshop on Intelligent Scalable Text Summarization; July 1; Madrid, Spain: ACL. pp. -1. [] Inkpen DZ, Hirst G. Near-synonym choice in natural language generation. In: Nicolov N, Bontcheva K, Angelova G, Mitkov R, editors. Recent Advances in Natural Language Processing. Amsterdam/Netherlands: John Benjamins, 00. pp. -1. [] Mirkin S, Dagan I, Geffet M. Integrating pattern-based and distributional similarity methods for lexical entailment acquisition. In: COLING ACL'0 The 1 st International Conference on Computational Linguistics and th Annual Meeting of the Association for Computational Linguistics; 1-1 July 00; Sydney, Australia: ACL pp. -. [] Harris Z. Distributional structure. Word 1; (): 1-1. [] Crouch CJ, Yang B. Experiments in automatic statistical thesaurus construction. In: ACM/SIGIR 1 The 1 th Annual International ACM/SIGIR Conference on Research 1
13 and Development in Information Retrieval; 1- June 1; Copenhagen, Denmark: ACM. pp. -. [] Grefenstette G. Explorations in Automatic Thesaurus Discovery. Norwell, MA, USA: Kluwer Academic Press, 1. [1] Van der Plas L, Bouma G. Syntactic contexts for finding semantically related words. In: Computational Linguistics in the Netherlands; 1 December 00; LOT, Utrecht. pp [1] Barzilay R, McKeown K. Extracting paraphrases from a parallel corpus. In: ACL 001 The th Annual Meeting and th Conference of the European Chapter, Proceedings of the Conference; - July 001; Toulouse, France: ACL. pp. 0-. [1] İbrahim A, Katz B, Lin J. Extracting structural paraphrases from aligned monolingual corpora. In: ACL 00 The nd International Workshop on Paraphrasing: Paraphrase Acquisition and Applications; July 00; Sapporo, Japan: ACL. pp. -. [1] Shimohata M, Sumita E. Automatic paraphrasing based on parallel corpus for normalization. In: The rd International Conference on Language Resources and Evaluation; 0-1 May 00; Las Palmas, Canary Island, Spain, pp. -. [1] Van der Plas L, Tiedemann J. Finding synonyms using automatic word alignment and measures of distributional similarity. In: ACL-COLING'0 The 1 st International Conference on Computational Linguistics and th Annual Meeting of the Association for Computational Linguistics; 1-1 July 00; Sydney, Australia: ACL. pp. -. [1] Blondel VD, Sennelart P. Automatic extraction of synonyms in a dictionary. In: SDM 00 The nd SIAM International Conference on Data Mining; 1 April 00; Arlington, VA. pp
14 [1] Wang T, Hirst G. Exploring patterns in dictionary definitions for synonym extraction. Natural Language Engineering 01; 1(): 1-. [1] Lin D, Zhao S, Qin L, Zhou M. Identifying synonyms among distributionally similar words. In: The 1 th International Joint Conference on Artificial Intelligence (IJCAI-0); -1 August 00; Acapulco, Mexico. pp [0] Freitag D, Blume M, Byrnes J, Chow E, Kapadia S, Rohwer R, Wang Z. New experiments in distributional representations of synonymy. In: ACL 00 The th Conference on Computational Natural Language Learning; -0 June 00; Ann Arbor, Michigan, USA: ACL. pp. -. [1] Terra E, Clarke CLA. Frequency estimates for statistical word similarity measures. In: HLT-NAACL 00, Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics; May-1 June 00; Edmonton, Canada: ACL. pp [] Turney PD. Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In: The 1 th European Conference on Machine Learning (ECML'01) and the th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'01); - September 001; Freiburg, Germany. pp [] Turney PD, Littman ML, Bigham J, Shnayder V. Combining independent modules in lexical multiple-choice problems. In: Nicolov N, Bontcheva K, Angelova G, Mitkov R, editors. Current Issues in Linguistic Theory. Amsterdam/Netherlands: John Benjamins, 00. pp [] Landauer TK, Dumanis ST. A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological review 1; : -0. 1
15 [] Hagiwara M, Ogawa Y, Toyama K. Selection of effective contextual information for automatic synonym acquisition. In: ACL 00 The 1 st International Conference on Computational Linguistics and th Annual Meeting of the ACL; 1-1 July 00; Sydney, Australia: ACL. pp. -0. [] Hagiwara M. A supervised learning approach to automatic synonym identification based on distributional features. In: ACL 00 The th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop; 1-0 June 00; Columbus, Ohio, USA: ACL. pp. 1-. [] Heylen K, Peirsman Y, Geeraerts D, Speelman D. Modelling word similarity: an evaluation of automatic synonymy extraction algorithms. In: The th International Language Resources and Evaluation; -0 May 00; Marrakech, Morocco. pp. -. [] Yates A, Goharian N, Frieder O. Graded relevance ranking for synonym discovery. In: WWW'1 Companion, The nd International Conference on World Wide Web Companion; 1-1 May 01; Rio de Janeiro, Brazil. pp. 1-. [] Bilgin O, Çetinoğlu Ö, Oflazer K. Building a WordNet for Turkish. Romanian J of Information, Science and Technology 00; : 1-1. [0] Şerbetçi A, Orhan Z, Pehlivan İ. Extraction of semantic word relations in Turkish from dictionary definitions. In: ACL 0 Workshop on Relational Models of Semantics; June 0; Portland, Oregon, USA: ACL. pp. -1. [1] Yazıcı E, Amasyalı MF. Automatic extraction of semantic relationships using Turkish dictionary definitions. EMO Bilimsel Dergi 0; 1:
16 [] Yıldız T, Yıldırım S, Diri B. An integrated approach to automatic synonym detection in Turkish corpus. In: Przepiórkowski A, Ogrodniczuk M, editors. LNCS. Germany: Springer-Verlag Berlin Heidelberg, 01. pp -1. [] Sak H, Güngör T, Saraçlar M. Turkish language resources: Morphological parser, morphological disambiguator and web corpus. In: Nordström B, Ranta A, editors. LNCS 1. Germany: Springer-Verlag Berlin Heidelberg, 00. pp. 1-. [] Yıldız T, Yıldırım S, Diri B. Acquisition of Turkish meronym based on classification of patterns. J Pattern Analysis and Applications 01; doi:.0/s [] Yıldırım S, Yıldız T. Automatic extraction of Turkish hypernym/hyponym pairs from large corpus. In: ACL/COLING 01 th International Conference on Computational Linguistics; -1 December 01; Bombay, India: ACL. pp [] Pedersen T, Patwardhan S, Michelizzi J. Wordnet::Similarity-Measuring the relatedness of concepts. In: HLT-NAACL 00 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics; - May 00; Boston, Massachusetts, USA: ACL pp
17 0 Target words Patterns from Dependency Relations Extraction of Synonym Candidates (0*1 word pairs) Features from Monolingual Turkish Corpus Features from WordNet Features from Monolingual on-line Dictionary Definition CLASS from TDK and Wiki Semantic Features () Dependency Relation Features () Co-occurrence Features (1) Turkish-English bilingual on-line dictionary TDK WIKI TDK_WIKI Features() CLASS 1 1 WordNet Modules Features from HSO/LIN/LESK/LCH/JCN/ PATH/WUP/RESNIK/VECTOR/ VECTOR_PAIR/TOTAL () Attributes + CLASS Machine Learning Algorithm Figure Process of the model for synonym detection 1
18 F Dependency Relations #ofp Examples in English and Turkish G1 direct object of verb 1 I drive a car/araba sürüyorum G subject of verb waiting car/bekleyen araba G direct object/subject of verb - G modified by adj+(with/without) car with gasoline/benzinli araba G modified by infinitive 1 swimming pool/yüzme havuzu G modified by noun 1 toy car/oyuncak araba G modified by adj 1 red car/kırmızı araba G modified by location the car at park/parktaki araba 1 Table 1. Dependency Features (F: features, adj: adjectives, #ofp: number of patterns)
19 IG Features Types IG Features Types 0. TOTAL WordNet 0.0 G Corpus 0. LCH WordNet 0.0 G Corpus 0. WUP WordNet 0.0 Meronym Corpus 0. LIN WordNet 0.0 Holonym Corpus 0. PATH WordNet 0.0 G Corpus 0. VECTOR WordNet 0.0 G1 Corpus 0. HSO WordNet 0.0 G Corpus 0. TDK/WIKI Dictionary 0.00 Co-occur Corpus 0. RES WordNet 0 G Corpus 0. TDK Dictionary 0 G Corpus 0. VECTOR_PAIR WordNet 0 G Corpus 0. LES WordNet 0 Hypernym Corpus 0.1 JCN WordNet 0 Hyponym Corpus 0. WIKI Dictionary 1 Table. Information Gain (IG) of each features 1
20 Sources Precision Recall F-Measure WordNet NONSYN... SYN Weighted Avg Dict. Def. NONSYN... SYN... Weighted Avg...0. Corpus NONSYN... SYN Weighted Avg.... All NONSYN... SYN Weighted Avg Table. Precision, Recall and F-Measure of only features from different sources 0
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
An Integrated Approach to Automatic Synonym Detection in Turkish Corpus
An Integrated Approach to Automatic Synonym Detection in Turkish Corpus Dr. Tuğba YILDIZ Assist. Prof. Dr. Savaş YILDIRIM Assoc. Prof. Dr. Banu DİRİ İSTANBUL BİLGİ UNIVERSITY YILDIZ TECHNICAL UNIVERSITY
Topics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 [email protected] Steffen STAAB Institute AIFB,
Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries
Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries Patanakul Sathapornrungkij Department of Computer Science Faculty of Science, Mahidol University Rama6 Road, Ratchathewi
Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization
Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization Alfio Gliozzo and Carlo Strapparava ITC-Irst via Sommarive, I-38050, Trento, ITALY {gliozzo,strappa}@itc.it
Clustering of Polysemic Words
Clustering of Polysemic Words Laurent Cicurel 1, Stephan Bloehdorn 2, and Philipp Cimiano 2 1 isoco S.A., ES-28006 Madrid, Spain [email protected] 2 Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe,
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS Alok Ranjan Pal 1, 3, Anirban Kundu 2, 3, Abhay Singh 1, Raj Shekhar 1, Kunal Sinha 1 1 College of Engineering and Management,
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
Identifying free text plagiarism based on semantic similarity
Identifying free text plagiarism based on semantic similarity George Tsatsaronis Norwegian University of Science and Technology Department of Computer and Information Science Trondheim, Norway [email protected]
3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work
Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan [email protected]
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
Improving Question Retrieval in Community Question Answering Using World Knowledge
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Improving Question Retrieval in Community Question Answering Using World Knowledge Guangyou Zhou, Yang Liu, Fang
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Overview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
Word Sense Disambiguation as an Integer Linear Programming Problem
Word Sense Disambiguation as an Integer Linear Programming Problem Vicky Panagiotopoulou 1, Iraklis Varlamis 2, Ion Androutsopoulos 1, and George Tsatsaronis 3 1 Department of Informatics, Athens University
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews Minqing Hu and Bing Liu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure Fuminori Kimura Faculty of Culture and Information Science, Doshisha University 1 3 Miyakodani Tatara, Kyoutanabe-shi,
Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
Sentiment analysis for news articles
Prashant Raina Sentiment analysis for news articles Wide range of applications in business and public policy Especially relevant given the popularity of online media Previous work Machine learning based
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
What Is This, Anyway: Automatic Hypernym Discovery
What Is This, Anyway: Automatic Hypernym Discovery Alan Ritter and Stephen Soderland and Oren Etzioni Turing Center Department of Computer Science and Engineering University of Washington Box 352350 Seattle,
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
A Survey on Product Aspect Ranking Techniques
A Survey on Product Aspect Ranking Techniques Ancy. J. S, Nisha. J.R P.G. Scholar, Dept. of C.S.E., Marian Engineering College, Kerala University, Trivandrum, India. Asst. Professor, Dept. of C.S.E., Marian
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection
Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection Jian Qu, Nguyen Le Minh, Akira Shimazu School of Information Science, JAIST Ishikawa, Japan 923-1292
Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs
Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe
A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations
A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations Peter D. Turney National Research Council of Canada Institute for Information Technology M50 Montreal Road Ottawa, Ontario, Canada
EFFICIENTLY PROVIDE SENTIMENT ANALYSIS DATA SETS USING EXPRESSIONS SUPPORT METHOD
EFFICIENTLY PROVIDE SENTIMENT ANALYSIS DATA SETS USING EXPRESSIONS SUPPORT METHOD 1 Josephine Nancy.C, 2 K Raja. 1 PG scholar,department of Computer Science, Tagore Institute of Engineering and Technology,
The Role of Sentence Structure in Recognizing Textual Entailment
Blake,C. (In Press) The Role of Sentence Structure in Recognizing Textual Entailment. ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic. The Role of Sentence Structure
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
Effective Mentor Suggestion System for Collaborative Learning
Effective Mentor Suggestion System for Collaborative Learning Advait Raut 1 U pasana G 2 Ramakrishna Bairi 3 Ganesh Ramakrishnan 2 (1) IBM, Bangalore, India, 560045 (2) IITB, Mumbai, India, 400076 (3)
Web opinion mining: How to extract opinions from blogs?
Web opinion mining: How to extract opinions from blogs? Ali Harb [email protected] Mathieu Roche LIRMM CNRS 5506 UM II, 161 Rue Ada F-34392 Montpellier, France [email protected] Gerard Dray [email protected]
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology Makoto Nakamura, Yasuhiro Ogawa, Katsuhiko Toyama Japan Legal Information Institute, Graduate
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then
Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets
Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells Computer Science Dep., Universidad Autonoma de Madrid, 28049 Madrid, Spain
Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu
Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Turker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
Hybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information
Cross-Lingual Concern Analysis from Multilingual Weblog Articles
Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/
Intro to Linguistics Semantics
Intro to Linguistics Semantics Jarmila Panevová & Jirka Hana January 5, 2011 Overview of topics What is Semantics The Meaning of Words The Meaning of Sentences Other things about semantics What to remember
Parsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh [email protected] Ewan Klein University of Edinburgh [email protected] Abstract Software
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
End-to-End Sentiment Analysis of Twitter Data
End-to-End Sentiment Analysis of Twitter Data Apoor v Agarwal 1 Jasneet Singh Sabharwal 2 (1) Columbia University, NY, U.S.A. (2) Guru Gobind Singh Indraprastha University, New Delhi, India [email protected],
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati
Classification of Natural Language Interfaces to Databases based on the Architectures
Volume 1, No. 11, ISSN 2278-1080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/ Classification of Natural
Named Entity Recognition Experiments on Turkish Texts
Named Entity Recognition Experiments on Dilek Küçük 1 and Adnan Yazıcı 2 1 TÜBİTAK - Uzay Institute, Ankara - Turkey [email protected] 2 Dept. of Computer Engineering, METU, Ankara - Turkey
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files
Journal of Universal Computer Science, vol. 21, no. 4 (2015), 604-635 submitted: 22/11/12, accepted: 26/3/15, appeared: 1/4/15 J.UCS From Terminology Extraction to Terminology Validation: An Approach Adapted
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
An Introduction to Random Indexing
MAGNUS SAHLGREN SICS, Swedish Institute of Computer Science Box 1263, SE-164 29 Kista, Sweden [email protected] Introduction Word space models enjoy considerable attention in current research on semantic indexing.
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France [email protected] Cyril Grouin LIMSI-CNRS rue John von Neumann 91400
Clustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
Flattening Enterprise Knowledge
Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
Chapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
