Semantic Clustering in Dutch

Size: px
Start display at page:

Download "Semantic Clustering in Dutch"

Transcription

1 Alfa-informatica, Rijksuniversiteit Groningen Computational Linguistics in the Netherlands December 16, 2005

2 Outline 1 2 Clustering Additional remarks 3 Examples 4

3 Research carried out during internship at Centrum voor Nederlandse Taal & Spraak, University of Antwerp Goal: automatically clustering nouns (and adjectives) by applying machine learning techniques Basic approach: inducing semantic classes of nouns according to the adjectives those nouns collocate with (and vice versa) Hypothesis: syntactic context (e.g. adjectival modifiers to nouns) is a sufficient cue for semantic clustering

4 Research carried out during internship at Centrum voor Nederlandse Taal & Spraak, University of Antwerp Goal: automatically clustering nouns (and adjectives) by applying machine learning techniques Basic approach: inducing semantic classes of nouns according to the adjectives those nouns collocate with (and vice versa) Hypothesis: syntactic context (e.g. adjectival modifiers to nouns) is a sufficient cue for semantic clustering

5 Research carried out during internship at Centrum voor Nederlandse Taal & Spraak, University of Antwerp Goal: automatically clustering nouns (and adjectives) by applying machine learning techniques Basic approach: inducing semantic classes of nouns according to the adjectives those nouns collocate with (and vice versa) Hypothesis: syntactic context (e.g. adjectival modifiers to nouns) is a sufficient cue for semantic clustering

6 Research carried out during internship at Centrum voor Nederlandse Taal & Spraak, University of Antwerp Goal: automatically clustering nouns (and adjectives) by applying machine learning techniques Basic approach: inducing semantic classes of nouns according to the adjectives those nouns collocate with (and vice versa) Hypothesis: syntactic context (e.g. adjectival modifiers to nouns) is a sufficient cue for semantic clustering

7 Semantic similarity Clustering Additional remarks Finding semantically similar words by looking at syntactic context (Distributional Hypothesis, Harris) Take a word and its contexts: verse sneup gezouten sneup lekkere sneup zoete sneup taaie sneup A speaker of Dutch can infer meaning from context In the same way, a computer might be able to discover similar words from similar contexts

8 Semantic similarity Clustering Additional remarks Finding semantically similar words by looking at syntactic context (Distributional Hypothesis, Harris) Take a word and its contexts: verse sneup gezouten sneup lekkere sneup zoete sneup taaie sneup A speaker of Dutch can infer meaning from context In the same way, a computer might be able to discover similar words from similar contexts

9 Semantic similarity Clustering Additional remarks Finding semantically similar words by looking at syntactic context (Distributional Hypothesis, Harris) Take a word and its contexts: verse sneup gezouten sneup lekkere sneup zoete sneup taaie sneup A speaker of Dutch can infer meaning from context In the same way, a computer might be able to discover similar words from similar contexts

10 Semantic similarity Clustering Additional remarks Finding semantically similar words by looking at syntactic context (Distributional Hypothesis, Harris) Take a word and its contexts: verse sneup gezouten sneup lekkere sneup zoete sneup taaie sneup A speaker of Dutch can infer meaning from context In the same way, a computer might be able to discover similar words from similar contexts

11 Semantic similarity Clustering Additional remarks Finding semantically similar words by looking at syntactic context (Distributional Hypothesis, Harris) Take a word and its contexts: verse sneup gezouten sneup lekkere sneup FOOD zoete sneup taaie sneup A speaker of Dutch can infer meaning from context In the same way, a computer might be able to discover similar words from similar contexts

12 Semantic similarity Clustering Additional remarks Finding semantically similar words by looking at syntactic context (Distributional Hypothesis, Harris) Take a word and its contexts: verse sneup gezouten sneup lekkere sneup FOOD zoete sneup taaie sneup A speaker of Dutch can infer meaning from context In the same way, a computer might be able to discover similar words from similar contexts

13 Vector space measures 1/2 Clustering Additional remarks How to determine semantic similarity? Create vectors rood lekker snel tweedehands appel wijn auto vrachtwagen

14 Vector space measures 1/2 Clustering Additional remarks How to determine semantic similarity? Create vectors rood lekker snel tweedehands appel wijn auto vrachtwagen

15 Vector space measures 2/2 Clustering Additional remarks Apply cosine similarity measure P x y = n i=1 x i y i Pn P n i=1 x2 i i=1 y i 2 Examples: cos(auto, vrachtwagen) = 4 18 = 0.94 cos( x, y ) = x y cos(appel, vrachtwagen) = 2 15 = 0.51 Problem: ambiguity Compare een steengoed nummer een oneven nummer Different meaning, but they end up in the same cluster

16 Vector space measures 2/2 Clustering Additional remarks Apply cosine similarity measure P x y = n i=1 x i y i Pn P n i=1 x2 i i=1 y i 2 Examples: cos(auto, vrachtwagen) = 4 18 = 0.94 cos( x, y ) = x y cos(appel, vrachtwagen) = 2 15 = 0.51 Problem: ambiguity Compare een steengoed nummer een oneven nummer Different meaning, but they end up in the same cluster

17 Vector space measures 2/2 Clustering Additional remarks Apply cosine similarity measure P x y = n i=1 x i y i Pn P n i=1 x2 i i=1 y i 2 Examples: cos(auto, vrachtwagen) = 4 18 = 0.94 cos( x, y ) = x y cos(appel, vrachtwagen) = 2 15 = 0.51 Problem: ambiguity Compare een steengoed nummer een oneven nummer Different meaning, but they end up in the same cluster

18 Vector space measures 2/2 Clustering Additional remarks Apply cosine similarity measure P x y = n i=1 x i y i Pn P n i=1 x2 i i=1 y i 2 Examples: cos(auto, vrachtwagen) = 4 18 = 0.94 cos( x, y ) = x y cos(appel, vrachtwagen) = 2 15 = 0.51 Problem: ambiguity Compare een steengoed nummer een oneven nummer Different meaning, but they end up in the same cluster

19 Vector space measures 2/2 Clustering Additional remarks Apply cosine similarity measure P x y = n i=1 x i y i Pn P n i=1 x2 i i=1 y i 2 Examples: cos(auto, vrachtwagen) = 4 18 = 0.94 cos( x, y ) = x y cos(appel, vrachtwagen) = 2 15 = 0.51 Problem: ambiguity Compare een steengoed nummer een oneven nummer Different meaning, but they end up in the same cluster

20 Vector space measures 2/2 Clustering Additional remarks Apply cosine similarity measure P x y = n i=1 x i y i Pn P n i=1 x2 i i=1 y i 2 Examples: cos(auto, vrachtwagen) = 4 18 = 0.94 cos( x, y ) = x y cos(appel, vrachtwagen) = 2 15 = 0.51 Problem: ambiguity Compare een steengoed nummer een oneven nummer Different meaning, but they end up in the same cluster

21 Vector space measures 2/2 Clustering Additional remarks Apply cosine similarity measure P x y = n i=1 x i y i Pn P n i=1 x2 i i=1 y i 2 Examples: cos(auto, vrachtwagen) = 4 18 = 0.94 cos( x, y ) = x y cos(appel, vrachtwagen) = 2 15 = 0.51 Problem: ambiguity Compare een steengoed nummer een oneven nummer Different meaning, but they end up in the same cluster

22 Vector space measures 2/2 Clustering Additional remarks Apply cosine similarity measure P x y = n i=1 x i y i Pn P n i=1 x2 i i=1 y i 2 Examples: cos(auto, vrachtwagen) = 4 18 = 0.94 cos( x, y ) = x y cos(appel, vrachtwagen) = 2 15 = 0.51 Problem: ambiguity Compare een steengoed nummer een oneven nummer Different meaning, but they end up in the same cluster

23 Clustering Clustering Additional remarks Clustering = the unsupervised classification of patterns (observations, data items or feature vectors) into groups (clusters) Two kinds of clustering: Partitional clustering: stand-alone clusters, not embedded in a structure Hierarchical clustering: a complete branching structure is assigned, up to the root node

24 Clustering Clustering Additional remarks Clustering = the unsupervised classification of patterns (observations, data items or feature vectors) into groups (clusters) Two kinds of clustering: Partitional clustering: stand-alone clusters, not embedded in a structure Hierarchical clustering: a complete branching structure is assigned, up to the root node

25 Clustering Clustering Additional remarks Clustering = the unsupervised classification of patterns (observations, data items or feature vectors) into groups (clusters) Two kinds of clustering: Partitional clustering: stand-alone clusters, not embedded in a structure Hierarchical clustering: a complete branching structure is assigned, up to the root node

26 Clustering Clustering Additional remarks Clustering = the unsupervised classification of patterns (observations, data items or feature vectors) into groups (clusters) Two kinds of clustering: Partitional clustering: stand-alone clusters, not embedded in a structure Hierarchical clustering: a complete branching structure is assigned, up to the root node

27 Additional remarks Clustering Additional remarks Adjective-noun collocations have been extracted from Twente News Corpus (>300M words) Lemma s have been used to get a better generalization Frequencies have been logarithmically smoothed For the n most frequent nouns, vectors have been created that contain the frequency of the m most frequent adjectives (and vice versa) In most experiments, n=5.000, m=20.000

28 Additional remarks Clustering Additional remarks Adjective-noun collocations have been extracted from Twente News Corpus (>300M words) Lemma s have been used to get a better generalization Frequencies have been logarithmically smoothed For the n most frequent nouns, vectors have been created that contain the frequency of the m most frequent adjectives (and vice versa) In most experiments, n=5.000, m=20.000

29 Additional remarks Clustering Additional remarks Adjective-noun collocations have been extracted from Twente News Corpus (>300M words) Lemma s have been used to get a better generalization Frequencies have been logarithmically smoothed For the n most frequent nouns, vectors have been created that contain the frequency of the m most frequent adjectives (and vice versa) In most experiments, n=5.000, m=20.000

30 Additional remarks Clustering Additional remarks Adjective-noun collocations have been extracted from Twente News Corpus (>300M words) Lemma s have been used to get a better generalization Frequencies have been logarithmically smoothed For the n most frequent nouns, vectors have been created that contain the frequency of the m most frequent adjectives (and vice versa) In most experiments, n=5.000, m=20.000

31 Additional remarks Clustering Additional remarks Adjective-noun collocations have been extracted from Twente News Corpus (>300M words) Lemma s have been used to get a better generalization Frequencies have been logarithmically smoothed For the n most frequent nouns, vectors have been created that contain the frequency of the m most frequent adjectives (and vice versa) In most experiments, n=5.000, m=20.000

32 Examples Examples of noun clustering Examples mei februari september maart december augustus oktober januari juli april november juni aanvaller speler middenvelder verdediger linksbuiten international invaller keeper voetballer doelman spits guerrillabeweging opstandeling rebellenleider guerrillastrijder guerrilla verzetsbeweging rebel bevrijdingsleger minuut millimeter seconde cent ton meter centimeter graad kilo kilometer

33 Examples Examples of noun clustering Examples mei februari september maart december augustus oktober januari juli april november juni aanvaller speler middenvelder verdediger linksbuiten international invaller keeper voetballer doelman spits guerrillabeweging opstandeling rebellenleider guerrillastrijder guerrilla verzetsbeweging rebel bevrijdingsleger minuut millimeter seconde cent ton meter centimeter graad kilo kilometer

34 Examples Examples of noun clustering Examples mei februari september maart december augustus oktober januari juli april november juni aanvaller speler middenvelder verdediger linksbuiten international invaller keeper voetballer doelman spits guerrillabeweging opstandeling rebellenleider guerrillastrijder guerrilla verzetsbeweging rebel bevrijdingsleger minuut millimeter seconde cent ton meter centimeter graad kilo kilometer

35 Examples Examples of noun clustering Examples mei februari september maart december augustus oktober januari juli april november juni aanvaller speler middenvelder verdediger linksbuiten international invaller keeper voetballer doelman spits guerrillabeweging opstandeling rebellenleider guerrillastrijder guerrilla verzetsbeweging rebel bevrijdingsleger minuut millimeter seconde cent ton meter centimeter graad kilo kilometer

36 Examples Examples of adjective clustering Examples bruin groen rood oranje grijs wit geel roze zwart paars blauw ongebreideld mateloos tomeloos grenzeloos ongeremd brutaal cool lelijk dom tof stom

37 Examples Examples of adjective clustering Examples bruin groen rood oranje grijs wit geel roze zwart paars blauw ongebreideld mateloos tomeloos grenzeloos ongeremd brutaal cool lelijk dom tof stom

38 Examples Examples of adjective clustering Examples bruin groen rood oranje grijs wit geel roze zwart paars blauw ongebreideld mateloos tomeloos grenzeloos ongeremd brutaal cool lelijk dom tof stom

39 Examples Example of hierarchical clustering Examples januari september augustus november februari juni december oktober maart april juli mei donderdag maandag zaterdag woensdag dinsdag zondag vrijdag nacht zondagmiddag weekend herfst middag zomeravond handelsdag winter avond zomerdag voorjaar werkdag weer ochtend zomer najaar morgen dag weekeinde

40 Wordnet comparison evaluation 1/3 Examples Automatic evaluation by comparing clusters to Wordnet relations The wordnet-relations used for the evaluation are: Hyponyms Hyperonyms Hyponyms of the hyperonyms (co-hyponyms, synonyms)

41 Wordnet comparison evaluation 1/3 Examples Automatic evaluation by comparing clusters to Wordnet relations The wordnet-relations used for the evaluation are: Hyponyms Hyperonyms Hyponyms of the hyperonyms (co-hyponyms, synonyms)

42 Wordnet comparison evaluation 1/3 Examples Automatic evaluation by comparing clusters to Wordnet relations The wordnet-relations used for the evaluation are: Hyponyms Hyperonyms Hyponyms of the hyperonyms (co-hyponyms, synonyms)

43 Wordnet comparison evaluation 1/3 Examples Automatic evaluation by comparing clusters to Wordnet relations The wordnet-relations used for the evaluation are: Hyponyms Hyperonyms Hyponyms of the hyperonyms (co-hyponyms, synonyms)

44 Wordnet comparison evaluation 1/3 Examples Automatic evaluation by comparing clusters to Wordnet relations The wordnet-relations used for the evaluation are: Hyponyms Hyperonyms Hyponyms of the hyperonyms (co-hyponyms, synonyms)

45 Wordnet comparison evaluation (2/3) Examples For each cluster: Take the word with most relations to other words of cluster in Wordnet (=most central word) Get hyponyms, hyperonyms, co-hyponyms and synonyms in Wordnet Calculate precision: how many words in cluster have equivalent Wordnet-relation (Calculate recall: how many Wordnet-relations have no equivalent in found cluster) General precision (recall): average of precision (recall) of the various clusters

46 Wordnet comparison evaluation (2/3) Examples For each cluster: Take the word with most relations to other words of cluster in Wordnet (=most central word) Get hyponyms, hyperonyms, co-hyponyms and synonyms in Wordnet Calculate precision: how many words in cluster have equivalent Wordnet-relation (Calculate recall: how many Wordnet-relations have no equivalent in found cluster) General precision (recall): average of precision (recall) of the various clusters

47 Wordnet comparison evaluation (2/3) Examples For each cluster: Take the word with most relations to other words of cluster in Wordnet (=most central word) Get hyponyms, hyperonyms, co-hyponyms and synonyms in Wordnet Calculate precision: how many words in cluster have equivalent Wordnet-relation (Calculate recall: how many Wordnet-relations have no equivalent in found cluster) General precision (recall): average of precision (recall) of the various clusters

48 Wordnet comparison evaluation (2/3) Examples For each cluster: Take the word with most relations to other words of cluster in Wordnet (=most central word) Get hyponyms, hyperonyms, co-hyponyms and synonyms in Wordnet Calculate precision: how many words in cluster have equivalent Wordnet-relation (Calculate recall: how many Wordnet-relations have no equivalent in found cluster) General precision (recall): average of precision (recall) of the various clusters

49 Wordnet comparison evaluation (2/3) Examples For each cluster: Take the word with most relations to other words of cluster in Wordnet (=most central word) Get hyponyms, hyperonyms, co-hyponyms and synonyms in Wordnet Calculate precision: how many words in cluster have equivalent Wordnet-relation (Calculate recall: how many Wordnet-relations have no equivalent in found cluster) General precision (recall): average of precision (recall) of the various clusters

50 Wordnet comparison evaluation (2/3) Examples For each cluster: Take the word with most relations to other words of cluster in Wordnet (=most central word) Get hyponyms, hyperonyms, co-hyponyms and synonyms in Wordnet Calculate precision: how many words in cluster have equivalent Wordnet-relation (Calculate recall: how many Wordnet-relations have no equivalent in found cluster) General precision (recall): average of precision (recall) of the various clusters

51 Wordnet comparison evaluation (3/3) Examples Precision Recall Random precision Random recall percentage (%) # clusters

52 Share of each relation Examples Synonyms Hyponyms Hypernyms Co-hyponyms Precision percentage (%) # clusters

53 Wu & Palmer Examples Calculate similarity between two words according to distance in hierarchical wordnet Instead of having a fixed group of words to compare the clusters to, the cluster quality is calculated according to similarity in WordNet.

54 Wu & Palmer Examples Calculate similarity between two words according to distance in hierarchical wordnet Instead of having a fixed group of words to compare the clusters to, the cluster quality is calculated according to similarity in WordNet.

55 Wu & Palmer Examples 70 Similarity Random baseline percentage (%) # clusters

56 Significant similarity percentages when comparing the clusters to WordNet Syntactic context is a good cue for the automatic extraction of semantic classes Ambiguity is a problem difficult to tackle for a computer

57 Significant similarity percentages when comparing the clusters to WordNet Syntactic context is a good cue for the automatic extraction of semantic classes Ambiguity is a problem difficult to tackle for a computer

58 Significant similarity percentages when comparing the clusters to WordNet Syntactic context is a good cue for the automatic extraction of semantic classes Ambiguity is a problem difficult to tackle for a computer

59 Future work Develop algorithms that disambiguate different senses of a word Develop algorithms that extract hierarchical wordnets instead of stand-alone clusters (algorithms that might discover is-a relations) Investigate verbs: Improve noun-clustering by taking into account subject-verb and verb-object relations Cluster verbs with subject-verb and verb-object relations Deal with data sparseness and the curse of dimensionality by applying statistical analysis (LSA, principal component analysis)

60 Future work Develop algorithms that disambiguate different senses of a word Develop algorithms that extract hierarchical wordnets instead of stand-alone clusters (algorithms that might discover is-a relations) Investigate verbs: Improve noun-clustering by taking into account subject-verb and verb-object relations Cluster verbs with subject-verb and verb-object relations Deal with data sparseness and the curse of dimensionality by applying statistical analysis (LSA, principal component analysis)

61 Future work Develop algorithms that disambiguate different senses of a word Develop algorithms that extract hierarchical wordnets instead of stand-alone clusters (algorithms that might discover is-a relations) Investigate verbs: Improve noun-clustering by taking into account subject-verb and verb-object relations Cluster verbs with subject-verb and verb-object relations Deal with data sparseness and the curse of dimensionality by applying statistical analysis (LSA, principal component analysis)

62 Future work Develop algorithms that disambiguate different senses of a word Develop algorithms that extract hierarchical wordnets instead of stand-alone clusters (algorithms that might discover is-a relations) Investigate verbs: Improve noun-clustering by taking into account subject-verb and verb-object relations Cluster verbs with subject-verb and verb-object relations Deal with data sparseness and the curse of dimensionality by applying statistical analysis (LSA, principal component analysis)

63 Future work Develop algorithms that disambiguate different senses of a word Develop algorithms that extract hierarchical wordnets instead of stand-alone clusters (algorithms that might discover is-a relations) Investigate verbs: Improve noun-clustering by taking into account subject-verb and verb-object relations Cluster verbs with subject-verb and verb-object relations Deal with data sparseness and the curse of dimensionality by applying statistical analysis (LSA, principal component analysis)

64 Future work Develop algorithms that disambiguate different senses of a word Develop algorithms that extract hierarchical wordnets instead of stand-alone clusters (algorithms that might discover is-a relations) Investigate verbs: Improve noun-clustering by taking into account subject-verb and verb-object relations Cluster verbs with subject-verb and verb-object relations Deal with data sparseness and the curse of dimensionality by applying statistical analysis (LSA, principal component analysis)

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering

More information

Proceedings of the Sixteenth Computational Linguistics in the Netherlands

Proceedings of the Sixteenth Computational Linguistics in the Netherlands Proceedings of the Sixteenth Computational Linguistics in the Netherlands Edited by: Khalil Sima an, Maarten de Rijke, Remko Scha and Rob van Son Universiteit van Amsterdam / i The Sixteenth Computational

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

Proximity-based distributional similarity

Proximity-based distributional similarity Chapter 5 Proximity-based distributional similarity 5.1 Introduction Words that are distributionally similar are words that share a large number of contexts. We explained in Chapter 3 that there are two

More information

Semantic Analysis of. Tag Similarity Measures in. Collaborative Tagging Systems

Semantic Analysis of. Tag Similarity Measures in. Collaborative Tagging Systems Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems 1 Ciro Cattuto, 2 Dominik Benz, 2 Andreas Hotho, 2 Gerd Stumme 1 Complex Networks Lagrange Laboratory (CNLL), ISI Foundation,

More information

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction

More information

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan hasegawa.takaaki@lab.ntt.co.jp

More information

Clustering of Polysemic Words

Clustering of Polysemic Words Clustering of Polysemic Words Laurent Cicurel 1, Stephan Bloehdorn 2, and Philipp Cimiano 2 1 isoco S.A., ES-28006 Madrid, Spain lcicurel@isoco.com 2 Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe,

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,

More information

An Integrated Approach to Automatic Synonym Detection in Turkish Corpus

An Integrated Approach to Automatic Synonym Detection in Turkish Corpus An Integrated Approach to Automatic Synonym Detection in Turkish Corpus Dr. Tuğba YILDIZ Assist. Prof. Dr. Savaş YILDIRIM Assoc. Prof. Dr. Banu DİRİ İSTANBUL BİLGİ UNIVERSITY YILDIZ TECHNICAL UNIVERSITY

More information

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad

More information

Mining event log patterns in HPC systems

Mining event log patterns in HPC systems Mining event log patterns in HPC systems Ana Gainaru joint work with Franck Cappello and Bill Kramer HPC Resilience Summit 2010: Workshop on Resilience for Exascale HPC HPC Resilience Third Workshop Summit

More information

Tracking change in word meaning

Tracking change in word meaning Overview Intro DisSem Previous Case Visualisation Conclusion References Tracking change in word meaning A dynamic visualization of diachronic distributional semantics Kris Heylen, Thomas Wielfaert & Dirk

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization

Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization Alfio Gliozzo and Carlo Strapparava ITC-Irst via Sommarive, I-38050, Trento, ITALY {gliozzo,strappa}@itc.it

More information

Extraction of Hypernymy Information from Text

Extraction of Hypernymy Information from Text Extraction of Hypernymy Information from Text Erik Tjong Kim Sang, Katja Hofmann and Maarten de Rijke Abstract We present the results of three different studies in extracting hypernymy information from

More information

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Chapter 8. Final Results on Dutch Senseval-2 Test Data Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised

More information

Domein-extensie: Registreerbaar op:

Domein-extensie: Registreerbaar op: dd. 07-07-2014 Domein-extensie: Registreerbaar op:.bike woensdag 5 februari 2014.CLOTHING woensdag 5 februari 2014.GURU woensdag 5 februari 2014.PLUMBING woensdag 5 februari 2014.SINGLES woensdag 5 februari

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D. Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital

More information

How To Identify And Represent Multiword Expressions (Mwe) In A Multiword Expression (Irme)

How To Identify And Represent Multiword Expressions (Mwe) In A Multiword Expression (Irme) The STEVIN IRME Project Jan Odijk STEVIN Midterm Workshop Rotterdam, June 27, 2008 IRME Identification and lexical Representation of Multiword Expressions (MWEs) Participants: Uil-OTS, Utrecht Nicole Grégoire,

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

Identifying free text plagiarism based on semantic similarity

Identifying free text plagiarism based on semantic similarity Identifying free text plagiarism based on semantic similarity George Tsatsaronis Norwegian University of Science and Technology Department of Computer and Information Science Trondheim, Norway gbt@idi.ntnu.no

More information

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process

More information

Semi-Supervised and Unsupervised Machine Learning. Novel Strategies

Semi-Supervised and Unsupervised Machine Learning. Novel Strategies Brochure More information from http://www.researchandmarkets.com/reports/2179190/ Semi-Supervised and Unsupervised Machine Learning. Novel Strategies Description: This book provides a detailed and up to

More information

Natural Language Processing. Part 4: lexical semantics

Natural Language Processing. Part 4: lexical semantics Natural Language Processing Part 4: lexical semantics 2 Lexical semantics A lexicon generally has a highly structured form It stores the meanings and uses of each word It encodes the relations between

More information

Overzicht Golive new gtld s

Overzicht Golive new gtld s Overzicht Golive new gtld s dd. 27-10-2014 Domein extensie: Registreerbaar op:.bike woensdag 5 februari 2014.CLOTHING woensdag 5 februari 2014.GURU woensdag 5 februari 2014.PLUMBING woensdag 5 februari

More information

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets

Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells Computer Science Dep., Universidad Autonoma de Madrid, 28049 Madrid, Spain

More information

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum Statistical Validation and Data Analytics in ediscovery Jesse Kornblum Administrivia Silence your mobile Interactive talk Please ask questions 2 Outline Introduction Big Questions What Makes Things Similar?

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

Big Data Text Mining and Visualization. Anton Heijs

Big Data Text Mining and Visualization. Anton Heijs Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark

More information

Semantic analysis of text and speech

Semantic analysis of text and speech Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic

More information

What Is This, Anyway: Automatic Hypernym Discovery

What Is This, Anyway: Automatic Hypernym Discovery What Is This, Anyway: Automatic Hypernym Discovery Alan Ritter and Stephen Soderland and Oren Etzioni Turing Center Department of Computer Science and Engineering University of Washington Box 352350 Seattle,

More information

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

More information

A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches

A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria 105 A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria

More information

Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance

Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance David Bixler, Dan Moldovan and Abraham Fowler Language Computer Corporation 1701 N. Collins Blvd #2000 Richardson,

More information

Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

More information

Copyright 2014 EMC Corporation. All rights reserved. met VMware en EMC. Erik Zandboer Sr. vspecialist Europe West

Copyright 2014 EMC Corporation. All rights reserved. met VMware en EMC. Erik Zandboer Sr. vspecialist Europe West VIRTUAL BACKUP met VMware en EMC Erik Zandboer Sr. vspecialist Europe West GESCHIEDENIS Copyright 2014 EMC Corporation. All rights reserved. VMware API for DATA PROTECTION CHANGED BLOCK TRACKING CHANGED

More information

Journée Thématique Big Data 13/03/2015

Journée Thématique Big Data 13/03/2015 Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets

More information

Combining statistical data analysis techniques to. extract topical keyword classes from corpora

Combining statistical data analysis techniques to. extract topical keyword classes from corpora Combining statistical data analysis techniques to extract topical keyword classes from corpora Mathias Rossignol Pascale Sébillot Irisa, Campus de Beaulieu, 35042 Rennes Cedex, France (mrossign sebillot)@irisa.fr

More information

A chart generator for the Dutch Alpino grammar

A chart generator for the Dutch Alpino grammar June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

A Frequent Concepts Based Document Clustering Algorithm

A Frequent Concepts Based Document Clustering Algorithm A Frequent Concepts Based Document Clustering Algorithm Rekha Baghel Department of Computer Science & Engineering Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, Punjab, 144011, India.

More information

Text Analytics. A business guide

Text Analytics. A business guide Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application

More information

A typology of ontology-based semantic measures

A typology of ontology-based semantic measures A typology of ontology-based semantic measures Emmanuel Blanchard, Mounira Harzallah, Henri Briand, and Pascale Kuntz Laboratoire d Informatique de Nantes Atlantique Site École polytechnique de l université

More information

Semantic Class Induction and Coreference Resolution

Semantic Class Induction and Coreference Resolution Semantic Class Induction and Coreference Resolution Vincent Ng Human Language Technology Research Institute University of Texas at Dallas Richardson, TX 75083-0688 vince@hlt.utdallas.edu Abstract This

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD Eneko Agirre and Oier Lopez de Lacalle and Aitor Soroa Informatika Fakultatea, University of the Basque Country 20018,

More information

Data Mining Individual Assignment report

Data Mining Individual Assignment report Björn Þór Jónsson bjrr@itu.dk Data Mining Individual Assignment report This report outlines the implementation and results gained from the Data Mining methods of preprocessing, supervised learning, frequent

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

Intro to Linguistics Semantics

Intro to Linguistics Semantics Intro to Linguistics Semantics Jarmila Panevová & Jirka Hana January 5, 2011 Overview of topics What is Semantics The Meaning of Words The Meaning of Sentences Other things about semantics What to remember

More information

WORD SIMILARITY AND ESTIMATION FROM SPARSE DATA

WORD SIMILARITY AND ESTIMATION FROM SPARSE DATA CONTEXTUAL WORD SIMILARITY AND ESTIMATION FROM SPARSE DATA Ido Dagan AT T Bell Laboratories 600 Mountain Avenue Murray Hill, NJ 07974 dagan@res earch, art. tom Shaul Marcus Computer Science Department

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

Defining Antonymy: A Corpus-based Study of Opposites Found by Lexico-syntactic Patterns Abstract

Defining Antonymy: A Corpus-based Study of Opposites Found by Lexico-syntactic Patterns Abstract Defining Antonymy: A Corpus-based Study of Opposites Found by Lexico-syntactic Patterns Abstract Using small sets of adjectival seed antonym pairs, we automatically find patterns where these pairs co-occur

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Search Engine Based Intelligent Help Desk System: iassist

Search Engine Based Intelligent Help Desk System: iassist Search Engine Based Intelligent Help Desk System: iassist Sahil K. Shah, Prof. Sheetal A. Takale Information Technology Department VPCOE, Baramati, Maharashtra, India sahilshahwnr@gmail.com, sheetaltakale@gmail.com

More information

Email Task Management: An Iterative Relational Learning Approach

Email Task Management: An Iterative Relational Learning Approach Email Task Management: An Iterative Relational Learning Approach Rinat Khoussainov and Nicholas Kushmerick School of Computer Science and Informatics University College Dublin, Ireland {rinat, nick}@ucd.ie

More information

How To Cluster On A Search Engine

How To Cluster On A Search Engine Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

More information

for Word Sense Discrimination

for Word Sense Discrimination An Extended nmf Algorithm for Word Sense Discrimination University of Groningen benelearn 2008 May 20, 2008 Spa Semantic similarity Semantic Similarity Context Ambiguity Most work on semantic similarity

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

Secure semantic based search over cloud

Secure semantic based search over cloud Volume: 2, Issue: 5, 162-167 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 Sarulatha.M PG Scholar, Dept of CSE Sri Krishna College of Technology Coimbatore,

More information

Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries

Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries Patanakul Sathapornrungkij Department of Computer Science Faculty of Science, Mahidol University Rama6 Road, Ratchathewi

More information

Comparing constructicons: A cluster analysis of the causative constructions with doen in Netherlandic and Belgian Dutch.

Comparing constructicons: A cluster analysis of the causative constructions with doen in Netherlandic and Belgian Dutch. Comparing constructicons: A cluster analysis of the causative constructions with doen in Netherlandic and Belgian Dutch Natalia Levshina Outline 1. Dutch causative Cx with doen 2. Data and method 3. Quantitative

More information

Big Ideas in Mathematics

Big Ideas in Mathematics Big Ideas in Mathematics which are important to all mathematics learning. (Adapted from the NCTM Curriculum Focal Points, 2006) The Mathematics Big Ideas are organized using the PA Mathematics Standards

More information

Mining Text Data: An Introduction

Mining Text Data: An Introduction Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo

More information

The 2006 IEEE / WIC / ACM International Conference on Web Intelligence Hong Kong, China

The 2006 IEEE / WIC / ACM International Conference on Web Intelligence Hong Kong, China WISE: Hierarchical Soft Clustering of Web Page Search based on Web Content Mining Techniques Ricardo Campos 1, 2 Gaël Dias 2 Célia Nunes 2 1 Instituto Politécnico de Tomar Tomar, Portugal 2 Centre of Human

More information

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering Map-like Wikipedia Visualization by Pang Cheong Iao Master of Science in Software Engineering 2011 Faculty of Science and Technology University of Macau Map-like Wikipedia Visualization by Pang Cheong

More information

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Get the most value from your surveys with text analysis

Get the most value from your surveys with text analysis PASW Text Analytics for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

CDI Den Goubergh - Roosendaal (NED) 3th - 7th December 2014 Provisional time schedule (version 3rd December 2014)

CDI Den Goubergh - Roosendaal (NED) 3th - 7th December 2014 Provisional time schedule (version 3rd December 2014) Wednesday 3th December 2014 - Woensdag 3 december 2014 14h00-15h00 Horse Inspection CDI Small Tour 15h00-16h30 Training main arena CDI Small Tour 15h30 Draw CDI Small Tour Competition 3 Prix St. Georges

More information

SENTIMENT ANALYSIS: A STUDY ON PRODUCT FEATURES

SENTIMENT ANALYSIS: A STUDY ON PRODUCT FEATURES University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Dissertations and Theses from the College of Business Administration Business Administration, College of 4-1-2012 SENTIMENT

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

Yifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University

Yifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University Yifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University Presented by Qiang Yang, Hong Kong Univ. of Science and Technology 1 In a Search Engine Company Advertisers

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Mapping Dependency Relationships into Semantic Frame Relationships

Mapping Dependency Relationships into Semantic Frame Relationships Mapping Dependency Relationships into Semantic Frame Relationships N. H. N. D. de Silva 1, C. S. N. J. Fernando 1, M. K. D. T. Maldeniya 1, D. N. C. Wijeratne 1, A. S. Perera 1, B. Goertzel 2 1 Department

More information

Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System

Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System International Journal Of Computational Engineering Research (ijceronline.com) Vol. 3 Issue. 3 Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System

More information

Sheeba J.I1, Vivekanandan K2

Sheeba J.I1, Vivekanandan K2 IMPROVED UNSUPERVISED FRAMEWORK FOR SOLVING SYNONYM, HOMONYM, HYPONYMY & POLYSEMY PROBLEMS FROM EXTRACTED KEYWORDS AND IDENTIFY TOPICS IN MEETING TRANSCRIPTS Sheeba J.I1, Vivekanandan K2 1 Assistant Professor,sheeba@pec.edu

More information

Projektgruppe. Categorization of text documents via classification

Projektgruppe. Categorization of text documents via classification Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 5, Sep-Oct 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 5, Sep-Oct 2015 RESEARCH ARTICLE Multi Document Utility Presentation Using Sentiment Analysis Mayur S. Dhote [1], Prof. S. S. Sonawane [2] Department of Computer Science and Engineering PICT, Savitribai Phule Pune University

More information

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU http://ixa.si.ehu.es

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU http://ixa.si.ehu.es KYOTO () Intelligent Content and Semantics Knowledge Yielding Ontologies for Transition-Based Organization http://www.kyoto-project.eu/ Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU

More information

ONTOLOGY BASED FEEDBACK GENERATION IN DESIGN- ORIENTED E-LEARNING SYSTEMS

ONTOLOGY BASED FEEDBACK GENERATION IN DESIGN- ORIENTED E-LEARNING SYSTEMS ONTOLOGY BASED FEEDBACK GENERATION IN DESIGN- ORIENTED E-LEARNING SYSTEMS Harrie Passier and Johan Jeuring Faculty of Computer Science, Open University of the Netherlands Valkenburgerweg 177, 6419 AT Heerlen,

More information

Semantic Structure Matching for Assessing Web-Service Similarity

Semantic Structure Matching for Assessing Web-Service Similarity Semantic Structure Matching for Assessing Web- Service Similarity Yiqiao Wang and Eleni Stroulia Computer Science Department, University of Alberta, Edmonton, AB, T6G 2E8, Canada {yiqiao,stroulia}@cs.ualberta.ca

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

Personalized News Filtering and Summarization on the Web

Personalized News Filtering and Summarization on the Web Personalized News Filtering and Summarization on the Web Xindong Wu,2 Fei Xie,3 Gongqing Wu Wei Ding 4 College of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China

More information

Information Retrieval, Information Extraction and Social Media Analytics

Information Retrieval, Information Extraction and Social Media Analytics Anwendersoftware a Information Retrieval, Information Extraction and Social Media Analytics Based on chapter 10 of the Advanced Information Management lecture Laura Kassner Universität Stuttgart Winter

More information

Analysis of Social Media Streams

Analysis of Social Media Streams Fakultätsname 24 Fachrichtung 24 Institutsname 24, Professur 24 Analysis of Social Media Streams Florian Weidner Dresden, 21.01.2014 Outline 1.Introduction 2.Social Media Streams Clustering Summarization

More information

An ontology-based approach for semantic ranking of the web search engines results

An ontology-based approach for semantic ranking of the web search engines results An ontology-based approach for semantic ranking of the web search engines results Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University, Country Open review(s): Name

More information

Universal. Event. Product. Computer. 1 warehouse.

Universal. Event. Product. Computer. 1 warehouse. Dynamic multi-dimensional models for text warehouses Maria Zamr Bleyberg, Karthik Ganesh Computing and Information Sciences Department Kansas State University, Manhattan, KS, 66506 Abstract In this paper,

More information

Towards Regulatory Compliance: Extracting Rights and Obligation to Align Requirements with Regulations

Towards Regulatory Compliance: Extracting Rights and Obligation to Align Requirements with Regulations Towards Regulatory Compliance: Extracting Rights and Obligation to Align Requirements with Regulations Travis D. Breaux Matthew W. Vail Annie I. Antón North Carolina State University RE 06, Minneapolis,

More information

Exploring Topic Models for Word Sense Discrimination

Exploring Topic Models for Word Sense Discrimination Exploring s for Word Sense Discrimination University of Groningen clin December 7, 2007 Nijmegen s for WSD Semantic similarity Semantic Similarity Context Ambiguity Most work on semantic similarity relies

More information