Tracking change in word meaning

Size: px
Start display at page:

Download "Tracking change in word meaning"

Transcription

1 Overview Intro DisSem Previous Case Visualisation Conclusion References Tracking change in word meaning A dynamic visualization of diachronic distributional semantics Kris Heylen, Thomas Wielfaert & Dirk Speelman KULeuven Quantitative Lexicology and Variational Linguistics

2 Purpose of the talk A lexicological study of how a set of near-synonymous adjectives have changed meaning through time, using a statistical, distributional approach for modelling lexical semantics in large corpora, using a dynamic visualization to assist in interpreting these statistical patterns, with the ultimate goal of creating an exploritative tool for lexical semantic analysis.

3 Overview 1. Background: Lexical variation 2. Distributional semantics 3. Previous Visualisations 4. Case Study: positive evaluative adjectives 5. Dynamic Visualisation of semantic change 6. Conclusion

4 Overview 1. Background: Lexical variation 2. Distributional semantics 3. Previous Visualisations 4. Case Study: positive evaluative adjectives 5. Dynamic Visualisation of semantic change 6. Conclusion

5 Background: Lexical Variation LEXICOLOGY:

6 Background: Lexical Variation LEXICOLOGY

7 Background: Lexical Variation LEXICOLOGY: SEMASIOLOGICAL PERSPECTIVE

8 Background: Lexical Variation LEXICOLOGY: ONOMASIOLOGICAL PERSPECTIVE

9 Background: Lexical Variation LEXICOLOGY: FINER GRAINED ANALYSIS OF SEMANTIC FEA- TURES

10 Background: Lexical Variation LEXICOLOGY: FINER GRAINED ANALYSIS OF SEMANTIC FEA- TURES

11 Background: Lexical Variation LEXICOLOGY: LECTAL VARIATION

12 Background: Lexical Variation LEXICOLOGY: CHRONO-LECTAL (DIACHRONIC) VARIATION

13 Background: Lexical Variation LEXICOLOGY: QUANTITATIVE CORPUS ANALYSIS

14 Overview 1. Background: Lexical variation 2. Distributional semantics 3. Previous Visualisations 4. Case Study: positive evaluative adjectives 5. Dynamic Visualisation of semantic change 6. Conclusion

15 Distributional models of lexical semantics Linguistic origin: Distributional Hypothesis You shall know a word by the company it keeps (Firth) a word s meaning can be induced from its co-occurring words long tradition of collocation studies in corpus linguistics Semantic Vector Spaces in Computational Linguistics standard technique in statistical NLP for the large-scale automatic modeling of (lexical) semantics aka Vector Spaces Models, Distributional Semantic Models, Word Spaces,... (cf Turney & Pantel 2010 for overview) generalised, large scale collocation analysis mainly used for automatic thesaurus extraction: words occurring in same contexts have similar meaning

16 Semantic Vector Spaces as models of word meaning Practical Which two words out of a set of three have the same meaning? ongeval, koffie, accident Occurrences in context from a corpus Op de Brusselse ring deed zich een ongeval met een vrachtwagen voor s Morgens drinkt hij een kop koffie met melk en suiker 2 bestuurders raakten gekwetst bij een ongeval met een vrachtwagen in de avondspits veroorzaakte een accident een kilometerslange file als vieruurtje serveert het hotel koffie en gebak voor de gasten de auto was betrokken in een accident met een dodelijke afloop Met winterbanden is het risico op een ongeval bij vriesweer veel kleiner

17 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval

18 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval vader raakte gekwetst bij een ongeval met een vrachtwagen op de

19 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval voor zeven uur veroorzaakte een ongeval een kilometerslange file richting Antwerpen

20 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval vrachtwagens waren betrokken bij het ongeval, dat meer dan tien slachtoffers

21 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval

22 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval

23 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval

24 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval

25 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval

26 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval accident

27 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval accident

28 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval accident

29 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval accident

30 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval accident

31 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval accident koffie

32 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval accident koffie

33 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval accident koffie

34 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval accident koffie

35 auto slachtoffer vrachtwagen file gekwetst suiker melk kop ongeval accident koffie Which words are similar?

36 Distributional models of lexical semantics word by word similarity matrix ongeval accident koffie ongeval accident koffie

37 Distributional models of lexical semantics Geometrical metaphor: Semantic distance frequencies weighted by collocational strength (pmi) vectors projected in context feature space: Word Space cosine of angle between vectors as semantic similarity measure

38 Distributional semantics: lexical variation Bilectal Word Spaces Extend Word Space from one corpus to two corpora representative for different lects/varieties 2 context vectors for each word, one for each variety most words will have themselves as most similar word... BUT words with diverging semantic structure will not

39 Overview 1. Background: Lexical variation 2. Distributional semantics 3. Previous Visualisations 4. Case Study: positive evaluative adjectives 5. Dynamic Visualisation of semantic change 6. Conclusion

40 Sagi, Kaufmann & Clark 2009

41 Rohrdantz, Hautli, Mayer etal. 2011

42 Hilpert 2011

43 Overview 1. Background: Lexical variation 2. Distributional semantics 3. Previous Visualisations 4. Case Study: positive evaluative adjectives 5. Dynamic Visualisation of semantic change 6. Conclusion

44 Case study: : positive evaluative adjectives

45 Case study: positive evaluative adjectives brilliant cool delightful excellent fabulous fantastic good great impressive lovely magnificent marvelous perfect splendid superb terrific wonderful Table: positive evaluative adjectives

46 Case study Corpus Corpus of Historical American English (COHA, Davies 2012) Period from 1810 to 2009, 400M words, POS-tagged. Concept: Positive evaluative adjectives 1 vector per adjective, per decade ( ) modelled by window of 5 words left & right 5000 most frequent context words (minus top 100) PMI-weighting, cosine similarity

47 Overview 1. Background: Lexical variation 2. Distributional semantics 3. Previous Visualisations 4. Case Study: positive evaluative adjectives 5. Dynamic Visualisation of semantic change 6. Conclusion

48 HighD to 2D Visualisation word-decade by context matrix is high dimensional first aim is NOT to find latent structure (as with LSA/LDA) but general picture of distributional semantic structuring faithful rendering of similarity matrix in 2D: Kruskal s non-metric Multidimensional Scaling interpret dimensions with context-labeled clusters Dynamic and interactive chart Motion Charts from Google Chart Tools panchronic view to interpret semantic space diachronic view to see meaning changes.

49 panchronic view for interpretation of semantic space Clusters with most typical contextwords of adjectives: cluster 2 (centre, light blue): positive evaluated things (colors, spectacle, performance) centre of the plot, expressing the core meaning of the adjectives cluster 8 (red, lower left): loud and frightening things (explosion, thunder, crash) periphery of the plot, expressing non-related meaning

50 diachronic motion chart to see meaning change Trajectory of terrific from 1860 to 2000, moving from the peripheral cluster of frightening things to the central cluster of positive evaluated things, indicative of its meaning change

51 Overview 1. Background: Lexical variation 2. Distributional semantics 3. Previous Visualisations 4. Case Study: positive evaluative adjectives 5. Dynamic Visualisation of semantic change 6. Conclusion

52 Summary Conclusion and future work Lexicological perspective: Tool for exploring lexical semantics and variation in large amounts of corpus data Dynamic visualisation of evolving semantic structuring for a set of near-synonymous adjectives Desiderata integrate with latent dimension finding techniques (cf. Rohrdantz et al.) for easier interpretation of semantic space show individual occurrences of lexemes (tokens) to explore semasiological structure of adjectives in each decade show interpretative beacons in the dynamic plot other types of context features (e.g. dependency relations)

53 For more information:

54 References I Davies, Mark Corpus of Historical American English (COHA): ): 400+ million words, Heylen, Kris, Speelman, Dirk, & Geeraerts, Dirk Looking at word meaning. An interactive visualization of Semantic Vector Spaces for Dutch synsets. Pages of: Proceedings of the EACL-2012 joint workshop of LINGVIS & UNCLH: Visualization of Language Patters and Uncovering Language History from Multilingual Resources. Hilpert, Martin Dynamic visualizations of language change: Motion charts on the basis of bivariate and multivariate data from diachronic corpora. International Journal of Corpus Linguistics, 16(4),

55 References II Rohrdantz, Christian, Hautli, Annette, Mayer, Thomas, Butt, Miriam, Keim, Daniel A, & Plank, Frans Towards Tracking Semantic Change by Visual Analytics. Pages of: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland, Oregon, USA: Association for Computational Linguistics. Sagi, Eyal, Kaufmann, Stefan, & Clark, Brady Semantic Density Analysis: Comparing Word Meaning across Time and Phonetic Space. Pages of: Proceedings of the Workshop on Geometrical Models of Natural Language Semantics. Athens, Greece: Association for Computational Linguistics. Turney, Peter D., & Pantel, Patrick From Frequency to Meaning: Vector Space Models of Semantics. Journal of Artificial Intelligence Research, 37(1),

Distributional Semantic Modelling in Cognitive Sociolinguistics: QLVL probes Semantic Space

Distributional Semantic Modelling in Cognitive Sociolinguistics: QLVL probes Semantic Space Overview CogSoLx Onomas Semas Conclusion Distributional Semantic Modelling in Cognitive Sociolinguistics: QLVL probes Semantic Space Kris Heylen, Dirk Geeraerts & Dirk Speelman KU Leuven Quantitative Lexicology

More information

Lexical convergence in the Dutch lexicon

Lexical convergence in the Dutch lexicon Overview Introduction Dutch Method Results Lexical convergence in the Dutch lexicon Jocelyne Daems Kris Heylen Dirk Geeraerts University of Leuven RU Quantitative Lexicology and Variational Linguistics

More information

Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions. Natalia Levshina

Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions. Natalia Levshina Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions Natalia Levshina RU Quantitative Lexicology and Variational Linguistics Faculteit Letteren Subfaculteit Taalkunde K.U.Leuven

More information

Comparing constructicons: A cluster analysis of the causative constructions with doen in Netherlandic and Belgian Dutch.

Comparing constructicons: A cluster analysis of the causative constructions with doen in Netherlandic and Belgian Dutch. Comparing constructicons: A cluster analysis of the causative constructions with doen in Netherlandic and Belgian Dutch Natalia Levshina Outline 1. Dutch causative Cx with doen 2. Data and method 3. Quantitative

More information

Semantic Clustering in Dutch

Semantic Clustering in Dutch t.van.de.cruys@rug.nl Alfa-informatica, Rijksuniversiteit Groningen Computational Linguistics in the Netherlands December 16, 2005 Outline 1 2 Clustering Additional remarks 3 Examples 4 Research carried

More information

Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions

Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions FACULTEIT LETTEREN SUBFACULTEIT TAALKUNDE KATHOLIEKE UNIVERSITEIT LEUVEN Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions Proefschrift ingediend tot het behalen van de

More information

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that

More information

University of Marburg, RC Deutscher Sprachatlas University of Leuven, RU Quantitative Lexicology and Variational Linguistics

University of Marburg, RC Deutscher Sprachatlas University of Leuven, RU Quantitative Lexicology and Variational Linguistics Construction Grammar meets Semantic Vector Spaces: A radically data-driven approach to semantic classification of slot fillers Natalia Levshina Kris Heylen University of Marburg, RC Deutscher Sprachatlas

More information

Linguistic Research with CLARIN. Jan Odijk MA Rotation Utrecht, 2015-11-10

Linguistic Research with CLARIN. Jan Odijk MA Rotation Utrecht, 2015-11-10 Linguistic Research with CLARIN Jan Odijk MA Rotation Utrecht, 2015-11-10 1 Overview Introduction Search in Corpora and Lexicons Search in PoS-tagged Corpus Search for grammatical relations Search for

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

$ 4XDQWLWDWLYH $SSURDFK WR WKH &RQWUDVW DQG 6WDELOLW\ RI 6RXQGV

$ 4XDQWLWDWLYH $SSURDFK WR WKH &RQWUDVW DQG 6WDELOLW\ RI 6RXQGV Thomas Mayer 1, Christian Rohrdantz 2, Frans Plank 1, Miriam Butt 1, Daniel A. Keim 2 Department of Linguistics 1, Department of Computer Science 2 University of Konstanz thomas.mayer@uni-konstanz.de,

More information

Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions

Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions Supervisors: Natalia Levshina Dirk Geeraerts Dirk Speelman University of Leuven RU Quantitative Lexicology and Variational

More information

How To Identify And Represent Multiword Expressions (Mwe) In A Multiword Expression (Irme)

How To Identify And Represent Multiword Expressions (Mwe) In A Multiword Expression (Irme) The STEVIN IRME Project Jan Odijk STEVIN Midterm Workshop Rotterdam, June 27, 2008 IRME Identification and lexical Representation of Multiword Expressions (MWEs) Participants: Uil-OTS, Utrecht Nicole Grégoire,

More information

The Value of Visualization 2

The Value of Visualization 2 The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21 Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate

More information

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data

More information

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015 Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD

More information

Applying quantitative methods to dialect Dutch verb clusters

Applying quantitative methods to dialect Dutch verb clusters Applying quantitative methods to dialect Dutch verb clusters Jeroen van Craenenbroeck KU Leuven/CRISSP jeroen.vancraenenbroeck@kuleuven.be 1 Introduction Verb cluster ordering is a well-known area of microparametric

More information

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,

More information

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum Statistical Validation and Data Analytics in ediscovery Jesse Kornblum Administrivia Silence your mobile Interactive talk Please ask questions 2 Outline Introduction Big Questions What Makes Things Similar?

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan hasegawa.takaaki@lab.ntt.co.jp

More information

Get the most value from your surveys with text analysis

Get the most value from your surveys with text analysis PASW Text Analytics for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That

More information

Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations

Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations Christian W. Frey 2012 Monitoring of Complex Industrial Processes based on Self-Organizing Maps and

More information

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015 W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Big data, the future of statistics

Big data, the future of statistics Big data, the future of statistics Experiences from Statistics Netherlands Dr. Piet J.H. Daas Senior-Methodologist, Big Data research coordinator and Marco Puts, Martijn Tennekes, Alex Priem, Edwin de

More information

Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

More information

TechWatch. Technology and Market Observation powered by SMILA

TechWatch. Technology and Market Observation powered by SMILA TechWatch Technology and Market Observation powered by SMILA PD Dr. Günter Neumann DFKI, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Juni 2011 Goal - Observation of Innovations and Trends»

More information

How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free)

How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free) Statgraphics Centurion XVII (currently in beta test) is a major upgrade to Statpoint's flagship data analysis and visualization product. It contains 32 new statistical procedures and significant upgrades

More information

Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization

Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization Alfio Gliozzo and Carlo Strapparava ITC-Irst via Sommarive, I-38050, Trento, ITALY {gliozzo,strappa}@itc.it

More information

Introduction. 1.1 Kinds and generalizations

Introduction. 1.1 Kinds and generalizations Chapter 1 Introduction 1.1 Kinds and generalizations Over the past decades, the study of genericity has occupied a central place in natural language semantics. The joint work of the Generic Group 1, which

More information

Cross-lingual Synonymy Overlap

Cross-lingual Synonymy Overlap Cross-lingual Synonymy Overlap Anca Dinu 1, Liviu P. Dinu 2, Ana Sabina Uban 2 1 Faculty of Foreign Languages and Literatures, University of Bucharest 2 Faculty of Mathematics and Computer Science, University

More information

An Introduction to Random Indexing

An Introduction to Random Indexing MAGNUS SAHLGREN SICS, Swedish Institute of Computer Science Box 1263, SE-164 29 Kista, Sweden mange@sics.se Introduction Word space models enjoy considerable attention in current research on semantic indexing.

More information

Varieties of lexical variation

Varieties of lexical variation Dirk Geeraerts University of Leuven Varieties of lexical Abstract This paper presents the theoretical backgr ound of a large-scale lexicological research project on lexical that was carried out at the

More information

Imputing Values to Missing Data

Imputing Values to Missing Data Imputing Values to Missing Data In federated data, between 30%-70% of the data points will have at least one missing attribute - data wastage if we ignore all records with a missing value Remaining data

More information

Opinion Mining Issues and Agreement Identification in Forum Texts

Opinion Mining Issues and Agreement Identification in Forum Texts Opinion Mining Issues and Agreement Identification in Forum Texts Anna Stavrianou Jean-Hugues Chauchat Université de Lyon Laboratoire ERIC - Université Lumière Lyon 2 5 avenue Pierre Mendès-France 69676

More information

Computer-aided Document Indexing System

Computer-aided Document Indexing System Journal of Computing and Information Technology - CIT 13, 2005, 4, 299-305 299 Computer-aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić and Jan Šnajder,, An enormous

More information

Gallito 2.0: a Natural Language Processing tool to support Research on Discourse

Gallito 2.0: a Natural Language Processing tool to support Research on Discourse Presented in the Twenty-third Annual Meeting of the Society for Text and Discourse, Valencia from 16 to 18, July 2013 Gallito 2.0: a Natural Language Processing tool to support Research on Discourse Guillermo

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Crossing Corpora. Modelling Semantic Similarity across Languages and Lects.

Crossing Corpora. Modelling Semantic Similarity across Languages and Lects. Distributional Models Bilectal Bilingual Crossing Corpora. Modelling Semantic Similarity across Languages and Lects. Yves Peirsman Supervisors: Dirk Geeraerts & Dirk Speelman Quantitative Lexicology and

More information

Overview of SEO Recon Features and Benefits

Overview of SEO Recon Features and Benefits Michael Marshall, CEO Overview of SEO Recon Features and Benefits Data Collection (partial sample):... 2 Multivariate analysis: (Which Factors are Important?):... 3 Multivariate Analysis: (Which Competitors

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

INF4820, Algorithms for AI and NLP: More Common Lisp Vector Spaces

INF4820, Algorithms for AI and NLP: More Common Lisp Vector Spaces INF4820, Algorithms for AI and NLP: More Common Lisp Vector Spaces Erik Velldal University of Oslo Sept. 4, 2012 Topics for today 2 More Common Lisp More data types: Arrays, sequences, hash tables, and

More information

Exploratory Data Analysis with MATLAB

Exploratory Data Analysis with MATLAB Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton

More information

What the Hell is Big Data?

What the Hell is Big Data? Presentation What the Hell is Big Data? Bernard Marr www.ap-institute.com 1 Background 2 Navigating to Success 3 Navigation Today 4 The Global Data Revolution 5 The Intelligent Company Model Strategic

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Introduction VOLKER GAST. 1. Central questions addressed in this issue

Introduction VOLKER GAST. 1. Central questions addressed in this issue ZAA 54.2 (2006): 113-120 VOLKER GAST Introduction 1. Central questions addressed in this issue Corpus linguistics has undoubtedly become one of the most important and most widely used empirical methods

More information

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08 9. Text & Documents Visualizing and Searching Documents Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08 Slide 1 / 37 Outline Characteristics of text data Detecting patterns SeeSoft

More information

Utilizing spatial information systems for non-spatial-data analysis

Utilizing spatial information systems for non-spatial-data analysis Jointly published by Akadémiai Kiadó, Budapest Scientometrics, and Kluwer Academic Publishers, Dordrecht Vol. 51, No. 3 (2001) 563 571 Utilizing spatial information systems for non-spatial-data analysis

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).

More information

Big Data Visualisations. Professor Ian Nabney i.t.nabney@aston.ac.uk NCRG

Big Data Visualisations. Professor Ian Nabney i.t.nabney@aston.ac.uk NCRG Big Data Visualisations Professor Ian Nabney i.t.nabney@aston.ac.uk NCRG Overview Why visualise data? How we can visualise data Big Data Institute What is Visualisation? Goal of visualisation is to present

More information

Text Mining - Scope and Applications

Text Mining - Scope and Applications Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss

More information

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD Eneko Agirre and Oier Lopez de Lacalle and Aitor Soroa Informatika Fakultatea, University of the Basque Country 20018,

More information

Master of Artificial Intelligence

Master of Artificial Intelligence Faculty of Engineering Faculty of Science Master of Artificial Intelligence Options: Engineering and Computer Science (ECS) Speech and Language Technology (SLT) Cognitive Science (CS) K.U.Leuven Masters.

More information

On the use of antonyms and synonyms from a domain perspective

On the use of antonyms and synonyms from a domain perspective On the use of antonyms and synonyms from a domain perspective Debela Tesfaye IT PhD Program Addis Ababa University Addis Ababa, Ethiopia dabookoo@gmail.com Carita Paradis Centre for Languages and Literature

More information

MINISTRY OF DEFENCE LANGUAGES EXAMINATIONS BOARD

MINISTRY OF DEFENCE LANGUAGES EXAMINATIONS BOARD Name: Candidate Registration Number: Date of Exam: MINISTRY OF DEFENCE LANGUAGES EXAMINATIONS BOARD SURVIVAL SLP1 DUTCH PAPER A Reading Task 1 Task 2 Time allowed Translation Comprehension 15 minutes Candidates

More information

Acquiring grammatical gender in northern and southern Dutch. Jan Klom, Gunther De Vogelaer

Acquiring grammatical gender in northern and southern Dutch. Jan Klom, Gunther De Vogelaer Acquiring grammatical gender in northern and southern Acquring grammatical gender in southern and northern 2 Research questions How does variation relate to change? (transmission in Labov 2007 variation

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

How To Create A Data Science System

How To Create A Data Science System Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Richard Breakiron Senior Director, Cyber Solutions Rbreakiron@vion.com Office: 571-353-6127 / Cell: 803-443-8002

More information

HOTEL INFORMATION 2009

HOTEL INFORMATION 2009 HOTEL INFORMATION 2009 IBFD has negotiated special corporate rates with the Mövenpick Hotel Amsterdam City Centre where we feel sure you will enjoy a comfortable stay. The Mövenpick Hotel is a modern,

More information

IBM SPSS Text Analytics for Surveys

IBM SPSS Text Analytics for Surveys IBM SPSS Text Analytics for Surveys IBM SPSS Text Analytics for Surveys Easily make your survey text responses usable in quantitative analysis Highlights With IBM SPSS Text Analytics for Surveys you can:

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

OLAP Visualization Operator for Complex Data

OLAP Visualization Operator for Complex Data OLAP Visualization Operator for Complex Data Sabine Loudcher and Omar Boussaid ERIC laboratory, University of Lyon (University Lyon 2) 5 avenue Pierre Mendes-France, 69676 Bron Cedex, France Tel.: +33-4-78772320,

More information

Visibility optimization for data visualization: A Survey of Issues and Techniques

Visibility optimization for data visualization: A Survey of Issues and Techniques Visibility optimization for data visualization: A Survey of Issues and Techniques Ch Harika, Dr.Supreethi K.P Student, M.Tech, Assistant Professor College of Engineering, Jawaharlal Nehru Technological

More information

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Folksonomies versus Automatic Keyword Extraction: An Empirical Study Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk

More information

Computer Aided Document Indexing System

Computer Aided Document Indexing System Computer Aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić, Jan Šnajder Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 0000 Zagreb, Croatia

More information

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad

More information

Time series clustering and the analysis of film style

Time series clustering and the analysis of film style Time series clustering and the analysis of film style Nick Redfern Introduction Time series clustering provides a simple solution to the problem of searching a database containing time series data such

More information

Exploratory Data Analysis with R. @matthewrenze #codemash

Exploratory Data Analysis with R. @matthewrenze #codemash Exploratory Data Analysis with R @matthewrenze #codemash Motivation The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that

More information

Conceptual Change Digital Humanities Case Studies (7-8 December 2015)

Conceptual Change Digital Humanities Case Studies (7-8 December 2015) Conceptual Change Digital Humanities Case Studies (7-8 December 2015) ABSTRACTS The History of Concepts as Complex Systems Clifford Siskin (New York University) and Peter de Bolla (University of Cambridge)

More information

Cover Page. "Assessing the Agreement of Cognitive Space with Information Space" A Research Seed Grant Proposal to the UNC-CH Cognitive Science Program

Cover Page. Assessing the Agreement of Cognitive Space with Information Space A Research Seed Grant Proposal to the UNC-CH Cognitive Science Program Cover Page "Assessing the Agreement of Cognitive Space with Information Space" A Research Seed Grant Proposal to the UNC-CH Cognitive Science Program Submitted by: Dr. Gregory B. Newby Assistant Professor

More information

Visual Discovery in Multivariate Binary Data

Visual Discovery in Multivariate Binary Data Visual Discovery in Multivariate Binary Data Boris Kovalerchuk a*, Florian Delizy a, Logan Riggs a, Evgenii Vityaev b a Dept. of Computer Science, Central Washington University, Ellensburg, WA, 9896-7520,

More information

Information Visualization Multivariate Data Visualization Krešimir Matković

Information Visualization Multivariate Data Visualization Krešimir Matković Information Visualization Multivariate Data Visualization Krešimir Matković Vienna University of Technology, VRVis Research Center, Vienna Multivariable >3D Data Tables have so many variables that orthogonal

More information

Clever Search: A WordNet Based Wrapper for Internet Search Engines

Clever Search: A WordNet Based Wrapper for Internet Search Engines Clever Search: A WordNet Based Wrapper for Internet Search Engines Peter M. Kruse, André Naujoks, Dietmar Rösner, Manuela Kunze Otto-von-Guericke-Universität Magdeburg, Institut für Wissens- und Sprachverarbeitung,

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

Cours de Visualisation d'information InfoVis Lecture. Multivariate Data Sets

Cours de Visualisation d'information InfoVis Lecture. Multivariate Data Sets Cours de Visualisation d'information InfoVis Lecture Multivariate Data Sets Frédéric Vernier Maître de conférence / Lecturer Univ. Paris Sud Inspired from CS 7450 - John Stasko CS 5764 - Chris North Data

More information

Lexical Competition: Round in English and Dutch

Lexical Competition: Round in English and Dutch Lexical Competition: Round in English and Dutch Joost Zwarts * Abstract This paper studies the semantic division of labour between three Dutch words, om, rond and rondom, all three corresponding to the

More information

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

More information

Data visualization in political and social sciences

Data visualization in political and social sciences Data visualization in political and social sciences Andrei Zinovyev Institut Curie, Paris, France zinovyev@gmail.com The basic objective of data visualization is to provide an efficient graphical display

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information

2014/02/13 Sphinx Lunch

2014/02/13 Sphinx Lunch 2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue

More information

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable

More information

Exploratory Spatial Data Analysis

Exploratory Spatial Data Analysis Exploratory Spatial Data Analysis Part II Dynamically Linked Views 1 Contents Introduction: why to use non-cartographic data displays Display linking by object highlighting Dynamic Query Object classification

More information

Monitoring chemical processes for early fault detection using multivariate data analysis methods

Monitoring chemical processes for early fault detection using multivariate data analysis methods Bring data to life Monitoring chemical processes for early fault detection using multivariate data analysis methods by Dr Frank Westad, Chief Scientific Officer, CAMO Software Makers of CAMO 02 Monitoring

More information

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions What is Visualization? Information Visualization An Overview Jonathan I. Maletic, Ph.D. Computer Science Kent State University Visualize/Visualization: To form a mental image or vision of [some

More information

Towards a Visually Enhanced Medical Search Engine

Towards a Visually Enhanced Medical Search Engine Towards a Visually Enhanced Medical Search Engine Lavish Lalwani 1,2, Guido Zuccon 1, Mohamed Sharaf 2, Anthony Nguyen 1 1 The Australian e-health Research Centre, Brisbane, Queensland, Australia; 2 The

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Visual Analytics and Data Mining

Visual Analytics and Data Mining Visual Analytics and Data Mining in S-T-applicationsS Gennady Andrienko & Natalia Andrienko Fraunhofer Institute AIS Sankt Augustin Germany http://www.ais.fraunhofer.de/and Mining Spatio-Temporal Data

More information

The First Online 3D Epigraphic Library: The University of Florida Digital Epigraphy and Archaeology Project

The First Online 3D Epigraphic Library: The University of Florida Digital Epigraphy and Archaeology Project Seminar on Dec 19 th Abstracts & speaker information The First Online 3D Epigraphic Library: The University of Florida Digital Epigraphy and Archaeology Project Eleni Bozia (USA) Angelos Barmpoutis (USA)

More information

Hybrid Strategies. for better products and shorter time-to-market

Hybrid Strategies. for better products and shorter time-to-market Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,

More information

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA Paper 156-2010 The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA Abstract JMP has a rich set of visual displays that can help you see the information

More information

Exploratory Data Analysis with R

Exploratory Data Analysis with R Exploratory Data Analysis with R Roger D. Peng This book is for sale at http://leanpub.com/exdata This version was published on 2015-11-12 This is a Leanpub book. Leanpub empowers authors and publishers

More information

Text Analytics. A business guide

Text Analytics. A business guide Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application

More information

Eighth Annual Student Research Forum

Eighth Annual Student Research Forum Eighth Annual Student Research Forum February 18, 2011 COMPUTER SCIENCE AND COMPUTATIONAL SCIENCE PRESENTATION SCHEDULE Session Chair: Dr. George Miminis Head, Computer Science: Dr. Edward Brown Director,

More information

How To Rank Term And Collocation In A Newspaper

How To Rank Term And Collocation In A Newspaper You Can t Beat Frequency (Unless You Use Linguistic Knowledge) A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim Wermter Udo Hahn Jena University Language & Information

More information