Distributional Similarity

Similar documents

Selectional Preferences 1

Clustering Connectionist and Statistical Language Processing

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Social Media Mining. Data Mining Essentials

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work

Probabilistic Latent Semantic Analysis (plsa)

A chart generator for the Dutch Alpino grammar

Building a Question Classifier for a TREC-Style Question Answering System

Movie Classification Using k-means and Hierarchical Clustering

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Search Engine Based Intelligent Help Desk System: iassist

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Elementary Statistics

Gallito 2.0: a Natural Language Processing tool to support Research on Discourse

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

The Data Mining Process

The Role of Sentence Structure in Recognizing Textual Entailment

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization

Common factor analysis

2014/02/13 Sphinx Lunch

Machine Learning for Data Science (CS4786) Lecture 1

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Data Mining Yelp Data - Predicting rating stars from review text

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

TF-IDF. David Kauchak cs160 Fall 2009 adapted from:

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Can shared mobility offer an answer to the problems with regard to transport poverty?

Protein Protein Interaction Networks

Making Sense of the Mayhem: Machine Learning and March Madness

Latent Semantic Indexing with Selective Query Expansion Abstract Introduction

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Alignment and Preprocessing for Data Analysis

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Text Analytics. A business guide

Unsupervised learning: Clustering

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Linear Threshold Units

Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products

Interactive Dynamic Information Extraction

Data Mining: Algorithms and Applications Matrix Math Review

Domain Classification of Technical Terms Using the Web

A Large Scale Evaluation of Distributional Semantic Models: Parameters, Interactions and Model Selection

Learning is a very general term denoting the way in which agents:

Data, Measurements, Features

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Identifying Focus, Techniques and Domain of Scientific Papers

A Toolbox for Bicluster Analysis in R

CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature

Object Recognition. Selim Aksoy. Bilkent University

Customer Intentions Analysis of Twitter Based on Semantic Patterns

Mining Text Data: An Introduction

Artificial Intelligence and Transactional Law: Automated M&A Due Diligence. By Ben Klaber

Data Mining - Evaluation of Classifiers

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

Term extraction for user profiling: evaluation by the user

Supervised Learning (Big Data Analytics)

Automatically Constructing a Wordnet for Dutch

Statistical Feature Selection Techniques for Arabic Text Categorization

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye MSRC

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Cognitive Abilities Test Practice Activities. Te ach e r G u i d e. Form 7. Verbal Tests. Level5/6. Cog

W6.B.1. FAQs CS535 BIG DATA W6.B If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Analytics on Big Data

Tackling Big Data with Tensor Methods

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

CLUSTER ANALYSIS FOR SEGMENTATION

Crossing Corpora. Modelling Semantic Similarity across Languages and Lects.

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

Rank one SVD: un algorithm pour la visualisation d une matrice non négative

Chapter 8. Final Results on Dutch Senseval-2 Test Data

TEACHING ADULTS TO REASON STATISTICALLY

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Statistics and Probability

Projektgruppe. Categorization of text documents via classification

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

Machine Learning using MapReduce

Map-Reduce for Machine Learning on Multicore

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Transcription:

Overview and alpage inria & paris vii Séminaire cental Vendredi 5 novembre 2010

Outline 1 2 Similarity Context Dimensionality reduction Latent semantic analysis Non-negative matrix factorization Tensors 3

Semantic similarity Most work on semantic similarity relies on the distributional hypothesis (Harris 1954) Take a word and its contexts: tasty tnassiorc greasy tnassiorc tnassiorc with butter tnassiorc for breakfast By looking at a word s context, one can infer its meaning

Semantic similarity Most work on semantic similarity relies on the distributional hypothesis (Harris 1954) Take a word and its contexts: tasty tnassiorc greasy tnassiorc tnassiorc with butter tnassiorc for breakfast FOOD By looking at a word s context, one can infer its meaning

Semantic similarity Most work on semantic similarity relies on the distributional hypothesis (Harris 1954) Take a word and its contexts: tasty tnassiorc greasy tnassiorc tnassiorc with butter tnassiorc for breakfast By looking at a word s context, one can infer its meaning

Matrix Similarity Context Dimensionality reduction Tensors Capture co-occurrence frequencies of two entities

Matrix Similarity Context Dimensionality reduction Tensors Capture co-occurrence frequencies of two entities rouge délicieux rapide d occasion pomme 2 1 0 0 vin 2 2 0 0 voiture 1 0 1 2 camion 1 0 1 1

Matrix Similarity Context Dimensionality reduction Tensors Capture co-occurrence frequencies of two entities rouge délicieux rapide d occasion pomme 7 9 0 0 vin 12 6 0 0 voiture 7 0 8 4 camion 2 0 3 4

Matrix Similarity Context Dimensionality reduction Tensors Capture co-occurrence frequencies of two entities rouge délicieux rapide d occasion pomme 56 98 0 0 vin 44 34 0 0 voiture 23 0 31 39 camion 4 0 18 29

Matrix Similarity Context Dimensionality reduction Tensors Capture co-occurrence frequencies of two entities rouge délicieux rapide d occasion pomme 728 592 1 0 vin 1035 437 0 2 voiture 392 0 487 370 camion 104 0 393 293

Similarity calculation Similarity Context Dimensionality reduction Tensors Cosine P x y = n i=1 x i y i Pn P n i=1 x2 i i=1 y i 2 Examples: cos(pomme, vin) =.96 cos(pomme, voiture) =.42 cos( x, y ) = x y Other possibilities: set-theoretic measures Dice Jaccard probabilistic measures Kullback-Leibler divergence Jensen-Shannon divergence

Different kinds of context Similarity Context Dimensionality reduction Tensors Three different word space models based on context: document-based model (nouns documents) window-based model (nouns context words) syntax-based model (nouns dependency relations) Each model with plethora of parameters! document size, window size, type of dependency relations weighting function ± dimensionality reduction

Similarity Context Dimensionality reduction Tensors Different kinds of semantic similarity tight, synonym-like similarity: (near-)synonymous or (co-)hyponymous loosely related, topical similarity: more loose relationships, such as association and meronymy

Similarity Context Dimensionality reduction Tensors Different kinds of semantic similarity tight, synonym-like similarity: (near-)synonymous or (co-)hyponymous loosely related, topical similarity: more loose relationships, such as association and meronymy Example médecin doctor : docteur doctor, médecin de famille family doctor, chirurgien surgeon, spécialiste specialist, dermatologue dermatologist, gynécologue gynaecologist médecin doctor : malade patient, maladie disease, diagnostic diagnosis, traitement treatment, hôpital hospital, stéthoscope stethoscope

Relation context similarity Similarity Context Dimensionality reduction Tensors Different context leads to different kind of similarity Syntax, small window large window, documents The former models induce tight, synonymous similarity The latter models induce topical relatedness

Relation context similarity Similarity Context Dimensionality reduction Tensors Different context leads to different kind of similarity Syntax, small window large window, documents The former models induce tight, synonymous similarity The latter models induce topical relatedness Evaluation Syntax-based model scores best when evaluated according to Wordnet similarity measures (cornetto) Large window and document-based do not score well on Wordnet similarity, but do score on Wordnet domain evaluation

Similarity Context Dimensionality reduction Tensors Two reasons for performing dimensionality reduction: Intractable computations When number of elements and number of features is too large, similarity computations may become intractable reduction of the number of features makes computation tractable again Generalization capacity the dimensionality reduction is able to describe the data better, or is able to capture intrinsic semantic features dimensionality reduction is able to improve the results (counter data sparseness and noise)

Similarity Context Dimensionality reduction Tensors Latent semantic analysis: introduction Application of a mathematical/statistical technique to simulate how humans learn the semantics of words LSA finds latent semantic dimensions according to which words and documents can be identified Goal: counter data sparseness and get rid of noise

Similarity Context Dimensionality reduction Tensors Latent semantic analysis: introduction What is latent semantic analysis technically speaking? The application of singular value decomposition to a term-document matrix to improve similarity calculations

Singular value decomposition Similarity Context Dimensionality reduction Tensors Find the dimensions that explain most variance by solving number of eigenvector problems Only keep the n most important dimensions (n = 50 300)

Similarity Context Dimensionality reduction Tensors SVD: three matrices

LSA: Example Similarity Context Dimensionality reduction Tensors LSA in (part of) Twente Nieuws Corpus: 10 years of Dutch newspaper texts (AD, NRC, TR, VK, PAR) terms = nouns documents = paragraphs 20,000 terms * 2,000,000 documents matrix reduced to 300 dimensions

LSA: criticism Similarity Context Dimensionality reduction Tensors LSA criticized for a number of reasons: dimensionality reduction is best fit in least-square sense, but this is only valid for normally distributed data; language data is not normally distributed Dimensions may contain negative values; it is not clear what negativity on a semantic scale should designate Shortcomings are remedied by subsequent techniques (PLSA, LDA, NMF,... )

Similarity Context Dimensionality reduction Tensors Non-negative matrix factorization: technique Given a non-negative matrix V, find non-negative matrix factors W and H such that: Choosing r n, m reduces data V nxm W nxr H rxm (1) Constraint on factorization: all values in three matrices need to be non-negative values ( 0) Constraint brings about a parts-based representation: only additive, no subtractive relations are allowed

Similarity Context Dimensionality reduction Tensors Non-negative matrix factorization: technique Different kinds of NMF s that minimize different cost functions: Square of Euclidean distance Kullback-Leibler divergence better suited for language phenomena To find NMF is to minimize D(V WH) with respect to W and H, subject to the constraints W, H 0 This can be done with update rules H aµ H aµ i W V iµ ia (WH) iµ k W ka W ia W ia µ H aµ V iµ (WH) iµ v H av (2) these update rules converge to a local minimum in the minimization of KL divergence

NMF: graphical representation Similarity Context Dimensionality reduction Tensors

NMF: results Similarity Context Dimensionality reduction Tensors Context vectors (5k nouns 2k co-occurring nouns) extracted from clef corpus nmf is able to capture clear semantic dimensions Examples: bus bus, taxi taxi, trein train, halte stop, reiziger traveler, perron platform, tram tram, station station, chauffeur driver, passagier passenger bouillon broth, slagroom cream, ui onion, eierdooier egg yolk, laurierblad bay leaf, zout salt, deciliter decilitre, boter butter, bleekselderij celery, saus sauce

Two-way vs. three-way Similarity Context Dimensionality reduction Tensors all methods use two way co-occurrence frequencies matrix suitable for two-way problems words documents nouns dependency relations not suitable for n-way problems words documents authors verbs subjects direct objects

Two-way vs. three-way Similarity Context Dimensionality reduction Tensors all methods use two way co-occurrence frequencies matrix suitable for two-way problems words documents nouns dependency relations not suitable for n-way problems tensor words documents authors verbs subjects direct objects

Similarity Context Dimensionality reduction Tensors Non-negative tensor factorization: technique Idea similar to non-negative matrix factorization Calculations are different min xi R D1 0,y i R D2 0,z i R D3 T k 0 i=1 x i y i z i 2 F

NTF: graphical representation Similarity Context Dimensionality reduction Tensors

Task: automatic extraction of multi-word expressions (mwes) from large corpora Starting point: many mwes are non-compositional, i.e. the meaning of the mwe is not the sum of the meaning of the individual words Intuition: a noun within a mwe cannot easily be replaced by a semantically similar noun Use of semantic clusters to determine whether a mwe-candidate is compositional or not

Intuition break the ice break the vase *snow cup *hail dish In the first expression, it is not possible to replace ice with semantically related nouns such as snow or hail; in the second expression, it is possible to replace vase with semantically related words such as cup or dish.

Overview of the method verb + prepositional complement instances are extracted from 500M word corpus (focus on mwe with PP) matrix of 5K verb-preposition combinations 10K nouns is created 10K nouns are automatically clustered using distributional similarity measures number of statistical measures is applied to determine unique associations given the cluster in which a noun appears

Measures inspired by selectional preferences (Resnik 1993), entropy-based Kullback-Leibler divergence between P(n) and P(n v) S v = n p(n v) p(n v) log p(n) (3)

Measures The preference of the verb for the noun [0, 1] A v n = p(n v) log p(n v) p(n) S v (4) Ratio of verb preference for a particular noun, compared to other nouns in the cluster [0, 1] R v n = A v n n ɛc A v n (5)

Measures The preference of the noun for the verb [0, 1] A n v = p(v n) log p(v n) p(v) S n (6) Ratio of noun preference for a particular verb, compared to other nouns in the cluster [0, 1] R n v = A n v n ɛc A n v (7)

An elaborated example in de smaak vallen in de put vallen *geur kuil *voorkeur krater *stijl greppel in the taste fall in the well fall to be appreciated to fall down the well *smell hole *preference crater *style trench

An elaborated example smaak: idioom, karakter, persoonlijkheid, stijl, temperament, thematiek, uiterlijk, uitstraling, voorkomen mwe candidate (1) (2) (3) (4) mwe? val#in smaak.12 1.00.07 1.00 yes val#in karakter.00.00.00.00 no val#in stijl.00.00.00.00 no

An elaborated example put: gaatje, gat, kloof, krater, kuil, lek, scheur, valkuil mwe candidate (1) (2) (3) (4) mwe? val#in put.00.04.10.05 no val#in kuil.00.11.10.38 no val#in kloof.00.01.10.04 no

Quantitative Evaluation Fully automated, compared to Referentie Bestand Nederlands Upper bound consists of all RBN mwe s present in the data Parameters Prec Rec F-Measure (1) (2) (3) (4) N (%) (%) (%).10.80 2916 24.66 16.64 19.87.10.80.01.80 1694 30.99 12.15 17.45 Fazly/Stevenson 3470 16.89 13.56 15.04 Random baseline 4332 1.02 1.02 1.02

Conclusion Non-compositionality based algorithm is able to rule out expression that are coined mwe s by traditional algorithms, improving on state-of-the-art Using measures (1) and (2) gives best results; using (3) and (4) increases precision but degrades recall

Ambiguity Problem: ambiguity bar

Ambiguity Problem: ambiguity bar

Ambiguity Problem: ambiguity bar

Ambiguity Problem: ambiguity bar

Ambiguity Problem: ambiguity bar

Ambiguity Problem: ambiguity bar

Ambiguity Problem: ambiguity bar Main research question: can topical similarity and tight, synonym-like similarity be combined to differentiate between various senses of a word?

Goal: classification of nouns according to both window-based context (with large window) and syntactic context Construct three matrices capturing co-occurrence frequencies for each mode nouns cross-classified by dependency relations nouns cross-classified by (bag of words) context words dependency relations cross-classified by context words Apply nmf to matrices, but interleave the process Result of former factorization is used to initialize factorization of the next one

Graphical Representation 5k 80k A nouns x dependency relations 50 50 5k = x W 80k H 5k 2k B nouns x context words 50 50 5k = x V 2k G 2k 80k C context words x dependency relations 2k = x 50 80k U 50 F

Sense subtraction switch off dimension(s) of an ambiguous word to reveal other possible senses From matrix W, we know which dimensions are the most important for a certain word Matrix H gives the importance of each dependency relation given a dimension subtract dependency relations that are responsible for a given dimension from the original noun vector v new = v orig ( v 1 h dim ) each dependency relation is multiplied by a scaling factor, according to the load of the feature on the subtracted dimension

Combination with clustering A simple clustering algorithm (k-means) assigns ambiguous nouns to its predominant sense Centroid of the cluster is folded into nmf model The dimensions that define the centroid are subtracted from the ambiguous noun vector Adapted noun vector is fed to the clustering algorithm again

Experimental Design Approach applied to Dutch, using Twente Nieuws Corpus (± 500M words) Corpus parsed with Dutch dependency parser alpino three matrices constructed with: 5k nouns 80k dependency relations 5k nouns 2k context words 80k dependency relations 2k context words Factorization to 50 dimensions

Example dimension: transport 1 nouns: auto car, wagen car, tram tram, motor motorbike, bus bus, metro subway, automobilist driver, trein trein, stuur steering wheel, chauffeur driver 2 context words: auto car, trein train, motor motorbike, bus bus, rij drive, chauffeur driver, fiets bike, reiziger reiziger, passagier passenger, vervoer transport 3 dependency relations: viertraps adj four pedal, verplaats met obj move with, toeter adj honk, tank in houd obj [parsing error], tank subj refuel, tank obj refuel, rij voorbij subj pass by, rij voorbij adj pass by, rij af subj drive off, peperduur adj very expensive

Pop: most similar words pop music doll 1 pop, rock, jazz, meubilair furniture, popmuziek pop music, heks witch, speelgoed toy, kast cupboard, servies [tea] service, vraagteken question mark 2 pop, meubilair furniture, speelgoed toy, kast cupboard, servies [tea] service, heks witch, vraagteken question mark sieraad jewel, sculptuur sculpture, schoen shoe 3 pop, rock, jazz, popmuziek pop music, heks witch, danseres dancer, servies [tea] service, kopje cup, house house music, aap monkey

Barcelona: most similar words Spanish city Spanish football club 1 Barcelona, Arsenal, Inter, Juventus, Vitesse, Milaan Milan, Madrid, Parijs Paris, Wenen Vienna, München Munich 2 Barcelona, Milaan Milan, München Munich, Wenen Vienna, Madrid, Parijs Paris, Bonn, Praag Prague, Berlijn Berlin, Londen London 3 Barcelona, Arsenal, Inter, Juventus, Vitesse, Parma, Anderlecht, PSV, Feyenoord, Ajax

Clustering example: werk 1 werk work, beeld image, foto photo, schilderij painting, tekening drawing, doek canvas, installatie installation, afbeelding picture, sculptuur sculpture, prent picture, illustratie illustration, handschrift manuscript, grafiek print, aquarel aquarelle, maquette scale-model, collage collage, ets etching 2 werk work, boek book, titel title, roman novel, boekje booklet, debuut debut, biografie biography, bundel collection, toneelstuk play, bestseller bestseller, kinderboek child book, autobiografie autobiography, novelle short story, 3 werk work, voorziening service, arbeid labour, opvoeding education, kinderopvang child care, scholing education, huisvesting housing, faciliteit facility, accommodatie acommodation, arbeidsomstandigheid working condition

Evaluation: methodology Comparison to EuroWordNet senses using Wu & Palmer s Wordnet similarity measure a sense is assigned to a correct cluster if: for the top 4 words of the cluster the average similarity between the sense and the top 4 words exceeds a similarity threshold in EuroWordNet and the sense has not yet been considered correct when multiple senses exist in EuroWordNet, the one with maximum similarity is chosen

Evaluation: precision & recall Precision of a word: Percentage of correct clusters to which it is assigned overall: average precision of all words in test set Recall of a word: Percentage of senses in EuroWordnet that have a corresponding cluster overall: average recall of all words in test set Test set: words covered by algorithm that are also present in EuroWordNet (3683 words)

Evaluation: results threshold θ.40 (%).50 (%).60 (%) kmeans nmf prec. 78.97 69.18 55.16 rec. 63.90 55.95 44.77 cbc prec. 44.94 38.13 29.74 rec. 69.61 60.00 48.00 kmeans orig prec. 86.13 74.99 58.97 rec. 60.23 52.45 41.80

Evaluation: results kmeans nmf beats cbc with regard to precision cbc beats kmeans nmf with regard to recall kmeans nmf has higher recall than kmeans orig, so algorithm is able to find multiple senses with good precision

Conclusion Combining bag of words data and syntactic data is useful bag of words data (factorized with nmf) puts its finger on topical dimensions syntactic data is particularly good at finding similar words a clustering approach allows one to determine which topical dimension(s) are responsible for a certain sense and adapt the (syntactic) feature vector of the noun accordingly subtracting the more dominant sense to discover less dominant senses Algorithm scores better with regard to precision; lower with regard to recall

Standard selectional preference models: two-way co-occurrences Keeping track of single relationships But: two-way selectional preference models are not sufficiently rich Compare: The skyscraper is playing coffee. The turntable is playing the piano.

The skyscraper is playing coffee. (play, su, scyscraper) (play, obj, coffee) The turntable is playing the piano. (play, su, turntable) (play, obj, piano) (play, su, turntable, obj, piano)

Three-way extraction of selectional preferences Approach applied to Dutch, using twente nieuws corpus (500M words of newspaper texts) parsed with Dutch dependency parser alpino three-way co-occurrence of verbs with subjects and direct objects extracted adapted with extension of pointwise mutual information Resulting tensor 1k verbs 10k subjects 10k direct objects non-negative tensor factorization with k dimensions (k = 50, 100, 300)

Graphical representation

Example dimension: police action subjects su s verbs v s objects obj s politie police.99 houd aan arrest.64 verdachte suspect.16 agent policeman.07 arresteer arrest.63 man man.16 autoriteit authority.05 pak op run in.41 betoger demonstrator.14 Justitie Justice.05 schiet dood shoot.08 relschopper rioter.13 recherche detective force.04 verdenk suspect.07 raddraaier instigator.13 marechaussee military police.04 tref aan find.06 overvaller raider.13 justitie justice.04 achterhaal overtake.05 Roemeen Romanian.13 arrestatieteam special squad.03 verwijder remove.05 actievoerder campaigner.13 leger army.03 zoek search.04 hooligan hooligan.13 douane customs.02 spoor op track.03 Algerijn Algerian.13

Example dimension: legislation subjects su s verbs v s objects obj s meerderheid majority.33 steun support.83 motie motion.63 VVD.28 dien in submit.44 voorstel proposal.53 D66.25 neem aan pass.23 plan plan.28 Kamermeerderheid.25 wijs af reject.17 wetsvoorstel bill.19 fractie party.24 verwerp reject.14 hem him.18 PvdA.23 vind think.08 kabinet cabinet.16 CDA.23 aanvaard accepts.05 minister minister.16 Tweede Kamer.21 behandel treat.05 beleid policy.13 partij party.20 doe do.04 kandidatuur candidature.11 Kamer Chamber.20 keur goed pass.03 amendement amendment.09

Example dimension: exhibition subjects su s verbs v s objects obj s tentoonstelling exhibition.50 toon display.72 schilderij painting.47 expositie exposition.49 omvat cover.63 werk work.46 galerie gallery.36 bevat contain.18 tekening drawing.36 collectie collection.29 presenteer present.17 foto picture.33 museum museum.27 laat let.07 sculptuur sculpture.25 oeuvre oeuvre.22 koop buy.07 aquarel aquarelle.20 Kunsthal.19 bezit own.06 object object.19 kunstenaar artist.15 zie see.05 beeld statue.12 dat that.12 koop aan acquire.05 overzicht overview.12 hij he.10 in huis heb own.04 portret portrait.11

Quality count 44 dimensions contain similar, framelike semantics 43 dimensions contain less clear-cut semantics single verbs account for one dimension verb senses are mixed up 13 dimensions based on syntax rather than semantics fixed expressions pronomina

Evaluation: methodology pseudo-disambiguation task to test generalization capacity (standard automatic evaluation for selectional preferences) s v o s o jongere drink bier coalitie aandeel youngster drink beer coalition share werkgever riskeer boete doel kopzorg employer risk fine goal worry directeur zwaai scepter informateur vodka manager sway sceptre informer wodka 10-fold cross validation (± 300,000 co-occurrences)

Evaluation: models Evaluation of 4 different models 2 matrix models 1k verbs (10k subjects + 10k direct objects) singular value decomposition (R) non-negative matrix factorization (R 0 ) 2 tensor models 1k verbs 10k subjects 10k direct objects parallel factor analysis (R) non-negative tensor factorization (R 0 )

Evaluation: results dimensions 50 (%) 100 (%) 300 (%) svd 69.60 ± 0.41 62.84 ± 1.30 45.22 ± 1.01 nmf 81.79 ± 0.15 78.83 ± 0.40 75.74 ± 0.63 parafac 85.57 ± 0.25 83.58 ± 0.59 80.12 ± 0.76 ntf 89.52 ± 0.18 90.43 ± 0.14 90.89 ± 0.16

Conclusion novel method able to investigate three-way co-occurrences capable of automatically inducing selectional preferences three-way methods improve on two-way methods non-negativity constraint improves on unconstrained models non-negative tensor factorization outperforms other models

Similarity evaluation Wordnet-based similarity Compare results to Dutch cornetto database Two similarity measures: path length: Wu & Palmer similarity measure information theoretic: Lin s similarity measure Nouns close in the hierarchy are tightly similar Pairwise similarity for k similar words Test set of ± 5000 nouns

Similarity evaluation Wordnet-based similarity wu & palmer s similarity lin s similarity model k = 1 k = 3 k = 5 k = 1 k = 3 k = 5 document.379.331.309.354.320.304 window (w=par).442.377.349.404.357.336 window (w=2).633.561.526.541.485.456 syntax.648.584.554.555.504.480 baseline.128.126.126.164.163.163 syntax > window (w=2) window (w=par) > document

Similarity evaluation Wordnet-based similarity Syntax (with pmi) best Closely followed by small window (with pmi) Large window and document perform much worse dimensionality reduction only helps to improve document-based (a little)

Similarity evaluation Cluster quality Compare output of clustering algorithm to gold standard classification Two cluster tasks (esslli 2008 workshop s shared task) concrete noun categorization (44 nouns) 2-way: natural, artefact 3-way: animal, vegetable, artefact 6-way: bird, groundanimal, fruittree, green, tool, vehicle abstract/concrete noun discrimination (30 nouns) 2-way: hi, lo Evaluation measures: Entropy: distribution of classes within cluster (small = good) Purity: ratio of largest class present in cluster (large = good)

Similarity evaluation Cluster quality 2-way 3-way 6-way model ent pur ent pur ent pur document.930.614.179.932.292.682 window (w=par).911.659.541.705.377.591 window (w=2).000 1.000.213.909.206.773 syntax-based.000 1.000.000 1.000.153.864 syntax > window (w=2) document > window (w=par)

Similarity evaluation Cluster quality Same tendencies as wordnet-based similarity model with large window seems to extract topically related clusters: aardappel potato, ananas pineapple, banaan banana, champignon mushroom, fles bottle, kers cherry, ketel kettle, kip chicken, kom bowl, lepel spoon, peer pear, sla lettuce, ui onion Similar result for abstract/concrete noun discrimination

Similarity evaluation Domain coherence Coherence of semantic domain tags (available in cornetto) particular areas of human knowledge (politics, medicine, sports) topical similarity Ratio of most frequent domain tag (also in tagset of target word) over top 10 similar words Same test set of ± 5000 nouns

Similarity evaluation Domain coherence model sim topic document.394 window (w=par).399 window (w=2).414 syntax.441 baseline.048 syntax > window (w=2) = window(w=par) = document

Similarity evaluation Domain coherence Syntax still scores best Other models do not perform much worse No real difference between small window and large window large window and document do not extract tight similarity, but they do grasp topical similarity