Semantic Analysis of. Tag Similarity Measures in. Collaborative Tagging Systems



Similar documents
TAGora. Semiotic Dynamics in Online Social Communities. V. Loreto.

Stop Thinking, Start Tagging: Tag Semantics Emerge from Collaborative Verbosity

LinksTo A Web2.0 System that Utilises Linked Data Principles to Link Related Resources Together

Automatic Web Tagging and Person Tagging Using Language Models

int.ere.st: Building a Tag Sharing Service with the SCOT Ontology

Lecture Overview. Web 2.0, Tagging, Multimedia, Folksonomies, Lecture, Important, Must Attend, Web 2.0 Definition. Web 2.

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Does it fit? KOS evaluation using the ICE-Map Visualization.

Identifying free text plagiarism based on semantic similarity

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Web Development News, Tips and Tutorials

1 o Semestre 2007/2008

Interactive Data Visualization for the Web Scott Murray

Mining User Profiles to Support Structure and Explanation in Open Social Networking

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

INTERESTED IN EXPANDING YOUR TECHNICAL SKILLS?

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis

Interested in Expanding your Technical Skills?

Movie Classification Using k-means and Hierarchical Clustering

Semantic Concept Based Retrieval of Software Bug Report with Feedback

A typology of ontology-based semantic measures

SEO for Web 2.0 Enterprise Search Summit West September 2008

Yifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University

A SYSTEM FOR AUTOMATIC QUERY EXPANSION IN A BROWSER-BASED ENVIRONMENT

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Extracting user interests from search query logs: A clustering approach

Topological Tree Clustering of Social Network Search Results

Harvesting and Structuring Social Data in Music Information Retrieval

Mining Text Data: An Introduction

Building a Question Classifier for a TREC-Style Question Answering System

Web-based Learning with Non-linear Multimedia Stories

Clever Search: A WordNet Based Wrapper for Internet Search Engines

Selecting a Taxonomy Management Tool. Wendi Pohs InfoClear Consulting #SLATaxo

Friendship Prediction and Homophily in Social Media

Cross-lingual Synonymy Overlap

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Analyzing survey text: a brief overview

Personalizing Image Search from the Photo Sharing Websites

Controlled Vocabulary and Folksonomies. Louise Spiteri School of Information Management

Graph Analysis of Student Model Networks

An ontology-based approach for semantic ranking of the web search engines results

Remix Your Data: Visualizing Library Instruction Statistics

Secure semantic based search over cloud

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Clustering of Polysemic Words

Tags4Tags: Using Tagging to Consolidate Tags

Search and Information Retrieval

Information Retrieval, Information Extraction and Social Media Analytics

dustin caruso JavaScript / WordPress / UI developer 1230 Parkside Drive South, Reading, PA, USA dustin@dustincaruso.com

Liktap 2012 Search Engine

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-4, Issue-4) Abstract-

Realizing a Vision Interesting Student Projects

Friendship prediction and homophily in social media

Clustering Technique in Data Mining for Text Documents

Package wordnet. January 6, 2016

Why is Internal Audit so Hard?

The Open University s repository of research publications and other research outputs

CSCI 5417 Information Retrieval Systems Jim Martin!

Traping User Tag-Clouds

What Is This, Anyway: Automatic Hypernym Discovery

Department of Cognitive Sciences University of California, Irvine 1

EXPLORING THE VALUE OF FOLKSONOMIES FOR CREATING SEMANTIC METADATA

Politecnico di Torino. Porto Institutional Repository

Visualizing WordNet Structure

Learning to Identify Emotions in Text

Natural Language Processing. Part 4: lexical semantics

Semantic Clustering in Dutch

Fogbeam Vision Series - The Modern Intranet

Semantic tagging for crowd computing

Social Networking For Scientists

T HE I NFORMATION A RCHITECTURE G LOSSARY

A Semantic Metadirectory of Services Based on Web Mining Techniques

Ontology based ranking of documents using Graph Databases: a Big Data Approach

An Efficient Database Design for IndoWordNet Development Using Hybrid Approach

to Boost SEO Growth Services To learn more, go to: teletech.com

Generating Advertising Keywords from Video Content

Universiteit Leiden Opleiding Informatica

Knowledge Discovery from patents using KMX Text Analytics

WordTies - a web interface for browsing wordnets across languages. Bolette Sandford Pedersen University of Copenhagen

YouTube optimisation best practice guide

Collaborative process maturing support by mining activity streams. iknow 2015 Prof. Dr. René Peinl Graz,

Slide 7. Jashapara, Knowledge Management: An Integrated Approach, 2 nd Edition, Pearson Education Limited Nisan 14 Pazartesi

Concept based Personalized Web Search

Search Engines. Stephen Shaw 18th of February, Netsoc

SNS-Navigator: A Graphical Interface to Environmental Meta-Information

FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS

MBARI Deep Sea Guide: Designing a web interface that represents information about the Monterey Bay deep-sea world.

Privacy-preserving Digital Identity Management for Cloud Computing

Persona: A Contextualized and Personalized Web Search

Jay Buckingham Dynamic Signal

Evaluating the impact of research online with Google Analytics

Prepared for Northwest Flower & Garden Show.

Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation

Improving Tag Clouds with Ontologies and Semantics

Software Requirements Specification For Real Estate Web Site

A Framework for Ontology-Based Knowledge Management System

DISA at ImageCLEF 2014: The search-based solution for scalable image annotation

Transcription:

Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems 1 Ciro Cattuto, 2 Dominik Benz, 2 Andreas Hotho, 2 Gerd Stumme 1 Complex Networks Lagrange Laboratory (CNLL), ISI Foundation, Torino, Italy 2 Research Unit Knowledge and Data Engineering (KDE), University of Kassel, Germany

Everybody is tagging tag user So how to pick the best from the tag soup? -http://xkcd.com/ Extract concepts - Harvest semantics -resource Learn ontologies simple and intuitive way to organize resources, immediately useful uncontrolled vocabulary however: evidence for converging vocabulary / emergent semantics due to shared implicit knowledge mutual influence of users underlying social networks Dominik Benz 25.07.2008 2

Agenda Topic Definition Folksonomy Definition and Data Tag Similarity Measures Semantic Grounding Summary and Outlook Dominik Benz 25.07.2008 3

Topic Definition: Semantic Grounding of Tag similarity Final Goal: Understand tag semantics in a folksonomy, i.e., Which tags describe the same / a more specific / a more general concept? Two basic approaches: Look up tags in external thesaurus: + semantically grounded metrics - folksonomy jargon (misspellings, neologisms etc.) not present Semantic Grounding Apply measures directly to folksonomy structure (e.g. cooccurrence statistics, ) + inclusion of complete vocabulary - semantic interpretation of measures is not clear Understand characteristics of (distributional) measures assess their applicability for concept extraction, ontology learning, Dominik Benz 25.07.2008 4

Agenda Topic Definition Folksonomy Definition and Data Tag Similarity Measures Semantic Grounding Summary and Outlook Dominik Benz 25.07.2008 5

Folksonomy Model A folksonomy F = (U, T, R, Y) consists of the sets Users U Tags T Resources R Tag assignments Y (U T R) Can be seen as ternary relation tripartite hypergraph G = ((U T R), Y) User 1 User 3 User 2 Tag 2 Tag 3 Res 3 Res 1 Res 2 Dominik Benz 25.07.2008 6

Folksonomy Dataset Del.icio.us crawl 2006 U = 667,128 T = 2,454,546 R = 18,782,132 Y = 140,333,714 Excerpt: 10,000 most popular tags U = 476,378 T = 10,000 R = 12,660,470 Y = 101,491,722 In the following: tag rank = position in most-popular list: 1: design 2: software 3: blog 4: web Dominik Benz 25.07.2008 7

Agenda Topic Definition Folksonomy Definition and Data Tag Similarity Measures Semantic Grounding Summary and Outlook Dominik Benz 25.07.2008 8

Similarity Measures: Co-occurrence + Cosine Take Co-occurrence frequency as similarity measure (freq): freq(t 1,t 2 )= {(u,r) U R:(u,t 1,r) Y (u,t 2,r) Y} Describe each tag as a vector, whereby each dimension of the vector space corresponds to another tag. Compute similar tags by cosine similarity (cosine). (The same can be done in the user space or the resource space and with TF-IDF.) JAVA 5 30 1 10 50 design software blog web programming Dominik Benz 25.07.2008 9

Most related tags by cooccurrence / cosine simlarity art design photography illustration blog graphics web2.0 ajax web tools blog webdesign news blog technology politics media daily howto tutorial reference tips linux programming video music funny tv software media ajax javascript web2.0 web programming webdesign tutorial howto programming reference design css javascript ajax programming css web webdesign freq cosine art graphic creative print portfolios nice web2.0 web2 web-2.0 webapp web web_2.0 news blogs people weblog culture future howto how-to guide tutorials help how_to video entertainment awesome fun cool random ajax dhtml dom js ecmascript webdev tutorial tutorials tips coding code examples javascript webdevelopment webdev example examples webprogramming Dominik Benz 25.07.2008 10

Example for cosine measure Dominik Benz 25.07.2008 11

Similarity Measures: FolkRank Take Co-occurrence frequency as similarity measure (freq). Cosine Similarity between tag vectors Use FolkRank to find related tags (folkrank). Basic Idea: PageRank-like spreading of weights through folksonomy structure + high weights for a particular tag in the random surfer vector User 1 User 3 User 2 Tag 2 Tag 3 Res 3 Res 1 Res 2 Web graph Folksonomy graph Andreas Hotho and Robert Jäschke and Christoph Schmitz and Gerd Stumme. Information Retrieval in Folksonomies: Search and Ranking. Dominik Proceedings Benz of the 3rd European Semantic Web Conference, (4011):411-426, Springer,Budva, Montenegro,2006. 25.07.2008 12

Examples of most related tags Cosine FolkRank Freq Dominik Benz 25.07.2008 13

Qualitative insights: Overlap & average rank of related tags Overlap Average rank cosinefolkrank cosinefreq freqfolkrank 1.7 1.1 6,7 Dominik Benz 25.07.2008 14

First insights Freq / FolkRank show bias to high-frequency tags, i.e., to hyperonyms. Cosine seems to yield more synomyms and siblings. Now: grounding of these observations in WordNet. Dominik Benz 25.07.2008 15

Agenda Topic Definition Folksonomy Definition and Data Tag Similarity Measures Semantic Grounding Summary and Outlook Dominik Benz 25.07.2008 16

Semantic Grounding in WordNet WordNet is a large lexical database for English. Words with same meaning are grouped in synsets, which are ordered by an is-a hierarchy. Introduction of single artificial root node enables application of graph-based similarity metrics between pairs of nouns / pairs of verbs. Inclusion of top n del.icio.us tags in WordNet: 100: 82% 1,000: 79% 5,000: 69% 10,000: 61% Dominik Benz 25.07.2008 17

Example of Semantic Grounding Original tag: java Wordnet Synset Hierarchy: computers Most similar tag: programming Freq, folkrank: programming map design_patterns languages Cosine: python Grounded similarity java python Dominik Benz 25.07.2008 18

Shortest path between original tag and most closely related one 8 14 7 6 5 4 3 2 1 12 10 8 6 4 2 0 FREQ FolkRank Cosine 0 FREQ FolkRank Cosine shortest path Jiang-Conrath Shown to be the semantically most adequate measure for similarity within WordNet [Budanitsky, Hirst, 2006]. Dominik Benz 25.07.2008 19

shortest paths in WordNet random siblings length of shortest path to most related tag Dominik Benz 25.07.2008 20

Agenda Topic Definition Folksonomy Definition and Data Tag Similarity Measures Semantic Grounding Summary and Outlook Dominik Benz 25.07.2008 21

Summary Analysis of tag similarity measures by mapping to WordNet Exposed clearly different characteristics: freq measure and Folkrank tend to more general tags Synonyms and siblings are the result of the cosine measure Implications for ontology learning: Insights can inform the choice of an appropriate measure to extract semantic tag relations e.g, FolkRank to find Hyperonyms, Cosine measure for Synonyms Now: Embed these measures in an ontology learning procedure Dominik Benz 25.07.2008 22

Similar tags live on www.bibsonomy.org Thanks for your attention! contact: benz@cs.uni-kassel.de Dominik Benz 25.07.2008 23

Appendix: Music Genre Taxonomy learned from last.fm Music Genre Taxonomy learned from last.fm Dominik Benz 25.07.2008 24