Exploiting the Web of Data for cross-domain information retrieval and recommendation Ignacio Fernández-Tobías under the supervision of Iván Cantador Grupo de Recuperación de Información Universidad Autónoma de Madrid i.fernandez@uam.es VII Jornadas MAVIR Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia Escuela Politécnica Superior, Universidad Carlos III de Madrid 26-27 November 2012
Contents 1 Introduction: Cross-domain item recommendation Case study: Linking music with places of interest A semantic-based framework for linking domains Cross-domain semantic networks from Wikipedia Cross-domain semantic networks from Open Information Extraction A social tag-based emotion-oriented approach for linking domains
Introduction: Cross-domain item recommendation 2 Recommender systems help users to make choices, by proactively finding relevant items or services, taking into account or predicting the users tastes, priorities and goals The vast majority of the currently available recommender systems predicts the user s relevance of items in a specific and limited domain
Introduction: Cross-domain item recommendation 3 In some applications, it could be useful to offer the user joint personalized recommendations of items belonging to multiple domains In an e-commerce site, we may suggest movies or videogames based on a particular book bought by a costumer In a travel application, we may suggest cultural events may interest a person who has booked a hotel in a particular place In an e-learning system, we may suggest educational websites with topics related to a video documentary a student has seen Potential benefits Offering diversity and serendipity Addressing the cold-start problem (on a target domain) Mitigating the sparsity problem Fernández-Tobías, I., Cantador, I., Kaminskas, M., Ricci, F. 2012. Cross-domain Recommender Systems: A Survey of the State of the Art. 2nd Spanish Conference on Information Retrieval.
Introduction: Cross-domain item recommendation 4 Some real applications (e.g. Amazon) do already recommend items from different domains, but their recommendations rely on statistical analysis of popular items, without any personalization strategy, or most of them only exploit information about the user preferences in the target domain
Introduction: Cross-domain item recommendation 5 Context User and item profiles are distributed in multiple systems there is no / a few user profiles with preferences on items in different domains Goal Automatically establishing links or transferring knowledge between domains
Contents 6 Introduction: Cross-domain item recommendation Case study: Linking music with places of interest A semantic-based framework for linking domains Cross-domain semantic networks from Wikipedia Cross-domain semantic networks from Open Information Extraction A social tag-based emotion-oriented approach for linking domains
Case study: Linking music with places of interest Case study: Suggesting music / musicians highly related to a particular point of interest (POI) 7
Case study: Linking music with places of interest 8 Case study: Suggesting music / musicians highly related to a particular point of interest (POI) Relations between music and places Based on common emotions caused by listening to music and visiting POIs social tags Kaminskas, M., Ricci, F. 2011. Location-Adapted Music Recommendation Using Tags. 19th Intl. Conference on User Modeling, Adaptation and Personalization, 183-194.
Case study: Linking music with places of interest 9 Case study: Suggesting music / musicians highly related to a particular point of interest (POI) Relations between music and places Based on common emotions caused by listening to music and visiting POIs social tags Based on explicit semantic associations between musicians and POIs information available in the (Semantic) Web Austrian musicians Romanticism Vienna State Opera Classical music Opera composers 19th century Gustav Mahler Arnold Schoenberg Wolfgang Amadeus Mozart
Case study: Linking music with places of interest 10 Semantic relations between musicians and POIs Location relations Arnold Schoenberg was born in Vienna, which is the city where Vienna State Opera is located Time relations Gustav Mahler was born in 1869, which is a year in the decade when Vienna State Opera was built Architecture-History/Art-Music category relations Wolfgang A. Mozart was a classical music composer, and classical compositions are played in Opera houses, which is the building type of the Vienna State Opera Arbitrary relations Gustav Mahler was the director of Vienna State Opera Ana Belén (a famous Spanish singer) composed a song about La Puerta de Alcalá (a well known POI in Madrid)
Contents 11 Introduction: Cross-domain item recommendation Case study: Linking music with places of interest A semantic-based framework for linking domains Cross-domain semantic networks from Wikipedia Cross-domain semantic networks from Open Information Extraction A social tag-based emotion-oriented approach for linking domains
Cross-domain semantic networks from Wikipedia 12 Building type Date City (Architecture) categories
Cross-domain semantic networks from Wikipedia Linking Wikipedia s architecture and music categories 13 Architectural styles 19th century Visitor architecture attractions Arts venues 18th century Music venues Modern history Romanticism Opera houses Historical eras 19th century Romantic composers 19th century in music Opera Classical composers Opera composers 19th century musicians 19th century composers Classical music Music people Musicians Composers Music genres Kaminskas, M., Fernández-Tobías, I., Ricci, F., Cantador, I. 2013. Ontology-based Identification of Music for Places. 13th Intl. Conference on Information and Communication Technologies in Tourism.
Cross-domain semantic networks from Wikipedia Linking Wikipedia s architecture and music categories 14 Architectural styles 19th century Visitor architecture attractions Arts venues 18th century Music venues Modern history Romanticism Opera houses Historical eras 19th century Romantic composers 19th century in music Opera Classical composers Opera composers 19th century musicians 19th century composers Classical music Music people Musicians Composers Music genres Kaminskas, M., Fernández-Tobías, I., Ricci, F., Cantador, I. 2013. Ontology-based Identification of Music for Places. 13th Intl. Conference on Information and Communication Technologies in Tourism.
Cross-domain semantic networks from Wikipedia 15 Cross-domain taxonomies from Wikipedia Architecture History / Art Music Architectural styles Visitor attractions Centuries in architecture 19th century architecture Historical eras Centuries Modern history 18th century 19th century Romanticism Musicians Composers Classical composers Opera composers 19th century musicians 19th century composers Music people Arts venues Music venues Opera houses Romantic composers Opera Classical music Music genres
building_start_date_of building_end_date_of opening_date_of subcategory_of located_in City Year birth_place_of 16 death_place_of residence_place_of birth_date_of death_date_of activity_date_of POI Date Decade Musician has_style Century type_of has_type Architectural style Architectural era Historical era Musical era Musician type genre_of Building type Music genre
Vienna, Austria 17 City death_place_of 1860s 1869 Date birth_decade_of Vienna State Opera 19th century activity_century_of Gustav Mahler Architectural styles 1869 architecture Architectural eras 19th century architecture Historical eras 19th century Romanticism Musical eras Romantic music 19th century in music 19th century composers Romantic composers Building types Music genres Musician types Classical composers Opera houses in Vienna Opera houses in Austria Opera houses Opera Classical music
Cross-domain semantic networks from Wikipedia Weight Spreading Activation 18 score i S i = 1 d rel i + d w ji S(j) PageRank score i PR i = 1 d 1 N + d 1 PR(j) L(j) j i HITS j i j i j i score i A i A i = H(j) j i H i = A(j) i j H A A H
Cross-domain semantic networks from Wikipedia 97 users, 17 cities, 25 POIs, 356 POI-musician pairs, 1155 assessments 19
Cross-domain semantic networks from Wikipedia Average precision values for the top 5 ranked musicians for each POI 20 P@1 P@2 P@3 P@4 P@5 Random 0.355* 0.391* 0.363* 0.435* 0.413* HITS 0.688 0.706 0.711* 0.700* 0.694 PageRank 0.753 0.728 0.707* 0.660* 0.646* Spreading 0.810 0.804 0.828 0.847 0.837 The values marked with * have differences statistically significant with Spreading algorithm s (Wilcoxon signed-rank test, p<0.05) Fernández-Tobías, I., Kaminskas, M., Cantador, I., Ricci, F. 2013. A semantic framework for supporting cross-domain recommendation: Suggesting music for places of interest. Submitted.
Cross-domain semantic networks from Wikipedia Average number of semantic paths per POI 21 Percentages of interesting and obvious musicians recommended by Spreading algorithm Interesting Non interesting Related 78.3% 21.7% Non-related 8.2% 91.8% Non obvious Fernández-Tobías, I., Kaminskas, M., Cantador, I., Ricci, F. 2013. A semantic framework for supporting cross-domain recommendation: Suggesting music for places of interest. Submitted. Obvious 58.9% 41.1% 84.2% 15.8%
Contents 22 Introduction: Cross-domain item recommendation Case study: Linking music with places of interest A semantic-based framework for linking domains Cross-domain semantic networks from Wikipedia Cross-domain semantic networks from Open Information Extraction A social tag-based emotion-oriented approach for linking domains
Cross-domain semantic networks from Open Information Extraction TextRunner (openie.cs.washington.edu) and ReVerb (reverb.cs.washington.edu): Automatically identification and extraction of binary relationships from English sentences 23 Linked to Freebase Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam. 2011. Open Information Extraction: The Second Generation. 22nd International Joint Conference on Artificial Intelligence, pp. 3-10.
Cross-domain semantic networks from Open Information Extraction Filtering relations based on a TF-IDF heuristic c e 1, e 2 w e 1, r, e 2 = λ + 1 λ tfidf(r) c e i, e j ei,e j 24 tfidf r = max s e i, r, e j G e i, s, e j log N e i, r, e j C Ranking entities according to node categories and graph structure w e = α 1 w T e + α 2 w P (e) + α 3 w D (e) w T e = T e D w P e = s e w D e = dist(s, e) T e D T(e) Fernández-Tobías, I., Cantador, I. 2013. Open Cross-domain Semantic Networks: Application to Item-to-item Recommendation. To be submitted.
Cross-domain semantic networks from Open Information Extraction 25
Cross-domain semantic networks from Open Information Extraction 26
Contents 27 Introduction: Cross-domain item recommendation Case study: Linking music with places of interest A semantic-based framework for linking domains Cross-domain semantic networks from Wikipedia Cross-domain semantic networks from Open Information Extraction A social tag-based emotion-oriented approach for linking domains
A social tag-based emotion-oriented approach for linking domains Mining social tagging systems to create linked emotion-oriented folksonomies 28
A social tag-based emotion-oriented approach for linking domains Generic emotion lexicon Automatically created by mining online thesauri (e.g. thesaurus.com) 16 main emotions: alert, excited, elated, happy, content, serene, relaxed, calm, fatigued, bored, depressed, sad, upset, stressed, nervous, tense Emotion = synonym & antonym vector Synonyms: positive weights Antonyms: negative weights 29 happy happy:+66, cheerful:+ 21, merry:+19, felicitous:+17, unhappy: 11, sad: 10, depressed: 6, serious: 4,. Fernández-Tobías, I., Plaza, L., Cantador, I. 2013. Cross-domain Emotion Folksonomies. To be submitted.
A social tag-based emotion-oriented approach for linking domains Generic emotion lexicon In accordance with Russell s emotion model (1980) Emotion representation in 2 dimensions: pleasure & arousal 30 Russell s emotion model Obtained emotion vectors projected into 2 dimensions (PCA) AROUSAL tense alert DISTRESS nervous excited EXCITEMENT stressed upset MISERY sad depressed DEPRESSION bored fatigued relaxed calm elated happy content serene PLEASURE 0.15 0.10 0.05 0.00-0.05 tense excited nervous upset elated alert stressed happy content depressed fatigued relaxed bored sad calm serene CONTENTMENT -0.10 SLEEPINESS Russell, J. A. 1980. A Circumplex Model of Affect. Journal of Personality and Social Psychology 39(6), pp. 1161-1178. -0.15-0.15-0.10-0.05 0.00 0.05 0.10 0.15
A social tag-based emotion-oriented approach for linking domains 31 Domain-dependent emotion folksonomies Particular emotional categories in each domain Each category is composed of a set of concurrent tags in the domain folksonomy Movies (MovieLens, Jinni, IMDb) bittersweet, emotional, feel good, scary, Music (Last.fm, GEMS) wonder, tenderness, nostalgia, peacefulness, Books (BookCrossing, LibraryThing, Whichbook) funny, unpredictable, disgusting, violent,
Exploiting the Web of Data for cross-domain information retrieval and recommendation Ignacio Fernández-Tobías under the supervision of Iván Cantador Grupo de Recuperación de Información Universidad Autónoma de Madrid i.fernandez@uam.es VII Jornadas MAVIR Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia Escuela Politécnica Superior, Universidad Carlos III de Madrid 26-27 November 2012
Case study: Linking music with places of interest Vienna State Opera Arnold Schoenberg Arnold Schoenberg was born in Vienna, where Vienna State Opera is located Arnold Schoenberg was born in the 19th century, when Vienna State Opera was built Arnold Schoenberg was a Classical music composer, Classical music genre is related to Opera houses, which is the building type of Vienna State Opera 33 Las Ventas Antonio Flores Antonio Flores was born in Madrid, where Las Ventas is located Antonio Flores died in the 20th century, when Las Ventas was built Antonio Flores was a Flamenco singer, Flamenco is a Romanic music genre and is related to Moorish architecture, and Moorish Revival architecture is the architectonical style of Las Ventas
Cross-domain semantic networks from Wikipedia 34 Average precision values obtained by the Spreading algorithm for the top 5 ranked musicians for each POI type P@1 P@2 P@3 P@4 P@5 Music venues (4) Religious buildings (8) Castles and palaces (6) Other POIs (7) 0.838 0.688 0.838 0.829 0.870 0.721 0.965 0.844 0.795 0.781 0.794 0.704 0.792 0.900 0.825 0.908 0.772 0.836 0.872 0.893
Cross-domain semantic networks from Wikipedia Evaluating if tracks of the retrieved musicians are relevant to POIs 35