A Wikipedia-based Naive Bayes Approach for Obtaining Related Phrases from A Natural Language Query

Size: px
Start display at page:

Download "A Wikipedia-based Naive Bayes Approach for Obtaining Related Phrases from A Natural Language Query"

Transcription

1 DEIM Forum 2012 D7-2 Wikipedia Web Wikipedia Wikipedia,,,, A Wikipedia-based Naive Bayes Approach for Obtaining Related Phrases from A Natural Language Query Masumi SHIRAKAWA, Kotaro NAKAYAMA, Takahiro HARA, and Shojiro NISHIO Graduate School of Information Science and Technology, Osaka University 1-5 Yamadaoka, Suita, Osaka , Japan The Center for Knowledge Structuring, The University of Tokyo Hongo, Bunkyo-ku, Tokyo , Japan {shirakawa.masumi,hara,nishio}@ist.osaka-u.ac.jp, nakayama@cks.u-tokyo.ac.jp 1. Web 2006 Web Wikipedia 1 Wikipedia 1 1) 2) 3) Wikipedia [2], [16]

2 Wikipedia Wikipedia 2. Wikipedia Wikipedia 2006 Wikipedia Wiki Web Web Wikipedia URL [10] Wikipedia 2 Wikipedia Wikipedia [9] [12] Strube [16] WordNet 3 Wikipedia Wikipedia [12] Wikipedia Gabrilovich [2] Wikipedia (Explicit Semantic Analysis, ESA) ESA ESA Milne [8] ESA [10], [11] Wikipedia Ito [4] [10] Wikipedia Wikipedia Twitter 4 Meij [6] Ferragina [1] Wikipedia Song [14] ESA Wikipedia Wikipedia Yahoo! Content Analysis API Wikipedia 3. Wikipedia

3 Wikipedia Wikipedia 1 6 t T T E e c P (t T ) P (e t) P (c e) P (c t) P (c) P (c T ) Table 1 1 Definition of symbols. t T t e e c t c c c T c P (T =T ) T T Table 2 2 An example of the probability that a term is a keyphrase. P (t T ) Apple Apple Inc Steve Jobs Japan China tree black house Wikipedia t P (t T ) Wikipedia [7] Wikipedia (wikification) Wikipedia t CountDocuments(t) CountDocuments(t Key) P (t T ) CountDocuments(t Key) CountDocuments(t) 2 TFIDF Apple Inc. Steve Jobs black house (1)... and New

4 3 Apple Table 3 The probability that a term Apple is linked to an entity. P (e t) Apple Inc Apple Apple Records Apple (album) Apple Corps Apple Store Apple (company) App Store Apple Inc. Table 4 Related terms of an entity Apple Inc. and their probability. P (c e) AppleInsider Apple Store Steve Jobs IPhone OS IPod Touch FairPlay Mac OS X Macworld York Times said... New York Times New York York Wikipedia t e P (e t) Wikipedia [9] t e CountAnchortexts(t, e) P (e t) CountAnchortexts(t, e) e i E CountAnchortexts(t, ei) (2) E Wikipedia 3 Apple 8 IT Apple Inc. Apple Apple Records e c P (c e) Wikipedia ESA [2] e c CountLinks(e, c) e c P (c e) CountLinks(e, c) c j E CountLinks(e, cj) (3) ESA e c ESA Sim(e, c) e c Sim(e, c) P (c e) Sim(e, c c j E j) (2) t c P (c t) = P (c e i )P (e i t) (5) e i E 4 ESA Apple Inc. 8 Apple Inc (4) c P (c) c P (c e) c CountLinks(c) P (c) CountLinks(c) c j E CountLinks(cj) (6) Wikipedia

5 P (T = T ) = P (t k T ) P (t k / T ) t k T t k / T = P (t k T ) (1 P (t k T )) (8) t k T t k / T 1 (7) (8) ( ) P (c t P (c T ) P (T = T t ) k T k ) P (c) T 1 T (9) 1 Fig. 1 Naive Bayes for a set of keyphrases in which members are unobservable. T = {t 1,..., t K } P (c T ) 7 t k P (c t) [14] P (c T = {t 1,..., t K}) P (c) K P (t k c) k=1 K k=1 P (c t k) P (c) K 1 (7) T [13] T T P (c T ) 1 t 1 t 2 t 3 T T P (T = T ) 8 T T (1) 7 T T T T K T T t k T t k / T (9) ( ( P (t k T )P (c t k ) 1 P (tk T ) ) ) P (c) T t k T t k / T (10) P (c) K 1 t k [13] K k=1( P (t k T )P (c t k ) + ( 1 P (t k T ) ) ) P (c) P (c T ) P (c) K 1 (11) (11) (7) P (c t k ) P (c t k ) P (c) P (t k T ) P (t k T ) t k P (c t k ) t k P (c) P (t k T ) P (c) P (c) 4. ESA 4 Twitter 2 8 (a) (b) Microsoft Microsoft (a) brand (b) Xbox Live (a) Microsoft brand

6 (a) Did you know that Microsoft is the most influential brand in Canada? Microsoft (b) Microsoft denies Xbox Live security breach Xbox Microsoft (c) Warriors beat the Heat... Happy face! NBA (d) McClennan names Warriors lineup for first pre-season trial Fig. 2 2 Related terms obtained by our method (value means probability). Canada (c) (d) Warriors (c) NBA (d) Warriors Golden State Warriors New Zealand Warriors (c) NBA (d) (c) Heat NBA (d) McClennan Twitter Twitter K-means #Obama #MacBook # [5] 5 Table 5 Three datasets for evaluation and their statistics. U IT S #Obama #MacBook #NFL (779) (1,251) (1,043) #Bones #Silverlight #NHL (949) (221) (1,045) #PGA #VMWare #NBA (1,243) (890) (1,085) #Microsoft #MySQL #MLB ( ) (1,040) (1,241) (752) #medicine #Ubuntu #MLS (1,109) (988) (969) #Christ #Chrome #UFC (871) (1,018) (984) #NASCAR (857) 5,991 5,609 6,735 83,748 82,608 91, ,636 16,539 18,603 [14] 5 U IT

7 6 Table 6 The result of clustering. purity NMI ARI U IT S U IT S U IT S BOW ESA ( 10) ESA ( 20) ESA ( 50) ESA ( 100) ESA ( 200) ESA ( 500) ESA ( 1,000) ESA ( 2,000) ESA ( 5,000) ( 10) ( 20) ( 50) ( 100) ( 200) ( 500) ( 1,000) ( 2,000) ( 5,000) (ESA 10) (ESA 20) (ESA 50) (ESA 100) (ESA 200) (ESA 500) (ESA 1,000) (ESA 2,000) (ESA 5,000) S 1) 2) 3) RT URL 4) # 5) 5 bag-of-words (BOW) Gabrilovich ESA [2] (ESA) ESA ESA (purity) [17] (NMI) [15] adjusted Rand index (ARI) [3] purity NMI ARI NMI ARI false-positive false-negative 0 1 K-means (BOW) Wikipedia ESA 5 Song [14]

8 BOW ESA ESA ESA ESA purity ARI NMI (IT, S) (U) IT S ESA ESA 6. Wikipedia Wikipedia B( ) [1] P. Ferragina and U. Scaiella, TAGME: On-the-fly Annotation of Short Text Fragments (by Wikipedia Entities), Proc. of ACM Conference on Information and Knowledge Management (CIKM), pp , Oct [2] E. Gabrilovich and S. Markovitch, Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis, Proc. of International Joint Conference on Artificial Intelligence (IJCAI), pp , Jan [3] L. Hubert and P. Arabie, Comparing Partitions, Journal of Classification, vol.2, no.1, pp , [4] M. Ito, K. Nakayama, T. Hara, and S. Nishio, Association Thesaurus Construction Methods based on Link Cooccurrence Analysis for Wikipedia, Proc. of ACM Conference on Information and Knowledge Management (CIKM), pp , Oct [5] D. Laniado and P. Mika, Making Sense of Twitter, Proc. of International Semantic Web Conference (ISWC), pp , Nov [6] E. Meij, W. Weerkamp, and M. de Rijke, Adding Semantics to Microblog Posts, Proc. of ACM International Conference on Web Search and Data Mining (WSDM), Feb [7] R. Mihalcea and A. Csomai, Wikify! Linking Documents to Encyclopedic Knowledge, Proc. of ACM Conference on Information and Knowledge Management (CIKM), pp , Nov [8] D. Milne and I.H. Witten, An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links, Proc. of AAAI Workshop on Wikipedia and Artificial Intelligence (WIKIAI), pp.25 30, July [9] D. Milne and I.H. Witten, Learning to Link with Wikipedia, Proc. of ACM Conference on Information and Knowledge Management (CIKM), pp , Oct [10] K. Nakayama, T. Hara, and S. Nishio, Wikipedia Mining for An Association Web Thesaurus Construction, Proc. of International Conference on Web Information Systems Engineering (WISE), pp , Dec [11] Y. Ollivier and P. Senellart, Finding Related Pages Using Green Measures: An Illustration with Wikipedia, Proc. of National Conference on Artificial Intelligence (AAAI), pp , July [12] S.P. Ponzetto and M. Strube, Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution, Proc. of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp , June [13] M. Shirakawa, H. Wang, Y. Song, Z. Wang, K. Nakayama, T. Hara, and S. Nishio, Entity Disambiguation based on a Probabilistic Taxonomy, Tech. Rep. MSR-TR , Microsoft Research, Nov [14] Y. Song, H. Wang, Z. Wang, H. Li, and W. Chen, Short Text Conceptualization Using a Probabilistic Knowledgebase, Proc. of International Joint Conference on Artificial Intelligence (IJCAI), pp , July [15] A. Strehl and J. Ghosh, Cluster Ensembles A Knowledge Reuse Framework for Combining Multiple Partitions, Journal of Machine Learning Research, vol.3, pp , Dec [16] M. Strube and S.P. Ponzetto, WikiRelate! Computing Semantic Relatedness using Wikipedia, Proc. of National Conference on Artificial Intelligence (AAAI), pp , July [17] Y. Zhao and G. Karypis, Criterion Functions for Document Clustering: Experiments and Analysis, Tech. Rep. #01-40, Department of Computer Science, University of Minnesota, Feb

Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities

Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa Kotaro Nakayama Takahiro Hara Shojiro Nishio Graduate School of Information Science and Technology,

More information

Clustering Documents with Active Learning using Wikipedia

Clustering Documents with Active Learning using Wikipedia Clustering Documents with Active Learning using Wikipedia Anna Huang David Milne Eibe Frank Ian H. Witten Department of Computer Science, University of Waikato Private Bag 3105, Hamilton, New Zealand {lh92,

More information

Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation

Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation Denis Turdakov, Pavel Velikhov ISP RAS turdakov@ispras.ru, pvelikhov@yahoo.com

More information

Local and Global Algorithms for Disambiguation to Wikipedia

Local and Global Algorithms for Disambiguation to Wikipedia ACL 11 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1 Dan Roth 1 Doug Downey 2 Mike Anderson 3 1 University of Illinois at Urbana-Champaign {ratinov2 danr}@uiuc.edu 2 Northwestern

More information

Local and Global Algorithms for Disambiguation to Wikipedia

Local and Global Algorithms for Disambiguation to Wikipedia ACL 11 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1 Dan Roth 1 Doug Downey 2 Mike Anderson 3 1 University of Illinois at Urbana-Champaign {ratinov2 danr}@uiuc.edu 2 Northwestern

More information

Improving Question Retrieval in Community Question Answering Using World Knowledge

Improving Question Retrieval in Community Question Answering Using World Knowledge Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Improving Question Retrieval in Community Question Answering Using World Knowledge Guangyou Zhou, Yang Liu, Fang

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,

More information

Semantic Relationship Discovery with Wikipedia Structure

Semantic Relationship Discovery with Wikipedia Structure Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Semantic Relationship Discovery with Wikipedia Structure Fan Bu, Yu Hao and Xiaoyan Zhu State Key Laboratory of

More information

Twitter Stock Bot. John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu

Twitter Stock Bot. John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu Twitter Stock Bot John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu Hassaan Markhiani The University of Texas at Austin hassaan@cs.utexas.edu Abstract The stock market is influenced

More information

Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques.

Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques. Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques. Akshay Amolik, Niketan Jivane, Mahavir Bhandari, Dr.M.Venkatesan School of Computer Science and Engineering, VIT University,

More information

Sheeba J.I1, Vivekanandan K2

Sheeba J.I1, Vivekanandan K2 IMPROVED UNSUPERVISED FRAMEWORK FOR SOLVING SYNONYM, HOMONYM, HYPONYMY & POLYSEMY PROBLEMS FROM EXTRACTED KEYWORDS AND IDENTIFY TOPICS IN MEETING TRANSCRIPTS Sheeba J.I1, Vivekanandan K2 1 Assistant Professor,sheeba@pec.edu

More information

COMPUTATION OF THE SEMANTIC RELATEDNESS BETWEEN WORDS USING CONCEPT CLOUDS

COMPUTATION OF THE SEMANTIC RELATEDNESS BETWEEN WORDS USING CONCEPT CLOUDS COMPUTATION OF THE SEMANTIC RELATEDNESS BETWEEN WORDS USING CONCEPT CLOUDS Swarnim Kulkarni and Doina Caragea Department of Computing and Information Sciences, Kansas State University, Manhattan, Kansas,

More information

The Effect of Clustering in the Apriori Data Mining Algorithm: A Case Study

The Effect of Clustering in the Apriori Data Mining Algorithm: A Case Study WCE 23, July 3-5, 23, London, U.K. The Effect of Clustering in the Apriori Data Mining Algorithm: A Case Study Nergis Yılmaz and Gülfem Işıklar Alptekin Abstract Many organizations collect and store data

More information

Concept Term Expansion Approach for Monitoring Reputation of Companies on Twitter

Concept Term Expansion Approach for Monitoring Reputation of Companies on Twitter Concept Term Expansion Approach for Monitoring Reputation of Companies on Twitter M. Atif Qureshi 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group, National University

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

An Open-Source Toolkit for Mining Wikipedia

An Open-Source Toolkit for Mining Wikipedia An Open-Source Toolkit for Mining Wikipedia David Milne Department of Computer Science, University of Waikato Private Bag 3105, Hamilton, New Zealand +64 7 856 2889 (ext. 6038) d.n.milne@gmail.com ABSTRACT

More information

Analysis One Code Desc. Transaction Amount. Fiscal Period

Analysis One Code Desc. Transaction Amount. Fiscal Period Analysis One Code Desc Transaction Amount Fiscal Period 57.63 Oct-12 12.13 Oct-12-38.90 Oct-12-773.00 Oct-12-800.00 Oct-12-187.00 Oct-12-82.00 Oct-12-82.00 Oct-12-110.00 Oct-12-1115.25 Oct-12-71.00 Oct-12-41.00

More information

Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis

Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis Evgeniy Gabrilovich and Shaul Markovitch Department of Computer Science Technion Israel Institute of Technology, 32000 Haifa,

More information

INTERNATIONAL JOURNAL FOR TRENDS IN ENGINEERING & TECHNOLOGY VOLUME 3 ISSUE

INTERNATIONAL JOURNAL FOR TRENDS IN ENGINEERING & TECHNOLOGY VOLUME 3 ISSUE Enhancing Implicit Relations in Wikipedia Mining Using Object Relationship Technique G.Shanmugapriya 1 1 B.S Abdur Rahman University, Computer Science, sarushiya@gmail.com S.Raja shaik 2 2 B.S Abdur Rahman

More information

Emoticon Smoothed Language Models for Twitter Sentiment Analysis

Emoticon Smoothed Language Models for Twitter Sentiment Analysis Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of

More information

Wikipedia-based Semantic Interpretation for Natural Language Processing

Wikipedia-based Semantic Interpretation for Natural Language Processing Journal of Artificial Intelligence Research 34 (2009) 443-498 Submitted 08/08; published 03/09 Wikipedia-based Semantic Interpretation for Natural Language Processing Evgeniy Gabrilovich Shaul Markovitch

More information

Sentiment Analysis and Topic Classification: Case study over Spanish tweets

Sentiment Analysis and Topic Classification: Case study over Spanish tweets Sentiment Analysis and Topic Classification: Case study over Spanish tweets Fernando Batista, Ricardo Ribeiro Laboratório de Sistemas de Língua Falada, INESC- ID Lisboa R. Alves Redol, 9, 1000-029 Lisboa,

More information

CitationBase: A social tagging management portal for references

CitationBase: A social tagging management portal for references CitationBase: A social tagging management portal for references Martin Hofmann Department of Computer Science, University of Innsbruck, Austria m_ho@aon.at Ying Ding School of Library and Information Science,

More information

Impact of Feature Selection Technique on Email Classification

Impact of Feature Selection Technique on Email Classification Impact of Feature Selection Technique on Email Classification Aakanksha Sharaff, Naresh Kumar Nagwani, and Kunal Swami Abstract Being one of the most powerful and fastest way of communication, the popularity

More information

Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

More information

Understanding User s Query Intent with Wikipedia

Understanding User s Query Intent with Wikipedia Understanding User s Query Intent with Wikipedia Jian Hu 1, Gang Wang 1, Fred Lochovsky 2, Jian-Tao Sun 1, Zheng Chen 1 1 Microsoft Research Asia 2 The Hong Kong University of Science & Technology No.

More information

WIKITOLOGY: A NOVEL HYBRID KNOWLEDGE BASE DERIVED FROM WIKIPEDIA. by Zareen Saba Syed

WIKITOLOGY: A NOVEL HYBRID KNOWLEDGE BASE DERIVED FROM WIKIPEDIA. by Zareen Saba Syed WIKITOLOGY: A NOVEL HYBRID KNOWLEDGE BASE DERIVED FROM WIKIPEDIA by Zareen Saba Syed Thesis submitted to the Faculty of the Graduate School of the University of Maryland in partial fulfillment of the requirements

More information

Identifying free text plagiarism based on semantic similarity

Identifying free text plagiarism based on semantic similarity Identifying free text plagiarism based on semantic similarity George Tsatsaronis Norwegian University of Science and Technology Department of Computer and Information Science Trondheim, Norway gbt@idi.ntnu.no

More information

DBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis

DBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis DBTechNet DBTech Pro Workshop Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining Dimitris A. Dervos dad@it.teithe.gr http://aetos.it.teithe.gr/~dad Georgios Evangelidis

More information

Sense and Reference Disambiguation in Wikipedia. Dezambiguizare de Sensuri si Referinte in Wikipedia

Sense and Reference Disambiguation in Wikipedia. Dezambiguizare de Sensuri si Referinte in Wikipedia Sense and Reference Disambiguation in Wikipedia Hui SH EN 1 Razvan BUN ESCU 1 Rada M IHALC EA 2 (1) School of Electrical Engineering and Computer Science, Ohio University, Athens, OH (2) Department of

More information

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5

More information

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2 Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data

More information

Harvesting and Structuring Social Data in Music Information Retrieval

Harvesting and Structuring Social Data in Music Information Retrieval Harvesting and Structuring Social Data in Music Information Retrieval Sergio Oramas Music Technology Group Universitat Pompeu Fabra, Barcelona, Spain sergio.oramas@upf.edu Abstract. An exponentially growing

More information

UNED Online Reputation Monitoring Team at RepLab 2013

UNED Online Reputation Monitoring Team at RepLab 2013 UNED Online Reputation Monitoring Team at RepLab 2013 Damiano Spina, Jorge Carrillo-de-Albornoz, Tamara Martín, Enrique Amigó, Julio Gonzalo, and Fernando Giner {damiano,jcalbornoz,tmartin,enrique,julio}@lsi.uned.es,

More information

Discovering and Querying Hybrid Linked Data

Discovering and Querying Hybrid Linked Data Discovering and Querying Hybrid Linked Data Zareen Syed 1, Tim Finin 1, Muhammad Rahman 1, James Kukla 2, Jeehye Yun 2 1 University of Maryland Baltimore County 1000 Hilltop Circle, MD, USA 21250 zsyed@umbc.edu,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 5, Sep-Oct 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 5, Sep-Oct 2015 RESEARCH ARTICLE Multi Document Utility Presentation Using Sentiment Analysis Mayur S. Dhote [1], Prof. S. S. Sonawane [2] Department of Computer Science and Engineering PICT, Savitribai Phule Pune University

More information

Effective Mentor Suggestion System for Collaborative Learning

Effective Mentor Suggestion System for Collaborative Learning Effective Mentor Suggestion System for Collaborative Learning Advait Raut 1 U pasana G 2 Ramakrishna Bairi 3 Ganesh Ramakrishnan 2 (1) IBM, Bangalore, India, 560045 (2) IITB, Mumbai, India, 400076 (3)

More information

Annotation for the Semantic Web during Website Development

Annotation for the Semantic Web during Website Development Annotation for the Semantic Web during Website Development Peter Plessers, Olga De Troyer Vrije Universiteit Brussel, Department of Computer Science, WISE, Pleinlaan 2, 1050 Brussel, Belgium {Peter.Plessers,

More information

Integrating Cyc and Wikipedia: Folksonomy Meets Rigorously Defined Common-Sense

Integrating Cyc and Wikipedia: Folksonomy Meets Rigorously Defined Common-Sense Integrating Cyc and Wikipedia: Folksonomy Meets Rigorously Defined Common-Sense Olena Medelyan Department of Computer Science University of Waikato, New Zealand olena@cs.waikato.ac.nz Catherine Legg Department

More information

Cloud Computing an introduction

Cloud Computing an introduction Prof. Dr. Claudia Müller-Birn Institute for Computer Science, Networked Information Systems Cloud Computing an introduction January 30, 2012 Netzprogrammierung (Algorithmen und Programmierung V) Our topics

More information

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS Alok Ranjan Pal 1, 3, Anirban Kundu 2, 3, Abhay Singh 1, Raj Shekhar 1, Kunal Sinha 1 1 College of Engineering and Management,

More information

Discovering Filter Keywords for Company Name Disambiguation in Twitter

Discovering Filter Keywords for Company Name Disambiguation in Twitter Discovering Filter Keywords for Company Name Disambiguation in Twitter Damiano Spina, Julio Gonzalo, Enrique Amigó UNED NLP & IR Group Juan del Rosal, 16 28040 Madrid, Spain http: // nlp. uned. es Abstract

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

Word Sense Disambiguation as an Integer Linear Programming Problem

Word Sense Disambiguation as an Integer Linear Programming Problem Word Sense Disambiguation as an Integer Linear Programming Problem Vicky Panagiotopoulou 1, Iraklis Varlamis 2, Ion Androutsopoulos 1, and George Tsatsaronis 3 1 Department of Informatics, Athens University

More information

BT Lancashire Services

BT Lancashire Services In confidence BT Lancashire Services Remote Access to Corporate Desktop (RACD) Getting Started Guide Working in partnership Confidentiality Statement BT Lancashire Services Certain information given to

More information

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), jagatheshwaran.n@gmail.com, Velalar College of Engineering and Technology,

More information

Yifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University

Yifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University Yifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University Presented by Qiang Yang, Hong Kong Univ. of Science and Technology 1 In a Search Engine Company Advertisers

More information

On the Evolution of Wikipedia: Dynamics of Categories and Articles

On the Evolution of Wikipedia: Dynamics of Categories and Articles Wikipedia, a Social Pedia: Research Challenges and Opportunities: Papers from the 2015 ICWSM Workshop On the Evolution of Wikipedia: Dynamics of Categories and Articles Ramakrishna B. Bairi IITB-Monash

More information

Building Semantic Kernels for Text Classification using Wikipedia

Building Semantic Kernels for Text Classification using Wikipedia Building Semantic Kernels for Text Classification using Wikipedia Pu Wang and Carlotta Domeniconi Department of Computer Science George Mason University pwang7@gmuedu, carlotta@csgmuedu ABSTRACT Document

More information

Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

More information

REUSING DISCUSSION FORUMS AS LEARNING RESOURCES IN WBT SYSTEMS

REUSING DISCUSSION FORUMS AS LEARNING RESOURCES IN WBT SYSTEMS REUSING DISCUSSION FORUMS AS LEARNING RESOURCES IN WBT SYSTEMS Denis Helic, Hermann Maurer, Nick Scerbakov IICM, University of Technology Graz Austria ABSTRACT Discussion forums are highly popular and

More information

On Analyzing Hashtags in Twitter

On Analyzing Hashtags in Twitter Proceedings of the Ninth International AAAI Conference on Web and Social Media On Analyzing Hashtags in Twitter Paolo Ferragina Francesco Piccinno Roberto Santoro Dipartimento di Informatica University

More information

Access Your Cisco Smart Storage Remotely Via WebDAV

Access Your Cisco Smart Storage Remotely Via WebDAV Application Note Access Your Cisco Smart Storage Remotely Via WebDAV WebDAV (Web-based Distributed Authoring and Versioning), is a set of extensions to the HTTP(S) protocol that allows a web server to

More information

Software Defect Prediction for Quality Improvement Using Hybrid Approach

Software Defect Prediction for Quality Improvement Using Hybrid Approach Software Defect Prediction for Quality Improvement Using Hybrid Approach 1 Pooja Paramshetti, 2 D. A. Phalke D.Y. Patil College of Engineering, Akurdi, Pune. Savitribai Phule Pune University ABSTRACT In

More information

Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

More information

Improving Classification of Multi-Lingual Web Documents using Domain Ontologies

Improving Classification of Multi-Lingual Web Documents using Domain Ontologies Improving Classification of Multi-Lingual Web Documents using Domain Ontologies Marina Litvak, Mark Last, and Slava Kisilevich Department of Information Systems Engineering, Ben-Gurion University of the

More information

COLINDA - Conference Linked Data

COLINDA - Conference Linked Data Undefined 1 (0) 1 5 1 IOS Press COLINDA - Conference Linked Data Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University, Country Open review(s): Name Surname, University,

More information

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS Huina Mao School of Informatics and Computing Indiana University, Bloomington, USA ECB Workshop on Using Big Data for Forecasting

More information

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015 Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD

More information

University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task

University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task Graham McDonald, Romain Deveaud, Richard McCreadie, Timothy Gollins, Craig Macdonald and Iadh Ounis School

More information

Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System

Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System Bala Kumari P 1, Bercelin Rose Mary W 2 and Devi Mareeswari M 3 1, 2, 3 M.TECH / IT, Dr.Sivanthi Aditanar College

More information

Keyword Optimization in Sponsored Search via Feature Selection

Keyword Optimization in Sponsored Search via Feature Selection JMLR: Workshop and Conference Proceedings 4: 122-134 New challenges for feature selection Keyword Optimization in Sponsored Search via Feature Selection Svetlana Kiritchenko Institute for Information Technology

More information

nfl picks week 15 espn

nfl picks week 15 espn Additional information >>> HERE

More information

How To Make Sense Of Data With Altilia

How To Make Sense Of Data With Altilia HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

The 2006 IEEE / WIC / ACM International Conference on Web Intelligence Hong Kong, China

The 2006 IEEE / WIC / ACM International Conference on Web Intelligence Hong Kong, China WISE: Hierarchical Soft Clustering of Web Page Search based on Web Content Mining Techniques Ricardo Campos 1, 2 Gaël Dias 2 Célia Nunes 2 1 Instituto Politécnico de Tomar Tomar, Portugal 2 Centre of Human

More information

Automatic Annotation Wrapper Generation and Mining Web Database Search Result

Automatic Annotation Wrapper Generation and Mining Web Database Search Result Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India

More information

Using Semantic Data Mining for Classification Improvement and Knowledge Extraction

Using Semantic Data Mining for Classification Improvement and Knowledge Extraction Using Semantic Data Mining for Classification Improvement and Knowledge Extraction Fernando Benites and Elena Sapozhnikova University of Konstanz, 78464 Konstanz, Germany. Abstract. The objective of this

More information

Rule based Classification of BSE Stock Data with Data Mining

Rule based Classification of BSE Stock Data with Data Mining International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification

More information

Profile Based Personalized Web Search and Download Blocker

Profile Based Personalized Web Search and Download Blocker Profile Based Personalized Web Search and Download Blocker 1 K.Sheeba, 2 G.Kalaiarasi Dhanalakshmi Srinivasan College of Engineering and Technology, Mamallapuram, Chennai, Tamil nadu, India Email: 1 sheebaoec@gmail.com,

More information

Efficient Query Optimizing System for Searching Using Data Mining Technique

Efficient Query Optimizing System for Searching Using Data Mining Technique Vol.1, Issue.2, pp-347-351 ISSN: 2249-6645 Efficient Query Optimizing System for Searching Using Data Mining Technique Velmurugan.N Vijayaraj.A Assistant Professor, Department of MCA, Associate Professor,

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS Subbarao Jasti #1, Dr.D.Vasumathi *2 1 Student & Department of CS & JNTU, AP, India 2 Professor & Department

More information

A Novel Framework for Personalized Web Search

A Novel Framework for Personalized Web Search A Novel Framework for Personalized Web Search Aditi Sharan a, * Mayank Saini a a School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi-67, India Abstract One hundred users, one

More information

An Adaptive Method for Organization Name Disambiguation. with Feature Reinforcing

An Adaptive Method for Organization Name Disambiguation. with Feature Reinforcing An Adaptive Method for Organization Name Disambiguation with Feature Reinforcing Shu Zhang 1, Jianwei Wu 2, Dequan Zheng 2, Yao Meng 1 and Hao Yu 1 1 Fujitsu Research and Development Center Dong Si Huan

More information

Additional details >>> HERE <<<

Additional details >>> HERE <<< Additional details >>> HERE http://dbvir.com/winningtip/pdx/nasl3500/

More information

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Multilingual Word Sense Disambiguation and Entity Linking on the Web based on BabelNet Roberto Navigli, Tiziano

More information

Analysis of Social Media Streams

Analysis of Social Media Streams Fakultätsname 24 Fachrichtung 24 Institutsname 24, Professur 24 Analysis of Social Media Streams Florian Weidner Dresden, 21.01.2014 Outline 1.Introduction 2.Social Media Streams Clustering Summarization

More information

Role of Social Networking in Marketing using Data Mining

Role of Social Networking in Marketing using Data Mining Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:

More information

Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge

Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge Evgeniy Gabrilovich and Shaul Markovitch Department of Computer Science Technion Israel

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Sentiment Analysis of Twitter Data within Big Data Distributed Environment for Stock Prediction

Sentiment Analysis of Twitter Data within Big Data Distributed Environment for Stock Prediction Proceedings of the Federated Conference on Computer Science and Information Systems pp. 1349 1354 DOI: 10.15439/2015F230 ACSIS, Vol. 5 Sentiment Analysis of Twitter Data within Big Data Distributed Environment

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Folksonomies versus Automatic Keyword Extraction: An Empirical Study Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk

More information

A Matrix Factorization Approach for Integrating Multiple Data Views

A Matrix Factorization Approach for Integrating Multiple Data Views A Matrix Factorization Approach for Integrating Multiple Data Views Derek Greene, Pádraig Cunningham School of Computer Science & Informatics, University College Dublin {derek.greene,padraig.cunningham}@ucd.ie

More information

Sentiment analysis: towards a tool for analysing real-time students feedback

Sentiment analysis: towards a tool for analysing real-time students feedback Sentiment analysis: towards a tool for analysing real-time students feedback Nabeela Altrabsheh Email: nabeela.altrabsheh@port.ac.uk Mihaela Cocea Email: mihaela.cocea@port.ac.uk Sanaz Fallahkhair Email:

More information

Self-adaptive e-learning Website for Mathematics

Self-adaptive e-learning Website for Mathematics Self-adaptive e-learning Website for Mathematics Akira Nakamura Abstract Keyword searching and browsing on learning website is ultimate self-adaptive learning. Our e-learning website KIT Mathematics Navigation

More information

Additional information >>> HERE <<<

Additional information >>> HERE <<< Additional information >>> HERE http://urlzz.org/winningtip/pdx/palo1436/ Tags: how to best price adidas football boots

More information

Bisecting K-Means for Clustering Web Log data

Bisecting K-Means for Clustering Web Log data Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining

More information

Mining Domain-Specific Thesauri from Wikipedia: A case study

Mining Domain-Specific Thesauri from Wikipedia: A case study Mining Domain-Specific Thesauri from Wikipedia: A case study David Milne, Olena Medelyan and Ian H. Witten Department of Computer Science, University of Waikato {dnk2, olena, ihw}@cs.waikato.ac.nz Abstract

More information

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis Yusuf Yaslan and Zehra Cataltepe Istanbul Technical University, Computer Engineering Department, Maslak 34469 Istanbul, Turkey

More information

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

More information

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Juntao Lai *1, Tao Cheng 1, Guy Lansley 2 1 SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental &Geomatic Engineering,

More information

Discovering the Dynamics of Terms Semantic Relatedness through Twitter

Discovering the Dynamics of Terms Semantic Relatedness through Twitter Discovering the Dynamics of Terms Semantic Relatedness through Twitter Nikola Milikic 1, Jelena Jovanovic 1, Milan Stankovic 2 1 University of Belgrade, Jove Ilica 154, 11000 Belgrade, Serbia 2 STIH, Université

More information

Predicting stocks returns correlations based on unstructured data sources

Predicting stocks returns correlations based on unstructured data sources Predicting stocks returns correlations based on unstructured data sources Mateusz Radzimski, José Luis Sánchez-Cervantes, José Luis López Cuadrado, Ángel García-Crespo Departamento de Informática Universidad

More information

Keyphrase Extraction for Scholarly Big Data

Keyphrase Extraction for Scholarly Big Data Keyphrase Extraction for Scholarly Big Data Cornelia Caragea Computer Science and Engineering University of North Texas July 10, 2015 Scholarly Big Data Large number of scholarly documents on the Web PubMed

More information