It Is Not Just What We Say, But How We Say Them: LDA-based Behavior-Topic Model

Save this PDF as:

Size: px
Start display at page:

Download "It Is Not Just What We Say, But How We Say Them: LDA-based Behavior-Topic Model"

Transcription

3 2.2 LDA-based Behavior-Topic Model Table 2 summarizes the set of notations and descriptions of our model parameters. Notations U N u L u,n T V b y z w Descriptions the total number of users the total number of tweets by user u the total number of words in u s n-th tweet the total number of topics the vocabulary size a behavior in B = {post, retweet, reply, mention} a switch a topic label a word label φ t topic-specific word distribution ψ t topic-specific behavior distribution φ background word distribution θ u user-specific topic distribution ϕ Bernoulli distribution α, η, β, β, γ Dirichlet priors Table 2: Notations and descriptions. We now present our B-LDA model. First, we assume that there are T hidden topics, where each topic has a multinomial word distribution φ t and a multinomial behavior distribution ψ t. Each tweet has a single hidden topic which is sampled from the corresponding user s topic distribution θ u ( u U). We further assume that given a tweet with hidden topic t ( t T ), the words in this tweet are generated from two multinomial distributions, namely, a background model and a topic specific model. The background model φ generates words commonly used in many tweets; they are similar to stop words. The topic specific model φ t generates words related to topic t. When we sample a word w ( w V ), we use a switch y {, }, which is sampled from a Bernoulli distribution ϕ, to decide which word distribution the word comes from. Specifically, if y =, the word w is sampled from φ ; otherwise, it is sampled from φ t. We also assume the behavior pattern b (b B) is sampled from the behavior distribution ψ t. Lastly, we assume θ u, ψ t, φ, φ t and ϕ have Dirichlet priors α, η, β, β and γ respectively. Figure 2 shows the plate notation of the model. The generative process for all posts is in Figure 3. In our model, we assume a universal behavior distribution instead of a personalized behavior distribution for each topic, as the former ensures the behavior information is a property of topic. In this case, both a user s personal behaviors and topic interests can be reflected by looking at her topic distribution. In other words, a topic in our model is with behavior pattern, which is studied in Section 4.. Note that our model is designed mainly for the settings where the text is mostly short and about a single topic. It is not hard to mod- Figure 2: LDA-based behavior-topic model (B-LDA) For each topic t =,, T Draw ψ t Dir(η), φ t Dir(β) Draw φ Dir(β ), ϕ Dir(γ) For each user u =,, U Draw topic distribution θ u Dir(α) For u s n-th tweet, n =,, N u Draw a topic z u,n from θ u For each word l =,, L u,n - Draw y u,n,l from Bernoulli(ϕ) - Draw w u,n,l φ if y u,n,l =, otherwise draw w u,n,l φ zu,n Draw a posting behavior b u,n ψ zu,n Figure 3: The generative process for all posts in B-LDA. ify our model to remove this special assumption, and we leave it to our future work to design a more general topic-behavior model. Learning and Parameter Estimation We use collapsed Gibbs sampling to obtain samples of the hidden variable assignment and to estimate the model parameters from these samples. Due to space limit, we leave all the detailed derivation and running time analysis to the supplementary pages. With Gibbs sampling, we can make the following estimation of the model parameters: (2.) (2.2) (2.3) θ u,t = ψ t,b = φ t,w = n t u + α T t= nt u + T α, n b t + η B b= nb t + Bη, n w t,y= + β V w= nw t,y= + V β, user-topic distribution topic-behavior distribution topic-word distribution where n t u is, when given the user u, the number of times t is sampled, n b t is the number of times behavior b cooccurs with topic t, and n w t,y= is, given the topic t, the number of times w is sampled as topical word. papers/blda_supp.pdf

5 is given, while in the combined approach, M is obtained by evaluating the given user s β K index. Specifically, if β K τ, the follower is defined as behavior-driven, and we use M B-LDA ; Otherwise, the follower is topic-driven, and we use M LDA or M T-LDA. 4 Empirical Evaluations In this section we present our empirical study of B-LDA for two application domains: (I) Topic Analysis and (II) Followee Recommendation. Data setup We use real Twitter data to evaluate our proposed model. Our base data set contains 5,55 Singaporebased Twitter users and their tweets, which are collected by starting from a seed set of active Singapore users and tracing their follower and followee links up to two hops. From this base set, 5 users are randomly selected, among whom are further randomly selected to obtain all their followees, which makes a total of 9688 users. A total of,882,44 tweets of these 9688 users published between September and November 3, 2 are used in our experiments. For the application of followee recommendation, we provide recommendations for the randomly selected users. We compare our B-LDA model with LDA and T- LDA in our study. For all the models, the number of topics T is set as 8, α is 5/T and β is.. For B-LDA, γ is set as, β is., η is.. Each model runs for 4 iterations of Gibbs sampling. We take 4 samples with a gap of 5 iterations in the last 2 iterations to assign values to all the hidden variables. Below we first topic analysis on behavior dimension, followed by a quantitative evaluation on fee recommendation. 4. Topic Analysis To see how integrating behavioral information into topic modeling could make a difference, we show some empirical studies on topics obtained by B-LDA. Topics grouped by dominant behavior In contrast to LDA, B-LDA generates topics each enhanced by a behavior distribution, which is denoted as ψ t,b in the output. Just like LDA is expected to generate topics each containing words most relevant to a coherent topic, we would like B-LDA to generate topics which are identified with some dominant behavior. To measure in general whether a topic contains some dominant behavior, we use the idea of entropy and identify for each topic its associated behavior distribution. The lower the entropy score, the more dominant the behavior the topic is identified with. For a given topic t, we first give the definition of its entropy e(t) on behavior distribution as: e(t) = b B p(b t) log p(b t), where B is the set of all possible types of behavior. How do we identify the associated behavior distribution of a given topic? In B-LDA, p(b t) is equal to ψ t,b in Equation (2.2). For T-LDA and LDA, we compute C(t,b)+δ p(b t) as: p(b t) = b B C(t,b)+ B δ. Note that, the ways of computing C(t, b) are different in T-LDA and LDA. For T-LDA, C(t, b) is computed by counting all tweets with topic t and behavior b. For LDA, C(t, b) in LDA is the number of times a word is with topic t and its corresponding tweet has behavior b. A normalization factor δ is introduced, which is similar to the hyperparameter η in Equation (2.2). To make fair comparison, we set δ = η = B LDA LDA T LDA B LDA LDA T LDA Figure 4: Comparison of topics from B-LDA, LDA and T-LDA in terms of entropy on behavior distribution. In Figure 4, we find B-LDA has a lower entropy score than both LDA and T-LDA, which means topics generated by B-LDA tend to be characterized by a few dominant types of behavior. Note that in LDA, one tweet is associated with multiple topics but with one behavior. In this case, the chance of many topics sharing the same behavior is higher comparing to the setting of one tweet sharing one topic in T-LDA and B-LDA. That is the reason why entropy of topics in T-LDA and B- LDA are with a higher variance than in LDA. Topics of distinct behavioral pattern Now we show the topics with distinct behavioral patterns associated, i.e., those ranked top for one type of behavior. Figure 5 shows the distribution of all 8 topics on the behavior dimensions of PO, RT and RE together with the top topical words of those ranked top along each behavior dimension. For the PO dimension, topic 6 is related to daily news which is mainly contributed by news media accounts who mostly post original tweets. Topic 23 is mostly users daily personal updates which seldom interest others to retweet or reply. Topic 7 is also related to personal updates, but more on things related to cell phones, laptops, etc. Top-4 topics in the RT dimension are topics related to jokes like topic 5 which is a mixture of jokes and funny things shared by popular

7 Model ID Top Topical Words PO RT RE ME T-LDA 65 #singapore, #news, news, #business, china, #int l, cna, minister, police, stocks police, obama, news, occupy, people, street, president, wall, man, video, cna B-LDA 6 #singapore, #news, #business, lee, minister, #local, news, pm, police, s pore T-LDA 62 eat, food, dinner, hungry, chicken, #sosingaporean, curry, lunch, sauce, rice eat, food, hungry, dinner, chicken, lunch, rice, ice, cream, ate, nice, love, meal B-LDA 6 curry, sauce, indian, #replacesongnameswithcurrysauce, #sosingaporen, #sgedu Table 3: Sample topic groups and their topical words and behavioral patterns. ID is topic id. 4.2 Followee Recommendation We show how our proposed B-LDA model can improve followee recommendation results in this section. The task is to recommend users to follow for a target user u. We randomly pick one followee from u s current followee set, and then combine her with another m (m = ) randomly-selected users who are not in her followee list. Any recommendation algorithm would generate a ranking of these m + users according to Algorithm, where the higher the real followee is ranked, the better the performance of the recommendation algorithm has. We repeat this process for R (R = ) runs where each run we pick a different followee and obtain an average rank of the real followee. Note that In the algorithm, the distance measure between two users w and f is sim(θw M, θf M ), where M is a given model and sim() is computed by cosine similarity. Our task is general recommendation evaluation task, if we set R as, it is similar to the one studied in [2] Comparison of Models We consider LDA and T-LDA as our baselines. The model setting is the same as discussed at the beginning of Section 4. We compare B-LDA with LDA in terms of two criteria: the average rank of the real followee, which is defined as: r = U u U r(u), where r(u) is the average rank of the randomly-picked real followees from u s followee list in R runs; and mean reciprocal rank: MRR = U u {U} r(u). K r MRR B-LDA LDA T-LDA B-LDA LDA T-LDA Table 4: Comparisons of models by average rank of real followee r and MRR. indicates the result is significantly better than T- LDA at 5% significance level by Wilcoxon signed-rank test. The comparison among these models in Table 4 suggests these findings: ). In general, all these models perform better by setting a smaller neighborhood size K. 2). B-LDA is a direct extension of T-LDA. The fact that it significantly outperforms T-LDA in terms of both real followee s rank and MRR demonstrates the benefit of adding behavior information into topic model for the task. 3). In terms of MRR, B-LDA and LDA report similar performance, which suggests there exist both behavior-driven and topic-driven followers Evaluation on behavior-driven followers We show in this part an empirical study on how to choose a suitable K value for β K index and use it to identify behavior-driven followers. We use Cohen s Kappa coefficient to measure the correlation between the ranking results and the D knn metric in Equation (3.5). We find that, by setting K =, D knn has the highest correlation score with the ranking results in both B-LDA and LDA,.7 for B- LDA and.8 for LDA, and both B-LDA and LDA yield better recommendation results. This shows that the proposed D knn metric provides a good characterization of the modularity of a user s followee set in both topic and joint behavior-topic space by setting K =. We then use β to judge whether a follower is topic-driven or behavior-driven. Specifically, if user u s corresponding β τ (τ = ), then u is a behavior-driven follower; otherwise, u is topic-driven follower. Figure 6 shows the histogram of, target users β values, binned into intervals of., where we find 53% of all the target users are behavior-driven followers. Model B-LDA LDA p-value r E-33 MRR N/A Table 5: Comparisons of B-LDA and LDA by average rank of real followee r and MRR on behavior-driven followers. We report the performance of two models on behavior-driven followers in Table 5. On this set of users, B-LDA outperforms LDA in terms of both real followee rank and MRR score. As MRR score is a single numeric value, we cannot perform significant test on it. For real followee rank, significant test shows a

Topic and Trend Detection in Text Collections using Latent Dirichlet Allocation

Topic and Trend Detection in Text Collections using Latent Dirichlet Allocation Levent Bolelli 1, Şeyda Ertekin 2, and C. Lee Giles 3 1 Google Inc., 76 9 th Ave., 4 th floor, New York, NY 10011, USA 2

Topical Authority Identification in Community Question Answering

Topical Authority Identification in Community Question Answering Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 95

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

Latent Dirichlet Markov Allocation for Sentiment Analysis

Latent Dirichlet Markov Allocation for Sentiment Analysis Ayoub Bagheri Isfahan University of Technology, Isfahan, Iran Intelligent Database, Data Mining and Bioinformatics Lab, Electrical and Computer

An Exploratory Study on Software Microblogger Behaviors

2014 4th IEEE Workshop on Mining Unstructured Data An Exploratory Study on Software Microblogger Behaviors Abstract Microblogging services are growing rapidly in the recent years. Twitter, one of the most

A Comparative Study on Sentiment Classification and Ranking on Product Reviews

A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan

Exploiting Topic based Twitter Sentiment for Stock Prediction

Exploiting Topic based Twitter Sentiment for Stock Prediction Jianfeng Si * Arjun Mukherjee Bing Liu Qing Li * Huayi Li Xiaotie Deng * Department of Computer Science, City University of Hong Kong, Hong

Topic models for Sentiment analysis: A Literature Survey

Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.

Emoticon Smoothed Language Models for Twitter Sentiment Analysis

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

SUIT: A Supervised User-Item Based Topic Model for Sentiment Analysis

Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence SUIT: A Supervised User-Item Based Topic Model for Sentiment Analysis Fangtao Li 1, Sheng Wang 2, Shenghua Liu 3 and Ming Zhang

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

Estimating Twitter User Location Using Social Interactions A Content Based Approach

2011 IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE International Conference on Social Computing Estimating Twitter User Location Using Social Interactions A Content Based

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

Impact of Multimedia in Sina Weibo: Popularity and Life Span

Impact of Multimedia in Sina Weibo: Popularity and Life Span Xun Zhao 1, Feida Zhu 1, Weining Qian 2, and Aoying Zhou 2 1 School of Information Systems, Singapore Management University 2 Software Engineering

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Juntao Lai *1, Tao Cheng 1, Guy Lansley 2 1 SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental &Geomatic Engineering,

Can Twitter Predict Royal Baby's Name?

Summary Can Twitter Predict Royal Baby's Name? Bohdan Pavlyshenko Ivan Franko Lviv National University,Ukraine, b.pavlyshenko@gmail.com In this paper, we analyze the existence of possible correlation between

A Tri-Role Topic Model for Domain-Specific Question Answering

A Tri-Role Topic Model for Domain-Specific Question Answering Zongyang Ma Aixin Sun Quan Yuan Gao Cong School of Computer Engineering, Nanyang Technological University, Singapore 639798 {zma4, qyuan1}@e.ntu.edu.sg

Automatic Web Page Classification

Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

ONLINE SOCIAL NETWORK MINING: CURRENT TRENDS AND RESEARCH ISSUES

ONLINE SOCIAL NETWORK MINING: CURRENT TRENDS AND RESEARCH ISSUES G Nandi 1, A Das 1 & 2 1 Assam Don Bosco University Guwahati, Assam 781017, India 2 St. Anthony s College, Shillong, Meghalaya 793001, India

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

Emoticon Smoothed Language Models for Twitter Sentiment Analysis

Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering,

Analysis of Social Media Streams

Fakultätsname 24 Fachrichtung 24 Institutsname 24, Professur 24 Analysis of Social Media Streams Florian Weidner Dresden, 21.01.2014 Outline 1.Introduction 2.Social Media Streams Clustering Summarization

Semantic Sentiment Analysis of Twitter Hassan Saif, Yulan He & Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom The 11 th International Semantic Web Conference

PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL

Journal homepage: www.mjret.in ISSN:2348-6953 PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL Utkarsha Vibhute, Prof. Soumitra

I, ONLINE KNOWLEDGE TRANSFER FREELANCER PORTAL M. & A.

ONLINE KNOWLEDGE TRANSFER FREELANCER PORTAL M. Micheal Fernandas* & A. Anitha** * PG Scholar, Department of Master of Computer Applications, Dhanalakshmi Srinivasan Engineering College, Perambalur, Tamilnadu

Personalizing Image Search from the Photo Sharing Websites

Personalizing Image Search from the Photo Sharing Websites Swetha.P.C, Department of CSE, Atria IT, Bangalore swethapc.reddy@gmail.com Aishwarya.P Professor, Dept.of CSE, Atria IT, Bangalore aishwarya_p27@yahoo.co.in

Data Mining Techniques for Online Social Network Analysis

Data Mining Techniques for Online Social Network Analysis 1 Aditya Kumar Agrawal, 2 Shikha Kumari, 3 B.Giridhar, 4 Bhavani Shankar Panda 1, 2 B.Tech Student, 2, 3 Asst.prof in CSE dept Abstract - In this

Towards an Evaluation Framework for Topic Extraction Systems for Online Reputation Management

Towards an Evaluation Framework for Topic Extraction Systems for Online Reputation Management Enrique Amigó 1, Damiano Spina 1, Bernardino Beotas 2, and Julio Gonzalo 1 1 Departamento de Lenguajes y Sistemas

Efficient Density Based Clustering of Tweets and Sentimental Analysis Based on Segmentation

RESEARCH ARTICLE OPEN ACCESS Efficient Density Based Clustering of Tweets and Sentimental Analysis Based on Segmentation AnumolBabu 1, Rose V Pattani 2 1 (Post Graduate Student, Dept. of CSE, Mangalam

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), jagatheshwaran.n@gmail.com, Velalar College of Engineering and Technology,

Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

Microblog Sentiment Analysis with Emoticon Space Model

Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory

Tweeting Educational Technology: A Tale of Professional Community of Practice

International Journal of Cyber Society and Education Pages 73-78, Vol. 5, No. 1, June 2012 Tweeting Educational Technology: A Tale of Professional Community of Practice Ina Blau The Open University of

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information

TwitterCracy: Exploratory Monitoring of Twitter Streams for the 2016 U.S. Presidential Election Cycle

TwitterCracy: Exploratory Monitoring of Twitter Streams for the 2016 U.S. Presidential Election Cycle M. Atif Qureshi (B), Arjumand Younus, and Derek Greene Insight Center for Data Analytics, University

Predicting Information Popularity Degree in Microblogging Diffusion Networks

Vol.9, No.3 (2014), pp.21-30 http://dx.doi.org/10.14257/ijmue.2014.9.3.03 Predicting Information Popularity Degree in Microblogging Diffusion Networks Wang Jiang, Wang Li * and Wu Weili College of Computer

Tweets Miner for Stock Market Analysis

Tweets Miner for Stock Market Analysis Bohdan Pavlyshenko Electronics department, Ivan Franko Lviv National University,Ukraine, Drahomanov Str. 50, Lviv, 79005, Ukraine, e-mail: b.pavlyshenko@gmail.com

Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality

Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality Anindya Ghose, Panagiotis G. Ipeirotis {aghose, panos}@stern.nyu.edu Department of

Semantic Search in E-Discovery. David Graus & Zhaochun Ren

Semantic Search in E-Discovery David Graus & Zhaochun Ren This talk Introduction David Graus! Understanding e-mail traffic David Graus! Topic discovery & tracking in social media Zhaochun Ren 2 Intro Semantic

Social-Network Analysis Using Topic Models

Social-Network Analysis Using Topic Models Youngchul Cha UCLA Computer Science Dept Los Angeles, CA 90095 youngcha@cs.ucla.edu Junghoo Cho UCLA Computer Science Dept Los Angeles, CA 90095 cho@cs.ucla.edu

Extracting Information from Social Networks

Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,

Machine Learning for Data Science (CS4786) Lecture 1

Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:

Search Result Optimization using Annotators

Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum

REVIEW ON QUERY CLUSTERING ALGORITHMS FOR SEARCH ENGINE OPTIMIZATION

Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

Predict the Popularity of YouTube Videos Using Early View Data

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

Neural Networks for Sentiment Detection in Financial Text

Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

Pharos: Social Map-Based Recommendation for Content-Centric Social Websites

Pharos: Social Map-Based Recommendation for Content-Centric Social Websites Wentao Zheng Michelle Zhou Shiwan Zhao Quan Yuan Xiatian Zhang Changyan Chi IBM Research China IBM Research Almaden ABSTRACT

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.

REVIEW ARTICLE ISSN: 2321-7758 UPS EFFICIENT SEARCH ENGINE BASED ON WEB-SNIPPET HIERARCHICAL CLUSTERING MS.MANISHA DESHMUKH, PROF. UMESH KULKARNI Department of Computer Engineering, ARMIET, Department

Using Twitter as a source of information for stock market prediction

Using Twitter as a source of information for stock market prediction Ramon Xuriguera (rxuriguera@lsi.upc.edu) Joint work with Marta Arias and Argimiro Arratia ERCIM 2011, 17-19 Dec. 2011, University of

Probabilistic topic models for sentiment analysis on the Web

University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as

IT services for analyses of various data samples

IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

DATA PREPARATION FOR DATA MINING

Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 \$12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin

COURSE RECOMMENDER SYSTEM IN E-LEARNING

International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

Financial Trading System using Combination of Textual and Numerical Data

Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,

A Study and Comparison of Sentiment Analysis Methods for Reputation Evaluation

A Study and Comparison of Sentiment Analysis Methods for Reputation Evaluation Anaïs Collomb (anais.collomb@insa-lyon.fr) Crina Costea (crina.costea@insa-lyon.fr) Damien Joyeux (damien.joyeux@insa-lyon.fr)

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

Supporting Privacy Protection in Personalized Web Search

Supporting Privacy Protection in Personalized Web Search Kamatam Amala P.G. Scholar (M. Tech), Department of CSE, Srinivasa Institute of Technology & Sciences, Ukkayapalli, Kadapa, Andhra Pradesh. ABSTRACT:

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

SOCIAL LISTENING AND KPI MEASUREMENT Key Tips for Brands to Drive Their Social Media Performance

SOCIAL LISTENING AND KPI MEASUREMENT Key Tips for Brands to Drive Their Social Media Performance With social media marketing, the power is derived from being able to directly communicate with consumers

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

An Introduction to Data Mining

An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

Text Mining - Scope and Applications

Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss

PRODUCT REVIEW RANKING SUMMARIZATION

PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

CHAPTER VII CONCLUSIONS

CHAPTER VII CONCLUSIONS To do successful research, you don t need to know everything, you just need to know of one thing that isn t known. -Arthur Schawlow In this chapter, we provide the summery of the

A Novel Method to Detect Fraud Ranking For Mobile Apps

A Novel Method to Detect Fraud Ranking For Mobile Apps Geedikapally Rajesh Kumar M.Tech Student Department of CSE St.Peter's Engineering College. Abstract: Ranking fraud in the mobile App market refers

Is it Really About Me? Message Content in Social Awareness Streams

Is it Really About Me? Message Content in Social Awareness Streams Mor Naaman, Jeffrey Boase, Chih-Hui Lai Rutgers University, School of Communication and Information 4 Huntington St., New Brunswick, NJ

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

Improving Twitter Sentiment Analysis with Topic-Based Mixture Modeling and Semi-Supervised Training

Improving Twitter Sentiment Analysis with Topic-Based Mixture Modeling and Semi-Supervised Training Bing Xiang * IBM Watson 1101 Kitchawan Rd Yorktown Heights, NY 10598, USA bingxia@us.ibm.com Liang Zhou

Subordinating to the Majority: Factoid Question Answering over CQA Sites

Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

Measuring Distribution Similarity in Context

Measuring Distribution Similarity in Context [Dinu & Lapata 2010] Computational Semantics Seminar: Summer 2012 Boris Kozlov 29 May 2012 boris.linguist@gmail.com Outline Vector-based models standard approach

Community-Aware Prediction of Virality Timing Using Big Data of Social Cascades

1 Community-Aware Prediction of Virality Timing Using Big Data of Social Cascades Alvin Junus, Ming Cheung, James She and Zhanming Jie HKUST-NIE Social Media Lab, Hong Kong University of Science and Technology

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing

Joint Relevance and Answer Quality Learning for Question Routing in Community QA

Joint Relevance and Answer Quality Learning for Question Routing in Community QA Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy

A UPS Framework for Providing Privacy Protection in Personalized Web Search

A UPS Framework for Providing Privacy Protection in Personalized Web Search V. Sai kumar 1, P.N.V.S. Pavan Kumar 2 PG Scholar, Dept. of CSE, G Pulla Reddy Engineering College, Kurnool, Andhra Pradesh,

LDA Based Security in Personalized Web Search

LDA Based Security in Personalized Web Search R. Dhivya 1 / PG Scholar, B. Vinodhini 2 /Assistant Professor, S. Karthik 3 /Prof & Dean Department of Computer Science & Engineering SNS College of Technology

Mimicking human fake review detection on Trustpilot

Mimicking human fake review detection on Trustpilot [DTU Compute, special course, 2015] Ulf Aslak Jensen Master student, DTU Copenhagen, Denmark Ole Winther Associate professor, DTU Copenhagen, Denmark

Predicting IMDB Movie Ratings Using Social Media

Predicting IMDB Movie Ratings Using Social Media Andrei Oghina, Mathias Breuss, Manos Tsagkias, and Maarten de Rijke ISLA, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands

Which universities lead and lag? Toward university rankings based on scholarly output

Which universities lead and lag? Toward university rankings based on scholarly output Daniel Ramage and Christopher D. Manning Computer Science Department Stanford University Stanford, California 94305

Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113

CSE 450 Web Mining Seminar Spring 2008 MWF 11:10 12:00pm Maginnes 113 Instructor: Dr. Brian D. Davison Dept. of Computer Science & Engineering Lehigh University davison@cse.lehigh.edu http://www.cse.lehigh.edu/~brian/course/webmining/

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com Outline Predictive modeling methodology k-nearest Neighbor

Prediction of Stock Performance Using Analytical Techniques

136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

An Analysis of Verifications in Microblogging Social Networks - Sina Weibo

An Analysis of Verifications in Microblogging Social Networks - Sina Weibo Junting Chen and James She HKUST-NIE Social Media Lab Dept. of Electronic and Computer Engineering The Hong Kong University of

Cluster Analysis for Evaluating Trading Strategies 1

CONTRIBUTORS Jeff Bacidore Managing Director, Head of Algorithmic Trading, ITG, Inc. Jeff.Bacidore@itg.com +1.212.588.4327 Kathryn Berkow Quantitative Analyst, Algorithmic Trading, ITG, Inc. Kathryn.Berkow@itg.com

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,

Statistics for BIG data

Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,