Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets"

Transcription

1 Disambiguating Implicit Temporal Queries by Clustering Top Ricardo Campos 1, 4, 6, Alípio Jorge 3, 4, Gaël Dias 2, 6, Célia Nunes 5, 6 1 Tomar Polytechnic Institute, Tomar, Portugal 2 HULTEC/GREYC, University of Caen Basse-Normandie, France 3 Faculty of Sciences, University of Porto, Porto, Portugal 4 LIAAD-INESC Porto L.A, Porto, Portugal 5 Department of Mathematics, University of Beira Interior, Covilhã, Portugal 6 Center of Mathematics, University of Beira Interior, Covilhã, Portugal 2012 IEEE/WIC/ACM International Conference on Web Intelligence, Macau China, December 04-07, 2012 Ricardo Campos, Alípio Jorge, Gaël Dias, Célia Nunes

2 INTRODUCTION Clustering Search Engines MOTIVATION Contributions Over the years some clustering engines have been proposed: Lacking however a time-oriented analysis, thus making it difficult to return results with a temporal perspective; In this work, we focus on disambiguating a text query with respect to its temporal purpose and propose an approach that temporally clusters the results of a text query; 2 /31

3 INTRODUCTION MOTIVATION Contributions Disambiguating Implicit Temporal Queries FIFA World Cup Germany /31

4 INTRODUCTION Twofold Process MOTIVATION Contributions Combining the identification of relevant temporal expressions extracted from Web snippets Title Snippet 2011 Haiti Earthquake Anniversary As of 2010 (see 1500 photos), the following major earthquakes have been recorded in Haiti. The 1 st one occurred in has been a tragic date, however in 2012 Haiti will organize the Carnival In essence, our method is a two stage process with a clustering methodology, where documents are grouped into the same cluster if they share a common year; 4 /31

5 INTRODUCTION Contributions Motivation CONTRIBUTIONS We present a temporal document representation model We introduce a novel approach to identify temporal expressions relevant to a query We propose a soft flat overlapping temporal clustering algorithm We conduct a user survey We present an evaluation of our approach using IR measures and a comparison against Carrot We publicly provide a set of queries and ground-truth results 5 /31

6 Outline 6 /31

7 Web Search and Web Snippet Modelling Date Classification APPROACH Temporal Similarity Measure Web Snippets Clustering System Architecture (1) Web Search (2) Web Snippet Representation (3) Temporal Similarity Measure (4) Relevant Date Classification (5) Web Snippets Clustering 7 /31

8 APPROACH Running Example: Haiti Earthquake Given a query q issued by a user, e.g., q = {haiti earthquake} We obtain a collection of web snippets S = {S 1, S 2,, S n } Title Snippet WEB SEARCH AND WEB SNIPPET MODELLING Temporal Similarity Measure Date Classification Web Snippets Clustering 2011 Haiti Earthquake Anniversary As of 2010 (see 1500 photos here), the following major earthquakes have been recorded in Haiti. The first one occurred in haiti earthquake; major earthquakes; haiti 1500; 1564; 2010; 2011 Title Snippet Title Snippet Haiti Earthquake Relief On January 12, 2010, a massive earthquake struck the nation of Haiti, causing catastrophic damage inside and around the capital city of Port-au-Prince. haiti earthquake; haiti; catastrophic damage; Port-au-Prince 2010 Haiti Earthquake The first great earthquake mentioned in histories of Haiti occurred in 1564 in what was still the Spanish colony. It destroyed Concepción de la Vega. haiti earthquake; haiti; Concepción de la Vega /31

9 WEB SEARCH AND WEB SNIPPET MODELLING Date Classification APPROACH Temporal Similarity Measure Web Snippets Clustering Select Best Words (W s ) and Candidate Dates (D s ) Let W s = {w 1, w 2,, w k } be the set of k distinct best relevant words/multi-words extracted for the query q within the set of web snippets S: W s = {haiti earthquake; major earthquakes; haiti; catastrophic damage; Portau-Prince; Concepción de la Vega} Let D s = {d 1, d 2,, d t } be the set of t distinct candidate dates retrieved from the set of web snippets S returned for the query q: D s = {1500; 1564; 2010; 2011} 9 /31

10 Web Search and Web Snippet Modelling Date Classification APPROACH TEMPORAL SIMILARITY MEASURE Web Snippets Clustering Problem Definition Given a query q and a date d i assign a degree of relevance to each (q, d i ) pair. To model this relevance, we define a temporal similarity value v given by a similarity measure sim. The proposed formulation tries to identify relevant dates d i for q; Minimize any errors that might arise from considering irrelevant or wrong dates. 10 /31

11 Web Search and Web Snippet Modelling Date Classification APPROACH TEMPORAL SIMILARITY MEASURE Web Snippets Clustering Problem Definition According to our investigation, the relevance between a (q,d i ) pair is better defined if Instead of just focusing on the self-similarity between q d i Haiti Earthquake v v v v 11 /31

12 Web Search and Web Snippet Modelling Date Classification APPROACH TEMPORAL SIMILARITY MEASURE Web Snippets Clustering Problem Definition According to our investigation, the relevance between a (q,d i ) pair is better defined if We compute the similarities between W * d i Haiti Earthquake v v v v major earthquakes v v v v haiti v v v v catastrophic damage 0 0 v 0 port-au-prince 0 0 v 0 concepción de la vega 0 v /31

13 Web Search and Web Snippet Modelling Date Classification APPROACH TEMPORAL SIMILARITY MEASURE Web Snippets Clustering GenTempEval In order to compute all these values, we propose a new generic temporal similarity measure called: A wide range of combinations with different F s and sim s have been proposed in Campos et al In this work we assume that F is the Median function and Sim the InfoSimba (Dias et al. 2007) a semantic vector space model supported by corpus-based correlations: 12 /31

14 Web Search and Web Snippet Modelling Date Classification APPROACH TEMPORAL SIMILARITY MEASURE Web Snippets Clustering GenTempEval In order to compute all these values, we propose a new generic temporal similarity measure called: Each word X and date d i is defined in terms of a context vector consisting of a combination of words and dates X Word Date Word d i Date Word Word 12 /31

15 Web Search and Web Snippet Modelling Date Classification APPROACH TEMPORAL SIMILARITY MEASURE Web Snippets Clustering GenTempEval InfoSimba is then defined as follows: where V x and V y are the context vectors of X and d i respectively The similarity between each pair S(i,j) of the two context vectors i j is determined by the DICE similarity measure which has shown better results compared to other measures. This requires the definition of M ct 13 /31

16 Web Search and Web Snippet Modelling Date Classification APPROACH M ct : Conceptual Temporal Matrix TEMPORAL SIMILARITY MEASURE Web Snippets Clustering Haiti Earthquake major earthquakes haiti catastrophic damage port-au-prince concepción de la vega Haiti Earthquake major earthquakes haiti catastrophic damage port-au-prince concepción de la vega /31

17 APPROACH Threshold Classification Strategy Web Search and Web Snippet Modelling Temporal Similarity Measure DATE CLASSIFICATION Web Snippets Clustering In order to determine whether a date is or is not relevant we use a classical threshold-based strategy; Given a (q,d i ) pair the system automatically classifies a date based on the following: Relevant, if GTE(q,d i ) Irrelevant, if GTE(q,d i ) < The final set of m relevant dates for the query is derived from the decomposition of D s into D s rel D s = {1500; 1564; 2010; 2011} D s rel = {1500; 1564; 2010; 2011} 15 /31

18 Web Search and Web Snippet Modelling Date Classification APPROACH Temporal Similarity Measure WEB SNIPPETS CLUSTERING Defining the Similarity between Web Snippets Based on this, each snippet S i is no longer represented by a set of candidate temporal expressions, but by a set of relevant temporal ones The next step is to choose an appropriate measure that calculates the similarity between each of the snippets In this work, instead of using a usual similarity measure, we cluster each snippet according to its associated years, based on the following principle: Two snippets are temporally similar if they are highly related to the same set of dates. 16 /31

19 Web Search and Web Snippet Modelling Date Classification APPROACH Temporal Similarity Measure WEB SNIPPETS CLUSTERING Forming Clusters Each web snippet S i can be assigned to possible many clusters C = {C 1,C 2,,C m } since its text can contain several different relevant temporal features A single cluster C j, for j=1,,m can be seen as a container including documents sharing the same year The final set of clusters is ranked on the basis of the timeline, and consists of m clusters, where m is the number of relevant dates found within D s rel As such, each cluster C j is labeled directly by D s rel 17 /31

20 EVALUATION EXPERIMENT SETUP Experimental Results We conduct three experiments: First, we evaluate the ability of our clustering algorithm to correctly identify relevant temporal clusters C j and snippets S i for the query q. Second, we compare our clustering proposal with current Web-snippet clustering engines, such as Carrot. Finally, we test the performance of our approach on real web user environment by conducting a user study 18 /31

21 Test Queries EVALUATION EXPERIMENT SETUP Experimental Results 42 BP Oil Spill; Waka Waka; 42 representative clear-concept implicit temporal queries: non-ambiguous in concept; temporal in purpose. 19 /31

22 Data Description EVALUATION EXPERIMENT SETUP Experimental Results We queried the search engine for each of the 42 queries, collecting the best 50 relevant web results; 582 relevant web snippets with years; Ground truth 656 distinct (S i, d h,i ) pairs, where S i is the set of 582 web snippets annotated with at least one year candidate, and d h,i, is the set of t candidate dates for the snippet S i ; Score # (S i,d h,i ) distinct (q,d i ) pairs; Score # (q,d i ) /31

23 EVALUATION Experiment Setup EXPERIMENTAL RESULTS Correctly Identification of Snippets with respect to the Cluster We rely on the set of 656 distinct (S i, d h,i ) pairs Results obtained point to 95.9% F1 performance, 92.9% Accuracy, 84.9% Balanced Accuracy, 94.6% Precision and 97.1% Recall We compare these results against a Non-GTE approach, measuring the effectiveness when all the dates are used F1 Precision Recall Non-GTE GTE Improvement /31

24 EVALUATION Correctly Identification of Clusters Experiment Setup EXPERIMENTAL RESULTS We rely on the set of 235 distinct (q,d i ) pairs; Results obtained point to 94.3% F1 performance, 92.6% of Balanced Accuracy, 94.5% of Precision and 94.2% of Recall Similarly, we compare our results against the approach of Alonso et al. (2009), who considers all the temporal patterns found as relevant clusters F1 Precision Recall Non-GTE GTE Improvement /31

25 EVALUATION Correctly Identification of Clusters Experiment Setup EXPERIMENTAL RESULTS 23 /31

26 EVALUATION GTE-Cluster against Carrot Experiment Setup EXPERIMENTAL RESULTS In the second set of experiments we compare our proposal against Carrot; For this purpose, we used the Carrot Document Clustering Workbench which enables to test Carrot with our dataset; In order to obtain Carrot results, we run each of the 42 text queries on the Workbench over the WC_DS dataset; We used Lingo an overlapping clustering algorithm, which is also used for Carrot live demos; Defined the cluster count base parameter to 100 with the purpose of obtaining the highest possible number of temporal clusters; This parameter was combined with the allow numeric labels, in order to enable labels to contain numbers; 24 /31

27 EVALUATION GTE-Cluster against Carrot Experiment Setup EXPERIMENTAL RESULTS As we intend to assess its temporal nature we will only rely on the set of clusters (and its corresponding snippets) labeled with a year, either a single numeric value 2009, or a combination between years and text, e.g., 1955 October or Susan Magdalene Boyle Born 1 April Clustering F1 Precision Recall Carrot GTE-Cluster Improvement Snippet F1 Precision Recall Carrot GTE-Cluster Improvement /31

28 EVALUATION Correctly Identification of Clusters Experiment Setup EXPERIMENTAL RESULTS 26 /31

29 User Survey EVALUATION Experiment Setup EXPERIMENTAL RESULTS In this experiment we aim to evaluate the ability of our clustering algorithm in correctly identifying relevant clusters and snippets and in filtering out irrelevant ones in real web user environment; The results shown to the users consist of: Users were then requested to rank each query using a 5-scale: Excellent; Good; Fair; Not Relevant; I don t know; 27 /31

30 User Survey EVALUATION Experiment Setup EXPERIMENTAL RESULTS Each query was evaluated by 6 workers The most frequent response was Excellent with an average of /31

31 CONCLUSIONS We propose a strategy for the temporal clustering of search engine query results, where snippets are clustered by year; We rely on a novel temporal similarity measure named GTE which enables to detect top relevant years and filter out irrelevant ones; 29 /31 Ricardo Campos, Alípio Jorge, Gaël Dias, Célia Nunes

32 CONCLUSIONS Results obtained show that the introduction of GTE benefits the quality of the clusters generated, by retrieving a high number of precise relevant dates; Comparative experiments have also been performed over Carrot Web-snippet clustering engine. Results showed that our clustering approach is more effective than the approach of Carrot in temporally disambiguating a query; These results were complemented with a user survey showing that users mostly agree with the set of temporal clusters retrieved by our system; 30 /31 Ricardo Campos, Alípio Jorge, Gaël Dias, Célia Nunes

33 Thanks for your attention! Both experimental datasets are available for download at Polytechnic Institute of Tomar is online at LIAAD is online at CMAT is online at University of Caen is online at Ricardo Campos is online at Gaël Dias is online at Alípio Jorge is online at Célia Nunes is online at 31 /31 Ricardo Campos, Alípio Jorge, Gaël Dias, Célia Nunes

WHAT IS THE TEMPORAL VALUE OF WEB SNIPPETS?

WHAT IS THE TEMPORAL VALUE OF WEB SNIPPETS? WHAT IS THE TEMPORAL VALUE OF WEB SNIPPETS? Ricardo Campos 1, 2, 4 Gaël Dias 2, Alípio Jorge 3, 4 1 Tomar Polytechnic Institute, Tomar, Portugal 2 Centre of Human Language Tecnnology and Bioinformatics,

More information

Temporal Web Image Retrieval

Temporal Web Image Retrieval Gaël Dias a, José G. Moreno a, Adam Jatowt b, Ricardo Campos c,( Paul Martin a, Frédéric Jurie a, Youssef Chahir a ) (a) HULTECH/IMAGE/GREYC - University of Caen Basse-Normandie, France (b) TANAKA Lab

More information

The 2006 IEEE / WIC / ACM International Conference on Web Intelligence Hong Kong, China

The 2006 IEEE / WIC / ACM International Conference on Web Intelligence Hong Kong, China WISE: Hierarchical Soft Clustering of Web Page Search based on Web Content Mining Techniques Ricardo Campos 1, 2 Gaël Dias 2 Célia Nunes 2 1 Instituto Politécnico de Tomar Tomar, Portugal 2 Centre of Human

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Research on News Video Multi-topic Extraction and Summarization

Research on News Video Multi-topic Extraction and Summarization International Journal of New Technology and Research (IJNTR) ISSN:2454-4116, Volume-2, Issue-3, March 2016 Pages 37-39 Research on News Video Multi-topic Extraction and Summarization Di Li, Hua Huo Abstract

More information

CENG 734 Advanced Topics in Bioinformatics

CENG 734 Advanced Topics in Bioinformatics CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the

More information

Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

More information

HELP DESK SYSTEMS. Using CaseBased Reasoning

HELP DESK SYSTEMS. Using CaseBased Reasoning HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Big Data Text Mining and Visualization. Anton Heijs

Big Data Text Mining and Visualization. Anton Heijs Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark

More information

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor A Genetic Algorithm-Evolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT - Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200-001 Covilhã, Portugal

More information

Resolving Common Analytical Tasks in Text Databases

Resolving Common Analytical Tasks in Text Databases Resolving Common Analytical Tasks in Text Databases The work is funded by the Federal Ministry of Economic Affairs and Energy (BMWi) under grant agreement 01MD15010B. Database Systems and Text-based Information

More information

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

More information

REVIEW ON QUERY CLUSTERING ALGORITHMS FOR SEARCH ENGINE OPTIMIZATION

REVIEW ON QUERY CLUSTERING ALGORITHMS FOR SEARCH ENGINE OPTIMIZATION Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Chapter 8. Final Results on Dutch Senseval-2 Test Data Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

A Lightweight Solution to the Educational Data Mining Challenge

A Lightweight Solution to the Educational Data Mining Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

Analysis and Synthesis of Help-desk Responses

Analysis and Synthesis of Help-desk Responses Analysis and Synthesis of Help-desk s Yuval Marom and Ingrid Zukerman School of Computer Science and Software Engineering Monash University Clayton, VICTORIA 3800, AUSTRALIA {yuvalm,ingrid}@csse.monash.edu.au

More information

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach Outline Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach Jinfeng Yi, Rong Jin, Anil K. Jain, Shaili Jain 2012 Presented By : KHALID ALKOBAYER Crowdsourcing and Crowdclustering

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Towards Inferring Web Page Relevance An Eye-Tracking Study

Towards Inferring Web Page Relevance An Eye-Tracking Study Towards Inferring Web Page Relevance An Eye-Tracking Study 1, iconf2015@gwizdka.com Yinglong Zhang 1, ylzhang@utexas.edu 1 The University of Texas at Austin Abstract We present initial results from a project,

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting

Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting Inf1-DA 2010 2011 III: 1 / 89 Informatics 1 School of Informatics, University of Edinburgh Data and Analysis Part III Unstructured Data Ian Stark February 2011 Inf1-DA 2010 2011 III: 2 / 89 Part III Unstructured

More information

Assisting bug Triage in Large Open Source Projects Using Approximate String Matching

Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Amir H. Moin and Günter Neumann Language Technology (LT) Lab. German Research Center for Artificial Intelligence (DFKI)

More information

Detailed definitions on the INSPIRE Network Services

Detailed definitions on the INSPIRE Network Services INSPIRE Infrastructure for Spatial Information in Europe Detailed definitions on the INSPIRE Network Services Title Detailed definitions on the INSPIRE Network Services Creator Date 2005-07-22 Subject

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

A ranking SVM based fusion model for cross-media meta-search engine *

A ranking SVM based fusion model for cross-media meta-search engine * Cao et al. / J Zhejiang Univ-Sci C (Comput & Electron) 200 ():903-90 903 Journal of Zhejiang University-SCIENCE C (Computers & Electronics) ISSN 869-95 (Print); ISSN 869-96X (Online) www.zju.edu.cn/jzus;

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

AN APPROACH FOR SUPERVISED SEMANTIC ANNOTATION

AN APPROACH FOR SUPERVISED SEMANTIC ANNOTATION AN APPROACH FOR SUPERVISED SEMANTIC ANNOTATION A. DORADO AND E. IZQUIERDO Queen Mary, University of London Electronic Engineering Department Mile End Road, London E1 4NS, U.K. E-mail: {andres.dorado, ebroul.izquierdo}@elec.qmul.ac.uk

More information

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

More information

Similarity Search in a Very Large Scale Using Hadoop and HBase

Similarity Search in a Very Large Scale Using Hadoop and HBase Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France

More information

Automatic Text Processing: Cross-Lingual. Text Categorization

Automatic Text Processing: Cross-Lingual. Text Categorization Automatic Text Processing: Cross-Lingual Text Categorization Dipartimento di Ingegneria dell Informazione Università degli Studi di Siena Dottorato di Ricerca in Ingegneria dell Informazone XVII ciclo

More information

Mavuno: A Scalable and Effective Hadoop-Based Paraphrase Acquisition System

Mavuno: A Scalable and Effective Hadoop-Based Paraphrase Acquisition System Mavuno: A Scalable and Effective Hadoop-Based Paraphrase Acquisition System Donald Metzler and Eduard Hovy Information Sciences Institute University of Southern California Overview Mavuno Paraphrases 101

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

Artificial Intelligence and Transactional Law: Automated M&A Due Diligence. By Ben Klaber

Artificial Intelligence and Transactional Law: Automated M&A Due Diligence. By Ben Klaber Artificial Intelligence and Transactional Law: Automated M&A Due Diligence By Ben Klaber Introduction Largely due to the pervasiveness of electronically stored information (ESI) and search and retrieval

More information

Dr. Antony Selvadoss Thanamani, Head & Associate Professor, Department of Computer Science, NGM College, Pollachi, India.

Dr. Antony Selvadoss Thanamani, Head & Associate Professor, Department of Computer Science, NGM College, Pollachi, India. Enhanced Approach on Web Page Classification Using Machine Learning Technique S.Gowri Shanthi Research Scholar, Department of Computer Science, NGM College, Pollachi, India. Dr. Antony Selvadoss Thanamani,

More information

CHAPTER VII CONCLUSIONS

CHAPTER VII CONCLUSIONS CHAPTER VII CONCLUSIONS To do successful research, you don t need to know everything, you just need to know of one thing that isn t known. -Arthur Schawlow In this chapter, we provide the summery of the

More information

Module 16. Semantic Search

Module 16. Semantic Search Module 16 Semantic Search Module 16 schedule 9.45-11.00 xxx Xxx 11.00-11.15 Coffee break 11.15-12.30 xxx Xxx 12.30-14.00 14.00-16.00 Lunch Break xxx xxx Module 16 outline Traditional approaches to search

More information

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

More information

The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

More information

Multi-source hybrid Question Answering system

Multi-source hybrid Question Answering system Multi-source hybrid Question Answering system Seonyeong Park, Hyosup Shim, Sangdo Han, Byungsoo Kim, Gary Geunbae Lee Pohang University of Science and Technology, Pohang, Republic of Korea {sypark322,

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com Outline Predictive modeling methodology k-nearest Neighbor

More information

Citation Context Sentiment Analysis for Structured Summarization of Research Papers

Citation Context Sentiment Analysis for Structured Summarization of Research Papers Citation Context Sentiment Analysis for Structured Summarization of Research Papers Niket Tandon 1,3 and Ashish Jain 2,3 ntandon@mpi-inf.mpg.de, ashish.iiith@gmail.com 1 Max Planck Institute for Informatics,

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components

More information

Recommendation Tool Using Collaborative Filtering

Recommendation Tool Using Collaborative Filtering Recommendation Tool Using Collaborative Filtering Aditya Mandhare 1, Soniya Nemade 2, M.Kiruthika 3 Student, Computer Engineering Department, FCRIT, Vashi, India 1 Student, Computer Engineering Department,

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment

Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment 2009 10th International Conference on Document Analysis and Recognition Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment Ahmad Abdulkader Matthew R. Casey Google Inc. ahmad@abdulkader.org

More information

Optimization of Image Search from Photo Sharing Websites Using Personal Data

Optimization of Image Search from Photo Sharing Websites Using Personal Data Optimization of Image Search from Photo Sharing Websites Using Personal Data Mr. Naeem Naik Walchand Institute of Technology, Solapur, India Abstract The present research aims at optimizing the image search

More information

Efficient Similarity Search over Encrypted Data

Efficient Similarity Search over Encrypted Data UT DALLAS Erik Jonsson School of Engineering & Computer Science Efficient Similarity Search over Encrypted Data Mehmet Kuzu, Saiful Islam, Murat Kantarcioglu Introduction Client Untrusted Server Similarity

More information

RE-SEARCHING THE RESEARCH PROBLEMS IN CAAD

RE-SEARCHING THE RESEARCH PROBLEMS IN CAAD RE-SEARCHING THE RESEARCH PROBLEMS IN CAAD Data Mining in i-caadria MAO-LIN CHIU*, CHIEH-JEN LIN** *Department of Architecture, National Cheng Kung University No. 1, University Road, Tainan 701, Taiwan

More information

An Analysis of Factors Used in Search Engine Ranking

An Analysis of Factors Used in Search Engine Ranking An Analysis of Factors Used in Search Engine Ranking Albert Bifet 1 Carlos Castillo 2 Paul-Alexandru Chirita 3 Ingmar Weber 4 1 Technical University of Catalonia 2 University of Chile 3 L3S Research Center

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:

More information

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval

More information

A Novel Framework for Personalized Web Search

A Novel Framework for Personalized Web Search A Novel Framework for Personalized Web Search Aditi Sharan a, * Mayank Saini a a School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi-67, India Abstract One hundred users, one

More information

The University of Amsterdam s Question Answering System at QA@CLEF 2007

The University of Amsterdam s Question Answering System at QA@CLEF 2007 The University of Amsterdam s Question Answering System at QA@CLEF 2007 Valentin Jijkoun, Katja Hofmann, David Ahn, Mahboob Alam Khalid, Joris van Rantwijk, Maarten de Rijke, and Erik Tjong Kim Sang ISLA,

More information

Three Methods for ediscovery Document Prioritization:

Three Methods for ediscovery Document Prioritization: Three Methods for ediscovery Document Prioritization: Comparing and Contrasting Keyword Search with Concept Based and Support Vector Based "Technology Assisted Review-Predictive Coding" Platforms Tom Groom,

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

More information

An Information Retrieval using weighted Index Terms in Natural Language document collections

An Information Retrieval using weighted Index Terms in Natural Language document collections Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia

More information

Robust Question Answering for Speech Transcripts: UPC Experience in QAst 2009

Robust Question Answering for Speech Transcripts: UPC Experience in QAst 2009 Robust Question Answering for Speech Transcripts: UPC Experience in QAst 2009 Pere R. Comas and Jordi Turmo TALP Research Center Technical University of Catalonia (UPC) {pcomas,turmo}@lsi.upc.edu Abstract

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig

More information

Enhancing Quality of Data using Data Mining Method

Enhancing Quality of Data using Data Mining Method JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad

More information

Text Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com

Text Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com Text Analytics with Ambiverse Text to Knowledge www.ambiverse.com Version 1.0, February 2016 WWW.AMBIVERSE.COM Contents 1 Ambiverse: Text to Knowledge............................... 5 1.1 Text is all Around

More information

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013. Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

More information

480093 - TDS - Socio-Environmental Data Science

480093 - TDS - Socio-Environmental Data Science Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 480 - IS.UPC - University Research Institute for Sustainability Science and Technology 715 - EIO - Department of Statistics and

More information

High Productivity Data Processing Analytics Methods with Applications

High Productivity Data Processing Analytics Methods with Applications High Productivity Data Processing Analytics Methods with Applications Dr. Ing. Morris Riedel et al. Adjunct Associate Professor School of Engineering and Natural Sciences, University of Iceland Research

More information

Florida International University - University of Miami TRECVID 2014

Florida International University - University of Miami TRECVID 2014 Florida International University - University of Miami TRECVID 2014 Miguel Gavidia 3, Tarek Sayed 1, Yilin Yan 1, Quisha Zhu 1, Mei-Ling Shyu 1, Shu-Ching Chen 2, Hsin-Yu Ha 2, Ming Ma 1, Winnie Chen 4,

More information

From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance

From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance Emanuele Di Buccio Department of Information Engineering University of Padua, Italy dibuccio@dei.unipd.it Mounia Lalmas

More information

! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II

! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II ! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering Map-like Wikipedia Visualization by Pang Cheong Iao Master of Science in Software Engineering 2011 Faculty of Science and Technology University of Macau Map-like Wikipedia Visualization by Pang Cheong

More information

Mining Navigation Histories for User Need Recognition

Mining Navigation Histories for User Need Recognition Mining Navigation Histories for User Need Recognition Fabio Gasparetti and Alessandro Micarelli and Giuseppe Sansonetti Roma Tre University, Via della Vasca Navale 79, Rome, 00146 Italy {gaspare,micarel,gsansone}@dia.uniroma3.it

More information

Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

More information

A Platform to Support Web Site Adaptation and Monitoring of its Effects: A Case Study

A Platform to Support Web Site Adaptation and Monitoring of its Effects: A Case Study A Platform to Support Web Site Adaptation and Monitoring of its Effects: A Case Study Marcos A. Domingues Fac. de Ciências, U. Porto LIAAD-INESC Porto L.A., Portugal marcos@liaad.up.pt José Paulo Leal

More information

Development, deployment and validation of an oceanographic virtual laboratory based on Grid computing

Development, deployment and validation of an oceanographic virtual laboratory based on Grid computing Development, deployment and validation of an oceanographic virtual laboratory based on Grid computing David Mera Pérez Santiago de Compostela, Feb. 15 th 2013 Index 1 Context and Motivation 2 Objectives

More information

Personalized Expedia Hotel Searches

Personalized Expedia Hotel Searches Personalized Expedia Hotel Searches Xinxing Jiang, Yao Xiao, and Shunji Li Stanford University December 13, 2013 Abstract In this paper, we propose machine learning algorithms with search data of Expedia

More information

Guido Sciavicco. 11 Novembre 2015

Guido Sciavicco. 11 Novembre 2015 classical and new techniques Università degli Studi di Ferrara 11 Novembre 2015 in collaboration with dr. Enrico Marzano, CIO Gap srl Active Contact System Project 1/27 Contents What is? Embedded Wrapper

More information

A Semantic Model for Multimodal Data Mining in Healthcare Information Systems. D.K. Iakovidis & C. Smailis

A Semantic Model for Multimodal Data Mining in Healthcare Information Systems. D.K. Iakovidis & C. Smailis A Semantic Model for Multimodal Data Mining in Healthcare Information Systems D.K. Iakovidis & C. Smailis Department of Informatics and Computer Technology Technological Educational Institute of Lamia,

More information

Content-Based Discovery of Twitter Influencers

Content-Based Discovery of Twitter Influencers Content-Based Discovery of Twitter Influencers Chiara Francalanci, Irma Metra Department of Electronics, Information and Bioengineering Polytechnic of Milan, Italy irma.metra@mail.polimi.it chiara.francalanci@polimi.it

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

DYNAMIC QUERY FORMS WITH NoSQL

DYNAMIC QUERY FORMS WITH NoSQL IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 7, Jul 2014, 157-162 Impact Journals DYNAMIC QUERY FORMS WITH

More information

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

DATA ANALYTICS USING R

DATA ANALYTICS USING R DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

More information

MULTI LAYER PERCEPTRON FOR WEB PAGE CLASSIFICATION BASED ON TDF/IDF ONTOLOGY BASED FEATURES AND GENETIC ALGORITHMS

MULTI LAYER PERCEPTRON FOR WEB PAGE CLASSIFICATION BASED ON TDF/IDF ONTOLOGY BASED FEATURES AND GENETIC ALGORITHMS MULTI LAYER PERCEPTRON FOR WEB PAGE CLASSIFICATION BASED ON TDF/IDF ONTOLOGY BASED FEATURES AND GENETIC ALGORITHMS N.VANJULAVALLI 1, DR.A.KOVALAN 2 1. Research Scholar, Department of Computer Science and

More information

E6893 Big Data Analytics: Yelp Fake Review Detection

E6893 Big Data Analytics: Yelp Fake Review Detection E6893 Big Data Analytics: Yelp Fake Review Detection Mo Zhou, Chen Wen, Dhruv Kuchhal, Duo Chen Columbia University in the City of New York December 11th, 2014 Overview 1 Problem Summary 2 Technical Approach

More information

Neovision2 Performance Evaluation Protocol

Neovision2 Performance Evaluation Protocol Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram rajmadhan@mail.usf.edu Dmitry Goldgof, Ph.D. goldgof@cse.usf.edu Rangachar Kasturi, Ph.D.

More information

Analysis of Social Media Streams

Analysis of Social Media Streams Fakultätsname 24 Fachrichtung 24 Institutsname 24, Professur 24 Analysis of Social Media Streams Florian Weidner Dresden, 21.01.2014 Outline 1.Introduction 2.Social Media Streams Clustering Summarization

More information

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1

More information