Enhanced Information Access to Social Streams. Enhanced Word Clouds with Entity Grouping
|
|
|
- Herbert Higgins
- 10 years ago
- Views:
Transcription
1 Enhanced Information Access to Social Streams through Word Clouds with Entity Grouping Martin Leginus 1, Leon Derczynski 2 and Peter Dolog 1 1 Department of Computer Science, Aalborg University Selma Lagerlofs Vej 300, 9200 Aalborg, Denmark 2 Department of Computer Science, University of Sheffield S1 4DP, United Kingdom
2 Agenda Word clouds for social streams Entity redundancies and motivation Grouping entities Graph based word cloud generation Synthetic evaluation User study Discussions, Conclusions and Future work
3 Word clouds generated from social streams A visual retrieval interface depicting the most important terms of a dataset. Word clouds provide means to minimize an information overload when browsing social media. Leginus, M., Zhai, C., and Dolog, P. (2015). Personalized generation of word clouds from tweets. Journal of the Association for Information Science and Technology.
4 Word clouds generated from social streams Figure : The user interface of FeedWinnower system. Hong L., Convertino G., Suh B., Chi E. H, and Kairam S. Feedwinnower: layering structures over collections of information streams. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages ACM, 2010
5 Word clouds generated from social streams Figure : Tweetmotif user interface for exploratory search. OConnor B., Krieger M., and Ahn D. Tweetmotif: Exploratory search and topic summarization for twitter. Proceedings of ICWSM, pages 2-3, 2010.
6 Word clouds generated from social streams Figure : Eddi - summarization interface of user timeline tweets. Bernstein, M. S., Suh, B., Hong, L., Chen, J., Kairam, S., and Chi, E. H. (2010). Eddi: interactive topic-based browsing of social status streams. In Proceedings of the 23nd annual ACM symposium on User interface software and technology, pages ACM.
7 Motivation Word clouds generated from terms often not meaningful - enrich with named entities. (clouds with entities perceived as more useful (Finn et.al 2010)) The football team Manchester United can be referred to as MUFC, Man U, Red Devils or the Reds. Redundancies decrease a quality of word clouds (decreased diversity, user confusion and limited browsing experience).
8 Goals 1 Condense divergent terms referring to the same entity. 2 Maximize a diversity of word clouds to provide a broad overview of topics. Research hypothesis: Word clouds with grouped named entities improve coverage, relevance and diversity.
9 The process of word clouds generation 1 Data collection 2 Data preprocessing 3 Word cloud generation
10 Grouping named entities 1 Recognise named entities (NER) and disambiguate them (entity linking) using the TextRazor service 2 Find alternative names for the recognised entity. 3 Perform lemmatisation: 4 Using the aliases, build a term cluster for each entity 5 Find canonical names for entities.
11 Grouping named entities 1 Recognise named entities (NER) and disambiguate them (entity linking) using the TextRazor service 2 Find alternative names for the recognised entity. (We employ a Freebase KB)
12 Grouping named entities 1 Recognise named entities (NER) and disambiguate them (entity linking) using the TextRazor service 2 Find alternative names for the recognised entity. 3 Perform lemmatisation: Group together all the inflicted forms of a word to exploit only the base form of the term e.g., fan, fans. 4 Using the aliases, build a term cluster for each entity 5 Find canonical names for entities.
13 Grouping named entities 1 Recognise named entities (NER) and disambiguate them (entity linking) using the TextRazor service 2 Find alternative names for the recognised entity. 3 Perform lemmatisation: 4 Using the aliases, build a term cluster for each entity e.g., mufc, manchester united, man united, red devils, devils 5 Find canonical names for entities.
14 Grouping named entities 1 Recognise named entities (NER) and disambiguate them (entity linking) using the TextRazor service 2 Find alternative names for the recognised entity. 3 Perform lemmatisation: 4 Using the aliases, build a term cluster for each entity 5 Find canonical names for entities. Represent the cluster mufc, manchester united, man united, red devils, devils with Manchester United F.C.
15 Grouping named entities 1 Recognise named entities (NER) and disambiguate them (entity linking) using the TextRazor service 2 Find alternative names for the recognised entity. 3 Perform lemmatisation: 4 Using the aliases, build a term cluster for each entity 5 Find canonical names for entities. Proceed with word cloud generation.
16 Graph-based word cloud generation Terms extracted from tweets used for a graph creation. When two terms (vertices) co-occur at least α times, two directed edges are introduced t 1 t 2, t 2 t 1 Tweets are short, hence a parameter α is set to 0.
17 Graph-based ranking Stochastic traversal of terms graph estimates an importance of a term t. Iterative stationary probability is defined as: ( din (v) ) π(v) (i+1) = (1 β) p(v u)π (i) (u) u=1 + β p p (1) User preferences can be encoded into a vector of prior probabilities p p The resulting global rank of a term t after convergence is: Top-k ranked terms are then used for word cloud generation. I(t) = π(t) (2)
18 Divrank ranking Intention is to increase the diversity of ranking Transition probabilities change over time - rich gets richer principle Q. Mei, J. Guo, and D. Radev. Divrank: the interplay of prestige and diversity in information networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, 2010.
19 Divrank ranking A transition probability from a node u to node v at time T is: ( ) p0 (v u) p T (v) p T (v u) = (1 β) + β p p v V p(v u) p T (v) Divrank algorithm is useful for incorrectly disambiguated entities e.g., BBC, BBC NEWS, BBC NEWS WORLD.
20 Evaluation metrics A term t links to tweets Tw t Tw tq is the set of all tweets that are associated with a query phrase t q Coverage indicates how many tweets are retrievable from the given word cloud. Coverage(WC k ) = t WC k Tw t, (3) Tw tq Overlap captures the extent of redundancies i.e., how many terms link to the same tweet. Overlap(WC k ) = avg ti t j Tw ti Tw tj min{ Tw ti, Tw tj }, (4)
21 Evaluation metrics I Mean Average Precision: 1 A word cloud is transformed into a query Q WCk. 2 Retrieve and rank tweets matching the query. 3 Measure Mean Average Precision (MAP). Ranking function is Okapi BM25: S(tw,Q WCk ) = c(q i,q WCk ) TF(q i,tw) IDF(q i ) (5) q i Q WCk tw The function c(q i,q WCk ) returns a weight of the term q i. Components TF(q i,tw) and IDF(q i ) are calculated in the standard way.
22 Evaluation dataset and word cloud generation methods TREC2011 microblogging collection with relevance judgements. Tweets rated as relevant or highly relevant are considered equally relevant. PgRankTerms (baseline) estimates a global importance of terms (extracted from tweets and transformed into a graph). MFE: selects top-k most popular recognized entities. MFEA: selects top-k most popular recognized entities grouped with their aliases. PgRankTermsEntities: ranks top-k phrases from the graph which contains extracted terms and grouped recognized entities.
23 Results Coverage Coverage # terms in the word cloud PgRankTerms PgRankTermsEntities MFE MFEA Relevance Overlap # terms in the word cloud MAP@ MAP@ # terms in the word cloud Figure : Word clouds with grouped entities attain higher Coverage and MAP.
24 Diversification Overlap PgRankTerms PgRankTermsEntities DivRankTermEntities Overlap # terms in the word cloud Figure : Divrank decreases redundancies and also improve Coverage wrt. to the baseline.
25 User study Are word clouds with named entities perceived as more relevant and diverse by the users? Do measured synthetic metrics correlate with the ratings of relevance and diversity by users?
26 Terms vs. Entity-grouped clouds 160 distinct relevance ratings, 89 positive towards word clouds with named entities, 27 neutral ratings and 44 for the baseline generated word clouds. Diversity ratings, 73 positive towards word clouds with named entities, 51 neutral ratings and 36 for the baseline generated word clouds % of ratings % of ratings Relevance ratings Diversity ratings
27 User ratings vs. synthetic metrics % of ratings Improved MAP % of ratings Improved MAP % of ratings % of ratings Relevance ratings Improved diversity ( Overlap) Relevance ratings Decreased MAP and Overlap % of ratings % of ratings Diversity ratings Improved diversity ( Overlap) Diversity ratings Decreased MAP and Overlap Relevance ratings Diversity ratings MAP metric predicts extrinsic human evaluations of cloud quality.
28 Discussions False positives from NER affect relevance ratings e.g., a cloud for Super Bowl, seats" contained Super (2010 American film)". Imprecise named entitiy disambiguation (e.g., BBC, BBC News, BBC News World) increases redundancies. Due to subjective nature of the crowdsourcing task, we disregarded a user qualifying phase.
29 Conclusions A technique that groups aliases of the same entity and represents them with a canonical term. Significantly decreased redundancy and significantly higher coverage than the baseline. User study supports that word clouds with grouped named entities are significantly more relevant and diverse than baseline. MAP metric predicts extrinsic human evaluations of cloud quality.
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
Exploring Big Data in Social Networks
Exploring Big Data in Social Networks [email protected] ([email protected]) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about
Research of Postal Data mining system based on big data
3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication
Tweets Miner for Stock Market Analysis
Tweets Miner for Stock Market Analysis Bohdan Pavlyshenko Electronics department, Ivan Franko Lviv National University,Ukraine, Drahomanov Str. 50, Lviv, 79005, Ukraine, e-mail: [email protected]
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
Why do statisticians "hate" us?
Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data
Task 3 Web Community Sensing & Task 6 Query and Visualization
Task 3 Web Community Sensing & Task 6 Query and Visualization REACTION Workshop January 31 th, 2013 Summary of on-going activities Team update WP3 & WP6 progress reports Resources & publications Team update
Analysis of Social Media Streams
Fakultätsname 24 Fachrichtung 24 Institutsname 24, Professur 24 Analysis of Social Media Streams Florian Weidner Dresden, 21.01.2014 Outline 1.Introduction 2.Social Media Streams Clustering Summarization
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED 17 19 June 2013 Monday 17 June Salón de Actos, Facultad de Psicología, UNED 15.00-16.30: Invited talk Eneko Agirre (Euskal Herriko
Sentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Clustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Big Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
Sentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
Automated Collaborative Filtering Applications for Online Recruitment Services
Automated Collaborative Filtering Applications for Online Recruitment Services Rachael Rafter, Keith Bradley, Barry Smyth Smart Media Institute, Department of Computer Science, University College Dublin,
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Predicting stocks returns correlations based on unstructured data sources
Predicting stocks returns correlations based on unstructured data sources Mateusz Radzimski, José Luis Sánchez-Cervantes, José Luis López Cuadrado, Ángel García-Crespo Departamento de Informática Universidad
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
COMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,
1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012
Digital Collections as Big Data Leslie Johnston, Library of Congress Digital Preservation 2012 Data is not just generated by satellites, identified during experiments, or collected during surveys. Datasets
SCALABLE GRAPH ANALYTICS WITH GRADOOP AND BIIIG
SCALABLE GRAPH ANALYTICS WITH GRADOOP AND BIIIG MARTIN JUNGHANNS, ANDRE PETERMANN, ERHARD RAHM www.scads.de RESEARCH ON GRAPH ANALYTICS Graph Analytics on Hadoop (Gradoop) Distributed graph data management
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Clustering Data Streams
Clustering Data Streams Mohamed Elasmar Prashant Thiruvengadachari Javier Salinas Martin [email protected] [email protected] [email protected] Introduction: Data mining is the science of extracting
Programming Tools based on Big Data and Conditional Random Fields
Programming Tools based on Big Data and Conditional Random Fields Veselin Raychev Martin Vechev Andreas Krause Department of Computer Science ETH Zurich Zurich Machine Learning and Data Science Meet-up,
How To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
Customer Driven Big-Data Analytics for the Companies Servitization
Customer Driven Big-Data Analytics for the Companies Servitization Eugen Molnár, Natalia Kryvinska, Michal Greguš Comenius University in Bratislava, Faculty of Management Principal interactions in a PSS
KEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
Visualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search
Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search Orland Hoeber and Hanze Liu Department of Computer Science, Memorial University St. John s, NL, Canada A1B 3X5
The Need for Training in Big Data: Experiences and Case Studies
The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor
Association rules for improving website effectiveness: case analysis
Association rules for improving website effectiveness: case analysis Maja Dimitrijević, The Higher Technical School of Professional Studies, Novi Sad, Serbia, [email protected] Tanja Krunić, The
Financial Trading System using Combination of Textual and Numerical Data
Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,
The Big Data Paradigm Shift. Insight Through Automation
The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.
THEMIS: Fairness in Data Stream Processing under Overload
THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK Marco Fiscato Imperial College London, UK Theodoros Salonidis IBM Research, USA Peter R. Pietzuch
How To Write A Summary Of A Review
PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,
IC05 Introduction on Networks &Visualization Nov. 2009. <[email protected]>
IC05 Introduction on Networks &Visualization Nov. 2009 Overview 1. Networks Introduction Networks across disciplines Properties Models 2. Visualization InfoVis Data exploration
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Dynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC [email protected] Hong Cheng CS Dept, UIUC [email protected] Abstract Most current search engines present the user a ranked
Content-Based Discovery of Twitter Influencers
Content-Based Discovery of Twitter Influencers Chiara Francalanci, Irma Metra Department of Electronics, Information and Bioengineering Polytechnic of Milan, Italy [email protected] [email protected]
Project Report BIG-DATA CONTENT RETRIEVAL, STORAGE AND ANALYSIS FOUNDATIONS OF DATA-INTENSIVE COMPUTING. Masters in Computer Science
Data Intensive Computing CSE 486/586 Project Report BIG-DATA CONTENT RETRIEVAL, STORAGE AND ANALYSIS FOUNDATIONS OF DATA-INTENSIVE COMPUTING Masters in Computer Science University at Buffalo Website: http://www.acsu.buffalo.edu/~mjalimin/
Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,
Oracle Big Data Spatial & Graph Social Network Analysis - Case Study
Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Mark Rittman, CTO, Rittman Mead OTN EMEA Tour, May 2016 [email protected] www.rittmanmead.com @rittmanmead About the Speaker Mark
Twitter Analytics: Architecture, Tools and Analysis
Twitter Analytics: Architecture, Tools and Analysis Rohan D.W Perera CERDEC Ft Monmouth, NJ 07703-5113 S. Anand, K. P. Subbalakshmi and R. Chandramouli Department of ECE, Stevens Institute Of Technology
Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
Performance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Term extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
Handling the Complexity of RDF Data: Combining List and Graph Visualization
Handling the Complexity of RDF Data: Combining List and Graph Visualization Philipp Heim and Jürgen Ziegler (University of Duisburg-Essen, Germany philipp.heim, [email protected]) Abstract: An
Implementing Graph Pattern Mining for Big Data in the Cloud
Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya [email protected]
with your eyes: Considerations when visualizing information Joshua Mitchell & Melissa Rands, RISE
Think with your eyes: Considerations when visualizing information Joshua Mitchell & Melissa Rands, RISE What is visualization? Well, it depends on who you talk to. Some people say it is strictly traditional
Context Aware Predictive Analytics: Motivation, Potential, Challenges
Context Aware Predictive Analytics: Motivation, Potential, Challenges Mykola Pechenizkiy Seminar 31 October 2011 University of Bournemouth, England http://www.win.tue.nl/~mpechen/projects/capa Outline
Twitter Data Analysis: Hadoop2 Map Reduce Framework
Data Intensive Computing CSE 487/587 Project Report 2 Twitter Data Analysis: Hadoop2 Map Reduce Framework Masters in Computer Science University at Buffalo Submitted By Prachi Gokhale ([email protected])
How Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information
AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING
AN EFFIIENT APPROAH TO PERFORM PRE-PROESSING S. Prince Mary Research Scholar, Sathyabama University, hennai- 119 [email protected] E. Baburaj Department of omputer Science & Engineering, Sun Engineering
Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
Text Clustering Using LucidWorks and Apache Mahout
Text Clustering Using LucidWorks and Apache Mahout (Nov. 17, 2012) 1. Module name Text Clustering Using Lucidworks and Apache Mahout 2. Scope This module introduces algorithms and evaluation metrics for
APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email [email protected]
Eighth International IBPSA Conference Eindhoven, Netherlands August -4, 2003 APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION Christoph Morbitzer, Paul Strachan 2 and
Big Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
Hybrid model rating prediction with Linked Open Data for Recommender Systems
Hybrid model rating prediction with Linked Open Data for Recommender Systems Andrés Moreno 12 Christian Ariza-Porras 1, Paula Lago 1, Claudia Jiménez-Guarín 1, Harold Castro 1, and Michel Riveill 2 1 School
A Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, [email protected] Abstract Most text data from diverse document databases are unsuitable for analytical
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Canonical Image Selection for Large-scale Flickr Photos using Hadoop
Canonical Image Selection for Large-scale Flickr Photos using Hadoop Guan-Long Wu National Taiwan University, Taipei Nov. 10, 2009, @NCHC Communication and Multimedia Lab ( 通 訊 與 多 媒 體 實 驗 室 ), Department
DYNAMIC QUERY FORMS WITH NoSQL
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 7, Jul 2014, 157-162 Impact Journals DYNAMIC QUERY FORMS WITH
A Comparative Study on Sentiment Classification and Ranking on Product Reviews
A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan
Keyphrase Extraction for Scholarly Big Data
Keyphrase Extraction for Scholarly Big Data Cornelia Caragea Computer Science and Engineering University of North Texas July 10, 2015 Scholarly Big Data Large number of scholarly documents on the Web PubMed
Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP
Operates more like a search engine than a database Scoring and ranking IP allows for fuzzy searching Best-result candidate sets returned Contextual analytics to correctly disambiguate entities Embedded
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
EXPLORING SPATIAL PATTERNS IN YOUR DATA
EXPLORING SPATIAL PATTERNS IN YOUR DATA OBJECTIVES Learn how to examine your data using the Geostatistical Analysis tools in ArcMap. Learn how to use descriptive statistics in ArcMap and Geoda to analyze
Web Usage Mining: Identification of Trends Followed by the user through Neural Network
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 617-624 International Research Publications House http://www. irphouse.com /ijict.htm Web
Distributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
Contact Recommendations from Aggegrated On-Line Activity
Contact Recommendations from Aggegrated On-Line Activity Abigail Gertner, Justin Richer, and Thomas Bartee The MITRE Corporation 202 Burlington Road, Bedford, MA 01730 {gertner,jricher,tbartee}@mitre.org
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
Client Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
Microblogging Queries on Graph Databases: An Introspection
Microblogging Queries on Graph Databases: An Introspection ABSTRACT Oshini Goonetilleke RMIT University, Australia [email protected] Timos Sellis RMIT University, Australia [email protected]
