Big Data in Web Search. Claudio Lucchese hpc.isti.cnr.it

Size: px
Start display at page:

Download "Big Data in Web Search. Claudio Lucchese claudio.lucchese@isti.cnr.it hpc.isti.cnr.it"

Transcription

1 Big Data in Web Search Claudio Lucchese hpc.isti.cnr.it

2 y High Performance Computing Lab 3 Post-doc 3 Research Associates from U. of Venice and Pisa l a b o fellows and r a t o r 7 Researchers 6 PhD students Claudio Lucchese Big Data in Web Search 2

3 Main Research Topics Web Search & Scalable DM/ML Responsiveness of large-scale search systems, storage, analysis and indexing of large amounts of data Machine learning and Web mining techniques for Ranking, Prediction, Recommendation, Diversification, Social media analysis, Entity Linking and Semantic Enrichment Cloud and Distributed computing Cloud federations, Resource Management l a b o r a t o r y Network overlays for P2P and Big Data Scalable data analysis with Hadoop Map-Reduce, Giraph, Spark, etc Claudio Lucchese Big Data in Web Search 3

4 Outline Some recent Learning to Rank activities User Task Discovery in Query Logs Lucchese, C., Orlando, S., Perego, R., Silvestri, F., & Tolomei, G. Discovering tasks from search engine query logs. ACM Transactions on Information Systems (TOIS), 31(3), ACM Notable Article. Entity Linking D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, S. Trani. Learning relatedness measures for entity linking. In Proceedings of CIKM '13: ACM Int. Conference on Information and Knowledge Management, p , Oct News Recommendation G. De Francisci Morales, A. Gionis, C. Lucchese. From Chatter to Headlines: Harnessing the Real-Time Web for Personalized News Recommendation. In Proceedings of WSDM '12: ACM Int. Conference on Web Search and Data Mining, Seattle, Washington, USA, February Tour planning Brilhante, I., Macedo, J. A., Nardini, F. M., Perego, R., & Renso, C. Where shall we go today?: planning touristic tours with tripbuilder. In Proceedings of the CIKM '13: ACM Int. Conference on Information and Knowledge Management, pp Oct Claudio Lucchese Big Data in Web Search 4

5 Ranking Ranking is (one of) the most important challenges in Web Search We define Ranking as the problem of sorting a set of documents according to their relevance to the user query. This is a typical Big Data task users feedback is very relevant Claudio Lucchese Big Data in Web Search 5

6 Learning to Rank is: q 1, d 11, r 11 q 1, d 12, r 12 q 1, d 1k, r 1k q m, d m1, r m1 q m, d mk, r mk Learning Scoring function h q *, d *,? Document Scoring q *, d *, h(q *, d * ) The goal is to learn the ranking, not the label! Claudio Lucchese Big Data in Web Search 6

7 Is it easy? q 1, d 11, r 11 q 1, d 12, r 12 q 1, d 1k, r 1k q m, d m1, r m1 q m, d mk, r mk Learning Scoring function h q *, d *,? Document Scoring q *, d *, h(q *, d * ) Not so easy when optimizing typical Information Retrieval measures. One simple reason is that they imply sorting (of documents), which is not a nicely derivable function Therefore we cannot apply gradient descent or similar Claudio Lucchese Big Data in Web Search 7

8 QuickScore A Learning-to-Rank function is typically implemented as a forest of thousands decision trees QuickScore is a cache-aware algorithm improving the scoring efficiency of tree-based ranking models It s 2x to 6.5x times faster than state-of-the-art implementations It visits a tree by touching a smaller number of nodes We also implemented an efficient multi-threaded learning toolkit, named QuickLearn, implementing a few variants of Gradient Boosted Regression Trees Claudio Lucchese Big Data in Web Search 8

9 Big Data in Web Search Europeana is the biggest European Cultural Heritage portal. Within the ASSETS EU CIP Project we designed a new ranking function, which was actually deployed on the portal! Users feedback (result licks) is exploited to learn document relevance Claudio Lucchese Big Data in Web Search 9

10 Big Data in Web Search Ranking design: define a ranking architecture, learn a ranking function, feature tuning, efficient scoring, etc. Data cleaning: remove near-duplicate documents from a collection of 6 billion documents with ~1000 CPU cores. NoSQL storage, massive Hadoop MapReduce computations Claudio Lucchese Big Data in Web Search 10

11 Outline Some recent Learning to Rank activities User Task Discovery in Query Logs Lucchese, C., Orlando, S., Perego, R., Silvestri, F., & Tolomei, G. Discovering tasks from search engine query logs. ACM Transactions on Information Systems (TOIS), 31(3), ACM Notable Article. Entity Linking D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, S. Trani. Learning relatedness measures for entity linking. In Proceedings of CIKM '13: ACM Int. Conference on Information and Knowledge Management, p , Oct News Recommendation G. De Francisci Morales, A. Gionis, C. Lucchese. From Chatter to Headlines: Harnessing the Real-Time Web for Personalized News Recommendation. In Proceedings of WSDM '12: ACM Int. Conference on Web Search and Data Mining, Seattle, Washington, USA, February Tour planning Brilhante, I., Macedo, J. A., Nardini, F. M., Perego, R., & Renso, C. Where shall we go today?: planning touristic tours with tripbuilder. In Proceedings of the CIKM '13: ACM Int. Conference on Information and Knowledge Management, pp Oct Claudio Lucchese Big Data in Web Search 12

12 User Task Discovery (UTD) Users have interleaving multi/tasking behavior: Thanks to Users Task Discovery, it would be possible to bettern understand users, to recommend tasks, etc. Claudio Lucchese 2nd HPC Workshop - Playing with Learning to Rank 13

13 User Task Discovery (UTD) Easy process: Put weights on the similarity graph Remove low weighted edges Extract Connected Components How to find a good query similarity function? Claudio Lucchese 2nd HPC Workshop - Playing with Learning to Rank 14

14 User Task Discovery (UTD) Binary classification approach Given a set of task-annotated queries Build a training set of query pairs labeled as same task vs. different tasks Define the feature set: edlevgt2: edit distance between q i and q j wordr: Jaccard distance between the sets of words of q i and q j char_suf: number of common characters in q i and q j nsubst_q j _X: related to the probability of q j being reformulated time_diff: inter-query time gap between q i and q j sequential: binary feature that is positive if q i and q j are issued sequentially prisma: cosine between the two vectors of the top-50 pages retuned by a SE for q i and q j entropy_q i _X: measure the rewrite probabilities from q i σ jaccard_url : Jaccard similarity between the top-20 URLs returned by SE for qi and qj σ wikipedia : cosine based on Wikipedia articles containing qi and qj Learn a classifier optimizing classification accuracy Logistic Regression works sufficiently well Decision trees are slightly better We improved over the Query-Flow-Graph approach Claudio Lucchese 2nd HPC Workshop - Playing with Learning to Rank 15

15 Learning to Rank for UTD Learning to rank setting: A query is a query q i in the user session A document is any other query q j in the same user session Objective: rank same-task queries higher than different-task queries Clustering quality: Sample of the AOL 2006 query log: ~8,800 queries by 127 users, manually annotated into ~1350 tasks, ~6.5 queries per task Algo Rand Index Jacc. Index Avg. Fm Log.Reg Lambda-MART Claudio Lucchese 2nd HPC Workshop - Playing with Learning to Rank 16

16 Outline Some recent Learning to Rank activities User Task Discovery in Query Logs Lucchese, C., Orlando, S., Perego, R., Silvestri, F., & Tolomei, G. Discovering tasks from search engine query logs. ACM Transactions on Information Systems (TOIS), 31(3), ACM Notable Article. Entity Linking D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, S. Trani. Learning relatedness measures for entity linking. In Proceedings of CIKM '13: ACM Int. Conference on Information and Knowledge Management, p , Oct News Recommendation G. De Francisci Morales, A. Gionis, C. Lucchese. From Chatter to Headlines: Harnessing the Real-Time Web for Personalized News Recommendation. In Proceedings of WSDM '12: ACM Int. Conference on Web Search and Data Mining, Seattle, Washington, USA, February Tour planning Brilhante, I., Macedo, J. A., Nardini, F. M., Perego, R., & Renso, C. Where shall we go today?: planning touristic tours with tripbuilder. In Proceedings of the CIKM '13: ACM Int. Conference on Information and Knowledge Management, pp Oct Claudio Lucchese Big Data in Web Search 17

17 Entity Linking The goal is to identify relevant entities mentioned by fragments of text. Entities are taken from a given catalogue, e.g. Wikipedia. Claudio Lucchese Big Data in Web Search 18

18 Entity Linking State-of-the-art approaches run three steps: 1. Spotting Given a document, find fragments of text potentially referring to entities, a.k.a. spots Common approach is to match anchors in Wikipedia Some spots are ambiguous, e.g. Michael Collins 2. Disambiguation Given a set of spots in a document, find the correct entity for each spot. Steps 1 and 2 are sometimes referred to as Word Sense Disambiguation 3. Link Detection Given a document, its terms and their senses, decide where to put links: E.g., Bank of London The main research questions is: How to improve the disambiguation step. Claudio Lucchese Big Data in Web Search 19

19 Saliency Driven Entity Linking Mentioned entities have not the same importance in the given document We defined 3 levels of saliency We trained a model to rank entities according to their expected saliency Table 2: Entity linking and saliency prediction performance. CoNLL Wikinews Rec Prec F 1 Rec Prec F 1 NDCG F top 1 GBDT-F GBRT-F LRC-F Tagme Wikiminer Spotlight Step GBRT-F l Claudio Lucchese Big Data in Web Search 26

20 What s Next The software we developed is open-source and available at Endless Applications: News stream analysis, for understanding, summarization, sentiment, Web queries and web documents annotation Claudio Lucchese Big Data in Web Search 27

21 Outline Some recent Learning to Rank activities User Task Discovery in Query Logs Lucchese, C., Orlando, S., Perego, R., Silvestri, F., & Tolomei, G. Discovering tasks from search engine query logs. ACM Transactions on Information Systems (TOIS), 31(3), ACM Notable Article. Entity Linking D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, S. Trani. Learning relatedness measures for entity linking. In Proceedings of CIKM '13: ACM Int. Conference on Information and Knowledge Management, p , Oct News Recommendation G. De Francisci Morales, A. Gionis, C. Lucchese. From Chatter to Headlines: Harnessing the Real-Time Web for Personalized News Recommendation. In Proceedings of WSDM '12: ACM Int. Conference on Web Search and Data Mining, Seattle, Washington, USA, February Tour planning Brilhante, I., Macedo, J. A., Nardini, F. M., Perego, R., & Renso, C. Where shall we go today?: planning touristic tours with tripbuilder. In Proceedings of the CIKM '13: ACM Int. Conference on Information and Knowledge Management, pp Oct Claudio Lucchese Big Data in Web Search 28

22 News Recommendation How recommend to a given user the most relevant news? There are a lot of information sources Users are different Timeliness is crucial! Claudio Lucchese Big Data in Web Search 29

23 Time is Crucial delay between news publication and clicks Claudio Lucchese Big Data in Web Search 30

24 Time is crucial: Earthquakes and Twitter There are some arguments about twitter waves being faster than seismic waves Claudio Lucchese Big Data in Web Search 31

25 Time is crucial: News Agencies vs. Twitter Tweets about Osama Bin Laden death Twitter is sometimes a faster than other media Claudio Lucchese Big Data in Web Search 32

26 Twitter vs. News streams The main research questions are: Streaming analysis of twitter stream Detect trending topics early in twitter Model information spreading process in twitter Detect trending topics early in news Recommend news to users according to their tastes Our research question is: Can we use information both from the twitter stream and from the news stream to provide personalized news recommendation? Claudio Lucchese Big Data in Web Search 33

27 When a news is relevant? Our assumptions. A news article is interesting if: 1. If it discusses topics of interest to the user (e.g., computer science) 2. If it discusses topics of interest to the social network of the user (e.g., computer science, art exhibitions in Tuscany) 3. If it does not fall in the user s interests, but it is of general relevance (e.g., Ukraine crisis) We need a way to detect topics in text Topics provide an higher level view Topics can bridge the gap between twitter and news streams We model the relation between news, tweets and users in terms of the topics/entities they discuss Claudio Lucchese Big Data in Web Search 34

28 Tweets and news relatedness We extracted topics from tweets and from news Let Z be the set of topics Z In our case, Z is the set of Wikipedia pages Let T(i, j) be the relevance of topic z j for tweet t i Let N(j, k) be the relevance of topic z j for news n k The product M=TN is used to estimate the relatedness between tweets and news: M(i, k) is the relatedness between the tweet t i and the news n k based on the co-cited topics. Claudio Lucchese Big Data in Web Search 35

29 Content and Social relevance Content relatedness is based of the users history of tweets. Let the binary matrix A(u, i)=1 iff user u tweeted tweet t i Content relatedness is defined by the matrix Γ=AM, where Γ(u, k) is the relevance of news n k for user u based on the entities mentioned in u s tweets and in n k. Social relatedness is a function of the tweets in the social network S of the user = Xi=d i=1 i S i! A M σ is a dumping factor, d is the max distance in the social network Σ(u, k) is the relevance of news n k for users in the social circles of u Claudio Lucchese Big Data in Web Search 36

30 Topic popularity over time Let Z be a (row) vector where Z(j) is the popularity of topic z j Z = Z 1 + w T H T + w N H N. Three components contribute to the popularity of a topic The popularity at the previous time-step τ (exponential forgiving) The estimated popularity in the stream of tweets (first order derivative) The estimated popularity in the stream of news (first order derivative) Weights w T and w N are set equal in our experiments Topic popularity is defined by the matrix Π=ZN, where Π(k) is popularity of news article n k Based on the topics it discusses Claudio Lucchese Big Data in Web Search 37

31 Learning to rank formulation We say that the relevance of a news n for a user u is a linear combination of content, social relevance and topic popularity R (u, n) = (u, n) + (u, n) + (n) This can be thought as a ranking problem: given a set of news at time τ, sort them according to their relevance R τ (u,n) and propose the best to the user u. Learn a ranking/relevance function that promotes clicked news Find the best α, β,γ such that clicked news are ranked higher than non clicked ones. This can be mapped into a Support Vector Machines formulations Claudio Lucchese Big Data in Web Search 38

32 Dataset used 1 Million English tweets by 3,214 users (May 2011) 40,000 news articles from the Yahoo! News portal containing at least one topic mentioned in the tweet stream Yahoo! Toolbar data for clicks on news articles We also used toolbar data to link web users with twitter users Assuming his twitter account is the most visited (filtering out popular public persons, celebrities, etc.) The training and test set are built under the assumption that a clicked news article must be the top ranked at the time of its publication. Claudio Lucchese Big Data in Web Search 39

33 1 T.Rex: 10 Twitter-based 100 News 1000 Recommendation System: Entity Predicting clicked entities stribution of entities (b) News. j=1 where G[j] is the entity intersection with the clicked news (rescaled in 0,,4) Average DCG DCG(N) = T.Rex+ T.Rex Popularity Content Social Recency Click count N X G[j] log(j + 1) Rank Claudio Lucchese Big Data in Web Search 40

34 What s Next The mining of twitter data requires large-scale tools, everything was implemented with MapReduce. Streaming data mining tools should be adopted More complex/interesting models instead of linear combination Correlation of twitter streams with other streams, e.g. query logs Claudio Lucchese Big Data in Web Search 41

35 Outline Some recent Learning to Rank activities User Task Discovery in Query Logs Lucchese, C., Orlando, S., Perego, R., Silvestri, F., & Tolomei, G. Discovering tasks from search engine query logs. ACM Transactions on Information Systems (TOIS), 31(3), ACM Notable Article. Entity Linking D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, S. Trani. Learning relatedness measures for entity linking. In Proceedings of CIKM '13: ACM Int. Conference on Information and Knowledge Management, p , Oct News Recommendation G. De Francisci Morales, A. Gionis, C. Lucchese. From Chatter to Headlines: Harnessing the Real-Time Web for Personalized News Recommendation. In Proceedings of WSDM '12: ACM Int. Conference on Web Search and Data Mining, Seattle, Washington, USA, February Tour planning Brilhante, I., Macedo, J. A., Nardini, F. M., Perego, R., & Renso, C. Where shall we go today?: planning touristic tours with tripbuilder. In Proceedings of the CIKM '13: ACM Int. Conference on Information and Knowledge Management, pp Oct Claudio Lucchese Big Data in Web Search 42

36 (credits to David Crandall et al., Cornell University) Claudio Lucchese Big Data in Web Search 43

37 Research Challenges The analysis of a large and noisy collection of social geo-tagged photos poses several challenges: 1. Clean and organize the collection in semantically coherent clusters 2. Associate relevant PoIs with these clusters 3. Devise routes of tourists through these PoIs and characterize as precisely as possible the behaviors of tourists 4. Extract and Exploit such knowledge pontevecchi o trip firenze palazzo vecchio canon florence Our research question is: How to provide personalized PoI recommendations?

38 How much can we understand from photos?

39 Visual Clustering Goal: to reduce the cost of computing similarity H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. Computer Vision ECCV 2006 G Csurka, C. Dance, L Fan, J Willamowski, and C. Bray. Visual categorization with bags of keypoints. ECCV, 2004.

40 Labeling with tags Two key ideas: Using the spatial relevance of tags Measure: ratio between the tag area and the overall geographical area analyzed Using the social relevance of tags Measure: number of different users using a given tag

41 Clustering and enriching Flickr photos Lucca, Toscana, Vacanze 2011, Tuscany, Italy, Summer 2011, Oval Square, Piazza dell anfiteatro, Piazza del mercato, square, happy new year, windows, balcony, canon, nikon, Claudio Lucchese Big Data in Web Search 48

42 Enriching older photos Claudio Lucchese Big Data in Web Search 49

43 Mining Trajectories from Flickr Colosseum 3 photos 01/07/2013 9:00-12:00 Ruins 2 photos 01/07/ :30-15:00 Devise patterns of tourists behavior... Trevi Fountain 2 photos 01/07/ :42-16:00

44 Planning Sightseeing Tours with TripBuilder Golden Gate Bridge What should I visit 4"h" in San Francisco? Golden Gate Park California Academy of Sciences Given: 4"h" Time: de Young 2 Museum days; My preferences. 8"h" San Francisco Museum of Modern Art Aquarium of the Bay Alcatraz How many How do of these other tourists trajectories visit can such I enjoy? places?

45 The TripCover Problem Given:' A"set"of"popular"trajectories" crossing"a"set"of"pois"and" their"8me"cost"" The"relevance'of"the" trajectories"w.r.t."the" category"set" The'Time"Budget'and' Preferences'of"a"user"" A"measure"of"PoI6User' interest' Find:' the"subset"of"trajectories"that" maximizes"user"interest"and" fits"in"the"8me"budget" TripCover'is"an"instance"of"the"Generalized'Maximum'Coverage"(GMC)" problem."npihard"with"a"(e/(ei1))iapproxima8on"algorithm."

46 TrajSP: joining the trajectories TripCover solution is a set of trajectories fitting user interest and time budget Local search heuristics for connecting the solution in a single sightseeing tour l(i, k) = 4 e(i) e(i) i n(k) i n(k) i n(i) k n(i) k k e(k) e(k) (a) (b) (c) l(i, k) = 3 e(i) n(e(i)) e(i) n(e(i)) i i i n(i) e(k) n(i) e(k) n(k) k n(k) k (d) (e) (f) k Claudio Lucchese Big Data in Web Search 53

47 Claudio Lucchese Big Data in Web Search 54

48 The End! "Data! Data! Data!" he cried impatiently. "I can't make bricks without clay." Claudio Lucchese Big Data in Web Search 55

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

Fast Data in the Era of Big Data: Twitter s Real-

Fast Data in the Era of Big Data: Twitter s Real- Fast Data in the Era of Big Data: Twitter s Real- Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Presented by: Rania Ibrahim 1 AGENDA Motivation

More information

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

More information

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014 Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview

More information

Big Data: Image & Video Analytics

Big Data: Image & Video Analytics Big Data: Image & Video Analytics How it could support Archiving & Indexing & Searching Dieter Haas, IBM Deutschland GmbH The Big Data Wave 60% of internet traffic is multimedia content (images and videos)

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Fast Matching of Binary Features

Fast Matching of Binary Features Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

More information

How To Cluster On A Search Engine

How To Cluster On A Search Engine Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

More information

How To Make Sense Of Data With Altilia

How To Make Sense Of Data With Altilia HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group Big Data and Its Implication to Research Methodologies and Funding Cornelia Caragea TARDIS 2014 November 7, 2014 UNT Computer Science and Engineering Data Everywhere Lots of data is being collected and

More information

Machine Learning over Big Data

Machine Learning over Big Data Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed

More information

Improving Search by using Query Logs and a bit of Seman9cs

Improving Search by using Query Logs and a bit of Seman9cs Ph.D. Workshop January 30th, 2012 Improving Search by using Query Logs and a bit of Seman9cs Diego Ceccarelli University of Pisa, Department of Computer Science High Performance Compu9ng Laboratory ISTI-

More information

Mammoth Scale Machine Learning!

Mammoth Scale Machine Learning! Mammoth Scale Machine Learning! Speaker: Robin Anil, Apache Mahout PMC Member! OSCON"10! Portland, OR! July 2010! Quick Show of Hands!# Are you fascinated about ML?!# Have you used ML?!# Do you have Gigabytes

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

More information

Sentiment analysis using emoticons

Sentiment analysis using emoticons Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was

More information

Content Delivery Networks. Shaxun Chen April 21, 2009

Content Delivery Networks. Shaxun Chen April 21, 2009 Content Delivery Networks Shaxun Chen April 21, 2009 Outline Introduction to CDN An Industry Example: Akamai A Research Example: CDN over Mobile Networks Conclusion Outline Introduction to CDN An Industry

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,

More information

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED 17 19 June 2013 Monday 17 June Salón de Actos, Facultad de Psicología, UNED 15.00-16.30: Invited talk Eneko Agirre (Euskal Herriko

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline

More information

Analysis of Social Media Streams

Analysis of Social Media Streams Fakultätsname 24 Fachrichtung 24 Institutsname 24, Professur 24 Analysis of Social Media Streams Florian Weidner Dresden, 21.01.2014 Outline 1.Introduction 2.Social Media Streams Clustering Summarization

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past - Traditional

More information

Enhanced Information Access to Social Streams. Enhanced Word Clouds with Entity Grouping

Enhanced Information Access to Social Streams. Enhanced Word Clouds with Entity Grouping Enhanced Information Access to Social Streams through Word Clouds with Entity Grouping Martin Leginus 1, Leon Derczynski 2 and Peter Dolog 1 1 Department of Computer Science, Aalborg University Selma Lagerlofs

More information

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Juntao Lai *1, Tao Cheng 1, Guy Lansley 2 1 SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental &Geomatic Engineering,

More information

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015 1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

A Logistic Regression Approach to Ad Click Prediction

A Logistic Regression Approach to Ad Click Prediction A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi kondakin@usc.edu Satakshi Rana satakshr@usc.edu Aswin Rajkumar aswinraj@usc.edu Sai Kaushik Ponnekanti ponnekan@usc.edu Vinit Parakh

More information

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Big Data in The Web. Agenda. Big Data Asking the Right Questions Wisdom of Crowds in the Web The Long Tail Issues and Examples Concluding Remarks

Big Data in The Web. Agenda. Big Data Asking the Right Questions Wisdom of Crowds in the Web The Long Tail Issues and Examples Concluding Remarks Big Data in The Web Ricardo Baeza-Yates Yahoo! Labs Barcelona & Santiago de Chile Agenda Big Data Asking the Right Questions Wisdom of Crowds in the Web The Long Tail Issues and Examples Concluding Remarks

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

Identifying SPAM with Predictive Models

Identifying SPAM with Predictive Models Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014 Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

More information

Fast Analytics on Big Data with H20

Fast Analytics on Big Data with H20 Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Spark and the Big Data Library

Spark and the Big Data Library Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Scalable Machine Learning - or what to do with all that Big Data infrastructure

Scalable Machine Learning - or what to do with all that Big Data infrastructure - or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection

More information

Automating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng.

Automating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng. Automating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng. Paolo Nesi Dipartimento di Ingegneria dell Informazione, DINFO Università

More information

Big Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify

Big Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify Big Data at Spotify Anders Arpteg, Ph D Analytics Machine Learning, Spotify Quickly about me Quickly about Spotify What is all the data used for? Quickly about Spark Hadoop MR vs Spark Need for (distributed)

More information

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Software Engineering for Big Data CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Big Data Big data technologies describe a new generation of technologies that aim

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

Optimization of Image Search from Photo Sharing Websites Using Personal Data

Optimization of Image Search from Photo Sharing Websites Using Personal Data Optimization of Image Search from Photo Sharing Websites Using Personal Data Mr. Naeem Naik Walchand Institute of Technology, Solapur, India Abstract The present research aims at optimizing the image search

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data Case Study 2: Document Retrieval Parallel Programming Map-Reduce Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 31 st, 2013 Carlos Guestrin

More information

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics contents A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics Abstract... 2 Need of Social Content Analytics... 3 Social Media Content Analytics... 4 Inferences

More information

PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA

PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA Harnessing the combined power of SAP HANA and PARC s HiperGraph graph analytics technology for real-time insights

More information

NetView 360 Product Description

NetView 360 Product Description NetView 360 Product Description Heterogeneous network (HetNet) planning is a specialized process that should not be thought of as adaptation of the traditional macro cell planning process. The new approach

More information

How To Use Big Data For Telco (For A Telco)

How To Use Big Data For Telco (For A Telco) ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Teaching Scheme Credits Assigned Course Code Course Hrs./Week. BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics. Theory Marks

Teaching Scheme Credits Assigned Course Code Course Hrs./Week. BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics. Theory Marks Teaching Scheme Credits Assigned Course Code Course Hrs./Week Name Theory Practical Tutorial Theory Practical/Oral Tutorial Tota l BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics Examination Scheme

More information

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2

More information

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Big Data Mining Services and Knowledge Discovery Applications on Clouds Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades

More information

Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

More information

The Italian Hate Map:

The Italian Hate Map: I-CiTies 2015 2015 CINI Annual Workshop on ICT for Smart Cities and Communities Palermo (Italy) - October 29-30, 2015 The Italian Hate Map: semantic content analytics for social good (Università degli

More information

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant

More information

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business Instructor: Kunpeng Zhang (kzhang@rmsmith.umd.edu) Lecture-Discussions:

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

IEEE JAVA Project 2012

IEEE JAVA Project 2012 IEEE JAVA Project 2012 Powered by Cloud Computing Cloud Computing Security from Single to Multi-Clouds. Reliable Re-encryption in Unreliable Clouds. Cloud Data Production for Masses. Costing of Cloud Computing

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

Content-Based Image Retrieval

Content-Based Image Retrieval Content-Based Image Retrieval Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Image retrieval Searching a large database for images that match a query: What kind

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science Dr. Daisy Zhe Wang CISE Department University of Florida August 25th 2014 20 Review Overview of Data Science Why Data

More information

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Mining Large Datasets: Case of Mining Graph Data in the Cloud

Mining Large Datasets: Case of Mining Graph Data in the Cloud Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large

More information

Big Data Text Mining and Visualization. Anton Heijs

Big Data Text Mining and Visualization. Anton Heijs Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Predicting stocks returns correlations based on unstructured data sources

Predicting stocks returns correlations based on unstructured data sources Predicting stocks returns correlations based on unstructured data sources Mateusz Radzimski, José Luis Sánchez-Cervantes, José Luis López Cuadrado, Ángel García-Crespo Departamento de Informática Universidad

More information

Similarity Search in a Very Large Scale Using Hadoop and HBase

Similarity Search in a Very Large Scale Using Hadoop and HBase Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France

More information

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,

More information

Data + Science Towards Organizational Excellence. eric Choo

Data + Science Towards Organizational Excellence. eric Choo Data + Science Towards Organizational Excellence eric Choo Business Intelligence & Big Data Analytics SoftSource Big Data Solutions Business Intelligence & Data EVA Services D a t a E x p l o r a t i o

More information

Big Data Integration: A Buyer's Guide

Big Data Integration: A Buyer's Guide SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information