Challenges and Opportunities in Data Mining: Personalization



Similar documents
Data Mining for Web Personalization

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

How To Make Sense Of Data With Altilia

recommendation in e-commerce

Big Data Explained. An introduction to Big Data Science.

Introduction to Big Data & Basic Data Analysis. Freddy Wetjen, National Library of Norway.

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi

QDquaderni. UP-DRES User Profiling for a Dynamic REcommendation System E. Messina, D. Toscani, F. Archetti. university of milano bicocca

Modern (Computational) Approaches to Big Data Analytics. CSC 576 Computer Science, University of Rochester Instructor: Ji Liu

Augmented Search for Software Testing

4, 2, 2014 ISSN: X

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Big Data and utility function in bank services. Nikolay K. Vitanov 1

The Big Data Paradigm Shift. Insight Through Automation

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence

CPS 216: Advanced Database Systems (Data-intensive Computing Systems) Shivnath Babu

The Need for Training in Big Data: Experiences and Case Studies

Web Usage Mining. from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher

Conquering the Astronomical Data Flood through Machine

Recommendation Tool Using Collaborative Filtering

Introduction. A. Bellaachia Page: 1

Collaborative Filtering. Radek Pelánek

Big Data Analytics Process & Building Blocks

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

An Overview of Knowledge Discovery Database and Data mining Techniques

Big Data: Study in Structured and Unstructured Data

DATA MINING - SELECTED TOPICS

Data Mining Solutions for the Business Environment

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Introduction to Data Mining

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Transforming the Telecoms Business using Big Data and Analytics

Database Marketing, Business Intelligence and Knowledge Discovery

Web analytics: Data Collected via the Internet

Data Mining System, Functionalities and Applications: A Radical Review

Large-Scale Data Processing

BUSINESS ANALYTICS. Overview. Lecture 0. Information Systems and Machine Learning Lab. University of Hildesheim. Germany

Data Mining Part 5. Prediction

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Chapter 12: Web Usage Mining

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Hexaware E-book on Predictive Analytics

Survey on Big Data Using Data Mining

IT services for analyses of various data samples

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Challenges and Lessons from NIST Data Science Pre-pilot Evaluation in Introduction to Data Science Course Fall 2015

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

A Road map to More Effective Web Personalization: Integrating Domain Knowledge with Web Usage Mining

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Machine Learning using MapReduce

Developing the SMEs Innovative Capacity Using a Big Data Approach

Chapter 11. Managing Knowledge

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

An Architectural Pattern for Designing Intelligent Enterprise Systems

Semantically Enhanced Web Personalization Approaches and Techniques

Introduction to Data Mining

Big Data and Analytics: Challenges and Opportunities

Managing Knowledge. Chapter 11 8/12/2015

Managing Knowledge and Collaboration

RECOMMENDATION SYSTEM

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

The Value of Taxonomy Management Research Results

Role of Social Networking in Marketing using Data Mining

Intelligent Web Techniques Web Personalization

Pentaho Data Mining Last Modified on January 22, 2007

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Alexander Nikov. 7. ecommerce Marketing Concepts. Consumers Online: The Internet Audience and Consumer Behavior. Outline

Big Data Analytics and Healthcare

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Autonomy Consolidated Archive

life science data mining

FRAMEWORK FOR WEB PERSONALIZATION USING WEB MINING

HOW TO DO A SMART DATA PROJECT

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Web Data Mining: A Case Study. Abstract. Introduction

ONE platform for ALL YOUR DATA Radim Petrzela February 26 th, 2013

Information Management course

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Can we Analyze all the Big Data we Collect?

Extreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh Stratis Viglas Extreme Computing 1

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

User Data Analytics and Recommender System for Discovery Engine

A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace

BIG DATA MARKETING: THE NEXUS OF MARKETING, ANALYSTS, AND IT

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Big Data: Image & Video Analytics

Cleaned Data. Recommendations

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Transcription:

Challenges and Opportunities in Data Mining: Big Data, Predictive User Modeling, and Personalization Bamshad Mobasher School of Computing DePaul University, April 20, 2012

Google Trends: Data Mining vs. Analytics 2

The Big Question? Will data mining remain relevant? If so, how? Quick survey: Do you think the amount of data available in the digital world will decrease in the future? will become less complex? Where is the Life we have lost in living? Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? -- T.S. Eliot, The Rock 3

How much data? Google: ~20-30 PB a day Wayback Machine has ~4 PB + 100-200 TB/month Facebook: ~3 PB of user data + 25 TB/day ebay: ~7 PB of user data + 50 TB/day CERN s Large Hydron Collider generates 15 PB a year In 2010, enterprises stored 7 Exabytes = 7,000,000,000 GB 640K ought to be enough for anybody.

The Data Tsunami McKinsy Global Institute Report: Big Data: the next frontier for innovation, competition and productivity 5

Big Data Value McKinsy Global Institute Report: Big Data: the next frontier for innovation, competition and productivity 6

7

8

What s Seen the Most Growth in 2008-2011 Types of Data Location / Geo / Mobile Data Music / Audio Social Media / Social Networks Time Series Images / Video User Profile data Text feeds / Micro-blog data Types of Activities/Areas Search / Web content mining Text mining / opinion analysis Personalization / recommendation Social network / Social media analysis Topic modeling / micro-blog analysis Health informatics Much of this growth is driven by end user mobile or Web-based applications users are inundated with huge volume of complex information need for more personalized intelligent applications 9

Personalization The Problem Dynamically serve customized content (pages, products, recommendations, etc.) to users based on their profiles, preferences, or expected interests Why we need it? Information spaces are becoming much more complex for user to navigate (huge online repositories, social networks, mobile applications, blogs,.) For businesses: need to grow customer loyalty / increase sales Industry Research: successful online retailers are generating as much as 35% of their business from recommendations 10

Data Mining and Personalization Killer App for data mining? Tangible successes both in the research and in industrial applications recommender systems personalized Web agents user adaptive systems Web marketing and ecrm personalized search Sophisticated modeling approaches based on both predictive and unsupervised DM techniques 11

Personalization Common Approaches Collaborative Filtering Give recommendations to a user based on preferences of similar users Content-Based Filtering Give recommendations to a user based on items with similar content in the user s profile Rule-Based (Knowledge-Based) d) Filteringi Provide recommendations to users based on predefined (or learned) rules age(x, 25-35) and income(x, 70-100K) and childred(x, >=3) recommend(x, Minivan) Combined or Hybrid Approaches 12

The Recommendation Task Basic formulation as a prediction problem Given a profile P u for a user u, and a target item i t, predict the preference score of user u on item i t Typically, the profile P u contains preference scores by u on some other items, {i 1,, i k } different from i t preference scores on i 1,, i k may have been obtained explicitly (e.g., movie ratings) or implicitly itl (e.g., time spent on a product page or a news article) 13

The Recommendation Task Content-Based Recommendation Predictions for unseen (target) items are computed based on their similarity (in terms of content) to items in the user profile Collaborative Recommendation Predictions for unseen (target) items are computed based the other users with similar interest scores on items in user u s profile i.e. users with similar tastes (aka nearest neighbors ) requires computing correlations between user u and other users according to interest scores or ratings k-nearest-neighbor (knn) strategy 14

Content-Based Recommender Systems 15

Content-Based Recommenders: Personalized Search How can the search engine determine the user s intent?? Query: Madonna and Child Need to learn the user profile:? User is an art historian? User is a pop music fan? 16

Content-Based Recommenders :: more examples Music recommendations Play list generation Example: Pandora 17

Collaborative Recommender Systems 18

Collaborative Recommender Systems 19

Collaborative Recommender Systems 20

Personalization Based on User Behavior Data: Data Mining Approach Data Preparation / Modeling Phase Typically an Offline Process Pattern Discovery Phase Implicit or explicit User preference data (clicktrhoughs, ratings, purchases, reviews Pattern Analysis Pattern Filtering Aggregation Characterization Content & Structure Data Preprocessing Data Cleaning Data Integration Data Transformation Event Model Generation Sessionization Data Mining Patterns Aggregate User Models Domain Knowledge User Transaction / Preference Database User Segmentation Item Clustering / Similarity User/Item Classification Correlation Analysis Association Rule Mining Sequential Pattern Mining 21

Personalization Based on User Behavior Data: Data Mining Approach Online Process Recommendation Engine Aggregate User Models <user,item1,item2, > Stored User Profile Integrated User Profile Recommendations, Predictions Domain Knowledge Active Session Web Server Client Application 22

New Challenges Context-Awareness Can systems stems understand user s context, t situation, current intentions? Need to understand task being gp performed; user s environment, domain knowledge/characteristics; short-term and long-term preferences Integrating ti Domain Knowledge Most current modeling approaches focus on the discovery of shallow patterns DM + Domain Knowledge (DM + AI) intelligent apps that can reason about / explain patterns 23

New Challenges Security / Trust / Reputation Many user adaptive systems vulnerable to malicious manipulation (e.g., shilling ) Need more robust algorithms and ways to detect malicious profiles In social systems the notion of reputation ti beocmes critical Serendipity Most predictive models not necessarily the best Need the ability to surprise or provide novelty Big Data Challenges Questions of scale require new frameworks and algorithms Wide variation in user behaviors require more sophisticated models (e.g., matrix factorization, hybrid / ensemble models) 24

Challenges:: Problems of Scale 25

New Opportunities:: Social Annotation Systems 26

Amazon Example: Tags describe the Resource Tags can describe The resource (genre, actors, etc) Organizational (toread) Subjective (awesome) Ownership (abc) etc

Tag Recommendation

Example: Tags describe the user These systems are collaborative. Recommendation / Analytics based on the wisdom of crowds. Rai Aren's profile co-author Secret of the Sands"

New Opportunities:: Social Recommendation A form of collaborative filtering using social network data Users profiles represented as sets of links to other nodes (users or items) in the network Prediction problem: infer a currently non-existent link in the network 30

Conclusions Personalization and Recommendation Technologies The killer app for predictive data analytics Will drive the next generation of Web applications Lots of new (and old) challenges New: Social media and social networks provide new challenges and opportunities; big data challenges scalability and effectiveness of old algorithms Old: scalability, sparsity, scrutability, serendipity Promising new work: New approaches to hybridization Social media analytics Context-aware recommendation / personalization 31