Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey

Similar documents

Big Data Analytics in Mobile Environments

Data Analysis on Location-Based Social Networks

Content-Based Recommendation

GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory

The Need for Training in Big Data: Experiences and Case Studies

SPATIAL DATA CLASSIFICATION AND DATA MINING

MOBILITY DATA MODELING AND REPRESENTATION

10/14/11. Big data in science Application to large scale physical systems

Applying Data Science to Sales Pipelines for Fun and Profit

Introduction to Data Mining

Integration of GPS Traces with Road Map

How To Solve The Kd Cup 2010 Challenge

BUSINESS ANALYTICS. Overview. Lecture 0. Information Systems and Machine Learning Lab. University of Hildesheim. Germany

Machine Learning using MapReduce

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

Information Management course

Location-Based Social Networks: Users

ICT Perspectives on Big Data: Well Sorted Materials

An Overview of Knowledge Discovery Database and Data mining Techniques

Advanced Methods for Pedestrian and Bicyclist Sensing

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

IPTV Recommender Systems. Paolo Cremonesi

Collaborative Filtering. Radek Pelánek

Sensors talk and humans sense Part II

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

DATA MINING TECHNIQUES AND APPLICATIONS

CABM System - The Best of CAB Software Solutions

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Experiments in Web Page Classification for Semantic Web

Estimating Twitter User Location Using Social Interactions A Content Based Approach

Proposed Advance Taxi Recommender System Based On a Spatiotemporal Factor Analysis Model

GPS Enabled Taxi Probe s Big Data Processing for Traffic Evaluation of Bangkok Using Apache Hadoop Distributed System

Bayesian networks - Time-series models - Apache Spark & Scala

PhoCA: An extensible service-oriented tool for Photo Clustering Analysis

ANALYTICS IN BIG DATA ERA

life science data mining

Big Data Text Mining and Visualization. Anton Heijs

Sanjeev Kumar. contribute

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

Data Mining Algorithms Part 1. Dejan Sarka

Big Data Analytics for United Security

Clustering Data Streams

Data Mining for Web Personalization

Business Intelligence, Predictive Analytics, and Geographic Information Systems

Marketing Mix Modelling and Big Data P. M Cain

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

The Scientific Data Mining Process

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

The primary goal of this thesis was to understand how the spatial dependence of

Load Balancing in Cellular Networks with User-in-the-loop: A Spatial Traffic Shaping Approach

Crowdsourcing mobile networks from experiment

Healthcare Measurement Analysis Using Data mining Techniques

A Time Efficient Algorithm for Web Log Analysis

Cloud Analytics for Capacity Planning and Instant VM Provisioning

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Unsupervised Data Mining (Clustering)

Big Data and Analytics: Challenges and Opportunities

Object Recognition. Selim Aksoy. Bilkent University

Data Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA)

Deep Insights Smart Decisions Motionlogic

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

Inferential Statistics. Data Mining. ASC September Proving value in complex analytics. 2 Rivers. Information and Data Management

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Star rating driver traffic and safety behaviour through OBD and smartphone data collection

Overview. Introduction. Recommender Systems & Slope One Recommender. Distributed Slope One on Mahout and Hadoop. Experimental Setup and Analyses

Oracle Real Time Decisions

CHAPTER VII CONCLUSIONS

Solera Networks, A Blue Coat Company SOLERA NETWORKS BIG DATA SECURITY ANALYTICS

House Committee on the Judiciary. Subcommittee on the Constitution, Civil Rights, and Civil Liberties

Invited Applications Paper

Data Mining Solutions for the Business Environment

A Statistical Text Mining Method for Patent Analysis

DATA WAREHOUSING AND ITS POTENTIAL USING IN WEATHER FORECAST

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Synergistic Sensor Location for Cost-Effective Traffic Monitoring

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

USING INTELLIGENT TRANSPORTATION SYSTEMS DATA ARCHIVES FOR TRAFFIC SIMULATION APPLICATIONS

Asian Institute of Technology, Pathum Thani, Thailand

Optimization of patient transport dispatching in hospitals

Aggregation of Spatio-temporal and Event Log Databases for Stochastic Characterization of Process Activities

Introduction. A. Bellaachia Page: 1

MCOM VEHICLE TRACKING SYSTEM MANUAL

Interactive information visualization in a conference location

Discovery of User Groups within Mobile Data

2015 Workshops for Professors

Data Mining Analytics for Business Intelligence and Decision Support

So, how do you pronounce. Jilles Vreeken. Okay, now we can talk. So, what kind of data? binary. * multi-relational

Asynchronous Data Mining Tools at the GES-DISC

Data2Diamonds Turning Information into a Competitive Asset

TEXT ANALYTICS INTEGRATION

A Game Theoretical Framework for Adversarial Learning

Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm

The Attendance Management System Based on Micro-cellular and GIS technology

Text Analytics. A business guide

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

Recommendation Tool Using Collaborative Filtering

Transcription:

1 Recommendations in Mobile Environments Professor Hui Xiong Rutgers Business School Rutgers University ADMA-2014 Rutgers, the State University of New Jersey

Big Data

3 Big Data Application Requirements Timely observation Timely analysis Timely solution

Big Data Talents 4 Technical Knowledge Domain Knowledge Big Data Applications

Data Driven Solutions 5 Theoretical top-down solutions Data driven bottomup solutions

Data Miners in Big Data Analytics Big Data Analytics Understand goals of business Collaborate in interdisciplinary teams Integrate large volumes of structured and unstructured data Formulate problems, develop solutions Blend statistical modeling, data mining, forecasting, optimization Develop/run integrated software solutions Gain higher visibility Change business operation

10 Ways Mobile Tech Is Changing Our World 7 1. Elections Will Never Be The Same 1. Doing Good By Texting 2. Bye-Bye, Wallets 3. The Phone Knows All 4. Your Life Is Fully Mobile 5. The Grid Is Winning 6. A Camera Goes Anywhere 7. Toys Get Unplugged 8. Gadgets Go To Class 9. Disease Can't Hide

Human Mobility Human mobility is people s movement trajectories which can be Phone traces or trajectories of driving routes a sequences of posts (like geo-tweets, geo-tagged photos, or check-ins) Indoor Traces and Outdoor Traces.

Urban Geography Urban geography is a set of geographic characteristics of a city including road networks, public transportation places of interest (POIs), regional functions

Public transportation data

Point of Interests (POI)

Outdoor Location Traces Taxi GPS trajectories

Big Data Application Trends 13 Micro-scope Macro-scope Personalized Recommendation Urban Computing Micro-Scope Classic Macro-Scope Applications/Techniques

Big Data Experiences 14 Understanding Data Characteristics Data Distribution, Data Quality etc. Feature Engineering Feature engineering is one of the key strategy for the success of big data analytics. The goal is to explicitly reveal important information to the model by feature selection or feature generation Original features different encoding of the features combined features Instance Selection (particularly mobile environment) The goal is to select the right instances/objects for the underlying data analytics

Mobile Recommender Systems 15 Background Revolution in Mobile Devices GPS WiFi Mobile phone The Urgent Demand for Better Service Driving route suggestion Mobile tourist guides Definition Mobile pervasive recommendation is promised to provide mobile users access to personalized recommendations anytime, anywhere.

Recommender Systems 16 Definition Recommend the right content and products (items) to the right user at the right time so as to archive the most relevant experience. Dependence Most Popular What s most engaging overall? Personalized Recommendation What s most relevant to me based on my interests and attributes? Related items Behavioral Affinity: People who did X, did Y Similarity: Based on metadata

Traditional Recommender Systems 17

Traditional Recommender System 18 Problem Formalization With Rating Without Rating Item-1 Item-2 Item-3 Item-4 Item-5 User-1 4? 2 5? User-2 3 2 1?? User-3? 2? 3? User-4? 3 3 5 4 Item-1 Item-2 Item-3 Item-4 Item-5 User-1 1? 1 1? User-2 1 1 1?? User-3? 1? 1? User-4? 1 1 1 1

Classification of Algorithms 19 Content-based Method Thinking In C++ C++ Hello world Java Hello world Search papers Sequence pattern Social mining network Data mining Text mining Web mining Auto abstract Machine learning Text classify NLP Windows Linux Unix Write Good papers NLU

Classification of Algorithms 20 Collaborative Filtering Method User-based CF new

Classification of Algorithms 21 Collaborative Filtering Method Item-based CF new

Mobile Recommender Systems 22 Route Recommendation Travel Recommendation

Challenges for Mobile 23 Recommendation (I) Complexity of the Mobile Data Heterogeneous Spatial and temporal auto-correlation Noisy The Validation Problem No Ratings The Generality Problem Different application domains with different recommendation techniques

25 Challenges for Mobile Recommendation (II) The Cost Constraints Time Price The Life Cycle Problem The Transplantation Problem Difficult to apply traditional Recommendation techniques for mobile recommendation

26 The Characteristics of Mobile Data Two Cases Case 1. Location trace by taxi drivers Case2. The tourism data Why? A good coverage of unique characteristics of mobile data Can be naturally exploited for developing mobile recommender systems They are the real-world data

27 The Characteristics of Mobile Data Case 1. Location trace by taxi drivers Data Description GPS traces Location information (Longitude, Latitude), timestamp The operation status (with or without passengers) Experienced drivers can usually have more driving hours and high occupancy rates Inexperienced drivers tend to have less driving hours and low occupancy rates

The Characteristics of Mobile Data 28 Case 1. Location trace by taxi drivers Driving pattern comparison The experienced drivers have a wider operation area. The experienced drivers know the roads as well the traffic patterns better.

29 The Characteristics of Mobile Data Case 1. Location trace by taxi drivers (KDD-2009) Develop a mobile recommender system Users ~ Taxi drivers Items ~ Potential pick-up points What did we learn? The difference between Mobile RS and traditional RS The items are application-dependent There is some cost to extract items The items are not i.i.d while spatial auto-correlation

The Characteristics of Mobile Data 30 Case2. The travel data (ICDM-2011) Data Description Expense records Tourists: ID, travel time Package: ID, name, landscapes, price, travel days Duration: 2000 2010 Recommender System Users ~ Tourists Items ~ Packages

31 The Characteristics of Mobile Data Case2. The travel data Characteristics of Tourism Data (I) Spatial auto correlation of packages For example, the 1-day Niagara Falls Tour The Sparseness Much sparser than the Netflix data set.

The Characteristics of Mobile Data 32 Case2. The travel data Characteristics of Tourism Data(II) The time dependence Packages and tourists have seasonal tendency Packages have a life cycle Short-period packages are more popular

Problem Formulation 33 : all probabilities of all pick-up points contained in PTD: Potential travel distance

An Example 34 PoCab -> C1 -> C4 or PoCab -> C4 ->C3 -> C2?

Two Challenges 35 Mining reliable pick-up point with probability information Computation challenge to search the optimal route

MSR Problem with Constraints 36 Computational complexity:

37 Recommending Point Generation High-performance Drivers Sufficient Driving Hours High Occupancy Rate Clustering based on Driving Distance Clustering close pick-up points into one pick-up cluster Probability Calculation Ratio of Pick-up Events Happening when cab travels across pick-up clusters

38 Potential Travel Distance Function: Two Vectors: PoCab: the position of a cab C1: one pick-up point P(C1): the probability of pick-up event D1: the driving distance Distance Vector: Probability Vector: PTD Function:

Potential Travel Distance Function: 39 PoCab: the position of a cab C1: one pick-up point P(C1): the probability of pick-up event D1: the driving distance One Vector: Generally, for, PTD Function:

Important Property of PTD Function 40

41 Route Dominance By this definition, if a candidate route A is dominated by a candidate route B, A cannot be an optimal route

42 Constrained Sub-route Dominance Illustration: the Sub-route Dominance

43 If and dominates Then dominates

44 Enumerate all sub-routes with length of L Prune dominated constrained sub-routes with length of L Once effort to prune search space offline

Skyline Route 45 First find the skyline routes Search the optimal driving route from the set of skyline routes

Pruning for SkyRoute 46 Traditional Skyline computing algorithms time-consuming and large memory SkyRoute Prune some candidates including dominated subroutes, at a very early stage The search space will be significant reduced, since lots of candidates containing dominated sub-routes are discarded Gradual effort to continue prune the search space

47 First use LCP to prune some candidates Search Skyline Routes from the rest of candidates Gradually prune candidates containing dominated sub-routes Search the optimal route from the set of Skyline Routes

48 Travel Package Recommendation Observable cost of travel packages Unobserved user s cost preference Different financial and time affordability of different users Challenges How to incorporate explicit package cost information into traditional collaborative filtering models How to model and learn user s cost preference

49 Probabilistic Matrix Factorization (PMF) The log of the posterior distribution over user and item features is given by

50 Probabilistic Matrix Factorization (PMF) Models cpmf represents user cost with a vector GcPMF represents user cost with a 2-D Gaussian Distribution

Theoretical Abstraction 51 Given: a set of objects O={O 1, O 2,, O n } Find: An ordered subset S={S 1, S 2,, S k } O The order of S 1, S 2,, S k is optimized subject to certain constraints. For taxi driver recommendation, the set O is a set of potential pick-up points For travel package recommendation, the set O is a set of landscapes.

Data Driven Solutions 52 Theoretical top-down solutions Data driven bottomup solutions

53 Point-of-Interest (POI) Recommendations in Mobile Social Environments (KDD-2013)

POI Recommendation 54 Task: Recommending POIs based one users check-in history and other available information Existing Method: collaborative filtering Failed to consider the multiple factors for choosing a POI Lack of integrated analysis of the joint effect of the various factors Challenges Various factors can influence POI check-in: user preferences, geographical influences, and user mobility

What s Special about POI Recommendation? 55 User mobility Geographical Influence probability of a user visiting a POI is inversely proportional to the geographic distance Law of geography everything is related to everything else, but nearby things are more related than distant thing Implicit user feedback need to infer user preferences from implicit user feedback in terms of user check-in count data

Check-in Patterns 56 All POIs in different regions A user's check-ins in different regions User check-ins in San Francisco Need a model that jointly encodes the personalized preferences, spatial influence, user mobility into the user check-in decision process to learn geographical user preferences for effective POI recommendation

General Ideas 57 Users are most likely to check in a number of POIs and these POIs are usually limited to some geographical regions By Tobler's first law of geography, POIs with similar services are likely to be clustered into the same geographical area A user's propensity for a POI Personalized interest Satisfaction Distance Need region partition! Best personalization, maximum satisfaction, at lowest distance cost

Geographical Probabilistic Factor Model 58 red plate represents users, blue plate represents POIs, and purple plate represents latent regions

59 Model Components Personal Interest preferences: latent factor model User mobility model A user samples a region from all R regions following a multinomial distribution A POI is assigned a normal distribution Geographical influence Distance from region center to the POI the probability of a user choose a POI decays as the power-law of the distance between them

Poisson Geo-PFM 60 Why Poisson Poisson distribution is more suitable and effective for modeling the skewed user check-in count Rigorous probabilistic generative process for the model check-in counts dist. and Poisson approximation

Experimental Data 61 Foursquare, Gowalla, and Brightkite

Summary 62 A general framework to learn geographical preferences for POI recommendation Learned the geographical influence on a user's check-in behavior Effectively modeled the user mobility patterns Extended the latent factor in explicit rating recommendation to implicit feedback recommendation settings Proposed model is flexible and could be extended to incorporate different later factor models

Data Miners in Big Data Analytics Big Data Analytics Understand goals of business Collaborate in interdisciplinary teams Integrate large volumes of structured and unstructured data Formulate problems, develop solutions Blend statistical modeling, data mining, forecasting, optimization Develop/run integrated software solutions Gain higher visibility Change business operation

Thank You! 64 My WEB site: http://datamining.rutgers.edu