1 Recommendations in Mobile Environments Professor Hui Xiong Rutgers Business School Rutgers University ADMA-2014 Rutgers, the State University of New Jersey
Big Data
3 Big Data Application Requirements Timely observation Timely analysis Timely solution
Big Data Talents 4 Technical Knowledge Domain Knowledge Big Data Applications
Data Driven Solutions 5 Theoretical top-down solutions Data driven bottomup solutions
Data Miners in Big Data Analytics Big Data Analytics Understand goals of business Collaborate in interdisciplinary teams Integrate large volumes of structured and unstructured data Formulate problems, develop solutions Blend statistical modeling, data mining, forecasting, optimization Develop/run integrated software solutions Gain higher visibility Change business operation
10 Ways Mobile Tech Is Changing Our World 7 1. Elections Will Never Be The Same 1. Doing Good By Texting 2. Bye-Bye, Wallets 3. The Phone Knows All 4. Your Life Is Fully Mobile 5. The Grid Is Winning 6. A Camera Goes Anywhere 7. Toys Get Unplugged 8. Gadgets Go To Class 9. Disease Can't Hide
Human Mobility Human mobility is people s movement trajectories which can be Phone traces or trajectories of driving routes a sequences of posts (like geo-tweets, geo-tagged photos, or check-ins) Indoor Traces and Outdoor Traces.
Urban Geography Urban geography is a set of geographic characteristics of a city including road networks, public transportation places of interest (POIs), regional functions
Public transportation data
Point of Interests (POI)
Outdoor Location Traces Taxi GPS trajectories
Big Data Application Trends 13 Micro-scope Macro-scope Personalized Recommendation Urban Computing Micro-Scope Classic Macro-Scope Applications/Techniques
Big Data Experiences 14 Understanding Data Characteristics Data Distribution, Data Quality etc. Feature Engineering Feature engineering is one of the key strategy for the success of big data analytics. The goal is to explicitly reveal important information to the model by feature selection or feature generation Original features different encoding of the features combined features Instance Selection (particularly mobile environment) The goal is to select the right instances/objects for the underlying data analytics
Mobile Recommender Systems 15 Background Revolution in Mobile Devices GPS WiFi Mobile phone The Urgent Demand for Better Service Driving route suggestion Mobile tourist guides Definition Mobile pervasive recommendation is promised to provide mobile users access to personalized recommendations anytime, anywhere.
Recommender Systems 16 Definition Recommend the right content and products (items) to the right user at the right time so as to archive the most relevant experience. Dependence Most Popular What s most engaging overall? Personalized Recommendation What s most relevant to me based on my interests and attributes? Related items Behavioral Affinity: People who did X, did Y Similarity: Based on metadata
Traditional Recommender Systems 17
Traditional Recommender System 18 Problem Formalization With Rating Without Rating Item-1 Item-2 Item-3 Item-4 Item-5 User-1 4? 2 5? User-2 3 2 1?? User-3? 2? 3? User-4? 3 3 5 4 Item-1 Item-2 Item-3 Item-4 Item-5 User-1 1? 1 1? User-2 1 1 1?? User-3? 1? 1? User-4? 1 1 1 1
Classification of Algorithms 19 Content-based Method Thinking In C++ C++ Hello world Java Hello world Search papers Sequence pattern Social mining network Data mining Text mining Web mining Auto abstract Machine learning Text classify NLP Windows Linux Unix Write Good papers NLU
Classification of Algorithms 20 Collaborative Filtering Method User-based CF new
Classification of Algorithms 21 Collaborative Filtering Method Item-based CF new
Mobile Recommender Systems 22 Route Recommendation Travel Recommendation
Challenges for Mobile 23 Recommendation (I) Complexity of the Mobile Data Heterogeneous Spatial and temporal auto-correlation Noisy The Validation Problem No Ratings The Generality Problem Different application domains with different recommendation techniques
25 Challenges for Mobile Recommendation (II) The Cost Constraints Time Price The Life Cycle Problem The Transplantation Problem Difficult to apply traditional Recommendation techniques for mobile recommendation
26 The Characteristics of Mobile Data Two Cases Case 1. Location trace by taxi drivers Case2. The tourism data Why? A good coverage of unique characteristics of mobile data Can be naturally exploited for developing mobile recommender systems They are the real-world data
27 The Characteristics of Mobile Data Case 1. Location trace by taxi drivers Data Description GPS traces Location information (Longitude, Latitude), timestamp The operation status (with or without passengers) Experienced drivers can usually have more driving hours and high occupancy rates Inexperienced drivers tend to have less driving hours and low occupancy rates
The Characteristics of Mobile Data 28 Case 1. Location trace by taxi drivers Driving pattern comparison The experienced drivers have a wider operation area. The experienced drivers know the roads as well the traffic patterns better.
29 The Characteristics of Mobile Data Case 1. Location trace by taxi drivers (KDD-2009) Develop a mobile recommender system Users ~ Taxi drivers Items ~ Potential pick-up points What did we learn? The difference between Mobile RS and traditional RS The items are application-dependent There is some cost to extract items The items are not i.i.d while spatial auto-correlation
The Characteristics of Mobile Data 30 Case2. The travel data (ICDM-2011) Data Description Expense records Tourists: ID, travel time Package: ID, name, landscapes, price, travel days Duration: 2000 2010 Recommender System Users ~ Tourists Items ~ Packages
31 The Characteristics of Mobile Data Case2. The travel data Characteristics of Tourism Data (I) Spatial auto correlation of packages For example, the 1-day Niagara Falls Tour The Sparseness Much sparser than the Netflix data set.
The Characteristics of Mobile Data 32 Case2. The travel data Characteristics of Tourism Data(II) The time dependence Packages and tourists have seasonal tendency Packages have a life cycle Short-period packages are more popular
Problem Formulation 33 : all probabilities of all pick-up points contained in PTD: Potential travel distance
An Example 34 PoCab -> C1 -> C4 or PoCab -> C4 ->C3 -> C2?
Two Challenges 35 Mining reliable pick-up point with probability information Computation challenge to search the optimal route
MSR Problem with Constraints 36 Computational complexity:
37 Recommending Point Generation High-performance Drivers Sufficient Driving Hours High Occupancy Rate Clustering based on Driving Distance Clustering close pick-up points into one pick-up cluster Probability Calculation Ratio of Pick-up Events Happening when cab travels across pick-up clusters
38 Potential Travel Distance Function: Two Vectors: PoCab: the position of a cab C1: one pick-up point P(C1): the probability of pick-up event D1: the driving distance Distance Vector: Probability Vector: PTD Function:
Potential Travel Distance Function: 39 PoCab: the position of a cab C1: one pick-up point P(C1): the probability of pick-up event D1: the driving distance One Vector: Generally, for, PTD Function:
Important Property of PTD Function 40
41 Route Dominance By this definition, if a candidate route A is dominated by a candidate route B, A cannot be an optimal route
42 Constrained Sub-route Dominance Illustration: the Sub-route Dominance
43 If and dominates Then dominates
44 Enumerate all sub-routes with length of L Prune dominated constrained sub-routes with length of L Once effort to prune search space offline
Skyline Route 45 First find the skyline routes Search the optimal driving route from the set of skyline routes
Pruning for SkyRoute 46 Traditional Skyline computing algorithms time-consuming and large memory SkyRoute Prune some candidates including dominated subroutes, at a very early stage The search space will be significant reduced, since lots of candidates containing dominated sub-routes are discarded Gradual effort to continue prune the search space
47 First use LCP to prune some candidates Search Skyline Routes from the rest of candidates Gradually prune candidates containing dominated sub-routes Search the optimal route from the set of Skyline Routes
48 Travel Package Recommendation Observable cost of travel packages Unobserved user s cost preference Different financial and time affordability of different users Challenges How to incorporate explicit package cost information into traditional collaborative filtering models How to model and learn user s cost preference
49 Probabilistic Matrix Factorization (PMF) The log of the posterior distribution over user and item features is given by
50 Probabilistic Matrix Factorization (PMF) Models cpmf represents user cost with a vector GcPMF represents user cost with a 2-D Gaussian Distribution
Theoretical Abstraction 51 Given: a set of objects O={O 1, O 2,, O n } Find: An ordered subset S={S 1, S 2,, S k } O The order of S 1, S 2,, S k is optimized subject to certain constraints. For taxi driver recommendation, the set O is a set of potential pick-up points For travel package recommendation, the set O is a set of landscapes.
Data Driven Solutions 52 Theoretical top-down solutions Data driven bottomup solutions
53 Point-of-Interest (POI) Recommendations in Mobile Social Environments (KDD-2013)
POI Recommendation 54 Task: Recommending POIs based one users check-in history and other available information Existing Method: collaborative filtering Failed to consider the multiple factors for choosing a POI Lack of integrated analysis of the joint effect of the various factors Challenges Various factors can influence POI check-in: user preferences, geographical influences, and user mobility
What s Special about POI Recommendation? 55 User mobility Geographical Influence probability of a user visiting a POI is inversely proportional to the geographic distance Law of geography everything is related to everything else, but nearby things are more related than distant thing Implicit user feedback need to infer user preferences from implicit user feedback in terms of user check-in count data
Check-in Patterns 56 All POIs in different regions A user's check-ins in different regions User check-ins in San Francisco Need a model that jointly encodes the personalized preferences, spatial influence, user mobility into the user check-in decision process to learn geographical user preferences for effective POI recommendation
General Ideas 57 Users are most likely to check in a number of POIs and these POIs are usually limited to some geographical regions By Tobler's first law of geography, POIs with similar services are likely to be clustered into the same geographical area A user's propensity for a POI Personalized interest Satisfaction Distance Need region partition! Best personalization, maximum satisfaction, at lowest distance cost
Geographical Probabilistic Factor Model 58 red plate represents users, blue plate represents POIs, and purple plate represents latent regions
59 Model Components Personal Interest preferences: latent factor model User mobility model A user samples a region from all R regions following a multinomial distribution A POI is assigned a normal distribution Geographical influence Distance from region center to the POI the probability of a user choose a POI decays as the power-law of the distance between them
Poisson Geo-PFM 60 Why Poisson Poisson distribution is more suitable and effective for modeling the skewed user check-in count Rigorous probabilistic generative process for the model check-in counts dist. and Poisson approximation
Experimental Data 61 Foursquare, Gowalla, and Brightkite
Summary 62 A general framework to learn geographical preferences for POI recommendation Learned the geographical influence on a user's check-in behavior Effectively modeled the user mobility patterns Extended the latent factor in explicit rating recommendation to implicit feedback recommendation settings Proposed model is flexible and could be extended to incorporate different later factor models
Data Miners in Big Data Analytics Big Data Analytics Understand goals of business Collaborate in interdisciplinary teams Integrate large volumes of structured and unstructured data Formulate problems, develop solutions Blend statistical modeling, data mining, forecasting, optimization Develop/run integrated software solutions Gain higher visibility Change business operation
Thank You! 64 My WEB site: http://datamining.rutgers.edu