1 Big Data Analytics in Mobile Environments 熊 辉 教 授 罗 格 斯 - 新 泽 西 州 立 大 学 2012-10-2 Rutgers, the State University of New Jersey
Why big data: historical view? Productivity versus Complexity (interrelatedness, ambiguity) Complex versus Complicated While the complicated can be unfolded for analysis, the complex cannot. Big Machines Software Connectivity Knowledge for Business Operation Hardware Storage Big Data
Similarities Between Data Miners and Doctors Very Often, No Standardized Solutions Data Characteristics Data Mining Techniques Medical Devices
So What is Big Data? Big Data refers to datasets that grow so large that it is difficult to capture, store, manage, share, analyze and visualize with the typical database software tools.
What Makes it Big Data? SOCIAL BLOG SMART METER 101100101001 001001101010 101011100101 010100100101 VOLUME VELOCITY VARIETY VALUE Big is also a relative concept. Data Size / Solution-Time-Window >= Computing Capacity Per Time Unit
Big Data Use Cases Today s Challenge New Data What s Possible Healthcare Expensive office visits Hospital Dynamics Manufacturing In-person support Location-Based Services Based on home zip code Finance Fast-paced, Variety Retail One size fits all marketing Remote patient monitoring, Hospital Sensors Product sensors Real time location data Social Media, Highfrequency Trading Data Market basket data, user behavior logs Preventive care, reduced hospitalization, reduced human mistakes Automated diagnosis, customized support Geo-advertising, urban computing, mobile recommendation Sentiment analysis Finance engineering Personalized Recommendation, Segmentation
10 Ways Mobile Tech Is Changing Our World 7 1. Elections Will Never Be The Same 1. Doing Good By Texting 2. Bye-Bye, Wallets 3. The Phone Knows All 4. Your Life Is Fully Mobile 5. The Grid Is Winning 6. A Camera Goes Anywhere 7. Toys Get Unplugged 8. Gadgets Go To Class 9. Disease Can't Hide
Human Mobility Human mobility is people s movement trajectories which can be Phone traces or trajectories of driving routes a sequences of posts (like geo-tweets, geo-tagged photos, or check-ins) Indoor Traces and Outdoor Traces.
Urban Geography Urban geography is a set of geographic characteristics of a city including road networks, public transportation places of interest (POIs), regional functions
Public transportation data
Point of Interests (POI)
Outdoor Location Traces Taxi GPS trajectories
Data Miners in Big Data Analytics Big Data Analytics Understand goals of business Collaborate in interdisciplinary teams Integrate large volumes of structured and unstructured data Formulate problems, develop solutions Blend statistical modeling, data mining, forecasting, optimization Develop/run integrated software solutions Gain higher visibility Change business operation
14 Big Data Application Requirements Timely observation Timely analysis Timely solution
Big Data Application Trends 15 Micro-scope Macro-scope Personalized Recommendation Urban Computing Micro-Scope Classic Macro-Scope Applications/Techniques
Data Driven Solutions 16 Theoretical top-down solutions Data driven bottomup solutions
Big Data Application Requirements 17 Technical Knowledge Domain Knowledge Big Data Application
Big Data Experiences 18 Understanding Data Characteristics Data Distribution, Data Quality etc. Feature Engineering Feature engineering is one of the key strategy for the success of big data analytics. The goal is to explicitly reveal important information to the model by feature selection or feature generation Original features different encoding of the features combined features Instance Selection (particularly mobile environment) The goal is to select the right instances/objects for the underlying data analytics
Mobile Recommender Systems 19 Background Revolution in Mobile Devices GPS WiFi Mobile phone The Urgent Demand for Better Service Driving route suggestion Mobile tourist guides Definition Mobile pervasive recommendation is promised to provide mobile users access to personalized recommendations anytime, anywhere.
Mobile Recommender Systems 20 Route Recommendation Travel Recommendation
Challenges for Mobile 21 Recommendation (I) Complexity of the Mobile Data Heterogeneous Spatial and temporal auto-correlation Noisy The Validation Problem No Ratings The Generality Problem Different application domains with different recommendation techniques
22 Challenges for Mobile Recommendation (II) The Cost Constraints Time Price The Life Cycle Problem The Transplantation Problem Difficult to apply traditional Recommendation techniques for mobile recommendation
23 The Characteristics of Mobile Data Two Cases Case 1. Location trace by taxi drivers Case2. The tourism data Why? A good coverage of unique characteristics of mobile data Can be naturally exploited for developing mobile recommender systems They are the real-world data
24 The Characteristics of Mobile Data Case 1. Location trace by taxi drivers Data Description GPS traces Location information (Longitude, Latitude), timestamp The operation status (with or without passengers) Experienced drivers can usually have more driving hours and high occupancy rates Inexperienced drivers tend to have less driving hours and low occupancy rates
The Characteristics of Mobile Data 25 Case 1. Location trace by taxi drivers Driving pattern comparison The experienced drivers have a wider operation area. The experienced drivers know the roads as well the traffic patterns better.
26 The Characteristics of Mobile Data Case 1. Location trace by taxi drivers Develop a mobile recommender system Users ~ Taxi drivers Items ~ Potential pick-up points What did we learn? The difference between Mobile RS and traditional RS The items are application-dependent There is some cost to extract items The items are not i.i.d while spatial auto-correlation
The Characteristics of Mobile Data 27 Case2. The travel data Data Description Expense records Tourists: ID, travel time Package: ID, name, landscapes, price, travel days Duration: 2000 2010 Recommender System Users ~ Tourists Items ~ Packages
28 The Characteristics of Mobile Data Case2. The travel data Characteristics of Tourism Data (I) Spatial auto correlation of packages For example, the 1-day Niagara Falls Tour The Sparseness Much sparser than the Netflix data set.
The Characteristics of Mobile Data 29 Case2. The travel data Characteristics of Tourism Data(II) The time dependence Packages and tourists have seasonal tendency Packages have a life cycle Short-period packages are more popular
Theoretical Abstraction 30 Given: a set of objects O={O 1, O 2,, O n } Find: An ordered subset S={S 1, S 2,, S k } O The order of S 1, S 2,, S k is optimized subject to certain constraints. For taxi driver recommendation, the set O is a set of potential pick-up points For travel package recommendation, the set O is a set of landscapes.
Data Driven Solutions 31 Theoretical top-down solutions Data driven bottomup solutions
JOBS: Projected shortage of 140,000-190,000 people with deep analytical talent in the US by the year 2018. Source: Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, May 2011. Source: Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, June 2011.
Data Miners in Big Data Analytics Big Data Analytics Understand goals of business Collaborate in interdisciplinary teams Integrate large volumes of structured and unstructured data Formulate problems, develop solutions Blend statistical modeling, data mining, forecasting, optimization Develop/run integrated software solutions Gain higher visibility Change business operation
Thank You! 34 My WEB site: http://datamining.rutgers.edu
Reference 35 Yong Ge, Hui Xiong, Alexander Tuzhilin, Keli Xiao, Marco Gruteser, Michael J. Pazzani, An Energy-Efficient Mobile Recommender System, the 16th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2010), pp. 899-908, 2010. Yong Ge, Qi Liu, Hui Xiong, Alexander Tuzhilin, Jian Chen, Costaware Travel Tour Recommendation, the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2011), to appear, 2011. Qi Liu, Yong Ge, Zhongmou Li, Enhong Chen, Hui Xiong, Personalized Travel Package Recommendation, the 11th IEEE International Conference on Data Mining (ICDM 2011) (ICDM 2011), Best Research Paper Award, 2011. Yong Ge, Chuanren Liu, Hui Xiong, A Taxi Business Intelligence System, the 17th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2011), to appear,2011.
Reference 36 Chuanren Liu, Hui Xiong, Yong Ge, Wei Geng, Matt Perkins. A Stochastic Model for Context-Aware Anomaly Detection in Indoor Location Traces. the 12th IEEE Conference on Data Mining (ICDM 2012), to appear, 2012. Baik Hoh, Marco Gruteser, Hui Xiong, Ansaf Alrabady, Preserving Privacy in GPS Traces via Uncertainty-Aware Path Cloaking, the 14th ACM Conference on Computer and Communication Security, (ACM CCS), pp. 161-171, 2007. Jing Yuan, Yu Zheng, Xing Xie: Discovering regions of different functions in a city using human mobility and POIs. the 18th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining ( KDD 2012), pp. 186-194, 2012.