Traffic Driven Analysis of Cellular Data Networks



Similar documents
Statistical machine learning, high dimension and big data

Load Balancing in Cellular Networks with User-in-the-loop: A Spatial Traffic Shaping Approach

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Regularized Logistic Regression for Mind Reading with Parallel Validation

Traffic Prediction in Wireless Mesh Networks Using Process Mining Algorithms

ANALYTICS IN BIG DATA ERA

Marketing Mix Modelling and Big Data P. M Cain

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

MOBILE DATA FORECASTING TOOLS AND METHODOLOGY TO IMPROVE ACCURACY AND OPTIMIZE PROFIT

Simple Linear Regression Inference

A Learning Based Method for Super-Resolution of Low Resolution Images

On Correlating Performance Metrics

BayesX - Software for Bayesian Inference in Structured Additive Regression

Backbone Capacity Planning Methodology and Process

Big Data Techniques Applied to Very Short-term Wind Power Forecasting

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

Chapter 4: Vector Autoregressive Models

Vector Time Series Model Representations and Analysis with XploRe

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Supervised Learning (Big Data Analytics)

Figure 1. An embedded chart on a worksheet.

Wireless Technologies for the 450 MHz band

White paper. Mobile broadband with HSPA and LTE capacity and cost aspects

On the effect of forwarding table size on SDN network utilization

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Stability of QOS. Avinash Varadarajan, Subhransu Maji

Least Squares Estimation

Integrated System Modeling for Handling Big Data in Electric Utility Systems

STA 4273H: Statistical Machine Learning

Java Modules for Time Series Analysis

WEEK #3, Lecture 1: Sparse Systems, MATLAB Graphics

Network (Tree) Topology Inference Based on Prüfer Sequence

Multi-layer MPLS Network Design: the Impact of Statistical Multiplexing

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

A Catechistic Method for Traffic Pattern Discovery in MANET

16 : Demand Forecasting

Course: Model, Learning, and Inference: Lecture 5

3. The Junction Tree Algorithms

Defining the Smart Grid WAN

Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Drugs store sales forecast using Machine Learning

There are a number of factors that increase the risk of performance problems in complex computer and software systems, such as e-commerce systems.

Probabilistic user behavior models in online stores for recommender systems

ANALYTICS IN BIG DATA ERA

Using Duration Times Spread to Forecast Credit Risk

240ST014 - Data Analysis of Transport and Logistics

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Keywords: Mobility Prediction, Location Prediction, Data Mining etc

VENDOR MANAGED INVENTORY

Broadband Quality in Public Libraries: Speed Test Findings and Results

Neural Network Add-in

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Measurement and Modelling of Internet Traffic at Access Networks

Statistical Models in Data Mining

Cluster Analysis: Advanced Concepts

Time Series Analysis of Aviation Data

Monotonicity Hints. Abstract

Assignment #3 Routing and Network Analysis. CIS3210 Computer Networks. University of Guelph

Graphical Modeling for Genomic Data

Parallelization Strategies for Multicore Data Analysis

Performance of TD-CDMA systems during crossed slots

Experiments on the local load balancing algorithms; part 1

Server Load Prediction

Multivariate Normal Distribution

Multivariate Logistic Regression

Lecture 1: Review and Exploratory Data Analysis (EDA)

Environmental Remote Sensing GEOG 2021

Empowering Developers to Estimate App Energy Consumption. Radhika Mittal, UC Berkeley Aman Kansal & Ranveer Chandra, Microsoft Research

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?*

Multiple Kernel Learning on the Limit Order Book

The Answer Is Blowing in the Wind: Analysis of Powering Internet Data Centers with Wind Energy

Statistics Graduate Courses

Cross Validation. Dr. Thomas Jensen Expedia.com

Use of deviance statistics for comparing models

College Readiness LINKING STUDY

Copyright. Network and Protocol Simulation. What is simulation? What is simulation? What is simulation? What is simulation?

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

The Coremelt Attack. Ahren Studer and Adrian Perrig. We ve Come to Rely on the Internet

Extracting correlation structure from large random matrices

Machine Learning for Data Science (CS4786) Lecture 1

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Evaluation of Machine Learning Techniques for Green Energy Prediction

Analysis of Bayesian Dynamic Linear Models

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Advanced Big Data Analytics with R and Hadoop

Transcription:

Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony Brook University Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot, Anand Prabhu Subramanian (Alcatel Lucent Bell Labs)

Mobile Data Usage Higher than the traffic volume in the entire Global Internet In 2006 10.8 EB / month 0.6 EB / month Forecast of Global Mobile Data Traffic Source: CISCO VNI Mobile 1 Exabyte = 1 million Terabyte Relatively little research on nature of mobile data traffic. 2

Modeling and Forecasting Traffic Management Traffic Analysis 3

Measurement Infrastructure Packet Flows Internet Mobility and Session Manager Flow Monitoring Tool Flow Records SQL Database Radio Access Network

Sample Results from Traffic Analysis Data collected from a nationwide 2G/3G network circa 2007 About 10K BSes, 1M subscribers. Significant traffic imbalance per subscriber and per BS 1% of subscribers create more than 60% of load. 10% of BSes experience more than 50% of load. Mobility is generally low More than 50% subscribers stick to just one BS daily. Median radius of gyration is ~1 mile.

Sample Results from Traffic Analysis Mobility is predictable Subscribers are almost always found in their top 2 3 most visited locations. They return to the same location at the same time of the day with high probability. More mobile subscribers tend to generate more traffic. Radio resource usage efficiency is very poor Much poorer for light users relative to heavy users.

Functional Influence Among BSes Model BS load as time series. Explore causal relationships between pairs of time series. Granger Causality Determines whether one time series is useful in forecasting another when using an autoregressive model. Has been used in economics and neuroscience. Statistically significant causality exists among neighboring BSes (roughly among half of the neighbors). Causality graph and causal path Make a graph out of causality. Long paths exist in this graph (median = 15 hops, 90 percentile = 37 hops).

Modeling Study Model BS traffic loads exploiting any interactions/dependencies Exploit tools from machine learning. Many possible directions purely static/spatial, dynamic/temporal. Goals: Intellectual broad understanding of any underlying structure would help future network architectures. Utilitarian models can help estimation/forecasting. Useful for various resource management.

Spatial Modeling Approach: Probabilistic Graphical Modeling Assume load on n base stations are multi variate Gaussian: Mean vector Covariance matrix Learn the parameters given a set of training data, specifically the inverse covariance matrix, given a set of training data (p observations). 1 is easier to estimate than and exposes interesting properties.

Inverse Covariance Matrix: Properties If then load variables X i and X j are conditionally independent, given the rest of the variables. Most problems produce a `sparse model. Related to probabilistic graphical models (e.g., Gaussian Markov Random Field). 1 3 Undirected Graphical Model > Edge 5 > no edge Graph properties translate to probabilistic (in)dependencies 2 4

Inference Problem Estimate load for BS i given the load of a subset of BSes S as the conditional mean: 1 3 5 2 4 Broad questions: How large should be S? Effort vs. accuracy tradeoff. How to choose S? Measure only a subset and estimate the rest.

First Solve the Learning Problem Learn the inverse covariance matrix from training data. How? Exploit relationship with linear regression modeling. Express load of BS i as a linear function of all other BS loads and then regress: Y i X j j i i Regression coefficients j can be shown to be directly related to inv. cov. matrix elements.

Sparse Models i Sparse model > many regression coeffs j are zero. Reduces danger of over fitting (lowering variance). Also, computationally efficient. Introduce a regularization term in regression. We used Lasso. Empirical error Regularization term modeling penalty

Regularization Cross validate using additional training samples (not used for model creation). Use various values of to create different models. Choose the one with max likelihood.

Data Processing Hourly load of 400 BSes covering 75 x 84 miles area. Includes a busy downtown and surrounding suburbs. No temporal dimension in model. Create different models for for different parts of the day (every 4 hours). Account for diurnal variation of load. Use residuals from a fitting function. Residuals pass normality test.

Average Edge Length in the Model Graph In miles In hops (in Voronoi Graph) Apparent spatial/regional significance.

Choosing the Measured Set S Greedy strategy each iteration picks the BS that minimizes the error estimate. Higher load first achieves almost similar performance.

Impact of Estimation Accuracy on Applications We understand the measurement complexity (size of S) vs. Error tradeoff. But how much accuracy do we need? Need to turn to applications Studied two applications Energy Management Opportunistic Traffic Scheduling

Opportunistic Traffic Scheduling Similar to Smart Electric Grid move non urgent traffic from peak to off peak periods. What is non urgent? p2p, large downloads, sync, push, etc. Who decides? User agent on mobile. May have multiple levels of priority or have deadlines to aid scheduling. Carriers can incentivize such scheduling. Similar to QoS scheduling but at a higher layer and at a longer time scale. Two components in System Architecture Server (Scheduler) in core network. User agent on mobile that coordinates with Server.

Server (scheduler) in the core network Creates low-priority flow Deadline=2hr Time Line 2PM 2:30PM 3PM 3:30PM 20

Solving the Scheduling Problem Several approaches possible based on how flows are prioritized. But for any approach, server needs to be able estimate current/future loads at all BSes. Also, needs to model/estimate subscriber mobility (separate problem). Poor estimation leads to poor scheduler performance.

Evaluation Approach Trace driven simulator based on a capacity model of BSes. Opportunistic scheduling is meant to admit more traffic but with the same network capacity. We use the same traffic trace always, but reduce network capacity to demonstrate impact. Impact? Do low priority flows still finish within a reasonable time? Are high priority flows impacted?

Results Low priority flows = random subset of longlived flows (over 25 mins), about 8% of all flows. Randomly chosen deadlines 1 4 hours. Rest high priority. Scheduling epoch hourly. Only a subset of 400 BSes are measured, rest estimated.

Conclusions Discovering structures in mobile traffic is a rich area of study. Applications in network and resource management.

Questions? Modeling and Forecasting Traffic Management Traffic Analysis 25