Reinforcement Learning for Resource Allocation and Time Series Tools for Mobility Prediction

Similar documents
Inductive QoS Packet Scheduling for Adaptive Dynamic Networks

National Technical University of Athens School of Electrical and Computer Engineering

Using Markov Decision Processes to Solve a Portfolio Allocation Problem

Stochastic Models for Inventory Management at Service Facilities

An Algorithm for Automatic Base Station Placement in Cellular Network Deployment

How performance metrics depend on the traffic demand in large cellular networks

CDMA Network Planning

ADHOC RELAY NETWORK PLANNING FOR IMPROVING CELLULAR DATA COVERAGE

On the Traffic Capacity of Cellular Data Networks. 1 Introduction. T. Bonald 1,2, A. Proutière 1,2

Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning

Evolution in Mobile Radio Networks

A Network Simulation Tool to Generate User Traffic and Analyze Quality of Experience for Hybrid Access Architecture


Supplement to Call Centers with Delay Information: Models and Insights

3GPP Wireless Standard

IRMA: Integrated Routing and MAC Scheduling in Multihop Wireless Mesh Networks

Inter-Cell Interference Coordination (ICIC) Technology

Quality Optimal Policy for H.264 Scalable Video Scheduling in Broadband Multimedia Wireless Networks

Interference Management: From Autonomous to Closely Coordinated Approaches

Scheduling and capacity estimation in LTE. Olav Østerbø, Telenor CD (Corporate Development) ITC-23, September 6-8, 2011, San Francisco

Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL

A study on machine learning and regression based models for performance estimation of LTE HetNets

Predictive Scheduling in Multi-Carrier Wireless Networks with Link Adaptation

Load Balancing in Cellular Networks with User-in-the-loop: A Spatial Traffic Shaping Approach

NEURAL NETWORKS AND REINFORCEMENT LEARNING. Abhijit Gosavi

Scheduling Algorithms for Downlink Services in Wireless Networks: A Markov Decision Process Approach

Reinforcement Learning

Customer Training Catalog Training Programs WCDMA RNP&RNO Technical Training

Priority-Coupling A Semi-Persistent MAC Scheduling Scheme for VoIP Traffic on 3G LTE

I. Wireless Channel Modeling

Downlink resource allocation algorithm: Quality of Service

Packet Queueing Delay in Wireless Networks with Multiple Base Stations and Cellular Frequency Reuse

Functional Optimization Models for Active Queue Management

Markov Decision Processes for Ad Network Optimization

Assessing the quality of VoIP transmission affected by playout buffer scheme and encoding scheme

The Evolution of Wireless Networks for the Internet of Things

Wireless Technologies for the 450 MHz band

System Design in Wireless Communication. Ali Khawaja

Figure 1: cellular system architecture

LTE Evolution for Cellular IoT Ericsson & NSN

RESOURCE ALLOCATION FOR INTERACTIVE TRAFFIC CLASS OVER GPRS

Performance of TD-CDMA systems during crossed slots

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Reinforcement Learning

LTE in Unlicensed Spectrum: European Regulation and Co-existence Considerations

COMPATIBILITY STUDY FOR UMTS OPERATING WITHIN THE GSM 900 AND GSM 1800 FREQUENCY BANDS

How To Understand The Gsm And Mts Mobile Network Evolution

Oscillations of the Sending Window in Compound TCP

Resource allocation and opportunistic scheduling for UMTS-TDD

Subcarrier Allocation Algorithms for multicellular OFDMA networks without Channel State Information

High-frequency trading in a limit order book

Inter-cell Interference Mitigation Reduction in LTE Using Frequency Reuse Scheme

Nokia Networks. Voice over LTE (VoLTE) Optimization

Cloud Radios with Limited Feedback

2G/3G Mobile Communication Systems

Spectrum and Power Measurements Using the E6474A Wireless Network Optimization Platform

Real Time Scheduling Basic Concepts. Radek Pelánek

Cellular Network Organization. Cellular Wireless Networks. Approaches to Cope with Increasing Capacity. Frequency Reuse

Smart Mobility Management for D2D Communications in 5G Networks

Introduction to Clean-Slate Cellular IoT radio access solution. Robert Young (Neul) David Zhang (Huawei)

Single item inventory control under periodic review and a minimum order quantity

Chapter 2: Binomial Methods and the Black-Scholes Formula

Statistical Machine Learning

Priority-Based Congestion Control Algorithm for Cross-Traffic Assistance on LTE Networks

An Interference Avoiding Wireless Network Architecture for Coexistence of CDMA x EVDO and LTE Systems

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

A Multi-agent Q-learning Framework for Optimizing Stock Trading Systems

EE4367 Telecom. Switching & Transmission. Prof. Murat Torlak

Learning Tetris Using the Noisy Cross-Entropy Method

Numerical PDE methods for exotic options

Numerical methods for American options

Channel assignment for GSM half-rate and full-rate traffic

An Environment Model for N onstationary Reinforcement Learning

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

EETS 8316 Wireless Networks Fall 2013

Electronic Communications Committee (ECC) within the Conference of Postal and Telecommunications Administrations (CEPT)

Assignment #1. Example: tetris state: board configuration + shape of the falling piece ~2 200 states! Recap RL so far. Page 1

Course Curriculum for Master Degree in Electrical Engineering/Wireless Communications

Heterogeneous LTE Networks and Inter-Cell Interference Coordination

communication over wireless link handling mobile user who changes point of attachment to network

Dynamic Load Balance Algorithm (DLBA) for IEEE Wireless LAN

Complexity-bounded Power Control in Video Transmission over a CDMA Wireless Network

Transcription:

Reinforcement Learning for Resource Allocation and Time Series Tools for Mobility Prediction Baptiste Lefebvre 1,2, Stephane Senecal 2 and Jean-Marc Kelif 2 1 École Normale Supérieure (ENS), Paris, France, baptiste.lefebvre@ens.fr 2 Orange Labs, Issy-les-Moulineaux, France stephane.senecal@orange.com, jeanmarc.kelif@orange.com First GdR MaDICS Workshop on Big Data for the 5G RAN 25 November 215 @ Huawei FRC 1/38

Agenda 1 Context 2 Current Controler 3 Proposed Controler 4 Mobility Prediction 5 Conclusion 2/38

Agenda 1 Context 2 Current Controler 3 Proposed Controler 4 Mobility Prediction 5 Conclusion 3/38

Wireless Networks UE = User Equipment BS = Base Station f 4/38

Radio Resource Management (RRM) 12 subcarriers (18 khz) slot (.5 ms)...... PRB 1 PRE 2 Allocation Sharing of joint timeslots and frequency bands Load ρ = Quality of Service (QoS) C c=1 QoS = 1 T r R D n c c ( ) 1 ρ 2 Energy/Power Consumption ) P = P BS + r (P RS + ρp AP 1. Physical Ressource Block 2. Physical Ressource Element ρ = min(ρ, 1) 5/38

Goal : optimization of the energy consumption under QoS constraints Formal framework considered : reinforcement learning [SB98] More specifically, Markov Decision Processes (MDP) [Put94] : A system state enumerates UEs of each radio condition and enumerates active resources An action is either null, either a deactivation or an activation of a resource A policy associates to every state an action to proceed In order to perform energy savings, one needs to compute or estimate an optimal policy, i.e. a policy which implements a good trade-off between energy (electrical power) consumption and targeted QoS level 6/38

Agenda 1 Context 2 Current Controler 3 Proposed Controler 4 Mobility Prediction 5 Conclusion 7/38

MDP Controler A controler executes a policy Π, which for a given traffic amount, aims at maximizing an objective function (QoS, power) Transition Probability Operator P(s, a, s ) Instantaneous Reward Function R(s, a) Searching for an optimal policy for a fully known MDP model can be performed by dynamic programming 8/38

Controler for Geometric Criterion max Π ( [ ]) E φ t R(s t, Π(s t )) s = s t= Solving an equations system by iterating until reaching a fixed point (geometric criterion) : ( ( ) ) Π(s) = arg max a A P(s, a, s ) R(s, a) + φv(s ) s S V(s) = ( ) P(s, Π(s), s ) R(s, Π(s)) + φv(s ) s S Parameter φ [; 1[ 9/38

Controler for Average Criterion max Π ( lim E T [ 1 T ]) T R(s t, Π(s t )) s = s t= Solving an equations system 1 by iterating until reaching a fixed point (average criterion) : ( ( ) ) Π(s) = arg max a A P(s, a, s ) R(s, a) + V(s ) s S V(s) = ( ) P(s, Π(s), s ) R(s, Π(s)) + V(s ) s S 1. Dynamic programming valid if V < + 1/38

States Transitions and Rewards The system evolves in continuous time and not in discrete time It is possible to turn a continuous-time MDP into a discrete-time MDP via the use of uniformization and discretization schemes P(s, a, s ) is replaced by Q(s, a, s ) which denotes the transition rate (i.e. Poisson process parameter) R(s, a) is replaced by C(s, a) which denotes the cost per time unit 11/38

States Transitions Modeling ( ) Q (n, r), a, (n, r ) = λ i if B λi (s, a, s ) 1 r n R n D i i if B µi (s, a, s ) F i else Data traffic on LTE network with Round Robin scheduling 12/38

States Transitions Modeling ( ) Q (n, r), a, (n, r ) = λ i if B λi (s, a, s ) 1 r n R n D i i if B µi (s, a, s ) F i else ( ) B λi (s, a, s ) = n = n + e (i) r = r + a ( ) B µi (s, a, s ) = n = n e (i) r = r + a e (i) = C-uplet composed of s except i th element which equals to 1 12/38

Rewards - Costs Functions C(s, a) = s S Q(s,a,s ) ( ) γe(n, r + a) + (1 γ)f (n, r + a) Multiobjective optimization via scalarization 13/38

Rewards - Costs Functions C(s, a) = s S Q(s,a,s ) ( ) γe(n, r + a) + (1 γ)f (n, r + a) P BS + rp RS P BS + R(P RS + P AP ) E(n, r) = P BS + r(p RS + P AP ) P BS + R(P RS + P AP ) F (n, r) = 1 exp log(2)t r Rn C i=1 n i n i D i C i=1 if n = else Multiobjective optimization via scalarization 13/38

Current Results The optimal policy is a threshold policy The optimal policy depends on traffic volume, on target throughput and on cell capacity The execution of the optimal policy enables energy savings of the order of 4% Proposal of taking into account activation time by adding a timer Adaptation to traffic evolution through an ε-greedy strategy 14/38

Optimization under Congestion The controler does not activate the whole resources in order to reduce congestion as fast as possible 15/38

Unused Resources The controler cannot activate its whole resources 16/38

Excessive QoS The controler can grant an effective QoS level much greater than initially targeted QoS level (e.g. 5 Kbps 4 Kbps) 17/38

Agenda 1 Context 2 Current Controler 3 Proposed Controler 4 Mobility Prediction 5 Conclusion 18/38

States Transitions Modeling Q(s, a, s ) = λ i if B λi (s, a, s ) 1 n n i n i 1 n r+a R D i F i if B µi (s, a, s ) B(s, a, s ) r R D i F i if B µi (s, a, s ) B(s, a, s ) else Temporality difference for action execution 19/38

States Transitions Modeling Q(s, a, s ) = λ i if B λi (s, a, s ) 1 n n i r+a R D i if B µi (s, a, s ) B(s, a, s ) F i 1 r n R n D i i if B µi (s, a, s ) B(s, a, s ) F i else ) ( ) B λi ((n, r), a, (n, r ) = n = n + e (i) (n = n n = N) ( ) r = r + a B(r, a, r) ) ( B µi ((n, r), a, (n, r ) = n = n e (i)) ( r = r + a B(r, a, r) B(r, a, r) = (r = r = 1 a = ) (r = r = R a = 1) ) Temporality difference for action execution 19/38

Ideal and Effective Power Consumption Ideal Power Consumption : { PBS + P RS if α(n) = P (n) = P BS + α(n) P RS + α(n)p AP Ideal Number of Resources 2 : ( C ) T α(n) = min n i R, R D i i=1 else Effective Power Consumption : { PBS + rp RS if n = ˆP(n, r) = P BS + rp RS + rp AP else 2. Solving equation F (n, r) = 1 2 = β 2/38

Power Consumption Error Modeling Normalized Regret : ˆP(n, r) P ( ) (n) R(P RS + P AP ) E (n, r), a = ˆP(n, r + a) P (n) R(P RS + P AP ) if B(r, a) else B(r, a) = (r = 1 a = ) (r = R a = 1) 21/38

Rewards - Costs Functions Symmetrical Instantaneous Reward : R(s, a) = E(s, a) Asymmetrical Instantaneous Reward : R θ (s, a) = E(s, a) I E(s,a)< θe(s, a) I E(s,a) 22/38

Results 23/38

Overall Performance β 1 2 3 4 9 1 current controler proposed controler γ ˆq,1 ˆq,5 ˆq,99 θ ˆq,1 ˆq,5 ˆq,99, 64, 98 +, 4 +, 8 1, 2 +, +, 2, 5 +, 21 +, 52 +, 84 1e 4, 2 +, +, 2, 4 +, 23 +, 54 +, 85 1e 8, 2 +, +, 2, 64, 98 +, 33 +, 6 1, 8 +, 2 +, 4, 5 +, 21 +, 44 +, 66 1e 2, 4 +, 2 +, 8, 4 +, 22 +, 47 +, 7 1e 4 +, +, 6 +, 12, 64, 98 +, 3 +, 41 1, 34, 2 +, 32, 5, 15, 21 +, 5 55e 3, 13 +, 21 +, 44, 4, 9 +, 27 +, 54 3e 3 +, +, 35 +, 54 24/38

Overall Performance β = 3 4 β = 9 1 25/38

Agenda 1 Context 2 Current Controler 3 Proposed Controler 4 Mobility Prediction 5 Conclusion 26/38

Mobility Traffic due to arrivals and departures of UEs in the coverage zone of the BS, modeled by Poisson processes Moves of UEs inducing propagation losses, shadowing and fast-fading 27/38

Problem Statement The activation/deactivation timeframe of a physical resource is not taken into account in the modeling Idea : implement the prediction of states to be visited in the next seconds This approach makes it possible to consider mobile users Given SINR traces of users who crossed the cell and the SINR trace of a user currently crossing the cell, we aim at estimating the SINR to be measured in the near future 28/38

Problem Modeling Let T = {T 1,, T K } denote a set of time series Let T 1 = t 1,1,, t 1,N1 denote a time series... Let T K = t K,1,, t K,NK denote a time series Let T = t 1,, t N denote a time series to be completed ˆt N = f (T ) ˆt N = g(t ) T k D ˆt N = h(t ) T k D = {D 1,, D M } 29/38

Dynamic Time Warping (DTW) Let T = t 1,, t N denote a time series Let T = t 1,, t N denote another time series Let d denote a distance measure between elements of these time series D(t i, t j) = d(t i, t j) + min ( D(t i, t j), D(t i, t j), D(t i, t j) ) DTW (T, T ) = D(t N, t N ) Computation via dynamic programming in O(N 2 ), cf. [SC78] 3/38

Barycentric Averaging DTW Let T = {T 1,, T K } denote a set of time series Let T 1 = t 1,1,, t 1,N1 denote a time series... Let T K = t K,1,, t K,NK denote a time series The barycentric averaging DTW T satisfies (cf. [PKG11]) : N N, T = t 1,, t N K k=1 ( DTW (T, T k ) ) 2 K k=1 ( ) 2 DTW (T, T k ) Computation via iterative scheme in Θ(I K N 2 ) where I << N 31/38

Fast Dynamic Time Warping (FastDTW) Multi-level approach for the computation of the dynamic time warping, cf. [SC4] Linear spatial complexity Linear temporal complexity Approximation method enjoying a good precision (via tuning parameter r) Computation in Θ(I K r N) 32/38

Preliminary Results Estimations implemented with a precision of db order for time horizons of 1s order 33/38

Agenda 1 Context 2 Current Controler 3 Proposed Controler 4 Mobility Prediction 5 Conclusion 34/38

Conclusion Summary : Review of State-of-the-Art controlers Proposal of a modified and improved controler Proposal of a mobility prediction mechanism (different from those proposed for intercells transfert management) Work in progress/perspectives : Integration of the mobility prediction module to the controler Enhancement of the mobility prediction mechanism Design of a higher-level control system for many cells, even for an entire network 35/38

References [PKG11] François Petitjean, Alain Ketterlin, and Pierre Gançarski. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition, 44(3) :678 693, 211. [Put94] [SB98] [SC78] [SC4] Martin Puterman. Markov decision processes : discrete stochastic dynamic programming. Wiley-Interscience, 1994. Richard S. Sutton and Andrew G. Barto. Reinforcement Learning : An Introduction. MIT Press Cambridge, 1998. Hiroaki Sakoe and Seibi Chiba. Dynamic Programming Algorithm Optimization for Spoken Word Recognition. Transactions on Acoustics, Speech and Signal Processing, 26(1) :43 49, 1978. Stan Salvador and Philip Chan. FastDTW : Toward accurate dynamic time warping in linear time and space. In KDD workshop on mining temporal and sequential data. ACM, 24. 36/38

Thank you! Thanks for your attention! Questions? These research works are funded by Orange and supported by the collaborative research project ANR NETLEARN (ANR-13-INFR-4) 37/38

Appendix : example of a MDP-based controler 4 1 4 2 4 3 4 4 4 5 4 3 1 3 2 3 3 3 4 3 5 3 2 1 2 2 2 3 2 4 2 5 2 1 1 1 2 1 3 1 4 1 5 1 38/38

Appendix : example of a MDP-based controler 4 1 4 2 4 3 4 4 4 5 4 3 1 3 2 3 3 3 4 3 5 3 2 1 2 2 2 3 2 4 2 5 2 1 1 1 2 1 3 1 4 1 5 1 38/38

Appendix : example of a MDP-based controler 4 1 4 2 4 3 4 4 4 5 4 3 1 3 2 3 3 3 4 3 5 3 2 1 2 2 2 3 2 4 2 5 2 1 1 1 2 1 3 1 4 1 5 1 38/38

Appendix : example of a MDP-based controler 4 1 4 2 4 3 4 4 4 5 4 3 1 3 2 3 3 3 4 3 5 3 2 1 2 2 2 3 2 4 2 5 2 1 1 1 2 1 3 1 4 1 5 1 38/38