Evaluation of dial-up behaviour of Internet users

Similar documents
Measurement and Modelling of Internet Traffic at Access Networks

Internet Dial-Up Traffic Modelling

The Dependence of Internet User Traffic Characteristics on Access Speed

Multi-service Load Balancing in a Heterogeneous Network with Vertical Handover

Traffic Modelling for Fast Action Network Games

Internet Traffic Variability (Long Range Dependency Effects) Dheeraj Reddy CS8803 Fall 2003

A Statistically Customisable Web Benchmarking Tool

Issues of Variance of Extreme Values in a Heterogeneous Teletraffic Environment

Drop Call Probability in Established Cellular Networks: from data Analysis to Modelling

Observingtheeffectof TCP congestion controlon networktraffic

Assignment 2: Option Pricing and the Black-Scholes formula The University of British Columbia Science One CS Instructor: Michael Gelbart

Accelerated Simulation Method for Power-law Traffic and Non- FIFO Scheduling

Traffic Control by Influencing User Behavior

Basic Queuing Relationships

Internet and Services

Chapter 1. Introduction

z Department of Economics Institute of Information Systems lines, risk the caprices of shared resources èbesteæort

CHAPTER 3 CALL CENTER QUEUING MODEL WITH LOGNORMAL SERVICE TIME DISTRIBUTION

Traffic Forecasting Models for the Incumbent Based on New Drivers in the Market

Electrical Resonance

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

A Policy-Based Admission Control Scheme for Voice over IP Networks

Comparative Traffic Analysis Study of Popular Applications

Flow aware networking for effective quality of service control

Stochastic Processes and Queueing Theory used in Cloud Computer Performance Simulations

modeling Network Traffic

Traffic forecast models for the transport network

RESOURCE ALLOCATION FOR INTERACTIVE TRAFFIC CLASS OVER GPRS

Introduction to time series analysis

Traffic Analysis for Voice over IP

Maximizing the number of users in an interactive video-ondemand. Citation Ieee Transactions On Broadcasting, 2002, v. 48 n. 4, p.

A Passive Method for Estimating End-to-End TCP Packet Loss

Performance of networks containing both MaxNet and SumNet links

Network Traffic Invariant Characteristics:Metering Aspects

Aachen Summer Simulation Seminar 2014

Performance Analysis of Session-Level Load Balancing Algorithms

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

A NEW TRAFFIC MODEL FOR CURRENT USER WEB BROWSING BEHAVIOR

Cable Access Q&A - Part One

OPTIMIZING WEB SERVER'S DATA TRANSFER WITH HOTLINKS

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

VOICE OVER WI-FI CAPACITY PLANNING

Knowledge Discovery in Stock Market Data

VoIP Network Dimensioning using Delay and Loss Bounds for Voice and Data Applications

TRAFFIC control and bandwidth management in ATM

Characterizing the Query Behavior in Peer-to-Peer File Sharing Systems*

Chapter 3 RANDOM VARIATE GENERATION

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

TRAFFIC ENGINEERING OF DISTRIBUTED CALL CENTERS: NOT AS STRAIGHT FORWARD AS IT MAY SEEM. M. J. Fischer D. A. Garbin A. Gharakhanian D. M.

A logistic approximation to the cumulative normal distribution

QoS of Internet Access with GPRS

A Complex Network Structure Design for Load Balancing and Redundant

Quality Matters: Some Remarks on Internet Service Provisioning and Tariff Design

Examining Self-Similarity Network Traffic intervals

Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT)

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

When to Refinance Mortgage Loans in a Stochastic Interest Rate Environment

Using Network Activity Data to Model the Utilization of a Trunked Radio System

What Does the Normal Distribution Sound Like?

Data Transforms: Natural Logarithms and Square Roots

Module 5: Measuring (step 3) Inequality Measures

VOIP TRAFFIC SHAPING ANALYSES IN METROPOLITAN AREA NETWORKS. Rossitza Goleva, Mariya Goleva, Dimitar Atamian, Tashko Nikolov, Kostadin Golev

Smart Queue Scheduling for QoS Spring 2001 Final Report

Deployment of express checkout lines at supermarkets

Characterizing Video Access Patterns in Mainstream Media Portals

Waiting Times Chapter 7

ANALYSIS OF LONG DISTANCE 3-WAY CONFERENCE CALLING WITH VOIP

Session 4.1. Network Planning Strategy for evolving Network Architectures Session Service matrix forecasting. Broadband

Twitter Analytics: Architecture, Tools and Analysis

Skewness and Kurtosis in Function of Selection of Network Traffic Distribution

Multiobjective Cloud Capacity Planning for Time- Varying Customer Demand

Realizing quality of service guarantees in multiservice networks

Simple linear regression

Content Distribution Scheme for Efficient and Interactive Video Streaming Using Cloud

Performance of voice and video conferencing over ATM and Gigabit Ethernet backbone networks

Traffic Behavior Analysis with Poisson Sampling on High-speed Network 1

Capacity planning and.

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

ON THE FRACTAL CHARACTERISTICS OF NETWORK TRAFFIC AND ITS UTILIZATION IN COVERT COMMUNICATIONS

Disjoint Path Algorithm for Load Balancing in MPLS network

Acceleration levels of dropped objects

Fuzzy Active Queue Management for Assured Forwarding Traffic in Differentiated Services Network

Review of Fundamental Mathematics

Interactive Logging with FlukeView Forms

The Case for Quality of Service On Demand

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Journal of Engineering Science and Technology Review 2 (1) (2009) Lecture Note

Resource Provisioning and Network Traffic

THE EFFECT OF HOLDING TIMES ON LOSS SYSTEMS

DEFINING AND COMPUTING EQUIVALENT INDUCTANCES OF GAPPED IRON CORE REACTORS

CURRENT wireless personal communication systems are

INTERNET TRAFFIC ANALYSIS WITH GOODNESS OF FIT TEST ON CAMPUS NETWORK

LOGNORMAL MODEL FOR STOCK PRICES

IST 301. Class Exercise: Simulating Business Processes

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Measurement with Ratios

SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS

Correlation key concepts:

Radio Resource Allocation in GSM/GPRS Networks

Transcription:

Evaluation of dial-up behaviour of Internet users Dipl.-Ing. Johannes F rber, Dipl.-Ing. Stefan Bodamer, Dipl.-Ing. Joachim Charzinski 2 University of Stuttgart, Institute of Communication Networks and Computer Engineering (IND), Germany 2 Siemens AG, Public Communication Networks, Germany Abstract In times of Internet access becoming a popular consumer application even for normal residential users, some telephone exchanges are congested by customers using modem or ISDN dial-up connections to their Internet Service Providers. In order to estimate the number of additional lines and switching capacity required in an exchange, the Internet access traffic must be characterised in terms of holding time and call interarrival time distributions. In this paper, we analyse six months worth of log files tracing the usage of the central modem pool and ISDN access lines at University of Stuttgart. Mathematical distributions are fitted to the measured data and the fit quality is evaluated with respect to the blocking probability observed in a multiple server loss system loaded by the described traffic. Introduction Internet access, especially for WWW based services is becoming a popular application with residential as well as business users. While business users are often connected to the Internet via leased lines, most residentials use the local telephone network to establish a dial-up connection to their Internet Service Provider (ISP). Apart from seizing a modem in the ISP s modem pool, each dial-up connection occupies one telephone line in a local exchange. In order to evaluate the additional traffic load imposed on a telephone exchange by Internet access traffic, the traffic characteristics must be well known. In this paper, we present an analysis of the Internet access traffic logged at a modem pool (56 modems shared by 3620 students) and an ISDN line access pool (30 B channels shared by 372 users) at the University of Stuttgart computing centre (RUS). Data mean holding time (at start of session) [min] 40 35 30 25 20 5 0 modem workdays modem non-workdays isdn workdays isdn non-workdays 5 0 4 8 2 6 20 24 hour of day Fig. : Session holding time during course of the day were collected during six months from May through October 997 with a resolution of one minute (modem pool) or one second (ISDN access). Section 2 describes the mean and distribution of holding times as well as access call interarrival times and the resulting daily traffic load. In Section 3, different mathematical distribution functions are fitted to the holding time and call interarrival time distributions. In Section 4, the fit quality is evaluated by comparing the blocking probability observed by loading different numbers of servers (i.e. phone lines) with random traffic generated according to the fitted distributions. 2 Session Behaviour The automatic monitoring of user login times at the dial-up access service of the University of Stuttgart allows the evaluation of characteristic measures on session level. The data allow to distinguish between modem users and ISDN users. Note that there is no way to specify the type of traffic the user has started. While in most cases it can be expected to be a World Wide Web session, it may also be a telnet session, an ftp retrieval,, an email transfer or a mix of those traffic types. In the following sections we describe the holding time of the sessions, the interarrival time between session starts and the mean daily traffic profile for traffic load. These measures allow to characterise the frequency and duration of a typical user session. 2. Holding Time By the holding time of a session we mean the duration of the seizure of an access line. The mean holding time for modems was 2 minutes while ISDN users

modem isdn 0:00-24:00 mean=0.60min CoV=2.2 0:00-:00 mean= 6.60min CoV=.88 22:00-23:00 mean=5.50min CoV=.9 CCDF = P { holding time > t } CCDF = P { holding time > t } 0 4 8 2 6 20 24 time t [hours] 0 60 20 80 240 300 360 420 480 Fig. 2: Session holding time ccdf were online for an average of minutes. The holding time shows a high variability, i.e. it varies strongly and reaches a maximum of as much as 4 days. Also, the average holding time varies during the course of the day. Fig. shows the average holding times of sessions of both user groups associated with the time of the session start for each hour of the day. Although this representation has to be regarded with caution (the average during the night is calculated from a relatively small number of sessions), it allows the conclusion that long sessions start mainly during the night and early morning hours. The mean session length at night is significantly larger than during day time. Sessions on non-workdays are longer in the average. Note that sessions of ISDN users seem to be much shorter than those of modem users. An explanation for this is given in Section 2.2. In [8] Morgan reports a significant peak in holding time at 4 a.m. If the holding time is associated with the session endings, a similar peak is observed in our data at 4 a.m. This means that among sessions ending in the early morning have lasted for a long time. The high variability of the holding time is visible in the complementary cumulative distribution function (ccdf) which is depicted in Fig. 2. The function shows the probability of the holding time being greater than the value on the horizontal axis. While there is a high probability for short sessions, the logarithmic presentation reveals that there is a small but not negligible probability for very long sessions of 20 hours and more. This so called heavy tail is an indication for high variability of large values [4]. The coefficient of variation (CoV) of the holding time data is around 2.5 for modem and 2.2 for ISDN based access. The shift of the average holding time during the course of the day shown in Fig. suggests that the instationarity of mean holding times contributes to the high variation of the overall holding time distribution. Fig. 3: Session holding time ccdf in different periods of the day (ISDN only) Therefore, the variability of the holding time might not be as high if a shorter period is regarded instead of the whole day. In Fig. 3 we present the ccdfs of the holding times of only several specific hours of a day (the most interesting busy hours, see section 2.3). The coefficients of variation for those periods are in fact smaller, but not much, i.e. the heavy tail is still visible. 2.2 Interarrival Time In our context, the session interarrival time is the time between two consecutive session beginnings of the aggregate traffic as seen by the access provider. This means that blocked calls are not detected and that session arrivals are not to be confused with call attempts. The mean session interarrival time was 44 seconds for modem sessions and 0 seconds for ISDN sessions. To allow a comparison, those absolute numbers have arrival rate per user [/day] 3.5 3 2.5 2.5 0.5 modem workdays modem non-workdays isdn workdays isdn non-workdays 0 0 4 8 2 6 20 24 hour of day Fig. 4: Session arrival rate during course of the day

to be put into context to the number of users producing the summary traffic. Therefore the values in the following two figures are related to the corresponding numbers of users. CCDF = P { interarrival time * number of users > t } 0 5 0 5 20 25 30 35 time t [days] modem isdn Fig. 5: Scaled ccdf of session interarrival time CCDF = P { interarrival time > t } 0:00-24:00 mean=2.24min CoV=.76 0:00-:00 mean=2.03min CoV=.3 22:00-23:00 mean=.44min CoV=.37 0 5 0 5 20 25 30 35 40 45 50 Fig. 6: Session interarrival time ccdfs in different periods of the day (ISDN only) Fig. 4 depicts the session arrival rate per user (the reciprocal of the interarrival time) during the course of the day. It shows that only a few sessions start in the early morning hours and that most sessions take place during the day and evening hours. The significant step at 6 p.m. on workdays is due to the cheaper telephone tariff starting at that hour in Germany. There is a remarkably high number of sessions starting in the late night. It can be seen that ISDN users cause a much higher call rate than modem users. When considering the arrival rates together with mean holding times (Fig. ), it is clear that the overall session lengths per user and per day are roughly identical for modem and ISDN users. The fast and easy setup of connections via ISDN seems to lead to a different user behaviour, i.e. more but shorter sessions are generated. Again we find a high variability for the interarrival time ranging from zero to several hours. The scaled ccdf in Fig. 5 also shows the typical heavy tail. From the above, it is obvious that modem sessions have a higher probability for longer interarrival times than ISDN sessions. Note that this presentation is only valid for the description of summary traffic of multiple users. As reported in [6], the ccdfs of the interarrival time of sessions of an individual user show significant periodic steps every 24 hours. Similar to the preceding section, we present the ccdfs of interarrival times during several specific hours only in Fig. 6. The ccdf for the period between 0 a.m. and a.m. is almost an exponential function (a straight line in the logarithmic presentation). The corresponding coefficient of variation is much smaller than e.g. between 0 p.m. and p.m. where we still observe a high variability. 2.3 Traffic Load To cope with the originated traffic, a telephone network must offer sufficient resources in terms of bandwidth (i.e. telephone lines) and connection setup capacity (i.e. processing power). Therefore we distinguish between traffic load related to user traffic (the seizure of the modem lines and ISDN channels) and signalling load, corresponding to the call arrival rate, to describe the actual load of the access network. Erlang per line 0.8 0.6 0.4 0.2 modem workdays modem non-workdays isdn workdays isdn non-workdays 0 0 2 4 6 8 0 2 4 6 8 20 22 hour of day Fig. 7: Mean daily traffic profile While the average signalling load can be seen in Fig. 4, the user traffic load is shown in Fig. 7. By depicting the traffic load per line, the utilisation of both services can be compared. While the 30 channels of the ISDN service are sufficient for 370 users, 56 modems are obviously not enough for the 3600 students. The flat

shape of the modem traffic load around midnight indicates that all modem lines were busy on most days and it can be assumed that many calls have been blocked. The general shape of the profiles is almost complementary to the typical telephone traffic profile, which has its busy hour from 0 a.m. to a.m. and only a small traffic load in the evening and during the night. The most striking characteristics of the traffic profile are the two steps at 6 p.m. and 9 p.m. on workdays. As mentioned above, these times mark the beginning of cheaper telephone tariffs during the observed period. The boundaries of the tariff periods are drawn as vertical lines. A small peak is also visible right before 9 a.m., when the expensive day tariff starts. On weekends and holidays this day tariff between 9 a.m. and 6 p.m. is does not exist and only a sharp increase in traffic load at 9 p.m. can be observed. The user behaviour follows the telephone tariffing scheme amazingly accurately. The time consistent busy hour for user traffic load is found at 0 p.m. but for the arrival rate it is found earlier at 6 p.m. In [3] Bolotin points out that these two busy hours are significantly shifted against each other for Internet traffic compared to telephone traffic. The phenomenon is caused by much longer holding times of around 20 minutes compared to 3 minutes for classical telephone calls. 3 Modelling Session Behaviour For performance evaluation of communication systems by performance analysis or simulation, source traffic has to be modelled to assess the system behaviour. Complex traffic is best described with the help of empirical data. Either a logged traffic trace is replayed into the system model, or a mathematical description can be found to generate stochastic traffic of similar characteristics. To describe traffic load on session level, it is important to know about the holding time and the session interarrival time. The complementary cumulative distribution functions of these measures capture their most important characteristics. If mathematical functions can be found that describe the ccdfs, they can be used for performance evaluation. While the classical telephone traffic is appropriately described with negative-exponentially distributed holding times and call interarrival times, this is no longer true for Internet access session traffic any more. The high variability of the measures described above is not captured by this distribution. For a more accurate description of the new traffic other distributions have been proposed like Pareto, hyperexponential, or lognormal, e.g. in [], [2], [5], [7]. In a previous attempt to describe the ccdfs, we used a least square fitting algorithm to fit some of the distributions mentioned above to the empirical ccdfs [6]. Although the results looked quite good, they were by far inferior to the exponential distribution if used for the simulation of a loss system. Exp. Hyperexp. f ( t) = exp m k f ( t) = p ( t) (order k) i i exp i i = t m ( ln t ) f ( t) = exp 2 t 2 f ( t) = t exp Tab. : Probability density functions for exponential, hyperexponential, lognormal and In this paper we fit the above mentioned distributions by calculating their parameters from the mean and the coefficient of variation given by the empirical ccdfs instead. Tab. shows the mathematical formulae for probability density functions of the most promising distributions, which can easily be determined by mean and CoV. Also, we evaluate the ccdfs for the most critical periods of the day, i.e. the busy hours. Tab. 2 shows the mean and CoV of the ccdfs for these cases, which are also depicted in Fig. 3 and in Fig. 6. For simplicity and because of a much better granularity of the raw data, the following sections deal only with the modelling of ISDN sessions. interarrival time: 0 a.m. - 2 p.m. 0 a.m. - a.m. 0 p.m. - p.m. holding time: 0 a.m. - 2 p.m. 0 a.m. - a.m. 0 p.m. - p.m. mean [min] 2.24 2.03.44 0.60 6.60 5.50 Tab. 2: Mean and coefficient of variation for interarrival time and holding time (ISDN) 2 t 2 CoV.76.3.37 2.2.88.9 Fig. 8 and Fig. 9 shows the resulting ccdfs for the holding time and the interarrival time, respectively, between 0 p.m. and p.m. in comparison to the empirical distribution. All functions have the same mean and variability (except the exponential distribution with a CoV of ). The corresponding figures

for the whole day and for 0 a.m. - a.m. are not depicted but look very similar. CCDF = P { holding time > t } Trace (mean=5.50min, CoV=.9) HyperExp 2 0 60 20 80 240 300 360 420 480 Fig. 8: Fitted functions to ccdf of the session holding time between 0 p.m. and p.m. In the case of an M/M/n system where interarrival and holding times are both assumed to be exponentially distributed, the loss probability B can be obtained analytically using the well-known Erlang loss formula: B = A n n! n A i i = 0 i! where A denotes the offered load defined as the ratio of mean holding time and mean interarrival time. If interarrival and holding times are described by distributions different from the exponential distribution, simulations are used to obtain the loss probabilities. This is the case for the empirical distributions obtained from the trace evaluation (see Section 2) as well as for the approximating distributions of hyperexponential, lognormal and type found in Section 3. CCDF = P { interarrival time > t } Trace (mean=.44min, CoV=.37) HyperExp 2 loss probability Measurement HyperExp2 0 5 0 5 20 25 30 35 40 45 50 0-6 4 6 8 0 2 4 6 8 20 number of servers Fig. 9: Fitted functions to ccdf of the session interarrival time between 0 p.m. and p.m. All fitted ccdfs in both diagrams are quite close to the emprical ccdf in the range of small values for holding and interarrival time, respectively. The distribution tails, however, can only be approximated reasonably well by the lognormal distribution, while the other fitted ccdfs do not capture the heavy tail effect. 4 Performance Evaluation The fitting results are validated for the case of a simple G/G/n loss system as it is often used to model communication systems. In this model, n represents the number of servers which could be, e.g., modems, ISDN cards, or access lines. Fig. 0: Loss probability for 0 a.m. -2 p.m. traffic loss probability 0-6 Measurement HyperExp2 2 4 6 8 0 2 4 6 number of servers Fig. : Loss probability for 0 a.m. - a.m. traffic

loss probability 0-6 Measurement HyperExp2 2 4 6 8 20 22 24 26 28 30 number of servers Fig. 2: Loss probability for 0 p.m. - p.m. traffic The results of the performance analysis and simulation for the empirical and the fitted distributions for the periods of the day presented in Section 3 are depicted in Fig. 0, Fig., and Fig. 2. The corresponding distribution type indicated for each curve in the diagrams is related to both interarrival and holding times. For all investigated periods of the day it becomes obvious that the M/M/n approximation using negativeexponential distributions underestimates the actual loss probability. An underestimation of the loss probability means that a dimensioning based on the M/M/n loss system would give a value for the necessary number of servers which is too small. The effect is most significant in the case referring to the whole day traffic, but also for the period from 0 p.m. to p.m. The approximation evolves to give the most exact results in all three cases. The hyperexponential distribution yields a conservative approximation, i. e. the loss probability curve is quite close to that associated to the empirical distribution but always a little bit above. The results obtained for the lognormal distribution on the other hand are also quite exact, but too optimistic. 5 Conclusion We have presented the dial-up behaviour of modem and ISDN users at the University of Stuttgart. Major results of the data evaluation are: the same long holding times as reported by other researchers have been observed, ISDN users generate more but shorter sessions (likely due to fast setup), the dial-up usage follows the telephone tariffing scheme with high accuracy (the Internet access itself was provided for free) and the session interarrival time and the session holding time show very high variability (heavy tailed distributions) The presented results are based on empirical data of a special user group, i.e. the students and members of staff of the University of Stuttgart. Although this group might not represent the general Internet user, it does represent a large group of users going online after work hours. The empirical ccdfs of interarrival time and holding time during the busy hours were approximated by fitting several distributions. The fit was based on the mean and the coefficient of variation instead of applying a least square fit as it was done in an earlier work. The quality of the fitted distributions has been proven by the performance evaluation of a simple loss system. The models with and the hyperexponentially distributed interarrival and holding times gave the best approximations of the loss probability compared to the empirical distributions. 6 References [] Bolotin, V.A.: Telephone Circuit Holding Time Distributions, Proceedings of the ITC 4, pp. 25-34, Antibes, France, 994. [2] Bolotin, V.A.: Modelling Call Holding Time Distributions for CCS Network Design and Performance Analysis, IEEE Journal on Selected Areas in Communications, pp. 433-438, April 994. [3] Bolotin, V.A.: New Subscriber Traffic Variability Patterns for Network Traffic Engineering, Proceedings of the 5th International Teletraffic Congress ITC 5, Vol. 2, pp. 867-878, Washington, 997. [4] Crovella, M., Bestavros, A.: Performance Characteristics of World Wide Web Information Systems, Tutorial at the SIGMETRICS 97, 997. [5] Crovella, M., Bestavros, A.: Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes, Proceedings of the ACM SIGME- TRICS`96, pp. 60-69, 996. [6] F rber, J., Bodamer, S., Charzinski, J.: Measurement and Modelling of Internet Traffic at Access Networks, EUNICE 98, Munich, 998. [7] Feldmann, A., Whitt, W.: Fitting mixtures of exponentials to long-tailed distributions to analyze network performance models, Performance Evaluation 3, pp. 245-79, 998. [8] Morgan, S.: The Internet and the Local Telephone Network: Conflicts and Opportunities, IEEE Communications Magazine, pp. 42-48, January 998.