Call Centers with Hyperexponential Patience Modeling Alex Roubos 1 Oualid Jouini 2 1 Department of Mathematics, VU University Amsterdam, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands 2 Laboratoire Génie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290 Châtenay-Malabry, France a.roubos@vu.nl oualid.jouini@ecp.fr International Journal of Production Economics, 141:307-315, 2013. Abstract An important feature in call center modeling is the presence of impatient customers. In this paper we show, using real data, that we can realistically model the patience distribution by the hyperexponential distribution. Since the hyperexponential distribution is a mixture of exponential distributions, an analytical Markov chain analysis is performed. A framework is developed in order to compute all kinds of practical service levels. This framework utilizes the recursive relation between the queue lengths at successive service completion epochs. Our approach shows overall better performance compared to current algorithms. Moreover, the computation times are short and our approach can therefore readily be applied in practice. Keywords: call centers; impatient customers; hyperexponential distribution; stochastic modeling; continuous-time Markov chains; service levels. 1 Introduction Call centers have been a fruitful research area for many researchers during the past couple of decades, as demonstrated by the extensive reference lists in [Gans et al., 2003] and [Akşin et al., 2007]. Yet, there are still many important challenges to overcome. One of these challenges is how to model customers patience. In most call center systems in practice, customers are not infinitely patient. They are willing to wait for service for only a limited amount of time. If they are not served within that time, they abandon, i.e., leave the system. 1
We propose to model customers patience by the hyperexponential distribution. As we will show in this paper, motivated by real call center data, the hyperexponential distribution turns out to be a very accurate representation of the patience distribution. The model we enforce is the queueing model. We show how we can model call centers in this way and derive standard service level performance measures. We develop a framework for computing these service levels based on the relation between the queue lengths at successive service completion epochs. Our approach leads to the complete characterization of the waiting time distribution. Analysis of queueing systems with impatient customers has been done before. The earliest work mentioned in the literature is found in [Palm, 1937]. In [Stanford, 1979], single-server queues with general service time distributions are studied. For the fully general single server GI/GI/1 + GI, queue stability conditions are derived in [Baccelli and Hebuterne, 1981], and for the M/GI/1 + GI queue, the distribution function of the waiting time is provided. When focussing on impatient customers in a multi-server environment there are ample resources available in the literature. In [Boxma and de Waal, 1994], insensitive bounds and several approximations for the abandonment probability in the M/GI/s + GI queue are developed. [Brandt and Brandt, 1999, Brandt and Brandt, 2002] consider the state-dependent M(n)/M(m)/s + GI queue in which the arrival rate depends on the number of customers in the system and in which the service rate depends on the number of busy servers. They derive the steady-state distribution of the number of customers in the system and various waiting time distributions. In [Mandelbaum and Zeltyn, 2004] the impact of the patience distribution on the performance is studied for the M/M/s + GI queue. They observe an approximate linearity between the abandonment probability and the average waiting time, for many practical abandonment parameters. [Iravani and Balcıo glu, 2008] propose two approximations to analyze the M/GI/s + GI queue. Both approximations are based on scaling the M/GI/1 + GI queue to obtain estimates for the waiting time distributions. In the context of call centers, [Jouini et al., 2009] study the impact of announcing delays in a setting of multiple 2
customer classes with Markovian abandonment. Concerning the estimation of the patience distribution out of real call center data, published resources are scarce. There is not a general claim about a given distribution of patience times. The patience distribution rather depends on the type of a call center. [Baccelli and Hebuterne, 1981] show that an Erlang distribution with three phases could work well in some cases. However, [Kort, 1983] claims that the patience distribution is Weibull. In [Brown et al., 2005], it was observed that the patience distribution is not exponential as usually assumed for the call center models in the literature. Contrary to service systems, various studies of statistical process control have been conducted for manufacturing systems. We refer the reader to [Colledani and Tolio, 2009], [Wallström and Segerstedt, 2010], and references therein. In this paper, we show through various data sets that the hyperexponential distribution fits better than the earlier proposed ones in the literature. A paper of our special interest is [Whitt, 2005]. In here an algorithm is developed to compute approximations for the standard steady-state performance measures for the M/GI/s/r +GI queue. Whitt approximates this queueing system by the M/M/s/r + M(n) queue, where M(n) denotes state-dependent abandonment rates. A positive feature is that the state-dependent abandonment rates can directly be obtained from historical data. In this paper, we compare our approach applied to real call center data with Whitt s model. One of the conclusions of Whitt is that the behavior of the patience distribution near the origin primarily affects the steady-state performance measures. Moreover, in [Brown et al., 2005] observed that, although the patience is not exponential, after a while the hazard rate is approximately constant. This was confirmed in [?], and also [?]. They model the patience by a distribution that consists of a discrete mass at zero (balking) and a remaining exponential distribution. While their approach is analytically simpler, we believe that the hyperexponential model offers superior accuracy. Our objective in this paper is to model patience times as accurately as possible, while still 3
being able to derive exact results. This is possible with the queueing model. Of course we can use the same model as an approximation for the M/GI/s + H 2 queue, the same approximation that Whitt makes. However, that is not in the scope of this paper. Because the analysis of queueing systems with general patience distributions is very difficult, one usually has to fall back onto approximations. In this paper, we fill the gap between an exact analysis and using approximations for systems where the patience is not simply exponential. In particular, using real data sets, we compare the results of our approach with those of Whitt and show that the former are at as least good, if not better. The remainder of the paper is organized as follows. In Section 2 we consider customer behavior obtained from four different data sets and show that the patience can be modeled by the hyperexponential distribution. In order to compute service level performance measures, we develop a framework for a queueing system with hyperexponentially distributed patience in Section 3. The performance of this model is illustrated in Section 4, which shows that our model has overall a very good performance. Some reflections on computational issues are presented in Section 5. Finally, we give concluding remarks and highlight some future research in Section 6. 2 Hyperexponential Patience Distribution in Call Centers In this section, we conduct a statistical analysis on call center data in order to assess the the fit of a hyperexponential distribution for patience times. The hyperexponential distribution is a mixture of two exponential distributions such that with probability p it is exponential with rate γ 1 (type 1) and with probability 1 p it is exponential with rate γ 2 (type 2). Let X 1 and X 2 be the exponentially distributed random variables for types 1 and 2, respectively. If X is hyperexponential, its cumulative distribution function (cdf) F X is given by F X (t) = pf X1 (t) + (1 p)f X2 (t), 4
for t 0. As it turns out, the hyperexponential distribution is a good model for customers patience in call centers. We will show this by means of different data sets. Data in call centers are usually very detailed. For our purpose we only need to know the time that customers have spent waiting and whether or not an abandonment occurred at the end of the waiting time. From the customers that have abandoned we know exactly what their patience is. However, from customers that did not abandon (but received service) we only know that their patience is greater than the time they have waited. To be more precise, we observe the minimum of the patience and the virtual waiting time, and we also know which one we observe. The virtual waiting time is defined as the waiting time of a tagged customer with infinite patience. The data on patience times in this situation is called right censored data. Techniques exist to deal with censored data, one of which is the Kaplan-Meier estimator [?, see]]kaplan1958. The result of the Kaplan-Meier estimator is the empirical cdf F (t) of the patience. By taking the derivative we can obtain the probability density function f(t). Afterwards, the hazard rate h(t) = f(t) is easily obtained. In Figure 1 several empirical hazard rates are displayed, together 1 F (t) with the hazard rates of the hyperexponential distribution. The parameters of the hyperexponential distribution are obtained by minimizing the mean squared error between F (t) and F X (t). Table 1 lists these parameters. The figure suggests the following. Data set 4 is the perfect example of hyperexponential patience. The empirical hazard rate is approximately non-increasing, and the hazard rate of the hyperexponential distribution follows it very closely. Data sets 1 and 3 are somewhat different in the sense that up to one minute there are several peaks in the hazard rates. This is caused by the fact that there are messages announced, which result in another burst of abandonments. As a consequence, the hazard rate is overestimated in the first minute on data set 3. This seems not to be the case for data set 1. Data set 2 shows strange behavior in the beginning, because the hazard rate starts out low for the first 0.25 minutes. Nevertheless, the fit of the hyperexponential distribution on this data set looks good enough. 5
0.7 0.6 Dataset1 Empirical Hyperexponential 2.5 2 Dataset2 Empirical Hyperexponential 0.5 Hazard rate 0.4 0.3 Hazard rate 1.5 1 0.2 0.1 0.5 0 0 0.5 1 1.5 2 2.5 Time(minutes) 0 0 0.5 1 1.5 2 2.5 Time(minutes) 0.45 0.4 Dataset3 Empirical Hyperexponential 0.4 0.35 Dataset4 Empirical Hyperexponential 0.35 0.3 Hazard rate 0.3 0.25 0.2 Hazard rate 0.25 0.2 0.15 0.1 0.15 0.1 0.05 0 0.5 1 1.5 2 2.5 Time(minutes) 0.05 0 0.5 1 1.5 2 2.5 Time(minutes) Figure 1: Hazard rates of the patience of four different data sets. Earlier research [Baccelli and Hebuterne, 1981, Kort, 1983] mentioned that the patience distribution could be Erlang with three phases or Weibull. In Table 2 we make a comparison of these distributions, together with the hyperexponential distribution, for different statistics. The first statistic is the mean squared error (MSE), which should be as low as possible for a good model. The second statistic is the p-value of the Kolmogorov-Smirnov test [Massey, 1951], which tests the null hypothesis that the empirical distribution and the tested distribution come from the same distribution. Values below the default significance level of α = 0.05 reject this hypothesis. From the table it is clear that the hyperexponential distribution is the best model for customers patience. All statistics are in favor of this distribution. If we look at the p-values of the Kolmogorov- 6
Data set p γ 1 γ 2 1 0.2222 2.3843 0.0603 2 0.6593 2.3986 0.0617 3 0.2734 1.3100 0.0735 4 0.0583 4.0780 0.0742 Table 1: The parameters of the hyperexponential distribution for the four data sets. Hyperexponential Weibull Erlang Data set MSE p-value MSE p-value MSE p-value 1 7.13e-5 0.747 7.68e-4 0.002 0.031 1e-30 2 2.32e-4 0.018 2.38e-3 1e-11 0.052 5e-35 3 1.40e-4 0.006 1.88e-4 0.006 0.031 6e-28 4 2.67e-5 0.974 1.52e-4 0.424 0.014 9e-13 Table 2: Comparison of different patience distributions. Smirnov test, we observe that the null hypothesis is actually rejected on data sets 2 and 3 at a significance level of 0.05. However, for a significance level of 0.01, the null hypothesis will not be rejected for data set 2. In summary, even though the hazard rates can look totally different among different call centers, the hyperexponential distribution can realistically model customers patience. 3 Markov Chain Model for Hyperexponential Patience We consider a call center modeled as an queueing system, with arrival rate, service rate µ, s identical servers and hyperexponentially distributed patience with parameters p, γ 1, and γ 2. This system can be described as follows. If there is at least one server available an arriving customer is immediately taken into service. Otherwise, the arriving customer is placed at the end of an infinite-buffer queue. The customer s patience is exponentially distributed with rate γ 1 with probability p and with probability 1 p it is exponential with rate γ 2. We name these customers 7
type 1 and type 2 customers, respectively. Arriving customers are served in a first-come first-served (FCFS) order. In service, both types of customers are identical (meaning they have the same service time distribution). The M/M/s+H 2 queueing system can be modeled as a continuous-time Markov chain (CTMC). To simplify notation in the coming calculations, we let the state of the system be the scalar denoting the number of customers in service, if there are no customers waiting, and the two-dimensional vector denoting the number of queued customers of each type, if all servers are occupied. If we define X(t) as the state of the system at time t, then the stochastic process {X(t), t 0} is a CTMC with state space X = {0, 1,..., s} N 0 N 0. Note that state s is equal to the state (0, 0). Figures 2 and 3 together show the complete transition diagram of the system. The transition diagram warrants some explanation, which we give next. We consider state (i, j). In this state there are i + j customers waiting: i of type 1 and j of type 2. The transitions to states (i + 1, j) and (i, j + 1) are trivial, since a customer arrives with rate and with probability p it is a type 1 customer and with probability 1 p it is a type 2 customer. The i type 1 customers together provide the transition rate of iγ 1 to state (i 1, j). The same holds for the transition rate of jγ 2 to state (i, j 1). These two transitions belong to an abandonment of one of the customers in the queue. The final possible transition occurs if one of the servers completes its service. This happens with rate sµ. In our model description, we stated that the first customer in line should then be taken into service. However, in the current CTMC formulation, the state (i, j) does not provide information on the ordering of the customers waiting in the queue. We then choose to approximate the analysis using the so-called random order of service (ROS) scheduling discipline instead of FCFS. Thus, with probability i/(i + j) a type 1 customer is first in line, and with probability j/(i+j) a type 2 customer is first in line. Later on when we compute the service level, an arriving customer that finds the system in state (i, j) has to wait for all of these i + j customers (who are served according to the ROS discipline) to have been removed from the 8
0 µ 1 2µ 2... s sµ Figure 2: Transition diagram of the first part of the queueing system. (1 p) (1 p) (1 p). 0,2 0,1 s 0,0 sµ+2γ 2 sµ+γ 2 p sµ+γ 1 1,0 (1 p) i i+j sµ+iγ 1 p sµ+2γ 1 i,j 2,0 p j i+j sµ+jγ 2 p... Figure 3: Transition diagram of the second part of the queueing system. queue. Then the approximation error of FCFS by ROS is only small. Also, the numerical results we provide later show that this modeling approach works well. 3.1 Steady-State Distribution To compute the steady-state probability distribution, the local balance equations can be used. However, the resulting distribution has no nice structure and it is not possible to give a closed-form expression for it. Therefore, we explain an alternative method to compute it. First we compute the intermediate probabilities p(i) for the transition diagram in Figure 2. Let p(s) = 1. For i = s,..., 1 we then obtain p(i 1) = iµp(i)/. The intermediate probabilities p(i) satisfy the local balance equations, and do not yet satisfy the normalizing condition. Next, we compute the other intermediate set of probabilities p(i, j) for the transition diagram in Figure 3. We show later how to combine the use of the two intermediate families of probabilities in order to derive the probabilities of our original system states. The way we compute the intermediate 9
probabilities p(i, j) is to limit the size of this state space by (M, N). See Section 5 for a discussion on how to choose M and N. We then have a finite two-dimensional state space, which we can represent in one dimension. Let f be the function that maps the two-dimensional state to the one-dimensional state defined as follows f(i, j) = j(m + 1) + i. We proceed by constructing the generator matrix Q, which is the matrix formed from the transition rates. This matrix has the following entries Q(f(i, j), f(i + 1, j)) = p, i = 0,..., M 1, j = 0,..., N, Q(f(i, j), f(i, j + 1)) = (1 p), i = 0,..., M, j = 0,..., N 1, Q(f(i, j), f(i 1, j)) = i/(i + j)sµ + iγ 1, i = 1,..., M, j = 0,..., N, Q(f(i, j), f(i, j 1)) = j/(i + j)sµ + jγ 2, i = 0,..., M, j = 1,..., N. Additionally, the diagonal of Q consists of those entries such that each row sums up to zero. Finally, we obtain the intermediate probabilities p(i, j) by solving pq = 0 and pe = 1, where e denotes the appropriately dimensioned vector of ones. With both intermediate probabilities p(i), i = 0,..., s, and p(i, j), i = 0,..., M, j = 0,..., N, we can finally obtain the steady-state distribution π. Since the following equality must hold p(s) = p(0, 0), we should multiply each p(i) by p(0, 0), i.e., p(i) = p(i)p(0, 0), i = 0,..., s. To ensure that we get a probability distribution, all unique elements must sum up to one. The p(i, j) already sum up to one, so the normalization constant becomes In the end we obtain, for i = 0,..., s, and for i = 0,..., M, j = 0,..., N, s 1 C = 1 + p(i). i=0 π(i) = p(i) C, π(i, j) = p(i, j) C. 10
3.2 Performance Evaluation Several performance measures can readily be obtained from the steady-state probability distribution. The probability that an arriving customer does not have to wait is given by s 1 P(Immediate service) = π(i). The expected number of customers in the system, EL, at an arbitrary moment in time is given by s 1 EL = iπ(i) + (s + i + j)π(i, j). i=0 i=0 j=0 The expected number of customers in the queue, EL Q, at an arbitrary moment in time is i=0 EL Q = (i + j)π(i, j). i=0 j=0 The expected waiting time in the queue, EW Q, of an arbitrary customer is EW Q = EL Q /. The probability that an arbitrary customer will abandon because of impatience is given by P(A) = (iγ 1 + jγ 2 )π(i, j)/. i=0 j=0 Finally, the probability that an arbitrary customer will receive service is P(S) = 1 P(A). Our performance measure of interest is the service level defined by P(V Q < τ), where V Q denotes the virtual waiting time in the queue of an arbitrary tagged customer (with infinite patience). However, customers waiting in front of this tagged customer are still subject to abandonments. To compute the service level, we condition on the state as seen by an arriving customer. Thanks to PASTA, the probabilities seen by this customer are identical to those seen by an external observer, i.e., the steady-state distribution π. Let us now denote by L Q the queue state when all servers are busy. For i, j 0, L Q = (i, j) means that all servers are busy, and that i and j customers of types 11
1 and 2 are waiting in the queue upon the arrival of our tagged customer, respectively. We then may write s 1 P(V Q < τ) = π(i) + P(V Q < τ L Q = (i, j))π(i, j). i=0 i=0 j=0 To compute the probabilities P(V Q < τ L Q = (i, j)) we proceed as follows. A customer, finding i type 1 and j type 2 customers in the queue upon arrival, always has to wait for a service completion. The time it takes for one of the servers to complete its service is exponentially distributed with rate sµ. Say this takes t time units. During time t each of the i + j customers could have abandoned. The probability that k out of the i type 1 customers abandon during time t is given by, for i N 0, k = 0,..., i, P(k out of i abandonments during t) = ( ) i (1 e γ1t ) k (e γ1t ) i k. k The same holds for the probability that l out of the j type 2 customers abandon during t. After the service completion epoch the first customer in line is immediately taken into service. Since k +l customers have abandoned, with probability (i k)/(i+j k l) a type 1 customer is first in line and is taken into service. Hence, the queue length is reduced from L Q = (i, j) to L Q = (i k 1, j l). On the other hand, with probability (j l)/(i + j k l) a type 2 customer is first in line and is taken into service. This decreases the queue length to L Q = (i k, j l 1). Also, there is now only τ t time left in order to reach the original service level. This process repeats itself until eventually the queue length becomes either L Q = ( 1, 0) or L Q = (0, 1). When this happens the tagged customer is taken into service. Combining all of this we state that P(V Q < τ L Q = (i, j)) 12
is recursively defined as, for i N 0, j N 0, P(V Q < τ L Q = (i, j)) = τ i j sµe sµt 0 k=0 l=0 ( ( ) ( ) i j (1 e γ1t ) k (e γ1t ) i k (1 e γ2t ) l (e γ2t ) j l k l i k i + j k l P(V Q < τ t L Q = (i k 1, j l)) ) j l + i + j k l P(V Q < τ t L Q = (i k, j l 1)) dt, (1) with P(V Q < τ L Q = ( 1, 0)) = 1 and P(V Q < τ L Q = (0, 1)) = 1. Note that if k = i and l = j (i.e., all customers in the queue have abandoned before a service completion) the tagged customer will reach the service level with probability one. In our expression, however, we divide by zero in this case (i + j k l = 0). Since the last two lines in Equation (1) should sum up to one for k = i and l = j, the issue is conveniently solved by defining 0/0 = 1/2. To solve the recursions, we start with L Q = (0, 0). It immediately follows that P(V Q < τ L Q = (0, 0)) = 1 e sµτ. Note that it remains a function of τ. For L Q = (1, 0) we need to evaluate this function at τ t. After some algebra, it follows that P(V Q < τ L Q = (1, 0)) = 1 e sµτ + sµ γ 1 e (sµ+γ 1)τ sµ γ 1 e sµτ. This procedure can then be applied for all other L Q in the same way. 3.3 Numerical Validation We validated our modeling approach using the numerical examples described as follows. As a base example we have an queueing system with the following parameters: = 2, µ = 1, s = 3, p = 0.1, γ 1 = 2, and γ 2 = 1. From this base example we vary µ [0.5, 1.3] and γ 2 (0, 12]. We consider the service level defined by τ = 1/3. We compare the service level in terms of the virtual waiting time obtained from our approach with the service level obtained by 13
0.95 0.9 0.92 0.9 0.85 0.88 Service level 0.8 0.75 0.7 0.65 0.6 Service level 0.86 0.84 0.82 0.8 0.55 0.78 0.5 0.76 0.45 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 µ 0.74 0 2 4 6 8 10 12 γ 2 Figure 4: Validation of our modeling approach with simulations. means of simulations. The results are shown in Figure 4. As can be seen, our approach agrees with the simulations, thereby validating the modeling approach. 3.4 Alternative Service Level Definitions The service level is currently defined as the probability that the virtual waiting time is less than the acceptable waiting time τ. Other definitions of the service level can also be derived. For instance, a service level of practical relevance is P(V Q < τ, V Q < X), where X is distributed according to the hyperexponential patience distribution. This is the probability that the waiting time is less than the acceptable waiting time and that service occurred before abandonment. It is defined for all customers (abandoned or not). The derivation of this performance measure is straightforward. If we condition on the type T of the tagged customer, everything is exponential again. We may write s 1 ( P(V Q < τ, V Q < X) = π(i) + pp(v Q < τ, V Q < X L Q = (i, j), T = 1) i=0 i=0 j=0 ) +(1 p)p(v Q < τ, V Q < X L Q = (i, j), T = 2) π(i, j), 14
where P(V Q < τ, V Q < X L Q = (i, j), T = z) = τ i j sµe (sµ+γz)t 0 k=0 l=0 ( ( ) ( ) i j (1 e γ1t ) k (e γ1t ) i k (1 e γ2t ) l (e γ2t ) j l k l i k i + j k l P(V Q < τ t, V Q < X L Q = (i k 1, j l, T = z)) ) j l + i + j k l P(V Q < τ t, V Q < X L Q = (i k, j l 1, T = z)) dt, for z = 1, 2. The only difference with Equation (1) is that here with probability e γzt the tagged type z customer does not abandon before t time units, the time it takes for the first service completion. Using this service level and the previous virtual waiting time service level, we can immediately obtain other useful service level definitions. In [Jouini et al., 2011], the following definitions are derived # answered < τ # offered = P(V Q < τ, V Q < X), # answered < τ # offered # abandoned < ζ = P(V Q < τ, V Q < X) P(X > ζ)p(v Q > ζ) + P(V Q < ζ, V Q < X), # answered < τ # offered # abandoned < τ = P(V Q < τ, V Q < X) P(X > τ)p(v Q > τ) + P(V Q < τ, V Q < X), # answered < τ # answered # abandoned < τ # abandoned = P(V Q < τ, V Q < X), P(V Q < X) = P(V Q < τ, V Q > X), P(V Q > X) P(virtual waiting time < τ) = P(V Q < τ), P(actual waiting time < τ) = 1 P(V Q > τ)p(v > τ). The numbers given in the previous equations are defined on a time interval with infinite duration, in order to obtain the steady-state values of the performance measures. Here ζ τ denotes the threshold for short abandonments. Short abandonments are for example calls that abandon within 5 seconds, while the acceptable waiting time is 20 seconds. The probability of service is simply P(S) = P(V Q < X), and that of abandonment is P(A) = 1 P(S) = P(V Q > X). Note that 15
# answered<τ # answered is the conditional cdf of the waiting time in the queue, given service, and # abandoned<τ # abandoned is the conditional waiting time in the queue, given abandonment. The actual waiting time is the time spent in the queue (defined by W Q in Section 3.2), i.e., the minimum of the virtual waiting time and the patience, or the unconditional waiting time in the queue that finishes after either a customer abandonment or a customer start of service. It can be also given by P(actual waiting time < τ) = # answered < τ # answered P(S) + # abandoned < τ # abandoned P(A). 4 Comparison with Whitt s Approach In this section we illustrate the performance of our model and compare it with the approximate algorithm of [Whitt, 2005]. For both our model and Whitt s model we compute P(V Q < τ, V Q < X) and compare it with the real service level obtained by means of simulations. In the simulations we use the empirical cdf of the patience, obtained with the Kaplan-Meier estimator. In Whitt s method we use the hazard rate derived from the empirical patience distribution. Our model fits a hyperexponential distribution on the empirical data. For the four data sets under consideration, the parameters related to the patience distribution are shown in Table 1. For the first example we consider a system with (0, 3], µ = 1, s = 3, and τ = 1/3. The service level estimates for the simulation model, our model, and Whitt s model are depicted in Figure 5. All plots in the figure show that our model is very close to the simulations and that it outperforms Whitt s model. Though the differences on data set 4 are small. The performance on data set 3 is noteworthy, since the hazard rate appeared to be overestimated as shown in Figure 1. Also, Whitt s model significantly underestimates the service level when the offered load is increased on this data set. On data sets 1 and 2 Whitt s model shows irregular behavior, which is probably caused by the small system in this example. For the second example we consider a larger system with [8, 12], µ = 0.2, s = 54, and τ = 1/3. The results of the comparison are shown in Figure 6. The results on data set 1 indicate 16
1 0.9 Dataset1 Whitt 1 0.95 0.9 Dataset2 Whitt 0.8 0.85 Service level 0.7 Service level 0.8 0.75 0.6 0.7 0.5 0.65 0.6 0.4 0 0.5 1 1.5 2 2.5 3 0.55 0 0.5 1 1.5 2 2.5 3 1 0.9 Dataset3 Whitt 1 0.9 Dataset4 Whitt 0.8 0.8 Service level 0.7 0.6 Service level 0.7 0.6 0.5 0.5 0.4 0.4 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 Figure 5: Comparison of the models, for a system with µ = 1, s = 3, and τ = 1/3. that our model is a bit lacking in performance in the situation where the system is overloaded without abandonments ( 10.8). However, the differences are small. On data set 2 our model underestimates the service level by the same amount that Whitt s model is overestimating it. So neither model is clearly preferable above the other. On data set 3 both our model and Whitt s model severely underestimate the service level. However, our model performs significantly better since the error is only less than half the error of Whitt s model. Because of the relatively large error on this data set, we have performed an additional validation that consists of simulating the model. This shows that there is a modest approximation error. On data set 4 the performances are equal. 17
1 0.9 Dataset1 Whitt 1 0.95 Dataset2 Whitt 0.8 Service level 0.7 0.6 Service level 0.9 0.85 0.5 0.8 0.4 8 8.5 9 9.5 10 10.5 11 11.5 12 0.75 8 8.5 9 9.5 10 10.5 11 11.5 12 1 0.9 0.8 Dataset3 Validation Whitt 1 0.9 0.8 Dataset4 Whitt Service level 0.7 0.6 Service level 0.7 0.6 0.5 0.5 0.4 0.4 0.3 8 8.5 9 9.5 10 10.5 11 11.5 12 0.2 8 8.5 9 9.5 10 10.5 11 11.5 12 Figure 6: Comparison of the models, for a system with µ = 0.2, s = 54, and τ = 1/3. Instead of only looking at the service level performance measure, we also consider the abandonment probability. When we perform experiments on the larger system, we find the results as shown in Figure 7. These results show almost no difference between the models, which means that the errors caused by the modeling approaches are negligible. On data set 3, however, there is a noticeable difference in favor of our model, but the errors do not seem to propagate. All in all, we can conclude from the performance analysis on these examples that our model is at least as good, if not better. 18
0.14 0.12 Whitt Dataset1 0.14 0.12 Whitt Dataset2 Abandonment probability 0.1 0.08 0.06 0.04 Abandonment probability 0.1 0.08 0.06 0.04 0.02 0.02 0 8 8.5 9 9.5 10 10.5 11 11.5 12 0 8 8.5 9 9.5 10 10.5 11 11.5 12 Abandonment probability 0.14 0.12 0.1 0.08 0.06 0.04 Validation Whitt Dataset3 Abandonment probability 0.12 0.1 0.08 0.06 0.04 Whitt Dataset4 0.02 0.02 0 8 8.5 9 9.5 10 10.5 11 11.5 12 0 8 8.5 9 9.5 10 10.5 11 11.5 12 Figure 7: Comparison of the models, for a system with µ = 0.2, s = 54, and τ = 1/3. 5 Computational Issues In order to be useful in practice, our method for computing the service level should be fast. We are interested in the execution time on a realistic large-scale call center with the following parameters: = 105, µ = 0.2, s = 500, p = 0.05, γ 1 = 4, and γ 2 = 0.1. Note that this system would not be stable without abandonments. The highly asymmetric parameters of the patience distribution require much more computations in one direction, and can therefore be seen as a worst case. For τ = 1/3 the service level will be around 70%. In order to compute the basic performance measures and the service level, the steady-state distribution is needed. Because the size of the state space is infinite in both dimensions, we have 19
truncated it to (M, N). We then get a blocking system. The probability that an arriving customer gets blocked (or equivalently the probability of loss) is given by N M p loss = p π(m, j) + (1 p) π(i, N). j=0 i=0 The stationary probabilities π(i, j) of the original system can be approximated by those of the blocking system with various desired precisions. For instance, if p loss < 10 6, the error on the stationary probabilities π(i, j) may be considered as negligible. On the considered large-scale call center (500 servers) this is already achieved by choosing (M, N) as small as (4, 20). It then takes about 0.013 seconds to compute the steady-state distribution on a computer with an Intel Core 2 Duo T9300 CPU and 2 GB RAM, which is almost negligible. For even larger systems (1000 servers) that require (M, N) equal to (20, 100) for instance, it still only takes 0.590 seconds. For Whitt s approach, under the same conditions, it takes on average 0.528 seconds to compute the service level. The final challenge is to compute P(V Q < τ L Q = (i, j)), for i = 0,..., M and j = 0,..., N. Fortunately, it is not necessary to compute them all. It is intuitively clear that these probabilities are decreasing in both i and j. Combining this with the fact that, at least for non-overloaded systems, the steady-state probabilities are also decreasing in i and j, the product of π(i, j) and P(V Q < τ L Q = (i, j)) quickly becomes negligible. However, the computation time is quite large. For instance, on the large-scale call center example, it takes a few minutes to compute the service level in this way. The problem is that the probabilities P(V Q < τ L Q = (i, j)) are created on the fly. A solution to this problem is to compute these probabilities only once using the symbolic representations of τ, γ 1, γ 2, and the product sµ, and to store the results for subsequent use. Then the service level can be obtained instantly. A further optimization is possible, since the probability at (i, j) is the same as the probability at (j, i) only with γ 1 and γ 2 interchanged. An alternative solution to the problem of computing P(V Q < τ L Q = (i, j)) is to resort to approximations. We can, for instance, very easily approximate these probabilities by numerically integrating the 20
integrals. However, the gain in computation time is then offset by the possible loss of accuracy. The question still remains when it is safe to stop. To answer this question we give an upper bound on the service level. The lower bound is as follows. If we stop at a certain (i, j ), the service level computed so far is our best estimate, but we know that the real service level is certainly not lower than this. When we stop at (i, j ), we evaluate the remaining non-computed probabilities P(V Q < τ L Q = (i, j)) (for higher i or j) to 0. So by stopping at (i, j ), we obtain a lower bound of the real service level (for which i and j go to infinity). The upper bound is as follows. The conditional virtual waiting time given L Q = (i, j) is equal to the conditional one given L Q = (i, j ) plus some positive passage times in the Markov chain in Figure 3, for any i i and j j. Thus, the probabilities P(V Q < τ L Q = (i, j)) are decreasing in i and also in j, an easy upper bound is to estimate the remaining probabilities as follows. P(V Q < τ L Q = (i, j )), i > i, j > j, P(V Q < τ L Q = (i, j)) = P(V Q < τ L Q = (i, j)), i > i, j j, P(V Q < τ L Q = (i, j )), i i, j > j. If we do not stop at (i, j ), but continue to either (i + 1, j ) or (i, j + 1), then our lower bound increases and at the same time our upper bound decreases. It is safe to stop when the difference between the lower bound and upper bound is small enough, for instance when the difference is less than 10 4. 6 Concluding Remarks and Future Research This paper has given several insightful observations. First, we depicted the general behavior of customers patience in call centers. The considered examples show a high hazard rate in the first thirty seconds of the waiting time. After that the hazard rate is constant. Due to delay announcements in the call handling system however, there may be additional bumps in the hazard rate. A second observation is that the hazard rate of the hyperexponential distribution has the same shape as the hazard rate of the patience. Therefore, it is natural to model the patience by 21
the hyperexponential distribution. This implies the queueing model. The queue allows an appropriate approximate analysis by means of a continuoustime Markov chain. The state space for the number of customers waiting in the queue is then two-dimensional. We compute the steady-state distribution using the recipe of a standard onedimensional state space. This is possible since the state space can be made finite, by truncation. The level of truncation can be chosen such that there is an arbitrarily small loss of precision. We have developed a framework for the computation of the service level. A tagged customer will reach the service level only if the first service completion occurs before the acceptable waiting time threshold. During this time, customers in front of the tagged customer could have abandoned. The service level follows from a recursive relation between the queue lengths at successive service completion epochs. This approach is validated to work. We have shown how alternative practical service level definitions can easily be obtained using our framework. Our approach is applied to real call centers, where the model parameters for the hyperexponential distribution are obtained from empirical data. When we compare our approach with Whitt s model, we obtain in some cases excellent and overall better performance. Finally, our approach is fast for most systems encountered in practice. A nice direction for future research would be to consider a two-skill call center as follows. There are only generalists, i.e., agents that have both skills and can serve all customers. No priorities are given to any type of customers. The customers enter a single shared queue and are served according to the FCFS discipline. The different customer types can have different service requirements or different patience. An example could be a bilingual call center, where foreign customers are more patient but also require longer service. Our framework could be used for this situation. It would be interesting to compute performance measures for both customer types. 22
References [Akşin et al., 2007] Akşin, O., Armony, M., and Mehrotra, V. (2007). The modern call center: A multi-disciplinary perspective on operations management research. Production and Operations Management, 16(6):665 688. [Baccelli and Hebuterne, 1981] Baccelli, F. and Hebuterne, G. (1981). On queues with impatient customers. In Performance 81, pages 159 179. North-Holland. [Boxma and de Waal, 1994] Boxma, O. and de Waal, P. (1994). Multiserver queues with impatient customers. In Labetoulle, J. and Roberts, J., editors, Proceedings of the 14th International Teletraffic Congress, pages 743 756. [Brandt and Brandt, 1999] Brandt, A. and Brandt, M. (1999). On the M(n)/M(m)/s queue with impatient calls. Performance Evaluation, 35(1):1 18. [Brandt and Brandt, 2002] Brandt, A. and Brandt, M. (2002). Asymptotic results and a Markovian approximation for the M(n)/M(n)/s + GI system. Queueing Systems, 41(1/2):73 94. [Brown et al., 2005] Brown, L., Gans, N., Mandelbaum, A., Sakov, A., Shen, H., Zeltyn, S., and Zhao, L. (2005). Statistical analysis of a telephone call center: A queueing-science perspective. Journal of the American Statistical Association, 100(469):36 50. [Colledani and Tolio, 2009] Colledani, M. and Tolio, T. (2009). Performance evaluation of production systems monitored by statistical process control and off-line inspections. International Journal of Production Economics, 120(2):348 367. [Gans et al., 2003] Gans, N., Koole, G., and Mandelbaum, A. (2003). Telephone call centers: Tutorial, review, and research prospects. Manufacturing & Service Operations Management, 5(2):79 141. 23
[Iravani and Balcıo glu, 2008] Iravani, F. and Balcıo glu, B. (2008). Approximations for the M/GI/N + GI type call center. Queueing Systems, 58(2):137 153. [Jouini et al., 2009] Jouini, O., Dallery, Y., and Akşin, O. (2009). Queueing models for full-flexible multi-class call centers with real-time anticipated delays. International Journal of Production Economics, 120(2):389 399. [Jouini et al., 2011] Jouini, O., Koole, G., and Roubos, A. (2011). Performance indicators for call centers with impatience. Submitted. [Kort, 1983] Kort, B. (1983). Models and methods for evaluating customer acceptance of telephone connections. In GLOBECOM 83, pages 706 714. IEEE. [Mandelbaum and Zeltyn, 2004] Mandelbaum, A. and Zeltyn, S. (2004). The impact of customers patience on delay and abandonment: some empirically-driven experiments with the M/M/n+G queue. OR Spectrum, 26(3):377 411. [Massey, 1951] Massey, F. (1951). The Kolmogorov-Smirnov test for goodness of fit. Journal of the American Statistical Association, 46(253):68 78. [Palm, 1937] Palm, C. (1937). Étude des délais d attente. Ericsson Technics, 5:37 56. [Stanford, 1979] Stanford, R. (1979). Reneging phenomena in single channel queues. Mathematics of Operations Research, 4(2):162 178. [Wallström and Segerstedt, 2010] Wallström, P. and Segerstedt, A. (2010). Evaluation of forecasting error measurements and techniques for intermittent demand. International Journal of Production Economics, 128(2):625 636. [Whitt, 2005] Whitt, W. (2005). Engineering solution of a basic call-center model. Management Science, 51(2):221 235. 24