SmartDispatch: Enabling Efficient Ticket Dispatch in an IT Service Environment Shivali Agarwal IBM Research Bengaluru, India shivaaga@in.ibm.com Renuka Sindhgatta IBM Research Bengaluru, India renuka.sr@in.ibm.com Bikram Sengupta IBM Research Bengaluru, India bsengupt@in.ibm.com ABSTRACT In an IT service delivery environment, the speedy dispatch of a ticket to the correct resolution group is the crucial first step in the problem resolution process. The size and complexity of such environments make the dispatch decision challenging, and incorrect routing by a human dispatcher can lead to significant delays that degrade customer satisfaction, and also have adverse financial implications for both the customer and the IT vendor. In this paper, we present SmartDispatch, a learning-based tool that seeks to automate the process of ticket dispatch while maintaining high accuracy levels. SmartDispatch comes with two classification approaches - the well-known SVM method, and a discriminative term-based approach that we designed to address some of the issues in SVM classification that were empirically observed. Using a combination of these approaches, SmartDispatch is able to automate the dispatch of a ticket to the correct resolution group for a large share of the tickets, while for the rest, it is able to suggest a short list of 3-5 groups that contain the correct resolution group with a high probability. Empirical evaluation of SmartDispatch on data from 3 large service engagement projects in IBM demonstrate the efficacy and practical utility of the approach. Categories and Subject Descriptors H.4.0 [Information Systems Applications]: General; I.5.4 [Pattern Recognition]: Applications Text Processing; I.5.2 [Pattern Recognition]: Design Methodology Feature Evaluation and Selection General Terms Design, Experimentation Keywords Ticket resolution group, SVM classification, Discriminative term weighting, Automated and advisory mode dispatch Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD 12, August 12 16, 2012, Beijing, China. Copyright 2012 ACM 978-1-4503-1462-6 /12/08...$15.00. 1. INTRODUCTION Information technology (IT) is now a major enabler for most businesses. The size, diversity and complexity of such systems have increased manifold over the years, prompting organizations to outsource the maintenance of their IT systems to specialized service providers (IT vendors). These vendors employ skilled practitioners and organize them into teams with the responsibility of maintaining different parts of the IT system. For example, a team may be responsible for a specific process area of a large packaged application, a set of custom applications that leverage a common technology, or the like. For systems in production use, customers submit service requests to the vendor in the form of tickets, that represent a specific IT problem or need experienced by the end users (e.g. a failed transaction, user authentication expiry, data formatting issues etc.), and are generally small, atomic tasks that can be handled by a single practitioner within a short duration (e.g. a few minutes to a few days). The tickets are usually assigned to a practitioner in a twostep process. First, incoming tickets are received by a dispatcher, who reviews the problem description text (entered by the customer or end user) to try and understand which service team (henceforth called resolution group) is responsible for addressing the ticket, and then dispatches it to that group of practitioners. Next, within the group, the assignment of the ticket to a specific practitioner may be done by the group lead, or a practitioner may volunteer for the same, and this is usually based on criteria such as complexity of the ticket and expertise of the practitioner, the practitioner s availability and current workload etc. In this paper, we focus on the first step of the assignment process - namely the dispatching of a ticket to an appropriate resolution group based on the problem text description. Using text-based classification techniques applied on large-scale real-life data from IBM s services engagements, we study the extent to which this process may be automated and how practical tool support may be provided to aid the dispatcher when complete automation is not feasible. The dispatch of a ticket - as soon as it arrives - to the correct group of practitioners, is a critical step in the speedy resolution of a ticket. If the dispatcher misinterprets the problem and routes the ticket to an incorrect group, then significant time may be wasted before the group reviews the problem, determines it is not in their area of responsibility, and transfers it back to the dispatcher, to begin the process anew. This calls for timely and intelligent handling of tickets by the dispatcher. However, a number of factors make the dispatcher s job challenging. First, s/he needs to have a 1393
reasonably broad knowledge of the entire IT portfolio being managed, along with the roles and responsibilities of the individual groups. Second, s/he needs to be able to quickly parse the ticket text describing the problem and map it to the right group, which is often not straightforward given the heterogeneous and informal nature of the problem description reported by human users. While a senior practitioner with a good overview of the customer s IT system may be able to discharge these responsibilities well, such practitioners are usually preoccupied with helping their colleagues resolve the more complex tickets and the routine dispatch task (that is also labor intensive when the ticket volume is high) is often assigned to one of the less experienced practitioners. High attrition in service delivery teams may further compound the problem, as a dispatcher who had developed a certain amount of expertise over a period of time, may leave the organization, and a new practitioner filling in will take time to develop into the role. Incorrect dispatch decisions that result in such situations can significantly increase the total turnaround time for ticket resolution. For example, we observed in a study of an actual production support system that the average turnaround time for a ticket jumps by as much as 100% as the number of transfers increase from 2 to 4. When such delays occur, the customer s business may be severely impacted, customer satisfaction degrades, and the vendor may also need to compensate the customer by paying a penalty in case of a breach in Service Level Agreement (SLA). Inefficiencies in dispatch may thus have serious business consequences that an IT vendor can ill-afford. 1.1 Smart Dispatch The practical challenges associated with manual ticket dispatch motivated us to investigate how, and to what extent, we can automate the process of resolution group selection based on ticket text description. This has led to the development of a tool called SmartDispatch, which we describe and evaluate in this paper. Our goal is to have a performance baseline that is comparable to that of an expert human dispatcher, so the error rate (incorrect routings) should be low (within 10% of tickets). As our subjects of study, we selected large 3 production systems maintained by IBM (from 3 different domains and customers), where 2 of the systems had 25 resolution groups each, while the largest one had 79 resolution groups. The total number of tickets across these 3 data sets was more than 82,000. To train the SmartDispatch tool in taking intelligent routing decisions, we decided to use a supervised learning approach, with 60% of tickets in each set being used to train a classification engine, while the remaining 40% of tickets was used to test the accuracy of classification. We applied text processing on the ticket text to transform it into a weighted vector of terms, and then used the well-known Support Vector Machine (SVM) method to build the classification engine. Our experiments revealed that the accuracy of Smart- Dispatch with a full-automated approach ranged from 69% to 81% in the three systems, thereby implying an incorrect dispatch of 19% to 31% of the tickets. Given the business consequences of incorrect routing as noted above, we concluded that blanket automation may not be a practical solution to the dispatch problem, as it may lead to unacceptably high error rates. Next, we sought to determine the percentage of tickets we could automatically dispatch with a low error rate, by considering the confidence probabilities of classification as returned by the engine. The idea here is that if a reasonably high percentage of tickets can be correctly and automatically dispatched this way through the SmartDispatch system, it would significantly reduce the burden on an expert human dispatcher who can process the remaining tickets. As a threshold, we considered a confidence probability of 90% i.e. as long as the probability of a ticket belonging to a particular group was perceived to be >=90% by the classification engine, we would route the ticket to that group. We obtained mixed results using this approach. On the positive side, we found that when the confidence probability was >=90%, the error rate of SmartDispatch was only around 4%-6%. Unfortunately, we also found that while the percentage of tickets where the engine displayed such a high classification confidence was reasonably high for two data sets (55% to 62%) it was quite low (25%) for the other data set. We concluded that this was unlikely to be a useful generic approach when dispatching real-life service tickets, since the need for human involvement and judgement may continue to be very high. Given this, our next goal was to see if we can devise a classification approach that is able to consistently classify a significant percentage of tickets (at least >50%) with high confidence probability, and with low error rate. The reason why we felt this may still be feasible was that on closer review of ticket text and analysis of how SVM was handling the terms in the text, we observed that discriminative terms - words or phrases that seemed to characterize one or a small subset of resolution groups - appeared to be not particularly well-leveraged by SVM, even when we experimented with the weights assigned to these terms. This motivated us to design a classification approach - the discriminative term approach (DTA) - that assigns weights to terms using a function called inverse group frequency, which is inspired by the notion of inverse document frequency (idf) used by search engines to score and rank a document s relevance given a user query. In this approach, we assumed the classification engine to display a high confidence probability in a resolution group if it gave it a higher score than any other group, and in such cases, the ticket is automatically dispatched to that group. Using this new approach, we obtained strong results - across the 3 groups, the percentage of tickets that could be automatically dispatched ranged from 59% to as high as 72%, with a 100% precision rate in all cases. The final question to consider was how SmartDispatch could best handle tickets that did not have the necessary clarity for automatic dispatch to a specific group. Here again, the confidence scores returned by classification engines came in handy. We decided to adopt a dual mode dispatch strategy wherein SmartDispatch, when unable to automatically dispatch a ticket to a group, switches to an advisory mode and forwards the ticket to a human dispatcher, with suggestions on the top N groups the ticket may likely belong to (the groups having the top N confidence scores). Alternatively, if we still desire a fully automated system, SmartDispatch could offer the tickets through a limited broadcast to the top N groups, with the expectation that the correct group will identify the ticket as its own and take ownership. Here, interestingly, the results derived from SVM-based classification outperformed those derived from our discriminative term approach. In fact, very effective dispatch performance could be achieved by leveraging the 1394
complementary strengths of the two approaches - using the discriminative term approach for fully automated dispatch of a large percentage of tickets to the correct resolution group, and using SVM to advise on potential group(s) in case the decision was not unambiguous, with a combined error rate of well within 10%. Overall, our experience with SmartDispatch suggests that a combination of learning techniques, along with a dual mode dispatch strategy, can provide a satisfactory, near-automated solution to the problem of fast, accurate dispatch of service tickets to resolution groups. The rest of the paper is structured as follows: Section 2 describes the SmartDispatch tool design, architecture and the learning methods used in the tool; Section 3 presents evaluations results; related work is discussed in Section 4, while Section 5 concludes the paper. 2. SmartDispatch TOOL DESIGN The basic principle underlying the SmartDispatch tool is to build a prediction model for ticket dispatch using supervised learning on historical ticket data, and then use the model to guide subsequent dispatch decisions. Past ticket data is persisted in repositories, usually within the ticketing system itself. Two key fields recorded for each ticket are Description, which is a text field that describes the problem/need that needs to be addressed, and the Resolution Group, which is the team in which the practitioner who resolved the ticket belongs. While a ticket has many other fields as well (e.g. severity, open/close timestamps, practitioner name etc.) the (Description, Resolution Group) attributes are what we use to build a prediction model for dispatch based on ticket text. The prototypical text classification problem can be posed as done in [2]: Given a set of labelled text documents L = { x i, c i } L i=1 where ci C = {1, 2,..., C } denotes the category of document x i and C and L are the total number of predefined categories and labelled documents; learn a classifier that assigns a category label from 1 to C to each document in the fresh set U = x i U i=1. This is a supervised learning approach, wherein it is assumed that the joint probability distribution of documents and categories is identical in sets U and L (although this may not be guaranteed in practice). Figure 1 depicts the architecture of SmartDispatch, and we outline the different components below. 2.1 Text processing Text processing is necessary to generate a numeric form of description that can be consumed by classification methods like Support Vector Machine (SVM). The ticket description text is transformed into a vector space model. This includes extracting nouns and verbs as terms from the ticket description and reducing the terms to their morphological root. Stanford POS Tagger [11] is used to identify the nouns and verbs in the ticket description. Porter s stemming [7] is further used on the terms. Once the terms are extracted, any appropriate term weighting scheme can be applied on ticket description to obtain a vector representation. The text processing component of our tool is generic enough to handle data from different kinds of maintenance projects. The results presented in Section 3 have been arrived at by using this generic scheme of stop words and stemming. While account-specific dictionaries may help process ticket text more intelligently, such dictionaries may not always be available or up-to-date, hence the generic nature of text processing in SmartDispatch ensures wide applicability of the tool. 2.2 Generating Supervised Learning Model There exists a number of supervised learning techniques that may be used to learn a classification model that uses ticket text description to predict resolution group. The SmartDispatch tool comes with the well-known classification technique of Support Vector Machine, and also employs another learning technique based on discriminative terms that has been developed in house. Both of these approaches are described in detail below with appropriate insights and motivation. 2.2.1 Standard Classifier Approach We have chosen Support Vector Machine (SVM) as the standard classification engine in SmartDispatch based on existing literature [1] that puts it above the rest of the techniques for unstructured text. The choice was made after verifying this result with the sample data that we had. The tf*idf term weighting scheme was applied on each ticket description to obtain a vector representation for SVM. The SVM approach was found to be more robust than other techniques such as decision trees, Bayesian network, K-means etc. as it performed reasonably well for all data samples, and produced the best results in most cases. We also found the notion of confidence intervals reliable in case of SVMs and such intervals are the basis for designing the advisor module of our tool. The standard classifier approach will be used interchangeably with SVM based approach in the remaining paper. We have used the SVM implementation in SPSS [10], an industry standard statistical analysis package, within SmartDispatch. However, one of the issues with the SVM approach that we observed was the tendency of biased prediction in favor of groups having large volume of tickets. For example, if groups A and B have significantly higher number of tickets as compared to the rest of the groups (which also have sufficient number of tickets to learn the model), then the mis-classified tickets of the rest of the groups were often found to labelled as A or B. We tried using different forms of term weight functions but with little improvement. Upon case based inspection of mis-classified tickets, we figured out that SVM was not able to effectively exploit the discriminatory terms present in those tickets in many cases. This led to the design of a new approach based on discriminative terms described below. Such an approach was envisioned to complement the SVM approach. 2.2.2 Discriminative Term Approach While trying to use ticket descriptions for group prediction by exploiting the discriminative content that they possess, we developed a learning approach based on discriminative terms or keywords. The motivation for this approach stems from the fact that a resolution group typically has a set of discriminative terms related to the IT sub-system they manage, which are likely to appear in the tickets that come to them for resolution. If we identify the terms that correspond to each group by mining the descriptions associated with that group, then we can assign appropriate weights to all the terms in the entire data set to signify their extent of uniqueness with respect to groups. Using this as a model, the terms in a new ticket can be analyzed to arrive at a 1395
Figure 1: SmartDispatch Architecture Diagram group closeness score, thus predicting the group to which the ticket should be dispatched. For example, a term like error that say, occurs in 8 out of 10 groups, is not discriminative as compared to a term like password that may occur in only 2 out of 10 groups. Any ticket that consists of the term password will have high closeness score with respect to those groups. The steps involved in this approach consist of discriminative term weighting, term selection and association, and classifier function definition. We outline these below. Discriminative term weighting: We have defined a weight function called inverse group frequency (igf). This function abstracts away the role of number of documents (tickets) involved in arriving at the weight for a term and uses only the group cardinality. This ensures that the function performs equally well on small as well as large training datasets as long as the distribution of the sample is the same as the actual data. The formal definition is given below. Let T be the set of terms in the dataset. Let w t denote the discriminative weight of the term t T. Let N denote all the groups in the data and let N t denote the groups that have occurrence of t in them. The discriminative weight is given by the formula: w t = N/N t The value of w t is what we refer to as igf of the term as it is analogous in concept to idf. Higher the value of igf for a term, higher is the discriminative nature of the term. The value of igf is computed for all the terms in the document space. If needed, a threshold can be used to restrict the number of terms in the space by allowing only those terms whose frequency of occurrence is above the threshold. Term selection and association: Each resolution group needs to be associated with a set of terms that best describe it. To do this, we first find all the terms associated with a group through text processing of the ticket descriptions of all tickets belonging to this group. Next, we perform term selection for each group by discarding all terms that have igf value below a certain predefined threshold value. The threshold value can be easily arrived at by setting some lenient value upfront and then refining it over a period of time. Each group is thereby associated with a set of terms that meet the threshold criteria. Formally, we use G to denote the set of groups and T g to denote the terms that have been selected for the group g G. Classifier function: This is the step that pools all information about groups and their terms and uses it for prediction. The igf value of each term belonging to a group gives us an idea of how discriminating that term is for the group. This information is pooled to obtain closeness score of a ticket with respect to each group. For each ticket, the terms in the ticket are used to compute a closeness score using a well defined linear function given below, with respect to each group. The group(s) with the highest score is(are) the group(s) that the ticket is predicted to belong. More formally, let x denote a ticket description consisting of a set of terms T x, and let V al x(g) denote the score of ticket x with respect to group g G computed using the linear discriminant function as below: V al x(g) = t T g T x w t The use of discriminant function reduces the T dimensional space to N = G dimensional feature space, and V al x(g), g G defines the N-dimensional feature space. The classifier function for the ticket description x can be simply written as follows: ĝ(x) = arg max g GV al x(g) such that ĝ(x) denotes the group that is obtained by solving the straightforward optimization problem posed above using iterative technique. Note that there can be multiple groups with the same score. So, it is possible that for certain tickets, there is no total order on the group ranks based on score. 1396
This approach works best if descriptions have discriminative nature for distinct groups. The prediction model is very simple in this case and once the discriminative terms enter the system, the model stabilizes very fast. 2.3 Dispatch Decision Maker An incoming ticket for which a dispatch decision has to be made is first classified (mapped to resolution group(s)) using one of the supervised learning techniques described above. If there is sufficient confidence in the classification result, then the Dispatch Decision Maker will dispatch the ticket to that group. For SVM-based classification, we exercise this Auto Dispatch Policy if a ticket is classified to a group by SVM with a confidence probability of 0.9 or above. We chose this threshold based on the observation that the error rate of prediction by SVM is very low when the confidence probability is 0.9 or above, as we discuss in more detail during the evaluation of SmartDispatch in the next section. For the discriminatory term approach, where multiple groups may hold the top position based on having the same score through the discriminant function, we exercise the auto dispatch policy if and only if there is a clear winner i.e. there is only a single group at the top position. There will be cases, however, when none of the classification approaches return a clear winner. In such situations, SmartDispatch resorts to an Advisory Dispatch Policy. Here, the tool selects a small subset of groups that may be expected to contain the correct resolution group with reasonably high confidence. If an expert human dispatcher is available, the tool forwards the ticket to him/her with an advisory that contains the identities of these selected groups. The expectation here is that the expert dispatcher will be able to take a well-judged dispatch decision, and if the advisory of the tool contains the correct group, it would have done its job well. If an expert dispatcher is not available, and/or if full automation is desired, then SmartDispatch may forward the ticket to each of the selected groups through a limited broadcast, with the assumption that the correct group (if present in the selected list) will take ownership of the ticket and resolve it, while others will ignore the ticket. If the correct group is not included in the shortlist compiled by SmartDispatcher, we consider it as an error in its prediction (thus we slightly relax the notion of correct prediction in advisory mode by allowing a small number of other groups to appear in the shortlist, along with the correct resolution group). In case of SVM-based classification, we exercise the advisory dispatch policy for predictions with confidence probability less than 0.9, and generate a small ranked list of between 1 and 5 groups, in decreasing order of confidence probabilities. For the discriminatory term based approach, when there are multiple groups having the highest score, then all the groups that have the same highest score are suggested as possible groups in the advisory mode. 2.4 Feedback Mechanism The new tickets that have been dispatched by the system may have their resolution group updated in case of an incorrect dispatch. Other details (e.g. closure timestamps, resolving practitioner etc.) would also get updated, and upon closure, the final ticket details are persisted with in the ticket repository. The learning model should be generated at frequent intervals in order to leverage the new data available, and improve the prediction capabilities. 3. EVALUATION OF THE TOOL We have evaluated the SmartDispatch tool on three ticket data sets, derived from three different ongoing services engagements involving IBM. To preserve client confidentiality, these data sets will henceforth be denoted by Dataset A, Dataset B and Dataset C. In order to test the wide applicability of the tool, the datasets were derived from three different domains: Media and Entertainment (A), Automotive (B) and Consumer Goods (C). The number of tickets in each data set, the number of resolution groups, and the duration of the dataset (derived from the timestamps of the constituent tickets) are shown in Figure 2. As can be seen, each data set covers a reasonably wide time window (with a minimum of two months) and has a significantly large number of tickets. Dataset C, of course, is by far the largest, although spanning over the shortest duration, since it comes from an account with very high ticket inflow. To build and test the prediction models, each data set was split into a training set and a testing set using a 60:40 ratio. Figure 2: Dataset Overview We will now discuss the evaluation results. The results will be presented in the sequence in which the tool was incrementally developed as outlined in the Introduction. We will thus begin with SVM as the sole classification engine used for automated dispatch of all tickets (Section 3.1); next, we will consider automated dispatch of tickets having high prediction confidence, and incorporate the discriminative term approach to see if the volume of such tickets may be increased (Section 3.2); we will then evaluate the tool in the advisory dispatch mode for tickets that could not be automatically dispatched to a single group (Section 3.3); finally, we will consider a heterogeneous dispatch strategy leveraging DTA for automatic dispatch, and SVM for the advisory or limited broadcast mode (Section 3.4). 3.1 Fully automated approach using SVM The very first set of experiments were carried out to evaluate the performance of fully automated dispatch of all tickets based on supervised learning using SVM. The results of running SVM model on test data are shown in table 1. As can be seen, the percentage of misrouted tickets is quite high in Table 1: Full Auto-dispatch with SVM Dataset SVM precision (%) % Misrouted A 81% 19% B 69% 31% C 73% 27% 1397
the 3 data sets, particularly for Dataset B and Dataset C. The error rates are considerably higher than our target performance baseline of 10% (as mentioned in Section 1.1), and this suggested to us that full automation may not be practical. This led us to consider dispatch based on confidence scores, as we discuss next. 3.2 Auto-dispatch based on confidence scores The ticket classification results from SVM contain confidence probabilities associated with each resolution group. We wanted to test if we could improve prediction accuracy of automated dispatch by only selecting tickets where there was a high confidence probability in a specific group, and if so, whether we could dispatch a significantly large share of tickets this way, so as to reduce the burden on a human dispatcher. Our experiments with SVM suggested that as the confidence probability increased, error rates went down significantly. This is depicted for all three data sets in Figures 3, 4 and 5. The left chart in the figures depict the percentage of tickets that fall in the confidence intervals of 0-0.1, 0.1-0.2,..., 0.9-1.0 where each bar depicts an interval. The chart on the right side shows the split of percentage of correct and incorrect predictions for each of these confidence intervals. For example, Figure 3 for dataset A shows that about 55% tickets occur in the range of 0.9 and above, out of which 4% are incorrect and rest are all correct predictions. This implies that 55% tickets can be dispatched automatically with 96% precision. The results of auto dispatch policy for confidence level of 0.9 and above using SVM are shown in Table 2, columns two and three. We see that the error rates are low for all three data sets, ranging from only 4% to 6%. However, the percentage of tickets that can be routed this way with high confidence probability is not consistently high, and is, in fact, as low as 25% for Dataset B. This suggested to us that confidence-based routing using SVM, using thresholds that lead to low error rates, carries the risk of passing on a significant share of tickets to a human dispatcher for manual routing. As mentioned in Section 2.2.1, we also observed that SVM was often unable to effectively exploit discriminatory terms in ticket text, and we hypothesized that if we were able to devise a classification approach that leverages discriminatory terms well, then it may also be able to classify a higher number of tickets with sufficient confidence. This led us to design and incorporate the Discriminative Term Approach (DTA) within SmartDispatch. Table 2: Auto-dispatch mode based on confidence probability Dataset %tickets SVM SVM precision %tickets DTA DTA precision A 55% 96% 70% 100% B 25% 94% 72% 100% C 62% 96% 59% 100% The results for DTA are presented in Table 2. The tickets for which only one group obtained the highest score were the ones eligible for auto-dispatch and these percentage values are shown in column four of Table 2 for the three data sets. It is noteworthy that the accuracy of auto-dispatch was 100% for all datasets. The percentage of tickets that could be auto-dispatched range from a low of 59% to a high of 72%. The results are significantly better than SVM for Dataset A and Dataset B, and only marginally worse for Dataset C (though still better in terms of error rates). The threshold value for igf was set to 1 for all datasets. This is the most liberal setting and thus, the results are obtained for the most generic case. The evaluation indicates that DTA is a very good choice for auto dispatch of tickets. The relatively bad performance on Dataset C was due to large number of tickets having terms with low igf values. We believe that this problem can possibly be resolved by introducing a notion of discriminating phrase (consisting of a sequence of terms) in the learning method, which we plan to do in future. Next, we consider the handling of tickets which are ineligible for auto-dispatch. The advisory mode designed to tackle this set is evaluated next. 3.3 Advisory mode evaluation Table 3 presents evaluation results when the tool operates in advisory mode for SVM approach. These are obtained on the tickets that did not meet the criterion of auto-dispatch. The entry Top 3 in this table shows the percentage of times the correct group occurred in top 3 predicted groups ranked on confidence probability values. Similarly, Top 5 indicates the percentage of tickets that did not figure in top 3 but were fourth or fifth. The other columns are analogously defined. Thus, for Dataset A in the table, the correct resolution group occurred in top 3 of the ranked predicted groups 86.8% of the time (this was split as - 58.2% at rank one, 28.6% at ranks two or three), and occurred in top 5 (86.8 + 5.63)% or 92.43% of the time. This mode performs very well for Dataset B as well, although for Dataset C (which has 79 resolution groups), the results are less satisfactory in absolute terms, with the correct group occurring in the top 5 in 81% cases. Overall, it can be concluded that the SVM approach is reasonably reliable for advisory mode. Note that a correct value can be predicted at rank one by SVM, although with a low absolute confidence score. As far as the advisory mode for DTA is concerned, all the tickets for which the top score is awarded to two or more groups are dispatched in advisory mode. Table 4 presents the results of this approach in advisory mode. Here again, only those tickets which were not eligible for auto-dispatch are the ones which are considered in advisory mode. The label Top 3 denotes the percentage of tickets out of this set for which there were 2 or 3 groups having highest score. Similarly, Top 5 denotes the set for which 4 or 5 groups have the top score. The rest of the labels are analogous to this. It can be seen that the precision falls at a very fast rate. For example, for dataset C, the correct resolution group appears in the top 3 in only 31.7% cases, thereby limiting the usefulness of the advisory mode in the DTA approach. 3.4 Combining SVM and DTA approaches The results in the preceding sections suggest that the SVM and DTA approaches have complementary strengths. When it comes to the percentage of tickets that can be automatically dispatched to the correct group with high confidence (low error rates), DTA outperforms SVM in general, on the strength of being able to distinguish such groups better using discriminative terms, with a perfect precision record. On the other hand, for tickets where such a distinction cannot 1398
Figure 3: Distribution of confidence probabilities for Dataset A and corresponding true vs. false predictions Figure 4: Distribution of confidence probabilities for Dataset B and corresponding true vs. false predictions Figure 5: Distribution of confidence probabilities for Dataset C and corresponding true vs. false predictions 1399
Table 3: Performance of SVM approach in advisory mode on the respective ticket set that it found ineligible for auto-dispatch Dataset Top 3 Top 5 Top 10 Top 15 Top 20 Top 25 >30 A 86.8% 5.63% 4.76% 1.5% 1.09% 0.25% 0% B 88% 5.26% 3.98% 1.72% 0.74% 0.3% 0% C 76.4% 4.68% 6.45% 3.6% 2.36% 1.48% 5.03% Table 4: Performance of DTA in advisory mode on the respective ticket set that it found ineligible for auto-dispatch Dataset Top 3 Top 5 Top 10 Top 15 Top 20 Top 25 >30 A 50% 26% 20% 2% 2% 0% 0% B 39.2% 21.4% 25% 3.6% 3.6% 7.2% 0% C 31.7% 12.2% 17.1% 9.7% 7.4% 9.7% 12.2% be made unambiguously, and the dispatch has to switch to an advisory mode, SVM s performance is clearly much more robust. This motivated us to experiment with a heterogeneous approach, tapping into the respective strengths of both the techniques within a single dispatch decision: that is, for a new ticket, we use DTA for auto-dispatch when a single resolution group receives the highest score, otherwise, we handle it using SVM s advisory dispatch mode. The results for this combined approach are presented in Table 5. The percentage of tickets that that are automatically dispatched through DTA are shown in column two, and are derived from similar results presented earlier in Table 2. For each data set, the remaining tickets are dispatched in advisory (or limited broadcast) mode using SVM. Even when the advise (or broadcast) is limited to a single group (with the highest confidence probability), the error rate is very low, varying between 4% to 9%, and this comes down further to 2% to 6% when the top 3 resolution groups (by confidence score) returned by SVM are used. When compared to Table 1, we see that the percentage of misrouted tickets have substantially come down and this shows the effectiveness of the dual mode heterogeneous SmartDispatch approach over blanket automation using a single learning technique like SVM. The complementary capabilities of SVM and DTA thus make them a good package to have in the dispatch tool. While they can be used together in the manner outlined above, we can also run both the techniques on historical data from an engagement and compare the precision values on test data. The technique which is able to automate higher percentage of ticket dispatch with low error rates may be chosen. This will provide more flexibility to handle the differing nature of ticket descriptions in different engagements; for example, while it would make more sense to adopt the DTA approach to decide on auto-dispatch for data sets A and B, the SVM based approach may also be adopted in case of Dataset C (where the overall performance across the two approaches is almost at par). Besides the superior dispatch performance that results from this combination, the DTA option also provides scalability in learning since it can build a model very fast even when given a huge training set. At the same time, it can also handle learning with limited training data under the identical distribution assumption between training and actual data; this makes it very useful in cases where a new resolution group is added, but the ticket volume is not yet high enough to have a SVM-based model. 4. RELATED WORK Several researchers have studied different aspects of the problem of routing tickets to practitioners [8], [9], [6]. The work in [9] approaches the problem by mining resolution sequence data and does not access ticket description at all, thus being completely different from our approach. Its objective is to come up with ticket transfer recommendations given the initial assignment information. The work in [8] mines historical ticket data and develops a probabilistic model of an enterprise social network that represents the functional relationships among various expert groups in ticket routing. Based on this network, the system then provides routing recommendations to new tickets. This work also focuses on ticket transfers between groups (given an initial assignment) like [9] without looking at the ticket text content. The work in [6] is different and approaches the problem from a queue perspective. This is more related to the issue of service times and becomes particularly relevant when the ticket that has been dispatched to a group needs to be assigned to an agent. There are some papers which apply text classification techniques to handle tickets. [4] is close to our work and the objective is to automatically classify tickets based on description to route them to the right group. However, the work was applied on a small ticket set with only 8 groups. The accuracy of 84% in best case is something that was also not acceptable for the kind of reliable automation that we sought in our tool. The work in [3] attempts to classify the incoming change requests into one of the fine grained activities in a catalog by leveraging aggregated information associated with the change, like past failure reasons or best implementation practices. They use information retrieval and machine learning techniques in order to match change tickets to the activity that is most suitable for it. They suggest the top 5 activity group to the user as output and do not automate the process of assignment. Similar to our work, [2] shows the limitation of SVM-like techniques in terms of scalability and proposes a notion of discriminative keyword approach. However, we differ substantially in the defini- 1400
Table 5: Combined SVM and DTA evaluation %Top 1 % Top 1 %Top 3 Advisory Advisory Advisory (SVM) misrouted (SVM) Dataset % Automated (DTA) A 70% 26% 4% 28% 2% B 72% 21% 7% 22% 6% C 59% 32% 9% 37% 6% % Top 3 Advisory misrouted tion of our discriminative term weighting function and our ability to handle a higher dimensional feature space, as also in the application domain, with [2] being more focused on commonly used text classification data sets like personalized spam filtering and movie reviews, rather than service tickets. There has also been a large body of work on comparative analysis of various machine learning algorithms like SVM, decision trees etc. We used the conclusions in the work in [1] to use SVM as the algorithm for our analysis of ticket descriptions. We did carry out a preliminary analysis of comparing the precision and recall values of SVM, Bayesnet and C&RT and found SVM to be more robust and precise. We also referred to [5] to help decide what may be considered as a quality score for descriptions. 5. CONCLUSIONS Ticket dispatch plays an important role in determining the turnaround time of a ticket because a misrouting can introduce significant delays. We have proposed a tool called SmartDispatch for efficient dispatch of tickets in an IT service environment, using supervised learning techniques to review ticket descriptions and predict the most appropriate resolution group. The tool uses a combination of the standard classification algorithm SVM and a new discriminative term based heuristic for carrying out the dispatch, and offers automated as well as advisory dispatch capabilities. Empirical evaluation of the tool on large ticket data sets from real-life services engagements at IBM demonstrate the efficacy of the approach. As a part of future work, we plan to incorporate the notion of discriminative phrases to handle descriptions that do not have discriminative terms but a combination of terms, that is a phrase, is unique to the group. 6. REFERENCES [1] Shantanu Godbole and Shourya Roy. Text classification, business intelligence, and interactivity: automating c-sat analysis for services industry. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 08, pages 911 919, New York, NY, USA, 2008. ACM. [2] K.N. Junejo and A. Karim. A robust discriminative term weighting based linear discriminant method for text classification. In Proceedings of the Eighth IEEE International Conference on Data Mining, 2008., ICDM 08, pages 323 332. IEEE, 2008. [3] Cristina Kadar, Dorothea Wiesmann, Jose Iria, Dirk Husemann, and Mario Lucic. Automatic classification of change requests for improved it service quality. In Proceedings of the 2011 Annual SRII Global Conference, SRII 11, pages 430 439, Washington, DC, USA, 2011. IEEE Computer Society. [4] G. di Lucca. An approach to classify software maintenance requests. In Proceedings of the International Conference on Software Maintenance (ICSM 02), pages 93, Washington, DC, USA, 2002. IEEE Computer Society. [5] Debapriyo Majumdar, Rose Catherine, Shajith Ikbal, and Karthik Visweswariah. Privacy protected knowledge management in services with emphasis on quality data. In Proceedings of the 20th ACM international conference on Information and knowledge management, CIKM 11, pages 1889 1894, New York, NY, USA, 2011. ACM. [6] Hoda Parvin, Abhijit Bose, and Mark P. Van Oyen. Priority-based routing with strict deadlines and server flexibility under uncertainty. In Winter Simulation Conference, WSC 09, pages 3181 3188. Winter Simulation Conference, 2009. [7] M. F. Porter. Readings in information retrieval. chapter An algorithm for suffix stripping, pages 313 316. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997. [8] Qihong Shao, Yi Chen, Shu Tao, et al. Easyticket: a ticket routing recommendation engine for enterprise problem resolution. Proc. VLDB Endow., 1:1436 1439, August 2008. [9] Qihong Shao, Yi Chen, Shu Tao, Xifeng Yan, and Nikos Anerousis. Efficient ticket routing by resolution sequence mining. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 08, pages 605 613, New York, NY, USA, 2008. ACM. [10] IBM SPSS. In http://spss.co.in/. [11] Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL 03, pages 173 180, Stroudsburg, PA, USA, 2003. Association for Computational Linguistics. 1401