A Service-Oriented Management Framework for Telecom Operation Support Systems

A Service-Oriented Management Framework for Telecom Operation Support Systems Ing-Yi Chen 1, Guo-Kai Ni 1, Cheng-Hwa Kuo 2, Chau-Young Lin 3 1 Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan, China (ichen@ntut.edu.tw, t5599007@ntut.edu.tw) 2 Department of Commerce Automation and Management, Chihlee Institute of Technology, Taipei County, Taiwan, China (chkuo@mail.chihlee.edu.tw) 3 Telecommunication Laboratories ChungHwa Telecom Co., Ltd., Taiwan, China (ivanlin@cht.com.tw) ABSTRACT Information systems are among the tools most frequently used by businesses in delivering and maintaining services. Yet the challenges faced by enterprises using information technology are not exclusively linked to system development, but are also derived from system operation and management. This paper describes a solution that was developed to address the importance of service operation management. The framework presented here was implemented by the Chunghwa Telecom Company in an effort to improve information system effectiveness and to reduce costs of system operation management. Since July 2008, the system has been providing complete support of the daily operations of the company s billing system. In consequence of the system s implementation, the company has experienced a large-scale improvement in both efficiency and cost reduction. Additionally, the number of incident occurrences has been reduced from an average of 50 monthly to 15. The average amount of time spent addressing individual incidents has also been reduced from approximately 20 hours to 26 minutes. Keywords: Service-Oriented Architecture, IT Service Management (ITSM), Information Technology Infrastructure Library (ITIL), Next Generation Operations System and Software (NGOSS), Service Operation Management, Service Operation Management Framework, Monitoring Technology 1. INTRODUCTION Information systems are widely used within telecommunications companies to provide business services and support daily operations. If information systems do not operate normally, the companies that depend on them can incur tremendous losses. The importance of system operation management is so great that operators need to deal with problem as they occur in runtime. IT Service Management (ITSM) is the discipline that strives to better align IT efforts with business needs and to manage the efficient provision of IT services with guaranteed quality. Information Technology Infrastructure Library (ITIL) is the widely used standard that provides guidelines to support operators to managing and using information systems [1]. Although ITIL provides the guidance to explain how to perform incident and problem management, it does not provide implementation details [2]. Despite this, there have been numerous discussions of ways to implement the guidelines. For example, Keel et al. addressed the main challenges of implementing ITSM in four broad areas: processes, people, technologies, and data [3]. The paper, also gave a simple scenario to show how an organization can implement ITSM by using IBM Service Management solutions through a top-down approach. Elsewhere, Brenner suggested that incident and problem management processes are important in making service operations run smoothly [4]. His work implies that the major challenge of service management is that operators need to spend a lot of time dealing with problems that occur. This paper proposes a framework that uses monitoring technology to detect the occurrence of problems and provides an extendable issue handling sub-system that allows developers to create specific issue handlers. It enables proactive service operations management to prevent the repeated occurrence of issues. In addition, the paper also presents an empirical case study, of the implementation of the described system operation management framework at the Chunghwa Telecom Corporation. Chunghwa Telecom Corporation is the largest telecommunications company in Taiwan. It provides many services such as phone lines and asymmetric digital subscriber lines (ADSL). To deliver these services, the company has constructed numerous information systems. The case study discusses one of them the billing system. The billing system was developed using the Next Generation Operations System and Software (NGOSS) guidelines. NGOSS was proposed by the TeleManagement Forum (TMF) to provide ways to help communication service providers (CSPs) manage their businesses. It also provides a development process to support developers in implementing business support systems (BSSs) and operation support systems (OSSs). The 978-1-4244-6486-9/10/$26.00 2010 IEEE 1010

development process consists of a lifecycle model that defines four perspectives a business view, system view, implementation view, and deployment view. The business view is used to define business challenges and strategies. The responsibility of the system view is to define business solutions and the system architecture of the solutions. The third, or implementation view, is used to build business solutions. Finally, the deployment view describes how to deploy and use business solutions. This paper focuses on the last perspective, the deployment view, and discusses the maintenance and operation of a built system. In addition, the paper presents a supporting framework to help CSPs achieve effectively functioning service operations. This paper presents a successfully implemented project. This success is measured in several ways a reduction in incident occurrence and an improvement in staff service operation performance. These results were attained by using daily operations data collected from the billing system from August 2008 to December 2009. The rest of this paper is organized as follows. The next section describes the related works in the area of system monitoring research. The section that follows describes the proposed service operation management framework. The fourth section given an empirical case study of Chunghwa telecom and discusses the effectiveness of project implementation and a summary of system performance as reflected in the performance data. The last section contains a brief summary of the material presented here. 2. LITERATURE REVIEW Marko and Kari s paper addressing the application of ITIL, points out that there are two different dimensions, proactive and reactive, in problem and incident management [5]. The proactive methodology seeks to prevent incidents and problems before they occur. In order to achieve this goal, a monitoring system is required. Monitoring systems are widely used in several important ways. For example, the monitoring data can be used to address performance problems, system failures, and the problems of business strategies. It can also be used to support service consumers in achieving higher quality service. Sidibé and Mehaoua proposed a quality of service (QoS) monitoring framework that aims to provide service performance verification with respect to the guarantees specified in contractual agreements between providers and users [6]. It provides monitoring information to service providers for use in quantified QoS-based services and service assurance. The Global Grid Forum (GGF) proposed the Grid Monitoring Architecture that guides implementation and monitoring solutions that are widely used in architecture and grid computing environments. It consists of three major components: producers, consumers, and directory services. Most current monitoring systems were implemented using this architecture. Li and Gui proposed an on-demand sensor development and management mechanism in an agent-based monitoring system [7]. The mechanism addresses the major challenges associated with static sensor deployment mechanisms in currently available monitoring systems. At the time of writing, the deployment of sensors was difficult and inefficient. The work of Li and Gui proposed a dynamic sensor loading mechanism to markedly improved the situation. Gupta et. al point out a number of current challenges to timely incident or problem diagnosis and resolution through monitoring technology [8]. They developed a problem determination platform that enabled efficient incident and problem management. The solution presented was applied using a service management approach that provides effective and efficient service operation. Despite these successes, the solution faces two key challenges. The first is that the problem determination platform is constructed based on agent technology. The use of agent technology implies a high degree of complexity in managing those agents. The second is that the solution does not provide an extendable problem solving framework to deal with the occurrence of problems automatically. In the Gupta approach, employees were required to spend time solving repeatedly reoccurring problems. In summarizing this literature, we can conclude that sound service management requires both the monitoring and solving of problems as they occur. Yet all of these works present solutions that require the deployment of many agents to monitor applications, resulting in a range of costly complexities and inefficiencies. This paper proposes a new monitoring mechanism based on a service bus to retrieve the results of service execution. This solution avoids the deployment of agents to existing applications by retrieving service results through the service bus. This approach reduces the complexity of monitoring system deployment. In addition, the framework provides an extendable issue handling sub-system to deal with the problems automatically, and as they occur, in order to reduce demands on employee time. 3. SERVICE ORIENTED MANAGEMENT FRAMEWORK Figure 1 illustrates the lifecycle for this service operation management framework. The framework consists of five major components: Application Services, a Result Verifier, a Validator, and both Recovery and Change processes. The model begins with a Service Application. Following the execution of a service, the 1011

application produces a service result that is reported, via an Adaptable Service Bus, to a Result Verifier. The Result Verifier component compares the result with metric data. If the execution result falls outside of the acceptable metric range, the Result Verifier will produce an alert to describe the issue to the Issue Handler. The Issue Handler, upon receiving the alert, is responsible for identifying the issue by means of an Issue Repository, and determining which course of action should be pursued in order to correct the problem. While the problem management program is extendable, there are typically two pathways that can be pursued in response to an incident. If an incident meets a number of specific, pre-defined criteria a recovery process can be initiated. Such a process is intended to identify non-specific re-occurring types of problems such as the inclusion of data from the wrong data source and initiate necessary changes and a re-calculation by the Application Service. If, on the other hand, an incident represents a significant failure of the billing process, or of an Application Service, a Change Management Process can be initiated. This entails active interference and alteration of the Application Service to ensure smooth and accurate results in future lifecycles. Figure 2 represents the architecture of the Chunghwa telecom billing system using the proposed service operation management framework. To the left in Figure 2 is the operation support system that details requirements for task scheduling, processes for bill calculation, monitoring of results, and reporting of the results. Next is the billing process, which dictates the method through which bills will be calculated. To the right of the Adaptable Service Bus (ASB) are the application services, which calculate basic charges, preferential pricing discounts, special offers and total bill calculation. The Monitoring Agent, collects results that are passing through the ASB, and sends a copy of those results to the Result Verifier. The Result Verifier compares the results to a predefined Metric. If the result falls outside of an acceptable metric range, an alert is forwarded to the issue handler. The issue handler makes use of an issue repository in determining which corrective course of action should be pursued. Having made this decision the issue handler will initiate either a recovery process or a change management process. The Recovery Process is composed of a set of services that can recover the billing process to a past state following the occurrence of several specific issues. Alternatively, the Change Management Process can initiate a notification service which adheres to the ITIL compliance problem management process. Of the two options, the change management process is the default for issues that cannot be recovered by the pre-defined recovery process pathways. In general, the service operation management framework consists of two sub-systems: a monitoring sub-system and an issue handling sub-system. The section that follows describes the design details of the two sub-systems. Figure 1. Lifecycle of the Service Management Figure 2. Service Operation Management Framework 1012

3.1. Monitoring Sub-System sub-system. The sub-system consists of a Dispatcher, Issue Repository, Handler, Default Handler, and Specific Handler. On the right side of Figure 4 the Dispatcher component receives issues from the monitoring sub-system. These are then categorized by the Dispatcher after reference is made to the Issue Repository, which stores information on each available Recovery Process. Information stored includes the handler name, address, and operation. Figure 3. Monitor Event Data Structure On the lower right of the Service Operation Management Framework depicted in Figure 2 is the monitoring sub-system. It consists of three major components: a monitoring agent located in the ASB a Result Verifier and a log containing the predefined metric. Figure 3 shows the monitoring event data structure that is delivered from the ASB by the monitoring agent. On the left of Figure 3 is the monitoring event which is composed of two data objects: Property and Extended Data. The Property data object represents metadata of the executing process, including processid, workitemid and parentid. The Extended Data presents the business data of the Chunghwa telecom billing system. This includes the processname, processversion, processstarttime, processendtime, UserName, and Shared Information Data. The processname and processversion are used to identify the Application Service that has been executed. The processstarttime and processendtime can be used to calculate the time taken in executing the service. The username filed stores the identity of the user who enabled the process. The Shared Information Data (SID) adheres to a common data model in the telecom industry. It includes billym, billcyc, sbranchcode, sfilename, number of bills calculated and total amount owed. The billym and billcyc objects are used to represent the billing year, month and cycle. The sbranchcode is used to represent the branch office of the Chunghwa telecom company in which the customer account is held. The sfilename is the file containing the billing information. The number of bills calculated and the total amount owed are the result of the billing calculation. 3.2. Issue Handling Sub-System Figure 4. Issue Handling Sub-System Figure 4 presents the design details of the issue handling sub-system. The issue handler sub-system copes with the issues that are detected by the monitoring Next the issue is forwarded to the handler component, depicted on the left in Figure 4. The handler draws upon either a specific handler, for issues bound for the Recovery Process, or a default handler, for issues requiring the Change Management Process. These handlers provide support for the initiation of either the recovery or change management process. 4. EXPERMENTAL RESULT The work presented here was conducted as part of a joint effort between academic researchers from the National Taipei University of Technology and a team of engineers from Chunghwa Telecom Co., Ltd. Chunghwa Telecom is the largest telecommunications company in Taiwan and ranks 14th in world. Its annual revenue for 2008 exceeded 7 billion U.S. dollars. The company provides a wide range of services to customers and as a result, the monthly billing process can be quite complex. Billing data is collected and then imported to a database for bill calculation. Bill calculation requires the completion of numerous sub-tasks. These include special offers, outstanding debts, credits from previous bills, or charges involving other telecom companies. These tasks are organized into a billing process, which is then integrated using a service-oriented architecture. Each of the tasks identified in the billing process will be performed by a billing service. In the past, the calculation of bills frequently required that time be spent dealing with problems that interfered with the successful completion of the billing process. These problems were derived, in large part, because errors occurring in individual service components disrupted the completion of the billing calculation process. Before the introduction of this system, the process experienced an average of 50 incidents during each monthly billing calculation period. This required a large number of man hours as operators were required to address each issue that arose individually. As a result, the previous system was prone to numerous inefficiencies and delays. By introducing the service operation management framework problem occurrences have been reduced, as have time inputs required by operators. In practical terms, the average number of monthly problem occurrences has been 1013

reduced to 15. In addition, the framework integrates the ITIL incident or problem management process that supports operators to reduce the time for dealing with those problems that do occur. The average amount of time spent addressing each problem has been reduced from 20 hours to 26 minutes. 5. CONCLUSIONS Effective service operation management of information technology is important. This paper presented an empirical study of the implementation of a service operation management system based on the proposed framework. Before the introduction of this system, Chunghwa Telecom did not utilize an extensive and systematic process of control over service operation procedures. The result was a time-consuming, labor-intensive method of maintaining the system during the occurrence of errors. In order to optimize the service operation procedure, so that improvements could be made in the quality of business service and reductions could be made to the cost of system maintenance, the company needed a well-defined control process to manage the service operation. This paper introduces the service operation management framework that was introduced to meet this need. The paper also discusses the effectiveness of framework implementation. The benefits of implementation included a reduction in the incident occurrence rate, improvement in the availability of services, and reduction in the ongoing maintenance costs of the system. The number of occurred incidents was reduced from an average of 50 to 15 incidents per month. The average time spent dealing with each incident was also reduced from an average of 20 hours to 26 minutes. Since the billing process is just one of a number of target areas for Chunghwa Telecom, future work will be directed toward the service management methodology of other business areas, such as readiness, fulfillment, and assurance. It is hoped that this paper will serve as a reference point for further research on this subject. 6. ACKNOWLEDGEMENTS The authors would like to thank the employees of Chunghwa Telecom Company who contributed to this project. We also gratefully acknowledge the support of the National Science Council, Taiwan, under Grant NSC 97-2622-E-027-002-CC1. REFERENCES [1] Office of Government Commerce (OGC), Ed., The Official Introduction to the ITIL Service Lifecycle, ser. IT Infrastructure Library (ITIL). The Stationary Office, 2007. [2] M. Jantti and A. Eerola, "A Conceptual Model of IT Service Problem Management", the 2006 International Conference on Service Systems and Service Management, 2006, pp. 798-803 [3] A. J. Keel, M. A. Orr, R. R. Hernandez, E. A Pattrocinio, and J. Bouchard, "From a technology-oriented to a service-oriented approach to IT management", VOL. 46, No. 3, p.p 549 563, Aug. 2007 [4] M. Brenner, "Classifying ITIL Processes; A Taxonomy under Tool Support Aspects", the 2006 IEEE International Workshop on Business-Driven IT Management, pp. 19-28, April 2006 [5] M. Jantti and K. Kinnunen, "Improving the Software Problem Management Process: A Case Study", Lecture Notes in Computer Science, 4257m pp. 40-49, 2006 [6] M. Sidibé, A. Mehaoua, "QoS Monitoring Framework for End-to-End Service Management in Wired and Wireless Networks", the 2008 IEEE International Conference on Computer Systems and Application, pp. 964-968, March, 2008 [7] Y. Li, X Gui, "An Agent-based Grid Monitoring System Featuring Dynamic and On-demand Sensor Deployment and Management", the 2005 International Conference on Semantics, Knowledge, and Grid, pp. 56-56, 27-29 Nov., 2005 [8] R. Gupta, K. H. Prasad, L. Luan, D. Rosu, C. Ward, "Multi-dimensional Knowledge Integration for Efficient Incident Management in a Services Cloud", the 2009 IEEE International Conference on Service Computing, pp. 57-64, 21-25 Sept., 2009 [9] Y. I. Chen, G. K. Ni, C. Y. Lin, "A runtime-adaptable service bus design for telecom operations support systems", IBM Systems Journal, Vol 47, No 3, pp. 445-456, August 2008 1014