Transaction-based Configuration Management for Mobile Networks



Similar documents
Application of Distributed Database Concepts to RAN Configuration Management

Transaction-based Configuration Management for Mobile Networks

Transaction-based Configuration Management for Mobile Networks

The EMSX Platform. A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks. A White Paper.

Definition of SOA. Capgemini University Technology Services School Capgemini - All rights reserved November 2006 SOA for Software Architects/ 2

Managing Variability in Software Architectures 1 Felix Bachmann*

The ConTract Model. Helmut Wächter, Andreas Reuter. November 9, 1999

Agile Manufacturing for ALUMINIUM SMELTERS

Automated Real Time Performance Management for Mobile Networks

IFS-8000 V2.0 INFORMATION FUSION SYSTEM

Compare & Adjust How to Guide for Compare & Adjust in SAP Solution Manager Application Lifecycle Management

SYSTEM DEVELOPMENT AND IMPLEMENTATION

Manual English KOI Desktop App 2.0.x

Broadband Networks. Prof. Karandikar. Department of Electrical Engineering. Indian Institute of Technology, Bombay. Lecture - 26

Modeling Workflow Patterns

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module

Applying 4+1 View Architecture with UML 2. White Paper

ADAPTIVE SOA INFRASTRUCTURE BASED ON VARIABILITY MANAGEMENT. Peter Graubmann, Mikhail Roshchin

EFPIA Principles for the Development of the EU Clinical Trials Portal and Database

Network Management Architectures for Broadband Satellite Multimedia Systems

CONDIS. IT Service Management and CMDB

Key New Capabilities Complete, Open, Integrated. Oracle Identity Analytics 11g: Identity Intelligence and Governance

Workflow Requirements (Dec. 12, 2006)

Policy-driven Workflows for Mobile Network Management Automation

SPPA-T3000 Control System The Benchmark in Controls

OpenIMS 4.2. Document Management Server. User manual

BPMN Business Process Modeling Notation

BPMN by example. Bizagi Suite. Copyright 2014 Bizagi

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

MarkLogic Server. Database Replication Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

Traffic Engineering Management Concepts

Internet Sustainability and Network Marketing Safety

The role of integrated requirements management in software delivery.

Nikolay Grozev. Supervisor: Juraj Feljan, Mälardalen University Consultant: Sylvia Ilieva, University of Sofia

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

SOA OPERATIONS EXCELLENCE WITH PROGRESS ACTIONAL WHITE PAPER

Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis

Building a virtual marketplace for software development tasks

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Middleware for Heterogeneous and Distributed Information Systems

SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems

Progress Report Aspect Oriented Programming meets Design Patterns. Academic Programme MSc in Advanced Computer Science. Guillermo Antonio Toro Bayona

Middleware support for the Internet of Things

TTCN-3, Qtronic and SIP

Database Management. Chapter Objectives

Chap 1. Introduction to Software Architecture

Database Replication with Oracle 11g and MS SQL Server 2008

Enterprise Energy Management with JouleX and Cisco EnergyWise

Managing Agile Projects in TestTrack GUIDE

Project management integrated into Outlook

HP Operations Agent for NonStop Software Improves the Management of Large and Cross-platform Enterprise Solutions

Data Mining Governance for Service Oriented Architecture

GOAL-BASED INTELLIGENT AGENTS

Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

ONOS Open Network Operating System

Automated Real Time Performance Management for Mobile Networks Tobias Bandh, Georg Carle, Henning Sanneck and Lars-Christoph Schmelz

Tivoli Storage Manager Explained

Junifer Utility CIS. Flexibility. Scalability. Cost Effectiveness

Managed Security Services (MSS) based on Provisioned Security Services (PSS)

perspective Shrink Resolution Times with Field Service Automation (FSA) Abstract

1.. This UI allows the performance of the business process, for instance, on an ecommerce system buy a book.

EII - ETL - EAI What, Why, and How!

SIEMENS. Teamcenter Change Manager PLM

NINTEX WORKFLOW TIPS AND TRICKS. Eric Rhodes

SQL Server. SQL Server 100 Most Asked Questions: Best Practices guide to managing, mining, building and developing SQL Server databases

Chapter 10. Backup and Recovery

What an Architect Needs to Know

E-Series. NetApp E-Series Storage Systems Mirroring Feature Guide. NetApp, Inc. 495 East Java Drive Sunnyvale, CA U.S.

Improved Software Testing Using McCabe IQ Coverage Analysis

Lightweight Data Integration using the WebComposition Data Grid Service

Installing and Configuring a SQL Server 2014 Multi-Subnet Cluster on Windows Server 2012 R2

Web Services in SOA - Synchronous or Asynchronous?

Textbook and References

A Coordinated. Enterprise Networks Software Defined. and Application Fluent Programmable Networks

Best Practices: Extending Enterprise Applications to Mobile Devices

Is ETL Becoming Obsolete?

Developing Software in a Private workspace PM PMS

108-C-215 CRITICAL PATH METHOD SCHEDULE. (Revised )

Chapter 3 - Data Replication and Materialized Integration

BackupEnabler: Virtually effortless backups for VMware Environments

Supervision software for Intrusion detection, Fire detection and CCTV systems

Compliance Response Edition 07/2009. SIMATIC WinCC V7.0 Compliance Response Electronic Records / Electronic Signatures. simatic wincc DOKUMENTATION

SONET and DWDM: Competing Yet Complementary Technologies for The Metro Network

THIRD REGIONAL TRAINING WORKSHOP ON TAXATION. Brasilia, Brazil, December 3 5, Topic 4

REQUIREMENTS FOR THE WORKFLOW-BASED SUPPORT OF RELEASE MANAGEMENT PROCESSES IN THE AUTOMOTIVE SECTOR

Flexible SDN Transport Networks With Optical Circuit Switching

The Sierra Clustered Database Engine, the technology at the heart of

Patterns of Information Management

MS-40074: Microsoft SQL Server 2014 for Oracle DBAs

ZooKeeper. Table of contents

Software Life-Cycle Management

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

University of Portsmouth PORTSMOUTH Hants UNITED KINGDOM PO1 2UP

Overview of Routing between Virtual LANs

Contents. Introduction and System Engineering 1. Introduction 2. Software Process and Methodology 16. System Engineering 53

Lease and Owned Property Contract Management User Guide

Software Configuration Management Best Practices

Web Services. Copyright 2011 Srdjan Komazec

Fourth generation techniques (4GT)

Transcription:

-based Configuration Management for Mobile Networks Henning Sanneck, Christoph Schmelz Siemens AG, Communications Mobile Networks D-81359 Munich, Germany Alan Southall, Joachim Sokol, Christian Kleegrewe, Christoph Gerdes Siemens AG, Corporate Technology Information & Communication D-81739 Munich, Germany Abstract Autonomic computing & communication includes the vision of self-configuring systems which can adapt themselves to their environment. Adaptation means essentially to establish a working initial configuration and to maintain it over time in accordance with the configuration of other systems that have dependencies. In mobile networks, this is a complex task due to the degree of distribution of the network elements (NEs), the properties of the Operation, Administration and Maintenance (OAM) network, the specific configuration dependencies between the network elements and the hierarchical nature of the legacy management infrastructure. To cope with these characteristics, an approach is presented, where configuration changes are communicated as transactions between the NEs and their element management system (EMS). A master-replica paradigm is adopted where the replicas (=NEs) are allowed to autonomously commit configuration data as "tentative" and the master (EMS) follows the state of the replicas, but may force rollbacks on them. Configuration dependencies between different NEs are expressed as transaction groups. The execution of these transaction groups is controlled by a centralized transaction manager located at the EMS. To provide for a defined duration of the transaction group execution in the presence of transaction failures, either a complete rollback or a partial commit of the transaction group (thus re-assessing the configuration dependencies) is possible based on operator policies. The presented scheme allows improving the level of configuration data consistency and the degree of automation in the configuration operations with benefits for both the manufacturer and the mobile network operator. A proposed system architecture and details of an experimental implementation are presented. A. INTRODUCTION Today, most network management systems are still based on a hierarchical infrastructure (e.g., mobile networks according to 2G/3G standards, fixed networks based on ITU and IETF standards). The configuration work cycle in such a system is characterized by the rollout of configuration data from a central Element Management System (EMS) to numerous Network Elements (NEs) and the alignment of the actual configurations of the NEs towards the EMS. The described work cycle is triggered by updates of the network planning or changes resulting from the day-today operation. The configuration tasks in OAM ranging from initial installation and commissioning of NEs up to reconfiguration of network domains in operation still require significant manual interventions by a human operator. To reduce the amount of manual work and the associated significant costs, the vision of autonomic computing / communication [1] seems appealing. Figure 1 shows how advances in configuration management automation could be mapped to the different levels in autonomic computing / communication (note that the next level always includes the functions of the previous one). A crucial boundary is between the levels of Predictive and Adaptive where the scope of automation is extended to NE groups rather than just individual NEs. Technologies Actions Level NE only (auto-configuration) NE & environment (self-config) Basic Managed Predictive Adaptive Autonomic - Manual NE commissioning (O&M connectivity setup) Hierarchical NM (agent manager) - Automated commissioning (NE triggered) DHCP / PXE, JMX, OSGi, XML, (UPnP, Jini) - Automated detection / reconfiguration of missing components: SW, configuration (NE triggered) - Closed-loop reconfig based on (NE/networklevel) manager policies & PM/FM-input - Automated reconfig of missing components of related NE - Automated detection of manager XML / RDF / OWL, Web Services, (service) discovery protocols, policy frameworks / languages, Distributed Data-base concepts P2P NM - Reconfig based on co-operative NE policies (& service-level manager policies) P2P protocols Figure 1 Steps in Network Configuration Management Automation With regard to the characteristics of a mobile network ([2], [3]) as a potentially autonomic system, it can be observed that the majority of the individual sub-systems (i.e., network elements like radio base stations) are less complex than in the IT domain. However, the degree of distribution and the wide-area network environment

results in more challenging issues of scale. The relevant characteristics can be summarized as follows: Number: Thousands of NEs (base stations / access points) may be associated with one single EMS. Reliability: NEs (particularly in rural areas) are attached over relatively expensive and thus bandwidth-limited links. Furthermore the connectivity on those links can be unreliable due to the employed network technology (microwave radio links) and the priority of user traffic over O&M traffic. Yet, if the O&M connectivity is not present for a certain period of time, the NE should still function correctly in the operational network. Duration: Configurations may change within short timescales at the NE and EMS. Concurrency: Multiple sources of configuration changes may be active concurrently (network planning, several human operators doing day-to-day configuration changes, local changes by the field service) Service availability: Users expect a high availability of services, i.e. service-affecting configuration changes can only be rolled out during defined short time windows (night hours, weekends). It may thus happen that parts of a network configuration change may need to be cancelled / rolled back when a time limit is reached. These characteristics require a configuration management system that is able to detect and automatically react to configuration data consistency errors between an NE and its EMS as well as for groups of NEs that have dependencies. Reliability should be provided by means of rollbacks to provide consistency even after logical or consistency errors that cannot be easily solved. The accepted means to realize the described functionality are transactions-based systems ([4]), which are defined on the lowest level by the ACID (Atomicity, Consistency, Isolation and Durability) paradigm. We extend this paradigm towards distributed adaptive transactions to provide for scalable synchronization of configuration data across NE groups and the EMS. Based on our initial work on the topic [5] this paper gives an overview of the distributed adaptive transaction concept and the transaction generation process. Then, further results on transaction execution, integration into an EMS, an experimental implementation and its evaluation are presented. B. DISTRIBUTED ADAPTIVE TRANSACTIONS The binary accept or reject of delta and bulk configuration changes made to a group of NEs in current element management systems follows the strict ACID paradigm on the fine granularity of an atomic operation, resulting in a very coarse view of the dependencies between the changes on the individual NEs. Thus, when errors have occurred, significant expensive manual intervention of the human operator is required to enforce a rollback across the entire group of NEs, to detect the sources of the errors, and to finally reach a consistent network state. The more likely the following attributes apply, the more likely an atomic operation (two-phasecommit / all or nothing strategy) will not be feasible anymore: High probability that a specific NE is not reachable at a given point in time Large number of NE groups with dependencies Large total number of NEs having dependencies Large number of different managed objects with dependencies Therefore, to accommodate the distribution characteristic, a transaction hierarchy is introduced: NE transactions (NET): An NE transaction transforms the configuration data of one NE from a consistent state to another consistent state. Note that a single NET may consist of multiple atomic operations (e.g., insert, update, delete operations). groups (TG): A transaction group contains a set of NETs that have interdependencies. Once the TG is committed, the group of NEs is transferred from one consistent state into another consistent state. TGs with no interdependencies may be executed in parallel. Network transaction (NT): A network transaction consists of one or more TGs applicable for an arbitrary number of NEs. NTs are independent of each other and are executed consecutively. Furthermore, it is desirable to have an adaptive approach to provide for a flexible control over the transaction execution process and long lasting execution periods that can continue even up to days. The flexibility refers to re-assessing (and possibly revoking some of) the dependencies based on the result of individual transactions. This means that instead of performing a complete rollback, the system should allow rescheduling of transactions. For example, if a network operator wishes to update n NEs with dependencies in a

distributed transaction, but only one single NE is offline during the update process, it is obvious that the distributed transaction shall not fail as a whole. Instead, the operator can often accept a partial success in that n-1 elements can commit their updates and the remaining single update will be rescheduled to a later point in time where the characteristics described above (reliability, concurrency) may have changed. To demonstrate the differences between the current and the envisaged workflow, Figure 2 shows the example of cell adjacency management, where a handover (HO) parameter set for a certain cell and the cell adjacency information (ADJC) of neighboring cells pointing to that cell need to be concurrently updated. Network planning CLI script Low-level workflow today Update HO params. cell #7: B Update ADJC cell #1: B Update ADJC cell #2: B Update cell #7: success Update cell #2: success Update cell #1: failure Manual verification of results Manual correlation of individual results Problem resolvement: manual correction of cell #1 problem or creation of new script for rollback Rollout Alignment Verification & PP* * PP = Post Processing -based approach Network planning CLI script High-level workflow Figure 2 Workflow comparison Group Update HO params. cell #7: B Update ADJC cell #1: B Update ADJC cell #2: B Update cell #7: success Update cell #2: success Update cell #1: failure Decision (policy/manual) between: Temporary revocation of dependency & rescheduling of few transactions NE group-wide rollback (e.g., for case Update cell #1: failure) Configuration Work Cycle Based on our approach the modified configuration cycle (cf. Figure 3) consists of the phases of network planning, transaction generation and transaction execution. As soon as a network update is planned, appropriate transactions are generated and their execution is triggered. The result of the generation phase is an NT together with parallelization and execution optimizations. At the beginning of the execution phase the parallelization plan is utilized to schedule individual TGs. The execution plans for each TG fix the sequence of the NET execution. In the transaction generation phase first a dataflow analysis is conducted. Thereby it is analyzed what and when data is written or read. The result of this step is a graph-like representation where the program statements Reschedule / Rollback constitute the nodes, and data relationships constitute the edges. Commit / Rollback Execution Network Planning Rescheduling Generation Figure 3 Configuration Work Cycle In a parallelization step this graph is partitioned such that each partition represents an independent set of instructions that can thus be executed in parallel ( transaction line, TL). Depending on the semantic tree and hence implicitly on the type of the high level input the partitioning of the graph can become highly complex. With regard to the described mobile network environment, a restriction on recursive free script languages seems appropriate. Even with knowledge of the data model there might be dependencies that remain difficult to detect. One such example is parameter reuse where values are assigned from a finite value set and two parameters must not have the same value. To detect these kinds of dependencies the parallelizer can consult a policy database. Policies within this database have knowledge of specific and arbitrary complex dependencies as well as data models and offer appropriate parallelization strategies. Before the actual execution can begin an execution planning step is conducted. The purpose of this optimization step is to reduce the rollback complexity in case of failing transactions. Generally the generated NETs may have differing degrees of dependency. For example the modification of a cell s handover information (HO in Figure 2) requires to update the adjacency information of all referring cells. Therefore the degree of dependency is high. On the other hand, a transaction that adjusts a single adjacency information (ADJC) has a low degree of dependency. s with high dependency degrees are typically more critical in their execution, i.e., they are more likely to fail. Instead of treating each NET equally and initiating their execution in parallel, an execution plan is generated. Thereby the planner evaluates the different dependencies and assigns weights which quantify a dependency s impact on the execution robustness. Since the nature of dependencies varies with the respective NEs,

dependency weights are not globally defined. Rather the creation of the plan is controlled through various policies. Thereby a policy determines the dependency weights and the resulting plan structure. The transaction execution is the most complex phase in the cycle. In the event of a failure the execution process itself becomes a cyclic process where new groups are generated by the re-scheduler and executed until all NETs are successfully committed (or passed to the operator and/or being subject for system policies). The result of the execution phase is a set of successfully executed and committed NETs (e.g. a partially updated network) and, if any, a set of NETs that did not yet commit and can be passed back to the operator or planning application. The transaction execution is detailed in the next chapter. C. DATA MANAGEMENT MODEL AND TRANSACTION EXECUTION Figure 4 shows the generic data management model, which can be easily mapped to existing systems. The data management model and architecture base on the X/Open DTP reference model ([6]) for transaction systems. A master-replica role mapping is chosen where the EMS is the master and the NEs are replicas with an appropriate degree of autonomous actions based on a tentative commit of configuration data. T PT TG Local access UUID 1 T x Generator dependencies Propagating Refresh Group T 1..n TT 1..n T 1..n 1..n TG UUID 1 T 1 Replica Manager (TM) Sync Procedure R 1 R n R z UUID R1 UUID Rn UUID Rz Master UUID I UUID RI PRECOMM UUID UUID n UUID 1 PT 1 PT n z PT z UUID Z UUID RZ EXEC UUID n T n Replica UUID z T z final commit TG Synchronization Table ID Node ID State UUID 1 UUID R1 INIT Replica tentative commit Figure 4 Generic data management model The work sequence for configuration roll-out / alignment depicted in Figure 4 can be summarized as follows: s (T) of one TG register with a synchronization procedure s are sent to the replicas Execution / tentative commit at replicas PT (Propagating s): Changes made in the replicas must be propagated to the master to be finally committed. These propagating transactions are created in the replica and are sent to the master (via TM) after they have been tentatively committed in the replica. Depending on the success / failure of the entire TG, changes are finally committed at the master, or the replicas are rolled back Modifications conducted either through the EMS or through a local terminal at the NE are introduced at the replicas only, as shown in Figure 4. Then, replicas propagate the changes to the master database in the EMS. This way of operation has been chosen because data in the replica may be changed by another source than the master, and such changes should always be committed first (though only tentatively ) in the replica until they are finally committed by the master. The EMS is therefore responsible for any consistency checks across the NEs and any decisions as to which updates are to be finally committed, rolled back or postponed to a later time. Thus, the NEs remain the masters of their data, i.e., modifications made can be immediately activated at the NE, but the EMS has the final decision to roll-back or keep the updates. The Manager (TM) at the master schedules the network configuration update process. It triggers its execution units (threads) to fetch a TG from their respective transaction line (TL) and executes its NETs on the system according to the execution plan. In the event of a failure during the execution of an individual NET it must be decided whether the whole TG and respectively the whole NT is rolled back, or a rescheduling process is initiated that allows the problematic NET to be isolated and rescheduled for execution at a later point in time. The decision to roll back or partially commit and reschedule is left to an operator or derived from a policy based decision system. If it is decided not to rollback the whole NT but partially commit it, the TG holding the failed transaction T error must be analyzed to find dependent transactions like TGrollback = { T TG; depends( T, Terror )} The rest of the group TG = TG \ committed TG rollback will be committed. The TM spawns a new execution unit and schedules TG rollback onto its TL for postponed execution. This isolation and rescheduling process is executed on every TG that is scheduled in the same TL as TG rollback (e.g. all TGs that include NETs that depend on T error ). Surely the isolation process which may include

the temporary revocation of dependencies must comply with all consistency claims and therefore requires policy support. From an abstract point of view the revocation of a dependency is backed by spawning an execution branch that develops independently from the parallel execution. If the branch leads to success it can be merged with the global view, otherwise the branch is discarded and the global view remains unaffected. To demonstrate this procedure a simple example is considered (see Figure 5): Given is a network update that contains just two transaction groups (TG 1 and TG 2 ). Amongst others, TG 1 contains a NET (NET 1 ) that resets a parameter from a finite value set S. Further it is assumed that at any time parameters of arbitrary elements must not have the same value, i.e. there exists an implicit dependency between all NETs modifying parameters with value of the set. TG 2 TG 1 TG 1 + TG 2 Official branch merge merge NET f TG 2commit TG 1commit NT commit Figure 5 Branching in the presence of failed transactions As it turns out TG 2 has a similar NET (NET 2 ) that aims to set a parameter of another element to exactly the same value as the element to be modified by NET 1 is set to at the present time. During execution of TG 1 one NET (NET f ) fails. It happens that NET f and NET 1 have a strong dependency that cannot be revoked. For the sake of simplicity it is assumed that the implicit dependency between NET 1 and NET 2 is the only dependency that exists between TG 1 and TG 2. Then a rescheduling policy might propose to revoke that dependency thus causing a separate execute branch to spawn. The branch contains a modified global state that marks points of conflict blindly, i.e. the parameter in question is an assigned value v S. While TG 1 is rescheduled to a separate TL still utilizing the yet consistent global state, TG 2 adopts the newly spawned branch with a conflict free state. When TG 2 commits, the changes become visible and the network state becomes inconsistent. The network remains in this state until TG 1 can be committed. As soon as this happens both execution branches are merged. On the contrary, if the merge of branches fails, the official branch, in this t example the branch TG 1, will overwrite the spawned branch and all changes of TG 2 are compensated. TG 2 will then, however, be rescheduled and might then commit successfully in the official branch. In any case consistency is restored either when branches are merged or both groups commit on the official branch. While the TM provides basic interfaces for execution branching and merging it is the responsibility of the respective rescheduling policy to maintain global execution consistency. A general design concept of the TM is to keep core functionality as lightweight as necessary and provide rather extension points for advanced features. D. PROOF-OF-CONCEPT IMPLEMENTATION AND EVALUATION To show the capabilities of our novel approach, an implementation was successfully realized based on standard Java technologies. The database connection is provided through a JDBC driver that would allow to employ any JDBC compatible database. In addition to the implementation of the core concepts a demonstrator has been developed that visualizes the internal methods of operation of the transaction processing, and furthermore demonstrates the cell adjacency management use case. The whole software package realizes all key concepts described in this paper and includes the following key building blocks of: transaction manager including support for failure handling and multiple policies transaction generator with focus on optimization (parallelization and execution planning) high level abstraction towards the database systems statistics and monitoring utilities console for operator interaction visualization of the transaction execution visualization of the cell adjacency management use case The class diagram in Figure 6 shows a small excerpt of the transactionmanager package. It shows the classes that accomplish the infrastructural support for transaction management. The singleton Manager class constitutes the base implementation of the transaction manager. The class allocates the needed execution units and acts as central event manager. Furthermore, it initializes the execution state by downloading a snapshot from the master database and converting the objects to transactional entities in the transactional state. s are accepted through the add()

method. Although an arbitrary number of transactions can be added to the Manager it is always assured that only one top level transaction (NT Network) is executed at a time. In addition to the transaction itself, an execution plan generated by the optimizer can be set. Depending on the ExecutionPlan ExecutionUnits are added or removed from the manager, thereby an ExecutionUnit object implements directly the concept as described in Section C. Manager state or unit state). On commit, it is first tried to merge the transaction branch with the official branch, whereby the official branch is the branch being synchronized with the persistent data store. If the merge completed, the official branch is then pushed to the respective databases via the database abstractions. Especially important are the adapter classes for the different policies: AbstractExecutionPolicy, AbstractParallelizerPolicy and AbstractRescheduling Policy. New policies inherit from these abstract classes and thus can introduce specific domain knowledge for particular use cases. Rescheduler AbstractExecutionPolicy AbstractParallelizerPolicy PolicyServer AbstractReschedulingPolicy SchemaServer Network NE ExecutionPlan ExecutionUnit Line ExecutionReport Group StateAware An extensive evaluation of the implemented system has been done [7] with regard to the following characteristics: Scalability to large numbers of replicas Resilience to connectivity interruptions Performance (network bandwidth consumption and transaction execution delays) Parallelization gains <<interface>> I Figure 6 Class overview (excerpt) To retrieve information on the ongoing execution process, interested objects can register a listener with the Manager. Respective events inform of the lifecycle of the different transaction types as well as internal information e.g. execution unit activity and the like. lifecycle information is passed through the ExecutionReport object. It holds information on the transaction, its executing TL, its state (executing, failed, done), and in the case of an error some information on the cause of the error. In the event of an error the Manager requests the Rescheduler to extract an applicable policy that analyses the group in question and creates new groups. With the result of the Rescheduler, the Manager spawns one or more new ExecutionUnits and starts their operation. In order to use resources efficiently ExecutionUnits deactivate themselves once they reach an idle state. To support the development of a database independent transactional state, the execution package provides a transactional and a recoverable object class. The classes feature methods to set and retrieve values with the option to restore a previous state. Once a transaction enters its execution method a separate branch for that transaction is allocated in the associated state instance (e.g. global time gain [per cent] 100 90 80 70 60 50 40 30 20 10 0 2 3 4 5 6 7 8 # transaction lines (TL) Figure 7 Relative parallelization gain theoretical optimum cell adjacency use case perfect parallelization As an example, Figure 7 shows the relative parallelization gain. The upper curve gives the 1 theoretical optimum of 1, i.e. the achievable n parallelization gain under the assumption of full symmetry of n transaction lines. Perfect parallelization is the result when driving the experimental system with an NT of eight TGs, which can always be perfectly parallelized. Perfectly means here, the distribution of the eight TGs over the TLs is as perfect as possible, e.g., for four TLs, two TGs are associated to each line. Note that when employing five, six or seven TLs the result does not change basically, as at least one of the TLs still needs to accommodate two consecutive TGs. The differences between the perfect parallelization and theoretical optimum curve for two, four and eight TLs

show the delay impact of the execution of the parallelization process itself in the experimental system. For the cell adjacency use case it can be observed that for only two TLs the parallelization gain is quite far away from the perfect case. This is due to the often unequal distribution of TGs to the two TLs (e.g., the actual dependencies may require that six consecutive TGs are associated with TL 1 whereas only two TGs are in TL 2. Thus it is the first TL which causes a 50% delay increase with regard to the perfect case). For an increasing number of available TLs the parallelization gain moves closer towards the optimum, as increasingly often the availability of more TLs can be exploited by the actual distribution of TGs in the cell adjacency example. E. POLICIES To demonstrate the correct working of the policy mechanisms one parallelization, one execution planning and several rescheduling policies have been implemented in our experimental environment. The parallelization policy parallelizes by first checking which transactions of a certain TG work on the same data sets, i.e. the same NE, and accordingly sorts the TGs to TLs. It is then checked if there exist cross dependencies between different NEs, and in case there are, the two TLs will be merged into one. The execution planning policy identifies that the TG with the most critical transactions and schedules it to be executed first. The rescheduling policies include a policy which defers the execution of a failed transaction until the transaction can finally be committed. In another policy the failed transaction is forwarded to the operator console and the execution is halted until the operator decides on actions to be taken. Other combined policies allow expressing full operator logic behind a decision to finish a distributed transaction (like e.g., commit at the master, if 90% of the individual transactions of the TG have been committed at the replicas ). Thus, the decision process at the EMS can be automated, since the TM as the controller component is autonomously able to control and correlate the status of all individual transactions. F. INTEGRATION Figure 8 shows how the described data management subsystem could be integrated into an EMS. The transaction compiler / manager is driven by configuration delta scripts emitted by a configuration preparation tool. The TM interfaces on one side to the master DB at the EMS as well as the human operator and on the other side to the replica DBs at the NE. The TM can employ any transaction-oriented protocol for communicating and transporting state between master and the replicas. Applications can both retrieve configuration state from the master DB ( committed read ) as well directly via the legacy configuration management protocols from the NE ( dirty read ). Configuration Preparation Tool Element Manager Network Elements CLI Application Management protocol GET ( Committed read ) SET GET ( Dirty read ) Agent Network Planning RV RV PV - Master Replica CLI script Compiler / Manager Protocol adapter T Protocol adapter RV: recent view PV: planned view PT -based protocol Figure 8 Integration into the element management architecture G. CONCLUSIONS A novel concept to enable distributed adaptive transactions for configuration management with an arbitrary desired level of control has been presented. Figure 9 shows a mapping of the major building blocks to the MAPE (Monitor-Analyze-Plan-Execute) model of autonomic computing. By employing transaction groups covering several NEs, dependencies between NEs can be expressed in the plan phase and the execution of configuration changes along these dependencies can be controlled ( distributed transactions). The use of transactional semantics between an individual NE and the EMS assure that the individual NE and the EMS always converge to a consistent state. While the NE can autonomously update and activate configuration data ( actuator, cf. Figure 9), the EMS retains the decision power to rollback any NE configuration. Moreover, while the EMS can rollout new configuration data, it is assured that data at the NE is never overwritten without a notification of the conflict to the NE and the EMS ( sensor function of the NE).

OK (recent view planned view) escalate Policy-based transaction Policy-based generation rescheduling reschedule (parallelization / execution plan) Analyze Plan Policy repository TG Sync Table Trans. ID UUID 1 Sensor Node ID UUID R1 Monitor State INIT PT Network Planning Master DB Knowledge Replica DB T TT 1..n T 1..n T 1..n 1..n TG Execute Actuator Figure 9 Mapping to the MAPE model The distributed transaction mechanism can be made adaptive in the sense that the original expression of dependencies may be revised during the execution of the distributed transaction. Dependent on the overall state of the distributed transaction ( monitor, cf. Figure 9), it can be rolled back or individual NE transactions can be committed at an operator-defined time to finish the configuration process. This considerably improves an all-or-nothing approach (two-phase-commit). Given that a significant probability for connectivity interruptions exists, two-phase-commit becomes unsuitable with increasing size of a transaction group due to the incurred overhead in network traffic and processing for roll-backs. The capability to control the network configuration on an NE group rather than individual NE level in the described way thus provides the basis to reach a higher degree of automation. On one hand, this directly allows to relieve the human operator of the treatment of simple, recurring problems in the roll-out of network configurations. On the other hand, by correlating the states of transactions ( analyze, cf. Figure 9) more intelligent decisions in the configuration process are possible. Furthermore, it is feasible to include other available state information like alarms into the decision process. Thus an automation level of predictive (cf. Figure 1) with some elements of the adaptive level can been reached. However, this can only be achieved by encapsulating today s human operator knowledge into the policies which drive the transaction generator and manager. However, the process of how these policies are developed and how complex (how intelligent) these policies ultimately can (or should) be is clearly an area of future work. In principle there seems to be any degree possible between complete automation and only manual operator control. We assume policy systems will evolve from atomic policies towards an integrated policy developing system, consisting of an integrated programming environment for developing very complex policies in a high level language. Nevertheless, to solve these issues, clearly operational experience from a large real system implementing the described transaction concept is required. REFERENCES [1] A. Ganek, T. Corbi, The dawning of the autonomic computing era, IBM Systems Journal, Vol. 42(1), 2003. [2] O. Lazaro, J. Irvine, D. Girma, J. Dunlop, A. Liotta, N. Borselius, C. Mitchell, Management System Requirements for Wireless Systems Beyond 3G, Proceedings - IST Mobile & Wireless Communications Summit 2002, Thessaloniki, Greece, June 2002, pp. 240-244, http://www.isrc.rhul.ac.uk/nb/publications/ist2002.pdf [3] Wireless World Research Forum (WWRF), Book of Visions 2001, Version 1.0, 2001, http://www.wireless-world-research.org/ [4] J. Gray and A. Reuter, Processing: Concepts and Techniques. Morgan Kaufmann, 1993. [5] H. Sanneck, J. Sokol, C. Gerdes, C. Schmelz, A. Southall, C. Kleegrewe, Mobile network self-configuration based on distributed adaptive transactions, Praxis der Informationsverarbeitung und Kommunikation (PIK), 28(4):217--222, October 2005. [6] The Open Group www.opengroup.org, X/Open Distributed Processing : Reference Model version 3, 1996 [7] A. Däubler, Performance evaluation of a Network-Wide Subsystem for Configuration Data Management, Master s thesis, University of Augsburg, October 2005. With its policy-based, network-wide configuration management, a higher-level interface to the operator is provided, thus ultimately relieving him of low-level per- NE configuration tasks. At the level of the operator a group-wide rollback thus may appear only as a single operation (dependent on the sophistication and thus the number of necessary escalations to a human operator).