unisys ClearPath OS 2200 Integrated Recovery Conceptual Overview imagine it. done. ClearPath OS 2200 Release 12.1 June

unisys imagine it. done. ClearPath OS 2200 Integrated Recovery Conceptual Overview ClearPath OS 2200 Release 12.1 June 2010 7830 8186 004

NO WARRANTIES OF ANY NATURE ARE EXTENDED BY THIS DOCUMENT. Any product or related information described herein is only furnished pursuant and subject to the terms and conditions of a duly executed agreement to purchase or lease equipment or to license software. The only warranties made by Unisys, if any, with respect to the products described in this document are set forth in such agreement. Unisys cannot accept any financial or other responsibility that may be the result of your use of the information in this document or software material, including direct, special, or consequential damages. You should be very careful to ensure that the use of this information and/or software material complies with the laws, rules, and regulations of the jurisdictions with respect to which it is used. The information contained herein is subject to change without notice. Revisions may be issued to advise of such changes and/or additions. Notice to U.S. Government End Users: This is commercial computer software or hardware documentation developed at private expense. Use, reproduction, or disclosure by the Government is subject to the terms of Unisys standard commercial license for the products, and where applicable, the restricted/limited rights provisions of the contract data rights clauses. Unisys and ClearPath are registered trademarks of Unisys Corporation in the United States and other countries. All other brands and products referenced in this document are acknowledged to be the trademarks or registered trademarks of their respective holders.

Contents Section 1. Introduction Description... 1---2 Components... 1---3 Benefits... 1---3 Implementing Integrated Recovery... 1---4 Component and System Recovery... 1---4 Database Recovery... 1---4 Message Recovery... 1---5 Distributed Transactions... 1---5 Integrated Recovery Example... 1---5 Section 2. Attributes and Application Groups Attributes for Integrated Recovery... 2---1 Application Groups... 2---3 Local Application Group... 2---4 Switchable Application Group... 2---4 Concurrent Application Group... 2---5 Section 3. Components Logical Components of Integrated Recovery... 3---2 Products in Integrated Recovery... 3---3 Universal Data System (UDS)... 3---4 Exec Components... 3---5 Step Control... 3---5 Audit Control... 3---5 TIP File Control... 3---6 TIP Scheduling... 3---6 TIP Session Control (TSC)... 3---6 TIP File Security (TFS)... 3---6 Message Control Bank (MCB)... 3---7 Integrated Recovery Utility (IRU)... 3---7 Database Environment Examples... 3---8 Section 4. Implementing Integrated Recovery Developing a Recovery Strategy... 4---1 Site Environment Characteristics... 4---1 Recovery and Availability Requirements... 4---2 7830 8186---004 iii

Contents Capacity or Processing Requirements... 4---3 Configuring Integrated Recovery Components... 4---4 Section 5. How Integrated Recovery Works Integrated Recovery Process... 5---1 Steps and Step-IDs... 5---1 Database Recovery... 5---2 Message Recovery... 5---2 Two-Phase Commit Process... 5---2 Integrated Recovery Options... 5---3 Short Recovery... 5---3 Medium Recovery... 5---3 Long Recovery... 5---4 iv 7830 8186---004

Figures 1---1. Integrated Recovery in the Business Environment... 1---2 2---1. Local Application Groups... 2---4 2---2. Switchable Application Group... 2---4 2---3. Concurrent Application Group... 2---5 3---1. The Integrated Recovery Environment... 3---1 3---2. Integrated Recovery Components and Files... 3---3 7830 8186---004 v

Figures vi 7830 8186---004

Section 1 Introduction Integrated Recovery is a combination of OS 2200 software products, features, and concepts that support database processing while providing protection from possible hardware and software failures. Integrated Recovery enables you to expedite the recovery process, thereby avoiding unnecessary downtime, ensuring data integrity, and maintaining system stability. Documentation Updates This document contains all the information that was available at the time of publication. Changes identified after release of this document are included in problem list entry (PLE) 18738708. To obtain a copy of the PLE, contact your Unisys representative or access the current PLE from the Unisys Product Support Web site: http://www.support.unisys.com/all/ple/18738708 Note: If you are not logged into the Product Support site, you will be asked to do so. About the Integrated Recovery Library This document provides an overview of how Integrated Recovery works and describes its components, attributes, and processing considerations. For details of implementing Integrated Recovery, refer to the Integrated Recovery Reference and Administration Guide. This guide contains information that pertains to all environments. For additional details of implementing Integrated Recovery in multihost environments, refer to the following documents: Integrated Recovery Reference and Administration Guide for Multihost Environments Partitioned Applications Conceptual Overview Partitioned Applications Planning, Installation, and Operations Guide 7830 8186---004 1---1

Introduction Description Integrated Recovery is a set of system supplied capabilities that coordinate database updates, transaction scheduling, and messages in user applications. The purpose of Integrated Recovery components and concepts is to safeguard the data that is used by applications to support the unique business needs of your organization, as shown in Figure 1---1. Integrated Recovery Guarantees a consistent state within databases Ensures that updates to data are coordinated and consistent with received and generated message traffic Figure 1---1. Integrated Recovery in the Business Environment Integrated Recovery includes methods that support the following system activities: Recovery from system or hardware failures Maintenance of the database and messages Monitoring of transaction processing and scheduling Integrated Recovery is available only for transactions and other batch and demand programs that are associated with an application group, a concept that provides the basis for Integrated Recovery. An application group is a software partition that includes its own database, audit trail, system files, and other support components. Refer to Section 2, Attributes and Application Groups, for more information about application groups. These concepts are discussed throughout this document in relation to the components of Integrated Recovery. 1---2 7830 8186---004

Introduction Components Benefits The following components are central to Integrated Recovery: Transaction processing Transactions are requests for data access or update that emphasize speed, integrity, all-or-nothing execution, and independence of one transaction from others. Database management A database is a collection of data and a mechanism for organizing, relating, and accessing specific data items. Database management is a method for defining and administering a database. Message handling Messages are input requests for processing and responses that contain the results. A response can be an output message to the caller or an input message to another transaction for additional processing. Scheduling Scheduling is the response to a request for a transaction execution or other processing in which program execution is initiated or delayed according to timing limitations, system operating rules, and other limitations. Logging (auditing) Logs are written records of activity and other information for tracking, history, and recovery purposes. Logging is the process of writing specific information to logs. In Integrated Recovery, this process is known as auditing and logs are audit trails. Integrated Recovery offers the following benefits for your environment: Enables and coordinates the use of multiple database formats with a single transaction or program Ensures consistency between database updates and message processing Synchronizes database recovery with message recovery Provides protection from possible hardware and software failures Preserves OS 2200 processing attributes (refer to Section 2, Attributes and Application Groups, for descriptions of these attributes): Recoverability Scalability ACID database properties Resiliency Data isolation 7830 8186---004 1---3

Introduction Security Ease of administration Implementing Integrated Recovery The concept of application groups is the coordinating factor in configuring and implementing Integrated Recovery across all possible components. When implementing Integrated Recovery for your site, you need to coordinate Configuration of Integrated Recovery components Application programs System administration procedures Component and System Recovery System software components, like the OS 2200 Exec, Transaction Processing (TIP), Universal Data System (UDS), and Message Control Bank (MCB), have their own procedures that can handle most failures. If these local procedures cannot recover from a failure, you can use the Integrated Recovery Utility (IRU) to perform other recovery procedures. If a system failure occurs, the Exec reboots the system. The standard procedures recover any lost data automatically without requesting user input. If you want to control the recovery procedure, you can configure the Integrated Recovery components so that user input controls the recovery. Component and system recovery activities always include the appropriate database and message recovery concepts that are described in the following topics. Database Recovery Database recovery refers to restoring data to a consistent state and ensuring database file integrity, particularly when system or program failures occur. Depending on the environment that your site establishes and the failure that occurs, recovering data can mean one of the following: Permanently applying all pending updates to the database Reversing the effects of all unfinished database updates, rolling back programs to a stable and consistent point, and, optionally, restarting the programs Restoring a database by reloading data from a known point and possibly reapplying updates Database recovery protects data against the following possible problems: Mass storage failures that could cause permanent loss of part or all of the database 1---4 7830 8186---004

Introduction Host system failures or unplanned system stops that could cause database inconsistencies Database software aborts Abnormal failures in the user program that could result in inconsistent data Message Recovery Message recovery is the process of recovering input messages, output messages, and program-to-program pass-off messages that were not completely processed by the communications network or user programs. When a transaction program specifies that the messages associated with the program should be recoverable and the transaction program terminates normally, the system forwards the output message to a workstation and discards the corresponding input message. If the transaction program does not terminate normally, the system retains the input message to reschedule the program. Integrated Recovery components enable sites to synchronize database updates and recovery with message updates and recovery. Distributed Transactions Distributed transactions, also known as global transactions, require multiple applications to accomplish a task. Application processing is distributed among multiple program modules, which can execute on different hardware platforms and update different databases. When multiple applications work together in a distributed transaction, the resulting updates to the databases must be either all committed or all rolled back. This process is known as two-phase commit. Refer to Section 5, How Integrated Recovery Works, for more information. The ACID database properties, particularly atomicity, are crucial to distributed transactions. Refer to Section 2, Attributes and Application Groups, for descriptions of these properties. Integrated Recovery Example The following scenario from the banking industry is an example of the importance of synchronizing database recovery with message recovery. Assume that a customer uses an online interface to transfer $100 from a savings account to a checking account. The database system successfully subtracts $100 from the savings account, adds $100 to the checking account, and queues a notification message to the customer. However, a system failure occurs before the customer receives the message that the accounts were updated. After the system failure is resolved, Integrated Recovery ensures that the system recovers the queued message and delivers it to the customer, so that the customer knows that the transfer is complete. 7830 8186---004 1---5

Introduction If the database system does not use Integrated Recovery, the customer does not receive the message stating that the transfer occurred and, therefore, might re-enter the transfer request. If the customer re-enters the request, $200 is transferred instead of $100, but the customer receives a notification message about the second transfer only. Therefore, a system that is configured and programmed to use Integrated Recovery ensures consistency between database and message update processing and protects users against the effects of hardware and software failures. 1---6 7830 8186---004

Section 2 Attributes and Application Groups Integrated Recovery provides many options to tailor systems to the needs of different applications. When configuring Integrated Recovery for your environment, you have to make choices like the following: Must the system be always available or are there periods of little activity, such as overnight? Does the processing load consist of tens, hundreds, or thousands of updates or accesses per second? Is your data public, private, or secret? Is minor data loss acceptable when system problems occur (that is, you choose not to recover all the data), or must you retain absolute integrity with no data loss? Every environment is different, but you can configure Integrated Recovery to support a wide range of needs and balance the competing choices to suit your business. Attributes for Integrated Recovery Integrated Recovery provides the following attributes. You need to consider the benefits of these attributes and decide which best suit your needs. Recoverability Recovers from any single failure (software or hardware). Recovery can be to a certain time (such as program logic corruption) or event (such as a system crash or disk failure). Recoverability can be defined at various levels, including Scalability Application group Entire database or selected database files Messages or selected messages Concerns transaction rates, transaction volume, and database size, including Very large databases, potentially scaling to ever larger sizes Large number of concurrent database users Applications with specific response time needs Multihost configurations that access a single database simultaneously for load sharing and capacity 7830 8186---004 2---1

Attributes and Application Groups ACID database properties Coordinate threads using a step-id to ensure the following properties: Atomicity All changes that a transaction makes to a database are made permanent, or all are nullified. Consistency A successful transaction transforms a database from a previous valid state to a new valid state. Isolation Changes that a transaction makes to a database are not visible to other operations until the transaction completes its work. Durability Resiliency Changes that a transaction makes to a database survive future system or media failures. Supports methods, such as the following, that provide multiple hosts, multiple file copies, and remote backup support: Extended Transaction Capacity (XTC) and concurrent application groups Partitioned Applications and switchable application groups Duplexed files Includes database, system files, audit files Duplexing methods Includes audit trail duplexing, TIP file duplexing, unit duplexing (host based mirroring) Disaster recovery Data isolation Uses application groups to isolate databases and processing (refer to the following discussion in this section) Security Supports techniques, such as the following, that ensure data security: Data access control File security User-ids Ease of administration Provides a variety of configurations and techniques for maintenance and monitoring that enable recovery, as follows: 2---2 7830 8186---004

Attributes and Application Groups Recovery Coordinates components so that recovery from most failures can be done quickly with minimal intervention. A single IRU command coordinates recovery of all application group database files and messages and all Integrated Recovery system components, usually requiring only a minute or two. Maintenance Provides file backup, audit file backup, and file maintenance. Monitoring Provides transaction monitoring, scheduling monitoring and throttling, file status monitoring, program monitoring, and load balancing. Configuration specification Provides considerable flexibility in configuration. Application Groups An application group is a software partition that usually includes its own Application programs and data Database files Message retention files System files Audit trails Repository or other source for file and record descriptions Other support components An application group provides a recoverable environment for logically related components, files, and user programs that is isolated from unrelated applications, processing, and data. User programs can access the database only of a particular application group. Integrated Recovery is provided only for user sessions, transactions, or programs that are affiliated with an application group. A site performs recovery on an application group basis. Recovering an application group means restoring its components, files, messages, and programs to a consistent state. An application group can be one of the following types: Local Switchable Concurrent 7830 8186---004 2---3

Attributes and Application Groups Local Application Group A local application group is defined on only one system, called a host, as shown in Figure 2---1. A local application group can run on only one host and uses local recovery, which implies that the application group must be recovered by the host on which it is running. The database for a local application group typically resides on the local mass storage of the host. Figure 2---1. Local Application Groups Generally, you can configure an application group as local if it does not need to share files with other hosts. Even in a multihost environment, a local application group is defined on only one host and is known only to that host. Local application group files cannot be recovered or accessed by the other hosts in a multihost configuration. Switchable Application Group A switchable application group is a pair of identical application groups on two hosts in the Partitioned Applications environment, as shown in Figure 2---2. Both hosts can access the database for the application group, but they cannot access the database simultaneously. The switchable application group must be active on a host before that host can access the database for the application group. Database files and other system files associated with switchable application groups reside on shared mass storage. A switchable application group has only one set of database files, one set of system files, and one set of Exec step control and audit control files. Figure 2---2. Switchable Application Group The Partitioned Applications environment extends the availability of OS 2200 hardware and software. With redundant hardware and software components, multiple hosts 2---4 7830 8186---004

Attributes and Application Groups operate as one uninterruptible system. From the end user s viewpoint, the system never goes down. If a critical component fails, the Partitioned Applications system automatically recovers the component on the same host and continues processing. If one host fails, the backup host in the Partitioned Applications system automatically 1. Takes over the switchable workload from the failed host 2. Initiates IRU recovery on the backup host 3. Continues processing on the backup host An application group whose database requires continuous availability to users is a candidate for being configured as a switchable application group. Refer to the Partitioned Applications Conceptual Overview and the Partitioned Applications Planning, Installation, and Operations Guide for more information on using switchable application groups. Concurrent Application Group A concurrent application group exists on multiple hosts, as shown in Figure 2---3. The application group can be active simultaneously on all hosts. The database resides on shared mass storage for concurrent access by all users of that application group. Figure 2---3. Concurrent Application Group Integrated Recovery coordinates data access among user applications. Requests are queued and managed to avoid conflicting updates. While one request is processed, the data is locked. The next request for the same data waits until the first request finishes. Concurrent application groups are supported only on systems where the Extended Transaction Capacity (XTC) feature is installed. In an XTC configuration, all locks are requested through hardware and software components that control access to the shared 7830 8186---004 2---5

Attributes and Application Groups database by coordinating and retaining database locks. These components are known as the Extended Processing Complex-Locking (XPC-L). In contrast, for local and switchable application groups, locking occurs in internal local software memory. Candidates for being configured as concurrent application groups include the following: An application group that requires the capacity of more than a one-host system. An application group that requires continuous access to the database. If a host fails, the application group is still accessible on the other hosts in the multihost system. Refer to the XTC Planning, Migration, and Operations Guide for details on using concurrent application groups. 2---6 7830 8186---004

Section 3 Components User applications, transactions, and other programs access TIP or UDS to request data access or updates, as shown in Figure 3---1. System software coordinates all other components and actions. Integrated Recovery ensures that processing is complete and secure. The Integrated Recovery Utility (IRU) can restore the database, if necessary. Figure 3---1. The Integrated Recovery Environment Integrated Recovery encompasses the database recovery needs for the Exec, Transaction Processing (TIP) file control, and the Universal Data System (UDS), as well as the message recovery needs for Message Control Bank (MCB). Integrated Recovery also controls transaction scheduling and the storage and maintenance of transaction programs. The main Exec components of Integrated Recovery are step control, audit control, TIP scheduling, and TIP file control. Integrated Recovery components that are not part of the Exec are the Integrated Recovery Utility (IRU), MCB, and UDS. This section identifies the logical and product components in Integrated Recovery. 7830 8186---004 3---1

Components Logical Components of Integrated Recovery Integrated Recovery involves numerous components in the following logical areas: Database management Direct access, fixed record size File control superstructure (FCSS), Shared File System (SFS 2200) Indirect access, variable record size database Freespace Network database Network Database Server (DMS) Relational database Relational Database Server (RDMS) The same application program can access combinations of these database types. Application programs, which access the database for updates and data retrieval Batch and demand programs Transactions Distributed transactions System software Communication software MCB, message handling Scheduling TIP scheduling Logging Audit control Commitment control Step control Recovery, maintenance, and monitoring IRU, UDS Monitor (UDSMON), TIP Performance Monitor (TPM) Security TIP Session Control (TSC), TIP File Security (TFS) 3---2 7830 8186---004

Components Products in Integrated Recovery Figure 3---2 shows the relationships of the main components in the Integrated Recovery environment. Figure 3---2. Integrated Recovery Components and Files where: Universal Data System (UDS) Enterprise Network Database Server (DMS) Enterprise Relational Database Server (RDMS) Shared File System (SFS 2200) Universal Data System Control (UDS Control), also known as UDSC Repository for ClearPath OS 2200 (UREP) Exec Step control Audit control Transaction Processing (TIP) file control Freespace TIP scheduling TIP Session Control (TSC) 7830 8186---004 3---3

Components TIP File Security (TFS) Message Control Bank (MCB) Integrated Recovery Utility (IRU) Refer to the documentation for individual software products for more information. Minimum Components The following minimum components are required for using Integrated Recovery: Either a TIP or UDS environment Exec step control Exec audit control Integrated Recovery Utility (IRU) Universal Data System (UDS) The Unisys expandable, modular suite of software products for data management, data processing, and database application development. The UDS suite provides an integrated environment for control, maintenance, and recovery of user databases in several database models. Network Database Server (DMS) The Unisys data management software product that conforms to the CODASYL (network) data model. DMS software enables data definition, manipulation, and maintenance in mass storage database files. Relational Database Server (RDMS) The Unisys data management software product that is based on the relational data model. RDMS software supports fourth-generation Structured Query Language (SQL) statements and native RDMS interfaces to create relational database structures and retrieve and manipulate relational data. Shared File System (SFS 2200) The Unisys data management software product that enables shared access to direct system data format (DSDF) and multi-indexed sequential access method (MSAM) data files (flat files). Universal Data System Control (UDS Control) The UDS online data manager that provides a common architecture and environment for the UDS product family. UDS Control software (also known as UDSC) is required for Integrated Recovery in a UDS environment. UDS Control software Supports three data models (RDMS, DMS, and SFS 2200) and manages all their files through locking and queuing, database I/O control, and diagnostics Enables users to share files, controls access to those files, and automatically and uniformly resolves conflicts over access to the files 3---4 7830 8186---004

Components Enables users to designate recoverable files, regardless of the data management method used, and provides consistent file recovery Supports multiple application groups, which provides for independent database environments, but also can run without interfering with other database management software on the system Handles main storage through its own cache manager banks, which improves overall system performance and increases data security Repository for ClearPath OS 2200 (UREP) The Unisys data management software that provides data dictionary functions by creating, maintaining, and reporting on data objects in a repository. UREP software Maintains all data definitions for UDS products Provides commands for defining and reporting on RDMS, DMS, and SFS 2200 databases, as well as your corporate information resources Enables dynamic system configuration Exec Components Step Control The Exec Integrated Recovery components (step control, audit control, and Transaction Processing (TIP) components) work together during normal production and recovery to ensure data consistency. Controls and tracks program execution. User steps mark the beginning and end of each recoverable unit during program execution. Step control enables recovery to occur on a step-by-step basis. Step control maintains the state of each step and, optionally, can record information on the audit trail, making it possible to recover database updates and transaction messages if a system fails. Audit Control Focuses on data integrity at the application group level (as opposed to the run unit level) by recording events for the application group. Audit control maintains several types of audit trails, but step control audit trails that are associated with application groups are the only type that is pertinent to Integrated Recovery. Step control audit trails contain Step control step changes Message and database updates 7830 8186---004 3---5

Components TIP File Control The database management component that is supplied by the Exec. TIP file control provides the following types of database management: TIP file control superstructure (FCSS) A direct access, fixed record size database, also known as a flat file database Freespace An indirect (key) access, variable record size database TIP file control provides recovery, locking, queuing, database I/O, and diagnostics for FCSS and Freespace databases. The FREIPS processor provides maintenance, file and record definition, and reporting capabilities for these databases. TIP Scheduling A modular extension of the OS 2200 Exec that provides a high-performance transaction monitor and scheduling system. Precompiled and prelinked user transaction programs are stored in program libraries maintained by the SUPUR and TPUR processors. These programs can be written in a variety of languages and can access one or several types of databases. TIP scheduling uses user-provided scheduling information to schedule executions of the transaction programs. You can configure the priority, number of copies, and other variables of transaction programs. Various interfaces are supplied for monitoring and dynamically controlling the TIP scheduling. TIP Session Control (TSC) An optional security feature that controls access to the system on a caller or application basis. TIP File Security (TFS) An optional security feature that controls access to the database files based on the security privileges of the caller. 3---6 7830 8186---004

Components Message Control Bank (MCB) Provides transaction message handling. MCB is a common bank that handles staging, auditing, queuing, recovery, and recall of input, output, pass-off, and checkpoint messages. MCB is the primary communications software product for Integrated Recovery. MCB consists of a background run (batch program) and various common banks that Interface with communication programs, transaction programs, and IRU to provide transaction scheduling and message recovery Interface with Exec step control to coordinate message recovery and transaction scheduling information Maintain a separate MCB database for recoverable transaction message text Note: Other communications software products that support Integrated Recovery include Communications Interface for Transaction Applications (CITA) Communications Platform (CPComm) Open Distributed Transaction Processing (Open DTP) System Interface for Legacy Applications (SILAS) Refer to the documentation for these products for more information. Integrated Recovery Utility (IRU) Provides both preventive maintenance and recovery capabilities. The Integrated Recovery Utility (IRU) is a stand-alone, command-driven processor that Restores UDS files Restores TIP FCSS and Freespace files Helps recover Exec step control files Helps recover MCB message retention files Backs up database files and audit trails Enables you to monitor the condition of database and audit files Performs file maintenance tasks, such as allocation and copying 7830 8186---004 3---7

Components Database Environment Examples The database environment can include any combination of data models, scheduling options, and language types. The environment can be a simple FCSS file that is accessed by a COBOL program or a multihost system with several file types and programs using different languages to access the database, as in the following examples: Example 1 An airline has a reservations database in FCSS and Freespace files with transaction programs accessing it using C and COBOL. The airline also has a cargo database in UDS DMS files with transaction programs accessing it using COBOL Data Manipulation Language (CDML). Example 2 A bank uses UDS RDMS files with transaction programs accessing it using CDML. This system uses MCB for message recovery. Example 3 An emergency support system (911) uses UDS DMS files with transaction programs accessing it using COBOL. This system uses a multihost Extended Transaction Capacity (XTC) system to provide access to the database in case of a host failure. Example 4 A catalog order business uses UDS RDMS files with transaction programs accessing it using C. This system uses a multihost Extended Transaction Capacity (XTC) system to provide additional capacity to access the database. 3---8 7830 8186---004

Section 4 Implementing Integrated Recovery Integrated Recovery is implemented through system configuration, application programs, and system administration procedures. Developing a Recovery Strategy You need to determine an overall recovery strategy for your environment that supports your business needs. All components in Integrated Recovery have default values and behaviors, but you should review these defaults and decide if they provide the best strategy for your environment. To determine the values and behaviors that you need, consider the following factors: Characteristics of your site s environment Recovery and availability requirements Capacity or processing requirements The following paragraphs explore the types of questions you need to ask about each factor when designing your recovery strategy. Site Environment Characteristics To evaluate your site s characteristics, you need to consider your processing environment, types of databases, and types of interfaces, as posed by the following questions: What environment is your site running? Online transaction processing environment Batch/demand environment A combination of these processing environments What database are you using? File control superstructure (FCSS) or a Freespace database UDS DMS, RDMS, or SFS 2200 database Any combination of these databases What interface are you using to access the database? One of the UCS languages (such as UCS C or UCS COBOL) 7830 8186---004 4---1

Implementing Integrated Recovery ASCII COBOL (ACOB), FORTRAN (FTN), or another supported language Java transactions or one of the Java resource adapters (such as DMS RA or RDMS JDBC) Any combination of these interfaces Is your system running in a distributed transaction processing (Open DTP) environment? If so, do you need network access to different types of machines? Recovery and Availability Requirements To evaluate your processing availability and recovery requirements, you need to consider your tolerance for loss of data and system downtime, as posed by the following questions. Compare your answers to the configuration of Integrated Recovery components to ensure that behavior during processing and recovery suits your needs. Is the integrity of input and output messages important? Does your site need transaction message recovery? MCB supports various levels of message recovery. Is the database used for read access only? Or is the database both read and updated? You can configure application groups as non-recoverable for read-only databases. Is reducing the vulnerability to mass storage failures important? Duplexing helps reduce vulnerability to mass storage failures. Duplexing means that all write operations write data twice: to two file copies or two disks. Read operations can occur from either copy. Both TIP file duplexing and unit duplexing are available for files. Audit trail duplexing is available for audit trails. TIP file duplexing Two mass storage file copies (legs) exist of the same TIP or UDS/TIP file data. You can configure either all TIP files or only selected TIP files to be duplexed. Unit duplexing An associated backup disk exists for each disk. Unit duplexing is a separately packaged OS 2200 Exec feature. Audit trail duplexing Two mass storage file copies or two tape file copies (legs) exist of the audit trail data. You can individually configure each application group s audit trail to be duplexed. 4---2 7830 8186---004

Implementing Integrated Recovery Does your system meet your availability (uptime) requirements? If not, you can install either of the following separately packaged features. These features use redundant hardware and software components in multihost systems to extend the availability of OS 2200 systems. OS 2200 Partitioned Applications feature, for automatic recovery of switchable application groups Extended Transaction Capacity (XTC) feature, for availability of concurrent application groups Does your system need disaster recovery procedures? IRU provides a variety of commands and capabilities to support different styles of disaster recovery. Capacity or Processing Requirements To evaluate your capacity or processing requirements, you need to know your current and projected processing load. Does a single host provide sufficient processing power for accessing your database? If not, you can install the Extended Transaction Capacity (XTC) feature and define concurrent application groups across multiple hosts. This feature enables all the hosts in a multihost system to be active in the application group simultaneously and process transactions concurrently. Concurrent processing increases the number of transactions that can be active simultaneously and, therefore, increases capacity. 7830 8186---004 4---3

Implementing Integrated Recovery Configuring Integrated Recovery Components All Integrated Recovery components have configurable portions. Most components also have default values. You should configure all components to suit your applications, transactions, processing requirements, data, and system operating requirements. The default configuration values might not match your environment. You should evaluate your requirements and adjust the Integrated Recovery configuration to suit your environment. The following table shows some of the main parameters you can configure for each Integrated Recovery component: Component Audit control Step control TIP file control TIP scheduling UDS MCB IRU Parameters to Configure Number of audit trails Type of audit trail (step control, log, TPM, COD) Audit trail number and name Audit trail media (mass storage, tape) If mass storage, audit trail placement (fixed, removable) Audit trail duplexing Audit file size Number of application groups Application group name and number Number of concurrent users Recoverable or non-recoverable application group Scheduling tree Number of TIP files Number of TIP/Exec files Use of Freespace Scheduling priority hierarchy Number of transaction copies that can be used at one time Application/transaction relationship Application name and number Number of concurrent threads User message size Number of unique file names that can appear on a command Number of dump history entries to retain Refer to the documents for each product to learn about the attributes and their possible settings. 4---4 7830 8186---004

Section 5 How Integrated Recovery Works With Integrated Recovery software in place, the system Ensures consistency between database update processing and message processing Synchronizes database recovery with message recovery Protects users against the effects of hardware and software failures Integrated Recovery Process The general process for Integrated Recovery involves the following steps: 1. The application program starts a step. The step state of active is audited. 2. The application program reads or writes the database files, as needed. Database write operations are audited; but the files themselves are not yet updated, in most cases. 3. The application program requests that the database updates, if any, be applied or discarded. The step state of commit-in-progress (CIP) or rollback is audited 4. The database is updated, or the database updates are discarded. 5. The application program completes. The step state of terminate is audited. Note: These steps do not show messages. Refer to the Exec System Software Administration Reference Manual for the entire Integrated Recovery process, including recoverable messages. Steps and Step-IDs Integrated Recovery processes are based on the concept of a step. When a transaction or application program request starts, it is given a unique identifier (called a step-id) that is maintained throughout the execution of the transaction or application program request. A variety of information is associated with each step and step-id, the most important of which is the step state. 7830 8186---004 5---1

How Integrated Recovery Works All Integrated Recovery components use the step-id and step state to determine the appropriate actions for the step. Auditing step states ensures that identical actions are done for system and component recovery and in the case of reloading a previous copy of a database file and reapplying the database updates to it. Database Recovery For example, if UDS or TPC recovery determines that the step state is active (that is, the transaction or application program request was not complete when the failure occurred), the recovery process discards any database updates. If the step state is commit-in-progress (that is, database updates were being written to the database), the recovery process completes all database updates. Message Recovery Message recovery ensures, among other actions, that Input messages that are not yet scheduled are retained. Input messages for transactions that have not completed are retained. Output messages that are not yet sent are retained. Output messages for incomplete transactions are discarded. Two-Phase Commit Process The two-phase commit process is a procedure that ensures the atomicity (an ACID database property) of distributed (global) transactions. Two-phase commit is an integral part of the Integrated Recovery environment. When the application program issues an instruction that the transaction is finished, the distributed transaction monitor makes sure that all updates are ready to be applied to the databases and then issues instructions to commit the updates. If any database update cannot be applied for some reason, the distributed transaction monitor issues instructions to roll back the changes to their previous state. The result of the two-phase commit protocol is that updates are either all committed or all rolled back. The two-phase commit process contains an additional step state: ready to commit the updates but waiting for the results of the other portions of the distributed transaction before continuing. Refer to the Open Distributed Transaction Processing TX Application Program Interface Programming Guide for more information. 5---2 7830 8186---004

How Integrated Recovery Works Integrated Recovery Options In the event of a failure, the Integrated Recovery components work together to recover databases and messages and to restore processing capabilities to the application group. Integrated Recovery has the following recovery options: Short recovery Medium recovery Long recovery The option you choose depends on the type of failure from which you are recovering. Short Recovery Short recovery enables recovery after a system, application group, or component failure. The user initiates a short recovery through a simple IRU request. Exec short recovery automatically restores the audit trail, step-id queues, and, sometimes, TIP database files. After Exec short recovery restores the audit trail, an IRU short recovery request uses information from the audit trail to Reconcile database updates for steps that were in progress at the time of the failure Request recovery actions for UDS, TIP, MCB, and step control No user input is required beyond requesting the recovery to start. At the end of a successful short recovery Steps that were in the process of committing at the time of failure are rolled forward (completed). Steps that had not yet reached the commit point are rolled back. Messages and step queues are reconciled. The application group is restarted. This process usually takes very little time, hence the name short recovery. Medium Recovery Medium recovery is useful especially for concurrent application groups because a lengthy host failure of one of the XTC hosts could retain locks and prevent the remaining XTC hosts from accessing portions of the database. If a host failure occurs, another host can execute medium recovery on behalf of the failed host to release the locks and roll steps forward or back, as appropriate. Medium recovery is useful also when short recovery fails due to a system file problem. 7830 8186---004 5---3

How Integrated Recovery Works IRU medium recovery uses data from the audit trail and system files to Restore database updates for steps that were in progress at the time of the failure to a consistent state (rolling forward or back, as appropriate) Request recovery actions for UDS and TIP (releasing locks and rebuilding system files) Long Recovery Long recovery restores an inconsistent or lost database to a consistent state by recovering database updates, messages, and step queues. You can use IRU commands to specify what needs to be recovered, such as portions of a file or the entire database, including the start and end points. Long recovery uses information from the audit trail and system files to Reapply database updates to reloaded files Rebuild message information in MCB system files Send step information to step control to rebuild step states Perform a combination of these actions Long recovery usually requires in-depth knowledge of the database and the failure. Long recovery often takes much longer to execute because it requires more audit processing than the other recovery types. 5---4 7830 8186---004