A STORAGE MANAGEMENT SOLUTION from VERITAS Software Corporation Advanced techniques for consistent database s with minimal operational impact.
Contents BACKUP: BECAUSE STUFF HAPPENS... 5 ADVANCED BACKUP TECHNIQUES... 6 VERITAS NETBACKUP AND ORACLE... 10 SUMMARY... 12 Figures FIGURE 1: FULL AND INCREMENTAL BACKUPS... 6 FIGURE 2: RESTORING A FILE SYSTEM FROM DIFFERENTIAL AND CUMULATIVE BACKUPS... 7 FIGURE 3: USING A SNAPSHOT... 9 FIGURE 4: MULTIPLE CHECKPOINTS AND DATABASE ROLLBACK... 10 FIGURE 5: CONSISTENT DATABASE BACKUP USING STORAGE CHECKPOINTS... 11 Tables TABLE 1: SAMPLE WEEKLY BACKUP STRATEGY USING DIFFERENTIAL AND CUMULATIVE BACKUPS... 8 3
Backup: Because Stuff Happens In the best designed data center in the world, things can go wrong. Hardware can fail. Operating systems and other environmental software can malfunction. Application errors can corrupt operating data. A disaster can incapacitate the data center. Humans can make errors. Advanced techniques such as disk volume mirroring, clustering, and remote data replication significantly reduce exposure to most of these failures. As experienced system administrators know, however, mirrored volumes store incorrect data just as reliably as they store correct data, and clusters restart applications with bugs just as quickly as they restart applications that are working properly. For some failures, a data center has to be able to restore a stable point-in-time database image, and recover from there. In other words, regular reliable is as necessary as it ever has been. Backup Seems Simple Backup is simple. A system administrator: decides which data is critical and has to be backed up, determines an appropriate time to perform the, uses a utility program like tar to make the copy, and, stores the copy in a safe place in case it is ever required for recovery. Conceptually, this really is simple. The difficulties for system administrators lie in: Resources: It s obviously important to get done as quickly as possible. But to finish faster, data has to be copied faster, which means greater demands on disks and I/O channel bandwidth. But disks and channels that are busy with requests can t process transactions. Simply put, the more I/O resources a uses, the slower online operations become. Timing: In order to represent a consistent point-in-time image, s should be started at a time when no other database activity is occurring. Thus, s are constrained to start at times when the business impact of stopping all application access to the database is lowest. 5
Advanced Backup Techniques The bigger an enterprise, the more data needing it is likely to possess. Because is very resource intensive, large enterprises invariably wish to minimize its impact on operations. Both storage system and software vendors (like EMC, VERITAS, and Oracle) have developed innovative techniques to minimize the impact of on online operations. Two of the most important ones are incremental and frozen image technology. Backup In most applications, only a small percentage of data changes between successive s. techniques use this fact to shorten times and minimize resource requirements. For example, an incremental of a file system copies only files that have changed since the last. If only a few files have changed, an incremental completes much faster and has less impact on online operations. Figure 1 compares full and incremental. Application Updates File System File System Unused Space File I Full : all files are copied bakcup: only changed files are copied Figure 1: Full and Backups does not replace full ; it reduces the frequency with which full s are required. An incremental contains data that have changed since the latest point in time for which a full is available. To restore a file system, one restores the newest full, and then restores all newer incremental s in order. Data centers typically schedule relatively infrequent (e.g., weekly) full s at times of low expected application activity (e.g., weekends), with more frequent (e.g., daily) incremental s. This policy minimizes operational impact because only 6
small amounts of data are copied during busy times. The disadvantage is that restores may involve more media handling and take longer to execute. Differential and Cumulative Backup A differential copies all data modified since the last of any kind. To restore from a differential, one restores the newest full and all newer differential s. A cumulative copies all data modified since the last full. To restore from a cumulative one restores the newest full and the newest cumulative. This is simpler and faster than restoring from differential s. The disadvantage is that each cumulative takes longer to make than the previous one, since more data has changed since the last full. File I Both restores start by restoring the Newest Full Backup File System File I File System Newest Cumulative Backup File I File I All Differential Backups made since newest Full Backup d File Systems are identical Figure 2: Restoring a File System from Differential and Cumulative Backups Full, cumulative, and differential s can be combined to balance the impact of on operations against the time required to restore a full file system or database. Table 1 illustrates a scheduling strategy in which full, differential, and cumulative s combine to balance time and restore complexity. 7
Sunday Monday Tuesday Wednesday Thursday Friday Saturday Type of Backup Full Differential Differential Cumulative Differential Differential Differential Data in copy Full database as it stood on Sunday Changes since Changes since Monday s Changes since Changes since Wednesday s Changes since Thursday s Changes since Friday s Full Database Procedure and Monday s differential, Monday s and Tuesday s differentials and Wednesday s cumulative, Wednesday s cumulative, and Thursday s differential, Wednesday s cumulative, Thursday s and Friday s differentials, Wednesday s cumulative, Thursday s, Friday s, and Saturday s differentials Table 1: Sample Weekly Backup Strategy using Differential and Cumulative Backups Frozen Image Technology Today, IT organizations are faced with the conflicting demands of: regular reliable, required because of the value of data, and, continuous application availability to meet competitive pressures. In other words, has to be done, but the database can t be taken down to do it. Both RAID and storage management software vendors have risen to this challenge with innovative technologies that enable an image of online data to be frozen at a user-determined point in time. Frozen images, sometimes called snapshots, can be the source for while applications continue to update the main data. Two popular frozen image techniques are breakaway mirrors and copy-on-write. For breakaway mirrors, a volume manager maintains two or more identical copies of data on separate sets of disks. To initiate a, database and other volume activity are stopped. One set of disks is remounted as a read-only volume, and backed up. Applications process against the other set (or sets) while the is occurring. Breakaway mirrors are very reliable, because they contain two or more complete copies of data. Their disadvantages are storage consumption and the I/O resources required to resynchronize the breakaway disks with operational data each time a completes. To overcome these disadvantages, the copy-on-write technique was developed. As the name suggests, a copy-on-write snapshot of a file system contains both new and old copies of modified data, but only one copy of unmodified data. To create a copy-on-write snapshot, application access is blocked and all cached data is flushed to disk. This makes the file system s disk image consistent. In database application terms, there are no debits without matching credits. 8
Next, a changed block map in which changes will be recorded is created. Additional space for before images of changed data may also be allocated at this time. When these steps (typically requiring a few seconds) are complete, application access can resume. The snapshot can be mounted immediately as a read only file system. application reads from the file system Read : Block 1 application (e.g., ) reads from the snapshot Read : Block 1 Read : Block 1 File System Free Space Free Space No: Read from original location No: Just write data Modified block? First modification of block? Yes: Read from snapshot Yes: Copy to snapshot before overwriting Snapshot Changed Block Map : Block 1 Other Changed Blocks Free Space Write : Block 1 application writes to the file system Figure 3: Using a Snapshot Using Snapshots Figure 3 illustrates reading and writing in a file system with a snapshot. Application reads from the file system are identical whether or not a snapshot is in effect. When an application (such as ) reads from the snapshot, the data is delivered: from its original file system location if it has not been modified, or, from the snapshot s before image if the file system image has been modified. When an application writes data for the first time since snapshot initiation, the data being overwritten is first copied to the snapshot. In addition, the changed block map is updated to indicate the location of the before image in the snapshot area. A snapshot is thus an instantaneous image of a set of data. Changes made after snapshot creation are visible in the data itself, but do not appear when the snapshot is read. The advantage for database is clear. A database can be used by applications while its snapshot is being backed up. From Oracle s standpoint, the is cold. The database image represented by the snapshot is not in use during the. 9
VERITAS NetBackup and Oracle If a snapshot is to be used for database, both the containing file system and the database must be quiescent when it is created. The VERITAS Database Edition for Oracle can request that Oracle quiesce a database momentarily for snapshot initiation. Snapshot initiation, normally a second or two, is the only time during which the database is unavailable to applications. To further facilitate Oracle, the VERITAS file system implements a special type of snapshot called a Storage Checkpoint. Storage Checkpoints differ from other VERITAS snapshot implementations in that: They are persistent (i.e., they still exist after a system reboot). They use the file system free space pool rather than separately allocated space. They are only accessible to the NetBackup component of the VERITAS Database Edition for Oracle. Storage Checkpoint T3 database updates U1 U2 U3 U4 U5 U6 U7 U8 T1 T2 T3 Main Database Image Table Space A U1 U2 U8 Table Space B U3 U7 Table Space C U4 U5 U6 Before Images of updated data U7 U8 U4 U5 U6 U7 U8 U1 U2 U3 Database state at T1, T2, or T3 can be backed up Database can be rolled back to its state at T1, T2, or T3. Changed Block Map Before Images of Changed Blocks since T3 U7 U8 Storage Checkpoint T2 Changed Block Map U4 U5 U6 U7 Before Images of Changed Blocks since T2 U4 U5 U6 U7 U8 Storage Checkpoint T1 Changed Block Map U8 Before Images of Changed Blocks since T1 U1 U2 U3 U4 U5 U6 U7 U8 Figure 4: Multiple Checkpoints and Database Rollback The VERITAS VxFS file system supports multiple concurrent Storage Checkpoints, as Figure 4 illustrates. This has three advantages for database administrators: Choice of times: A point-in-time can be made from any active Storage Checkpoint. Database rollback. The VERITAS Database Edition for Oracle can copy the contents of a Storage Checkpoint back to a database, effectively rolling back the database to its state at the instant of Storage Checkpoint creation. This can be useful if an application error is discovered after the application has been active for some time. 10
Point-in-time queries. A Storage Checkpoint can be used for systematic data mining or ad hoc queries to a database as it existed at a specific point in time. A typical database consists of a small number of relatively large datafiles. While only a small fraction of its data may change between s, changes to a database are usually distributed throughout its datafiles. Thus, an incremental that copies each changed file in its entirety is likely to include all of a database s datafiles, effectively becoming a full. As with a true full, the impact on online I/O resources can be substantial. A Storage Checkpoint s changed block map identifies changed database blocks regardless of what files they reside in. VERITAS NetBackup can use this information to create s that contain only changed database blocks. Figure 5 illustrates block level incremental. Storage Checkpoint Application Updates U1 U2 U3 U4 U5 U6 U7 U8 Main Database Image U1 Table Space A U2 Table Space B U3 U4 Table Space C U5 U6 U7 U8 U4 U3 U2 U1 Before Images of updated data U5 U6 U7 U8 Changed Block Map Before Images of Changed Blocks U1 U2 U3 U4 U5 U6 U7 U8 NetBackup Block Level Backup Backup of changed blocks (only) U1 U2 U3 U4 U5 U6 U7 U8 Figure 5: Consistent Database Backup Using Storage Checkpoints A block level incremental contains only before images of database blocks modified since the instant of Storage Checkpoint creation. If only a small percentage of a database is updated, its block level incremental image is correspondingly small. Compared to full database, block level incremental typically takes very little time and uses only small amounts of storage and I/O bandwidth. By greatly reducing the resource impact of, block level incremental encourages database administrators to schedule more frequent s. Frequent s reduce bandwidth and storage capacity requirements still further, and enable database restores to points in time closer to failure instants. Each block level incremental is relative to a previous. Restoring a database from block level incremental s, requires a full database restore, fol- 11
lowed by restores of all newer block level incremental s. VERITAS Net- Backup keeps track of relationships and automatically applies the appropriate full and incremental s necessary to restore a database to a given state. Closer Integration with Oracle The latest release of VERITAS NetBackup (Version 3.2) integrates even more closely with Oracle. A NetBackup Advanced Block-Level Agent can be invoked by Oracle s Recovery Manager (RMAN). A database administrator can use RMAN s proxy feature to request that NetBackup back up some or all of a database s datafiles (Oracle control files cannot use this technique, and are backed up separately through RMAN). NetBackup will use the block level incremental technique if appropriate. Thus, the database administrator has both the cataloging feature and central control of RMAN and the minimal operational impact of block level incremental and NetBackup s media management, data stream multiplexing, and automation facilities. Summary The conflicting IT imperatives of protecting enterprise data against failures of all kinds and 24x7 online operation make database a particularly difficult problem for database administrators. On the one hand, frequent, consistent database images need to be maintained in case database recovery is necessary. But taking a database out of service for is no longer an option for many installations. Even if databases didn t have to be online all the time, the I/O resource impact would make frequent full s impractical. Software technology has developed techniques that enable incremental s of frozen database images. By reducing the amount of data copied, incremental reduces resource requirements and makes it practical for database administrators to schedule more frequent s. By backing up from a frozen database image mounted as a separate read-only file system, consistent s are guaranteed with almost no application down time. File-based incremental techniques are not optimal for database s, since small amounts of data in most or all of a database s datafiles changes from to. To achieve incremental resource economies for databases, VERITAS has developed the block level incremental technique. VERITAS NetBackup can retrieve before images of changed blocks from Storage Checkpoints in a VERITAS file system and use them to create incremental s of database data- 12
files that include only blocks actually changed since the last. VERITAS has gone further, integrating NetBackup s block level incremental capability with the Oracle Recovery Manager s proxy feature, allowing a database administrator to use RMAN to schedule or invoke incremental s that are actually managed and executed by NetBackup. This gives the database administrator single-point control over Oracle database management, while providing all the media management, performance, and feature advantages of NetBackup managed s. Business without Interruption 13