Adaptable System Recovery (ASR) for Linux How to Restore Backups onto Hardware that May not be Identical to the Original System Storix, Inc. 7801 Mission Center Ct. Ste. 250 San Diego, CA 92108 1-619-543-0200 sba@storix.com
ADAPTABLE SYSTEM RECOVERY FOR LINUX 2 SYSTEM FAILURE A system failure renders a system inoperative. Causes can range from the failure of a single disk drive to a fire in the computer room. These catastrophic events can cause not only a system to fail but destroy the system and perhaps even an entire data center. Such events don t happen every day, but it only takes one catastrophe to destroy a business. While a failed disk drive may be easy to plan for, the loss of system hardware is another story. Most businesses do not have a mock-data center waiting in standby with duplicate idle hardware for quick replacement. You may find yourself needing to restore onto new hardware with minor differences such as a single network adapter change; or onto a system with completely different disk configuration. In either case, how will your backup and recovery solution handle these difference? Can your backup solution adapt? This paper will introduce the concept of Adaptable System Recovery (ASR) that makes it possible to easily adapt a system backup to fit new hardware by enabling a means for reconfiguring drivers, resizing and relocating filesystems, and restoring storage configuration. Failure Rate of Computer Systems Phase 2 Phase 1 Phase 3 OS Function Failure Rate (% OS Functions that Failed Against Test Data Set) On the top left: 1. An early failure period characterized by a decreasing failure rate (Phase 1). Failure occurrence during this period is not random in time but rather the result of substandard components with gross defects and the lack of adequate controls in the manufacturing process. Parts fail at a high but decreasing rate. 2. A normal operating period where electronics have a relatively constant failure rate caused by randomly occurring defects and stresses (Phase 2). This corresponds to a normal wear and tear period where failures are caused by unexpected and sudden over stress conditions. 3. A wear out period where the failure rate increases due to critical parts wearing out (Phase 3). As they wear out, it takes less stress to cause failure and the overall system failure rate increases, accordingly failures do not occur randomly in time. (Source Advisory Group for the Reliability of Electronic Equipment) On the bottom left: 1. Some OSes are more stable than others, but they all have failures. (Source University of Wisconsin)
ADAPTABLE SYSTEM RECOVERY FOR LINUX 3 SYSTEM RECOVERY System Recovery-- sometimes referred to as Bare-Metal-Restore (BMR) or Crash Recovery-- is the process of installing and configuring an operating system to match its previous state prior to a system failure. This recovery includes configuration on a wide variety of items such as: user accounts, printers/ other devices, networks, storage devices, filesystems; and application and user data. FILE-LEVEL VERSUS IMAGE BACKUP Most businesses are highly dependent on rapid recovery, and attempt to ready themselves for quick replacement of various components, whether it be a SCSI controller, network card, a failed disk drive or even a motherboard. Of course, you have a backup and disaster recovery plan. You have regular system backups and keep recent copies off-site. Better yet, you take the time to regularly test the system recovery boot media and verify readability of the backup media. After first installing the system, you even went so far as to perform a test of your backup and recovery process by restoring your backup onto the same system Well, that was a long time ago. Because you don t always know exactly what system type, controllers, drives or other peripherals will be available to replace your system when disaster strikes, planning to perform an Adaptable System Recovery (ASR) is the easiest way to address your recovery challenges. ASR solutions use a file-level operating system backup that is more granular and understands the prior system, controllers, disks, and software storage configuration. That operating system backup, using file-level techniques, makes it possible to adapt the backup to fit new hardware by enabling a means for reconfiguring drivers, resizing and relocating filesystems, and restoring storage configuration. In contrast an image backup, used in many system recovery products and tools for backup of disks or partitions, is typically unfamiliar with the specific operating system, software or storage configuration. Image backups simply provide a higherlevel byte-by-byte snapshot of a device, which can therefore only be restored to an identical disk or partition. Attempting to restore to different hardware, whether it be a different processor type, hard disk size or type, or even a system with different SCSI or network controllers could render the system inoperative. Image backup tools are known for some advantages. First, manufactures of image backup products often boast faster speeds than with file-level backup counterparts. But keep in mind that an image backup must include all bytes on a disk (or partition) - even those that are not referencing any real data. Even if the speed is faster, you may be forced to back up more (and often times useless) data, which could end up taking longer to perform a backup. A file-level backup may have to read non-sequential filesystem data, but it s only backing up real data. A second claim is that, although an image backup is inflexible, it usually has the advantage of simplicity. This claim is indeed true. In the cases where an administrator wants to restore the entire system, disk or partition to an identical replacement, an image-based backup may be a fine choice. Image backup tools usually come with an ability to boot to a simple program which restores the image to disk. If the tool backs up partition images, they may come with a simple partition editor, providing some flexibility on where the partition data may be restored. However, moving data to a new partition usually requires altering the system configuration files after the operating system has been restored. ASR, on the other hand, uses file-level backup and enables the user to easily move data to another location. This may include migration of partition-based filesystems to LVM or Software RAID, changing filesystem types, or even migrating, splitting and merging filesystems to different disks. Even when installing to the same hardware, ASR users can benefit from new storage configuration for higher performance, expandability or availability. Lastly, take care when using image backups with live data or mounted filesystems. Unless you can perform a snap-shot of your live data, to backup an offline copy, an image backup will almost always require the filesystems be unmounted before the backup. This is because, after a restore, the system will assume the filesystem was not previously unmounted properly, requiring a cleanup (fsck). Any filesystem transactions that were in process at the time of the backup, such as file creation, deletion, relocation or expansion, may then only be partially performed, possibly resulting in filesystem corruption.
ADAPTABLE SYSTEM RECOVERY FOR LINUX 4 A file-level backup doesn t have the same live data issues because files are backed up and restored individually to a clean filesystem. (Relational database files, are generally not part of the base operating system. If you have relational data that must be backed up live, address it separately. Look for file-level solutions that won t incur different timestamps on files after a restore; or use the tools provided with your database application). FILE-LEVEL/ ASR VS IMAGE BACKUP: PROS AND CONS Performance/ Speed Image-based Backup Faster recovery speeds but higher data volume. May actually result in longer backup times. ASR/ File-level Non-sequential disk backup requires slightly more time than disk image but only backs up real data. May result in less backup time. Backup Media Quantity Simplicity Flexibility of Recovery Hardware Migration Uses backup media equal to the entire size of the original disks or partitions. Can backup any type of data or operating system. System recovery often uses fewer steps. Can only be restored to the detail level of the backup. Disk backup can restore only to an identical disk. Partition backup can restore only to an identical partition (additional tools may provide ability to move partitions between disks but not without altering the system config files). Since operating system configuration is fixed on the backup, system after restore must match prior configuration. Changes to data location requires file modification of system config after the Os is restored. Uses only enough backup media for the real files with little overhead. Backup specific to an OS. Recovery has few steps when restoring to same hardware. Requires userintervention when restoring to different hardware. More detailed backup tools provide ability to alter most system, disk, partition, filesystem and other storage configuration. Files may be restored to identical filesystems or those of different types, sizes and locations. Any hardware with device support included on the boot and backup media is compatible. User can migrate filesystem to new disks, convert from partition to LVM-based filesystems, increase and decrease filesystem sizes, change filesystem types and attributes; and much more.
ADAPTABLE SYSTEM RECOVERY FOR LINUX 5 SYSTEM BACKUP SOFTWARE System backups are usually performed separate from user data backups. Commercial data backup products generally focus on efficient backup of user data but often don t handle the system recovery. Products that do supply system backup and recovery features, usually do so as a completely disconnected function. In the event of a system failure, plan to first restore your operating system and applications from your system backup then restore your user data once the system is up and running. If your system backup software performs image backups, as previously discussed, be sure you can supply similar, or identical, hardware. If not identical, plan for any possible differences in hardware and make sure the backup solution can adapt. If the system backup software provides ASR, or mentions support for dissimilar hardware, be sure you understand the support and limitations because not all system backup products are equal. For instance, if you are using Linux Software RAID devices (meta-disks) for disk striping or mirroring, be sure the solution understands and can restore your storage configuration as it was before. Since most Linux distributions now support Logical Volume Management (LVM) as the default logical storage manager, be sure your recovery software can do the same. STORIX S SYSTEM BACKUP ADMINISTRATOR (SBADMIN) Storix is the clear leader in ASR for Linux and other UNIX systems. There is no other commercial or open-source software with the same focus on system recovery as SBAdmin, whether it be to the same or dissimilar hardware. Since SBAdmin also provides various types of data backup, a single, reliable, product for both system and user data now exists. SBAdmin also complements other commercial data backup products, such as IBM TM Tivoli Storage Manager (TSM), providing the system backup and recovery they often lack. SBAdmin understands your operating system and storage configuration. Every system backup includes all the information needed to rebuild your system from the ground-up, supporting all Linux software storage options such as Logical Volume Manager (LVM), Software RAID (meta-disk) devices and all filesystem types. The SBAdmin system installation process can provide a no-prompt recovery to similar hardware or menus for full customization of the disks, partitions, filesystems and other software storage for migration to different hardware. Even if installing to the same hardware, SBAdmin s flexible install process allows the user to completely redesign the system for higher performance, expandability or availability. This includes migration of partition-based filesystems to LVM or Software RAID, changing filesystem types, or even migrating, splitting and merging filesystems to different disks. DISASTER RECOVERY TESTING There is no substitute for a full system backup and recovery test. This requires that you create the necessary boot/recovery media, then boot and actually restore the system. If you don t have a replacement system handy for testing, consider using a disaster recovery center that can supply the hardware you need. Always run a restore practice test onto both identical and different hardware. Storix is a registered trademark of Storix, Inc. in the USA. SBAdmin is a trademark of Storix, Inc. in the USA and other countries. Linux is a registered trademark of Linus Torvalds. IBM, AIX and Tivoli, are registered trademarks of International Business Machines Corporation. All other company/product names and service marks may be trademarks or registered trademarks of their respective companies.