Ingres Backup and Recovery Bruno Bompar Senior Manager Customer Support 1
Abstract Proper backup is crucial in any production DBMS installation, and Ingres is no exception. And backups are useless unless you can recover from them. This session explains how Ingres backup and recovery work. We will also cover some ideas on how best to do a regular backup and how to do a save recovery. 2
Agenda Why backup and recovery? Disaster scenarios Ingres features Housekeeping Customisation Issues to Consider Tips and cautions 3
Why backup and recovery? Insurance What if? Cost to business Critical functionality One part of overall process 4
Scenarios to Consider System Crash Database Corruption Lost Table Accidental Transaction 5
System Crash Automated Recovery After a crash Ingres will Scan the transaction log file Rollback uncompleted transaction Apply completed transactions Databases will be consistent Depends on the crash 6
Database Corruption Databases can be recovered Only if valid Ingres backup is available! ckpdb command to backup rollforwarddb to recover 7
Backup Mechanisms OS backup invalid unless done with Ingres shut down cleanly important for backing up Ingres installation, journals, checkpoints, dumps useless for backing up databases unless you can guarantee a clean shutdown unloaddb an archiving or porting tool, not a backup tool no way to ensure a consistent snapshot without locking out all users (an "offline" archive) 8
Backup Mechanisms In order to get the most out of a backup mechanism, two things are needed: a way to take a static snapshot of the database without interfering too greatly with active users a way to record incremental changes since that static snapshot Ingres does both via checkpoints and journals a checkpoint is the static backup or snapshot the journals are the ongoing change records 9
Backup Mechanisms Terminology note! Ingres differs from other DBMS's in its use of the word "checkpoint" Ingres: a checkpoint is a backup snapshot a consistency point (CP) is a buffer and log flush Other DBMS's: a checkpoint means a buffer flush a backup is just called a backup 10
Database Checkpoints Backup the whole database Online or Offline Enable / Disable journaling Can be performed in parallel Written to Tape Disk Don t forget iidbdb!! 11
Online versus Offline Offline Requires exclusive access to database Online Users carry on working No DDL statements Slower than offline Can cause transaction log file to fill 12
Online Checkpointing An online checkpoint (the ckpdb command) has three phases: quiescing the database file copying with change logging completion recording 13
14 Online Checkpointing
15 Online Checkpointing
Online Checkpointing File copying is controlled by the checkpoint template (cktmpl.def) can be modified by Ingres administrator change copy command, add file compression, etc amazing things are possible DML allowed during file copying but not DDL - no file creation/deletion Changes during file copying are specially logged before-images sent to dump files 16
Checkpointing After copying is complete, the checkpoint success or failure is recorded in the database config file aaaaaaaa.cnf another copy left in cnnnnnnn.dmp in dump location note that the checkpoint itself does not contain a record of the checkpoint completion Config file records last N checkpoint attempts successful or not N = 99 for recent releases of Ingres N = 16 for older versions (2.0 and older) 17
Online Checkpointing When it's all over, you have one or more checkpoint files (one for each data location) in disk checkpoint area, or on tape zero or more dump files containing changes made while file-copying an updated database config file plus an updated copy in the dump location a new set of journal files a fresh journal file is started at the end of the database quiescent phase 18
Checkpointing What to save after the checkpoint completes: the checkpoint and dump locations you need both infodb output (human readable listing of the database config file) output of: select * from iifile_info for manual table level recovery and emergencies optional but recommended 19
Journals Audit trail of all changes made to selected tables written in batches by the archiver (dmfacp) Default for tables is journaling ON journaling also needs to be enabled for the database using ckpdb +j this is an offline checkpoint; no users allowed Journal files grow to a target size, then a new one is started current expected size and sequence number is stored in the database config file each checkpoint starts a fresh set of journal files 20
Database Checkpoint - Examples Command line Online checkpoint ckpdb dbname Offline checkpoint enabling journaling ckpdb +j dbname #m3 Offline checkpoint disabling journaling ckpdb -j dbname 21
Database Checkpoint - Examples Visual DBA 22
Recovery Recovery is a two step process one command (rollforwarddb) with two distinct phases First, restore the database to a point in time (a checkpoint) Second, replay journals optional all journals, or stop at a given time 23
24 Recovery
25 Recovery
26 Recovery
Recovery The database must exist before it can be recovered All required data locations must exist A valid config file must be available recovery looks in the data location first, then the dump location config file is renamed to aaaaaaaa.rfc The last checkpoint must be valid can ask for an earlier checkpoint with #cn option 27
When Recovery Is Needed Stay calm! you have practiced recovery, right? haste makes mistakes turn off the mobile phone, pager, etc the database will be ready when it's ready Save your current database config ideally, make a copy of the dump location and the data location aaaaaaaa.cnf as a minimum save aaaaaaaa.cnf allows you to try again if something goes wrong if you have time, save everything in sight 28
Database Recovery Point in time recovery Last checkpoint only Last checkpoint + 10 hours work 5 checkpoints ago Based on available files 29
Database Recovery - Examples Command Line Last checkpoint only, no journals Rollforwarddb +c j dbname Last checkpoint, journals to 12:32 on 10/05/02 Rollforwarddb +c +j dbname e10-may-2002:12:32:00 30
Database Recovery - Examples Visual DBA Last checkpoint only, no journals 31
Database Recovery - Examples Visual DBA Last checkpoint, journals to 12:32 on 10/05/02 32
Recovery Scenarios Data area is lost shut down Ingres if it's not down restore data directories with db config file restart Ingres transaction log contents can be moved to journals only if a valid config file is available! rollforwarddb up-to-the-minute recovery should be possible 33
Recovery Scenarios Transaction log is lost wasn't it mirrored? recreate transaction log rollforwarddb most recent transactions not moved to journals will be lost 34
Recovery Scenarios Checkpoint or dump location is lost recreate location directories take fresh checkpoint loss of checkpoint area should not affect running database 35
Recovery Scenarios Journal location is lost installation will continue to run until transaction log fills up recreate journal directory alterdb -disable_journaling to halt journaling restart archiver which will have stopped due to inability to write journals ckpdb +j to restart journaling 36
Recovery Scenarios Software or human error is discovered If mistake is discovered immediately: crash/restart Ingres, or remove all user sessions rollforwarddb with -e option to replay journals, stopping short of the time of mistake If mistake isn't discovered until later, recovery is more complicated Ingres Journal Analyzer (IJA) can help 37
Accidental Transaction AuditDB Filter against Table Users Time Scan Journal files Generate SQL Execute 38
Accidental Transaction Ingres Journal Analyzer Auditdb with Knobs on Connect to remote servers Force Log Flush Point and Click 39
40 Accidental Transaction
41 Accidental Transaction
42
Recovery Scenarios Disaster Use OS backups to restore Ingres system directories, all data, work, checkpoint, dump, journal directories rollforwarddb iidbdb you have been checkpointing iidbdb, right? restores users, locations, database privileges, etc rollforwarddb databases 43
Recovery Scenarios Rollforwarddb failure restore the config or dump info you saved before attempting rollforwarddb rename aaaaaaaa.rfc back to aaaaaaaa.cnf if it exists cure any other rollforwarddb complaints try again Last checkpoint didn't work use ckpdb #cn to restore an older one you do have more than one checkpoint around, right? 44
Lost Table Table can be recovered From table checkpoint only Enforce logical consistency Journaling must be enabled 45
Table Checkpoints - Examples Command line Checkpoint table t1 ckpdb dbname table=t1 Checkpoint table t1 and t2 ckpdb dbname table=t1,t2 46
Table Recovery - Examples From table checkpoint only Command line Recover table t1 rollforwarddb dbname table=t1 Recover table t1 and t2 rollforwarddb dbname table=t1,t2 47
Housekeeping Ingres Infodb Checkpoints Dumps Journals 48
Infodb / aaaaaaaa.cnf Shows meta-data about database Locations Checkpoint sequence Valid / Invalid Dump / Journal sequence Counters Last table id Last valid checkpoint 49
Infodb / aaaaaaaa.cnf Info stored in aaaaaaaa.cnf Three copies Primary database location Dump location as aaaaaaaa.cnf Dump location as cxxxx.dmp Infodb reads CNF file in database area Copy to dump area with every change II_DUMP database own dump area 50
Checkpoint files Stored in 1 location II_CHECKPOINT Database defined checkpoint area One file for each location Format depends on archiver used 51
Dump files Changes during ONLINE checkpoint Required for recovery Single location II_DUMP Database defined dump area 52
Journal Files Record of changes Table configuration Facilitates point in time recovery Files stored in single location II_JOURNAL Database defined journal area 53
Backing up the backup files OFFLINE Checkpoint Database aaaaaaaa.cnf Dump aaaaaaaa.cnf Output from infodb Checkpoint Journals ONLINE Checkpoint All above Dump files 54
Cleaning up ckpdb d All but the last checkpoint Dump, journal files deleted as well alterdb delete_oldest_ckp Oldest checkpoint only Maintain set of checkpoints Dump, journal files deleted as well 55
Customisation cktmpl.def $II_SYSTEM/ingres/files Defines actions Before / During / After Tape Disk II_CKTMPL_FILE ingsetenv only Most common entries to change: WSDD: work phase of regular checkpoint WRDD: work phase of regular rollforward Some things you can do: add compression/decompression use a different utility (eg star instead of tar) wild and crazy stuff Test both checkpoint and restore after modifying the template 56
Issues To Consider Files Ingres supports large files OS archiver utility may not POSIX standard tar cpio 57
Tips and Cautions Hardware "solutions" aren't solutions "I don t need to backup, I have magic solution of the moment" RAID 5, mirroring, whatever you aren't protected against software failures you aren't protected against human failures you aren't protected against disasters you may not be protected against multiple hardware failures you are putting all your eggs in one basket 58
Tips and Cautions Backups are no good if they don't work make sure that ckpdb works automatic verification is better than manual verification not ensuring that checkpoints are working may be the #1 cause of recovery failure Automate as much as possible error checking disk space checking old-checkpoint deletion 59
Tips and Cautions A choice of checkpoints is better than just one avoid ckpdb -d (delete all prior checkpoints) alterdb -delete_oldest_ckp is better manual (or scripted) deletion of old checkpoints is often best maintains checkpoint history in the config file Keep as many checkpoints as you can gives you more recovery options don't skimp on checkpoint disk space (disks are cheap!) you can delete checkpoints but keep journals it's all on OS backups, right?? 60
Tips and Cautions Be wary of checkpointing to tape nasty, unreliable devices they are "oops, there wasn't a tape in the drive" if you must use tape, verify your backups regularly tape drives have been known to write unreadable tapes Keep checkpoint and dump locations together on the same file system or drive keep them on the same OS backup schedule checkpoints are worthless without the dump info 61
Tips and Cautions Practice is essential not just once, but regularly practice on look-alike installation if production is not available practice on production at least occasionally clean Ingres shutdown OS backup everything in sight verify the OS backup, then run your recovery tests you need hardware resources to support your recovery practice 62
Tips and Cautions Document your recovery procedures let someone else do a trial recovery keep the procedures up to date make sure that more than one person knows how to do a recovery make sure that more than one person knows where to find the documentation keep a copy offsite or in a safe place 63
Tips and Cautions Backing up and archiving are different a backup has a short useful lifetime an archive (unload) is good indefinitely Backup planning and disaster recovery planning are different recoverable backups are just one aspect of a complete disaster recovery plan 64
More Information Ingres DBA guide Chapter 15 (2.6) Ingres Command Reference Guide Compressed Checkpoints Servicedesk Doc ID 409751 65
Summary Backups deserve more than lip service Ensuring 100% recoverable backups takes time, effort, and money Ingres checkpoint and rollforward capabilities are simple yet powerful and customisable With proper practice and procedures, a recovery is nothing to be afraid of 66
Questions & Answers? 67