Big SQL v3.0. Metadata. Backup and Resiliency

Transcription

1 Big SQL v3.0 Metadata Backup and Resiliency Raanon Reutlinger IBM Big SQL Development WW Customer Engineering February 2015

2 Big SQL v3.0 Metadata Backup and Resiliency Contents I. Introduction... 3 II. First Steps... 6 III. Performing Offline Backups IV. Enabling ONLINE Backups V. Setup Automatic Online Backups VI. Restore the Databases from a Backup VII. Purge Old Backups and Archive Logs VIII. Redundancy of the Catalog with HADR IX. Summary X. [Addendum] Detailed Table of Contents Page 2 / 44

3 I. Introduction Big SQL is the SQL-on-Hadoop component of IBM InfoSphere BigInsights v3.0. Big SQL relies on other BigInsights components, such as the Hadoop Distributed File System (HDFS) and Hive (HCatalog). While the BigInsights product supplies many graphical tools for everything from installation to monitoring, exploring and developing, it will be necessary to manage the resiliency and backup of Big SQL metadata using different tools. This guide will help simplify the steps involved. While there are some Graphical User Interfaces (GUIs) to perform many of the steps in this guide (e.g., Eclipse, Data Studio), we will demonstrate the commands using the bash command line in order to minimize dependencies on outside tools. On the cluster head/management node, open a number of SSH sessions (for example, using PUTTY), so that you can stay logged in to the various users required to perform the commands. In the few steps which should be performed on a node other than the head/management node, this will be indicated and the commands will appear in a box with a black background. In this section we ll identify where the Big SQL metadata resides, as well as develop a metadata resiliency strategy. Throughout this guide, look for this Best Practices icon. A. Understanding the relationship between Data and Metadata The data for your Big SQL database resides in the Hadoop infrastructure, on HDFS. Backup of HDFS data is not within the scope of this guide. Rather, this guide will strictly cover Big SQL Metadata. (The term catalog is sometimes used to refer to metadata.) Big SQL uses the Hive catalog services (HCatalog) to register the table metadata, such as: Schema ( database ) names Table and column names and data types Location of the table data files in HDFS Table file storage format Encoding of input files, how many files there are, types of files Basic permissions The Hive metadata is saved in what is called a Metastore, which is a collection of regular tables, managed by the database of the BigInsights Catalog component. Page 3 / 44

4 Other Hadoop components, such as BigSheets and oozie, also store their metadata in the Catalog database. Since the Big SQL tables are defined in the Hive Metastore, they can also be accessed directly through Hive, without even passing through Big SQL. The advantage of accessing the tables via the Big SQL service comes from the superior query optimizer and efficient direct I/O mechanism which bypasses MapReduce. When you create a Hadoop table with Big SQL, after creating it in the Hive Metastore with HCatalog, Big SQL also saves metadata about the table in its own database. Maintaining the metadata in Big SQL not only allows the query optimizer quicker access to it, but Big SQL also maintains additional statistics about the tables which aren t collected by Hive. This helps the performance optimizer further by allowing it to come to more informed decisions. The Big SQL metadata also contains details on advanced features provided only in Big SQL, such as extended security role definitions (FGAC), workload management (WLM) classes, stored procedures and functions, to name just a few. The Big SQL metadata is located in a DB2 database, and the Catalog component also uses a DB2 database, although a different one. These databases reside in totally different DB2 instances on the head/management node. While the Catalog and Big SQL metadata are located only on the head/management node, the Big SQL instance is actually a powerful multinode MPP configuration, with worker processes located on all or a subset of the Hadoop data nodes used by HDFS. Since the metadata resides in a DB2 database in both cases, we will be using DB2 commands and tools. B. Strategy for Resiliency The strategy for achieving resiliency of the metadata in the Big SQL and Catalog databases which we ll cover in this guide will revolve around taking regular backups, as well as setting up redundancy on another server. Redundancy will be achieved by using DB2 s High Availability Disaster Recovery (HADR) feature to maintain a duplicate of the database on another server. At any given time, Page 4 / 44

5 the primary database can be stopped, whether due to a planned or unplanned event, and the BigInsights system will still be able to continue by using the standby database. We will be demonstrating this here only for the Catalog. The option to setup HADR for the Big SQL database will be available in an upcoming release of BigInsights, so that will be described more fully in a future or updated guide. (This guide was written for BigInsights 3.0.) Backups can be used to return the contents of a database to a certain point-in-time, perhaps to return to a point before a human or technical error, or to set up a new system on a different server. If backups are made to a local filesystem (as demonstrated in this guide), then be sure to keep a copy of the backup at a remote location, in case the local storage becomes inaccessible. Backups can be offline, which means that users will be unable to access BigInsights services during the backup, or online, which provides a better service but introduces some additional considerations. For example, online backups will require the maintenance of database log files (discussed further below). When performing backups, offline or online, or when restoring a backup, care must be taken that the Catalog and Big SQL databases remain in synch, since Big SQL shares metadata with the Hive Metastore tables located in the Catalog. (Strictly speaking, since we re talking only about metadata, any definitions made in Big SQL, such as security roles, WLM classes, procedures, etc., can all be recreated without affecting the core of the database, which is its data. Even the schema and table metadata in Big SQL can be recreated from the Hive Metastore using the HCAT_SYNC_OBJECTS()procedure. However, the recreated table metadata will not contain any of the ANALYZE statistics, which can be time-consuming to rebuild, and care must be taken to use the same userid of the user who created the table. Likewise, to avoid the headache of tracking down all of your other definitions made to the Big SQL database, it s much easier to simply keep your backups up to date.) Performing an offline backup requires the fewest changes to the system, so we will start with that. Performing an online backup requires a bit more changes, and finally HADR will require setting up a new database on another server, so we ve saved that for last. In addition, setting up HADR will require you to have taken an online backup, and preparing for online backups will require you to perform at least one offline backup. Part of every recovery strategy should include periodic testing of the backups, before disaster strikes, so we ll demonstrate those steps as well. Page 5 / 44

6 II. First Steps A. Find the instance and database names Since the names of the DB2 instances used for the Catalog and Big SQL components can be entered by the user who first installed BigInsights, the first step will be to identify those names. We will look for the user name which was created to manage (own) each DB2 instance. As a reminder, Figure 1 shows the Components 1 panel of the BigInsights installation GUI where the names may have been entered. This part of the page refers to the Catalog component. The type of Database should have been left as Use DB2 database and the DB2 instance owner has a default value of catalog. Figure 1: Panel from Installation GUI Further down that same page, you ll find the instance name for Big SQL, with the default value of bigsql. Figure 2: Panel from Installation GUI, cont'd Page 6 / 44

7 In order to determine the user/instance names which were chosen during installation, find the configuration file $BIGINSIGHTS_HOME/conf/install.xml. Look under the XML tag hierarchy shown below for the value of <username> for both the BigSQL and Catalog properties. <security> <service-security> <BigSQL> <username>bigsql</username> <uid>222</uid> </BigSQL> <Catalog> <username>catalog</username> <uid>224</uid> <password>{xor}nj0ybjy9mg==</password> </Catalog> To confirm that DB2 was chosen as the type of database for the Catalog, look for another block starting with <Catalog> and find the <catalog-type>, as seen below. <Catalog> <configure>true</configure> <catalog-type>db2</catalog-type> <node>testmg1.iic.il.ibm.com</node> <port>50000</port> </Catalog> Another way to confirm the Catalog user name (DB2 instance) is by checking the value of the USER_CATALOG environment variable (this should be defined for every user since the biginsights-env.sh script is placed in the /etc/profile.d directory). ~]# echo $USER_CATALOG catalog Next, let s find the name of the Catalog database itself (within the instance). The fact is that the name of the database is currently a constant, BIDB, but the next steps will show you where that value is stored. There are two methods to find the database name. The first is to find the Hive configuration file $BIGINSIGHTS_HOME/hive/conf/hive-site.xml, and search for the property containing ConnectionURL in the name. You ll find the database name at the end of the JDBC connection URL value. <property> <name>javax.jdo.option.connectionurl</name> <value>jdbc:db2://testmg1.iic.il.ibm.com:50000/bidb</value> </property> Page 7 / 44

8 Another method is to open a bash session to the DB2 instance owner, catalog, and use the DB2 command LIST DB DIRECTORY to list the available databases (there will be only one), as seen below. (Note, if you are not using su from user root you will need to know the password for the catalog user.) [root@testmg1 ~]# su - catalog [catalog@testmg1 ~]$ db2 LIST DB DIRECTORY System Database Directory Number of entries in the directory = 1 Database 1 entry: Database alias = BIDB Database name = BIDB Local database directory = /var/ibm/biginsights/database/db2 Database release level = Comment = Directory entry type = Indirect Catalog database partition number = 0 Alternate server hostname = Alternate server port number = Note: DB2 commands can be entered in upper or lower case, although the db2 prefix to the command must always be in lower case. The name of the Big SQL database is also currently a constant, BIGSQL, but can also be confirmed as shown. [root@testmg1 ~]# su - bigsql [bigsql@testmg1 ~]$ db2 list db directory System Database Directory Number of entries in the directory = 1 Database 1 entry: Database alias = BIGSQL Database name = BIGSQL Local database directory = /media/data1/var/ibm/biginsights/database/bigsql Database release level = Comment = Directory entry type = Indirect Catalog database partition number = 0 Alternate server hostname = Alternate server port number = For the remainder of this document we will assume the following names: Component DB2 Instance Database name (user) Catalog catalog BIDB Big SQL bigsql BIGSQL Page 8 / 44

9 B. Find the size of the database Before we take a backup of the database, it would be a good idea to know how much room we will need. (It might also be useful to know the size of the database in order to ensure that there is enough available space to grow on the storage path where it resides.) A quick and dirty way to determine the size of the database might be to use the du (disk usage) unix command with the Local database directory path returned by the db2 LIST DB DIRECTORY command used above. However, this might give an inaccurate picture as this path includes files which will not be included in a database backup and may be missing other paths which might be included in the backup. The more accurate way to determine the amount of storage used by the database (as well as the capacity remaining on the storage where it resides), is to use a built-in DB2 stored procedure, called GET_DBSIZE_INFO(). You can use this procedure on both the BIDB and BIGSQL databases. [catalog@testmg1 ~]$ db2 CONNECT TO bidb Database Connection Information Database server = DB2/LINUXX SQL authorization ID = CATALOG Local database alias = BIDB [catalog@testmg1 ~]$ db2 "CALL GET_DBSIZE_INFO(?,?,?, -1)" Value of output parameters Parameter Name : SNAPSHOTTIMESTAMP Parameter Value : Parameter Name : DATABASESIZE Parameter Value : Parameter Name : DATABASECAPACITY Parameter Value : Return Status = 0 [catalog@testmg1 ~]$ db2 terminate DB20000I The TERMINATE command completed successfully. The value returned for the DATABASESIZE parameter is in bytes. You can read more about this stored procedure at 01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.sql.rtn.doc/do c/r html?cp=ssepgg_10.5.0%2f &lang=en Remember to issue the db2 TERMINATE command to close your connection to the database (otherwise, further steps below may not work as expected). Page 9 / 44

10 C. [Optional] Free up space by purging old oozie log records The workflow component of BigInsights (Hadoop), called oozie, stores its logs inside the Catalog database. Oozie already provides a mechanism for automatic purges of this log, but the default is to keep 180 days worth. Since we will be making multiple copies of this database in the form of regular backups, it would be prudent to keep the database as small as possible by keeping fewer log records. This is done by changing the oozie.service.purgeservice.older.than property in the $BIGINSIGHTS_HOME/hdm/components/oozie/conf/oozie-site.xml configuration file. For example: <property> <name>oozie.service.purgeservice.older.than</name> <value>30</value> <description> Jobs older than this value, in days, will be purged by the PurgeService. </description> </property> Then propagate the change as user biadmin by running: syncconf.sh oozie You can check the current records in the oozie job log with the command: $BIGINSIGHTS_HOME/oozie/bin/oozie jobs Find more details on this topic see 01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.bigin sights.install.doc/doc/upgr_prep_upgrade_tasks.html%23upgr_prep_upgrade_tasks cl eanupjobs?lang=en. To further save on space, you can use the DB2 REORG command to recover space left over from the rows which were deleted (purged) from the tables (this is similar to defragmenting). Here is a script you can create and an example of how to run it as user catalog. [root@testmg1 ~]# su - catalog [catalog@testemg1 ~]$ vi oozie_reorg.sql [catalog@testemg1 ~]$ cat oozie_reorg.sql CONNECT TO bidb; REORG TABLE oozie.coord_actions; REORG TABLE oozie.coord_jobs; REORG TABLE oozie.sla_events; REORG TABLE oozie.wf_actions; REORG TABLE oozie.wf_jobs; TERMINATE; [catalog@testemg1 ~]$ db2 -tvf oozie_reorg.sql Page 10 / 44

11 III. Performing Offline Backups If you can allow yourself a maintenance window to bring the BigInsights system offline, then this is certainly the easiest method to perform backups. You will need to be sure to bring the system down on a regular basis in order to keep the backups up to date. If you plan on implementing online backups and/or an HADR solution, then you can skip this section for now and move on to <IV-Enabling ONLINE Backups>. In any case, you will return to this section to perform an offline backup after making some configuration changes. There are many options for performing a DB2 backup which will not be discussed here, such as choosing a target devices for your backup (tape, TSM or third party backup managers). Since we need to keep the content of both metadata databases in synch, we will backup both of them while the BigInsights system is down. A. Stop BigInsights, but restart the DB2 instances Stopping all of the BigInsights components will disconnect any connections to the databases, either by users or made internally by other BigInsights component. Once that completes, we will need to restart the DB2 instances to allow us to perform some commands, but there shouldn t be any connections to the databases. 1. Stop BigInsights Note that this is performed as the biadmin user. [root@testmg1 ~]# su - biadmin [biadmin@testmg1 ~]$ stop.sh all... [INFO] Progress - 100% [INFO] DeployManager - Stop; SUCCEEDED components: [alert, httpfs, console, oozie, bigsql, hive, hbase, catalog, hadoop, zookeeper, hdm]; Consumes : 81863ms 2. Restart the catalog instance This is done as the catalog user. [root@testmg1 ~]# su - catalog [catalog@testmg1 ~]$ db2start SQL1063N DB2START processing was successful. Page 11 / 44

12 3. Restart the bigsql instance This is done as the bigsql user. Note that there will be a response from each of the Big SQL Worker nodes (data nodes). [root@testmg1 ~]# su - bigsql [bigsql@testmg1 ~]$ db2start 01/28/ :57: SQL1063N DB2START processing was successful. 01/28/ :57: SQL1063N DB2START processing was successful. 01/28/ :57: SQL1063N DB2START processing was successful. 01/28/ :57: SQL1063N DB2START processing was successful. 01/28/ :57: SQL1063N DB2START processing was successful. 01/28/ :57: SQL1063N DB2START processing was successful. SQL1063N DB2START processing was successful. B. Perform an OFFLINE database backup In our example, we will backup the database to a path on a directly attached disk which is mounted at /media/data1. Be sure to choose a path on a storage device which has sufficient space for multiple database backups. Be sure to perform these steps after <III.A-Stop BigInsights, but restart the DB2 instances>. 1. Create the backup target directory for BIDB We will create the DB2Backups directory as the catalog user, but then give the biadmin group read-write permissions to it. This is so we can use the same directory for the bigsql user, who is in the same group. [root@testmg1 ~]# su - catalog [catalog@testmg1 ~]$ mkdir -p /media/data1/db2backups/catalog [catalog@testmg1 ~]$ chmod g+rw /media/data1/db2backups [catalog@testmg1 ~]$ ls -ld /media/data1/db2backups/ drwxrwxr-x 3 catalog biadmin 4096 Jan 25 16:55 /media/data1/db2backups/ 2. Backup the BIDB database We are now ready to backup the database. This should only take a few seconds (8 seconds in our test). [catalog@testmg1 ~]$ db2 BACKUP DB bidb TO /media/data1/db2backups/catalog Backup successful. The timestamp for this backup image is : Page 12 / 44

13 3. [Optional] Use the COMPRESS argument You might be interested in using another backup option to compress the backup image. Purely to demonstrate the compression, we ll take another backup with the COMPRESS option. ~]$ db2 BACKUP DB bidb TO /media/data1/db2backups/catalog COMPRESS Backup successful. The timestamp for this backup image is : [catalog@testmg1 ~]$ ls -l /media/data1/db2backups/catalog total rw catalog biadmin Jan 25 17:22 BIDB.0.catalog.DBPART rw catalog biadmin Jan 25 17:25 BIDB.0.catalog.DBPART We see here that the compressed backup is about 9% the size of the non-compressed backup. 4. Create the backup target directory for BIGSQL Unlike the catalog instance which resides on a single node, the bigsql instance is a multi-node implementation, residing on the worker/data nodes and the head/management node. So, we need to create a target backup directory on each node in the Big SQL cluster. DB2 provides the db2_all utility to execute a command on every node of the cluster. [bigsql@testmg1 ~]$ db2_all "mkdir -p /media/data1/db2backups/bigsql; chmod g+rw /media/data1/db2backups; ls -ld /media/data1/db2backups/bigsql" egrep v completed ok chmod: changing permissions of `/media/data1/db2backups': Operation not permitted drwxr-xr-x 2 bigsql biadmin 4096 Feb 4 16:20 /media/data1/db2backups/bigsql drwxr-xr-x 2 bigsql biadmin 4096 Feb 4 16:20 /media/data1/db2backups/bigsql drwxr-xr-x 2 bigsql biadmin 4096 Feb 4 16:20 /media/data1/db2backups/bigsql drwxr-xr-x 2 bigsql biadmin 4096 Feb 4 16:20 /media/data1/db2backups/bigsql drwxr-xr-x 2 bigsql biadmin 4096 Feb 4 16:20 /media/data1/db2backups/bigsql drwxr-xr-x 2 bigsql biadmin 4096 Feb 4 16:20 /media/data1/db2backups/bigsql The grep used here is just to shorten the output a bit. (You can ignore the chmod error, since that directory was created by catalog on the head node in the previous step.) Page 13 / 44

14 5. Backup the BIGSQL database Backing up the BIGSQL database is almost the same as for the BIDB database, but the backup will actually take place in parallel on each node. ~]$ db2 BACKUP DB bigsql ON ALL DBPARTITIONNUMS TO /media/data1/db2backups/bigsql Part Result DB20000I The BACKUP DATABASE command completed successfully DB20000I The BACKUP DATABASE command completed successfully DB20000I The BACKUP DATABASE command completed successfully DB20000I The BACKUP DATABASE command completed successfully DB20000I The BACKUP DATABASE command completed successfully DB20000I The BACKUP DATABASE command completed successfully. Backup successful. The timestamp for this backup image is : [bigsql@testmg1 ~]$ db2_all "ls -l /media/data1/db2backups/bigsql" egrep -v "total completed ok" -rw bigsql biadmin Jan 28 20:27 BIGSQL.0.bigsql.DBPART rw bigsql biadmin Jan 28 20:27 BIGSQL.0.bigsql.DBPART rw bigsql biadmin Jan 28 20:27 BIGSQL.0.bigsql.DBPART rw bigsql biadmin Jan 28 20:27 BIGSQL.0.bigsql.DBPART rw bigsql biadmin Jan 28 20:27 BIGSQL.0.bigsql.DBPART rw bigsql biadmin Jan 28 20:27 BIGSQL.0.bigsql.DBPART Notice that the size of the backup on the first node is considerable larger than the other nodes. This is because the head node is where all of the metadata resides, while no real data resides within DB2 on the other nodes (the data resides in HDFS). (The backup image size of 32 MB on the other nodes can be accounted for as the space which gets pre-allocated for a database.) C. Restart BigInsights If you will be sticking with an OFFLINE backup solution, then you can restart BigInsights now. If you plan on continuing with any of the further steps, then keep BigInsights in the down state and skip this step. Restart all BigInsights components using the biadmin user. [root@testmg1 ~]# su - biadmin [biadmin@testmg1 ~]$ start.sh all... [INFO] Progress - 100% [INFO] DeployManager - Start; SUCCEEDED components: [hdm, zookeeper, hadoop, catalog, hbase, hive, bigsql, oozie, console, httpfs, alert]; Consumes : ms Page 14 / 44

15 IV. Enabling ONLINE Backups Enabling ONLINE backups will allow you to do any of the following: Perform online backups manually or with an automated script (e.g., using cron). Configure DB2 to maintain automatic online backups. Configure DB2 High Availability Disaster Recovery (HADR). Recovery to a point-in-time AFTER the latest backup. In order for DB2 to allow online backups, it must be configured to assure that in-flight transactions occurring during a backup are always saved to an archive transaction log location. The default mechanism for log transaction files is called circular logging, which means that completed log files are reused (overwritten). For archive logging, completed log files are copied to a safe location in case they are needed later for recovery. Aside from allowing online backups, archive logging will also allow recovery (restore) to any point-in-time which can be found in the logs. Remember that the actual data in a Big SQL and Hive database reside in Hadoop s HDFS, which doesn t log transactional changes to DB2 s log files. So, the transactions which will be written to DB2 s log files will revolve entirely around changes to metadata. We can expect the volume of metadata changes to be quite small, certainly in relation to the entire Big Data environment. Once we enable archive logging, we will be able to perform online backups, but only after completing an initial offline backup. So, if you haven t already done so, start by bringing down BigInsights, by following the steps in <III.A-Stop BigInsights, but restart the DB2 instances>. A. Enable ARCHIVE logging Let s use the catalog user to inspect the current configuration parameters relevant to logging (you can do this for bigsql, as well). [root@testmg1 ~]# su - catalog [catalog@testmg1 ~]$ db2 get db cfg for bidb egrep 'Path to log files LOGARCHMETH1' Path to log files = /var/ibm/biginsights/database/db2/catalog/node0000/sql00001/logstream0000/ First log archive method (LOGARCHMETH1) = OFF Here we see that active logs are being written to the long path under the /var path (the output has wrapped to the next line) and that archive logging is currently OFF, which is the default state. For this example, we ve chosen to keep archived logs on a local filesystem (by using the DISK keyword, below). One of the implications of this is that we should consider how to regularly remove unneeded log files from that path. An automated method to do this is explained in <VII-Purge Old Backups and Archive Logs>. Page 15 / 44

16 1. Enable ARCHIVE logging for BIDB After creating a directory for the archived logs on another filesystem (and giving it group read-write access, as before) we can activate archive logging to use it. [catalog@testmg1 ~]$ mkdir /media/data1/db2archivelogs [catalog@testmg1 ~]$ chmod g+rw /media/data1/db2archivelogs [catalog@testmg1 ~]$ db2 UPDATE DB CFG FOR bidb USING LOGARCHMETH1 'DISK:/media/data1/DB2ArchiveLogs' DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. [catalog@testmg1 ~]$ db2 get db cfg for bidb egrep 'Path to log files LOGARCHMETH1' Path to log files = /var/ibm/biginsights/database/db2/catalog/node0000/sql00001/logstream0000/ First log archive method (LOGARCHMETH1) = DISK:/media/data1/DB2ArchiveLogs/ 2. Enable ARCHIVE logging for BIGSQL Once again, we will create the log archive directory on each node of the cluster (ignoring the error that it already exists on the local node). [bigsql@testmg1 ~]$ db2_all mkdir /media/data1/db2archivelogs; chmod g+rw /media/data1/db2archivelogs" egrep -v "completed ok" mkdir: cannot create directory `/media/data1/db2archivelogs': File exists chmod: changing permissions of `/media/data1/db2archivelogs': Operation not permitted [catalog@testmg1 ~]$ db2 UPDATE DB CFG FOR bigsql USING LOGARCHMETH1 'DISK:/media/data1/DB2ArchiveLogs' DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. The DB2 command applies the change to the database as a whole, despite the instance residing on multiple nodes. Although we used the same path to DB2ArchiveLogs for both databases, BIDB and BIGSQL, the logging mechanism will keep them separate by creating subdirectories for each instance and database. B. [Optional] Further changes to log files Since we re making some changes to database log file parameters, there is another parameter which we recommend to modify while the system is already stopped. The modification is not related to backups, but rather to the ability to perform larger transactions (changes to the metadata database). It has been observed that some Big SQL commands which modify a large quantity of metadata in the Hive Metastore may report that the catalog BIDB database transaction log has been exhausted. The default size of the transaction logs allow for approximately 100 MB worth of transaction log records. This is calculated by looking at the log file size and the number of primary and secondary log files allowed in a single transaction. [catalog@testmg1 ~]$ db2 get db cfg for bidb egrep 'LOGFILSIZ LOGPRIMARY LOGSECOND' Log file size (4KB) (LOGFILSIZ) = 1024 Number of primary log files (LOGPRIMARY) = 13 Number of secondary log files (LOGSECOND) = 12 [catalog@testmg1 ~]$ echo "1024 * 4096 * ( ) / 1024^2" bc 100 Page 16 / 44

17 By increasing the value of LOGSECOND, we can allow the size of a transaction to grow only when it s absolutely needed, without the need to pre-allocate that space (LOGPRIMARY indicates pre-allocated log files). [catalog@testmg1 ~]$ db2 update db cfg for bidb using LOGSECOND 100 DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. (We increased the capacity here by about 4.5 times.) C. [Optional] Capping diagnostic log size (DIAGSIZE) Another configuration change you might want to make while BigInsights is stopped, which can simplify DB2 maintenance, is to cap the size of the DB2 diagnostic logs. By default, all diagnostic messages are written to files db2diag.log and <instance>.nfy located in the sqllib/db2dump directory under the instance home directory. Unchecked, these files can grow indefinitely. To avoid the need to occasionally truncate the log files, you can set the DIAGSIZE configuration parameter to the number of megabytes you are willing to allocate for them. You can do this for both the catalog and bigsql instances (users). [root@testmg1 ~]# su - catalog [catalog@testmg1 ~]$ db2 GET DBM CFG grep DIAGSIZE Size of rotating db2diag & notify logs (MB) (DIAGSIZE) = 0 [catalog@testmg1 ~]$ db2 UPDATE DBM CFG USING diagsize 1024 DB20000I The UPDATE DATABASE MANAGER CONFIGURATION command completed successfully. You can read more about this parameter at 01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.config. doc/doc/r html?cp=ssepgg_10.5.0%2f &lang=en. D. Perform the initial OFFLINE backup Once archive logging is configured, you are required to perform an initial offline backup in order to establish a starting point for your database recoverability. If you try to connect to the database now, you will see the following message: [catalog@testmg1 ~]$ db2 CONNECT TO bidb SQL1116N A connection to or activation of database "BIDB" failed because the database is in BACKUP PENDING state. SQLSTATE=57019 Follow the steps in <III.B-Perform an OFFLINE database backup> to perform the offline backup for both metadata databases, BIDB and BIGSQL. Page 17 / 44

18 After the offline backup, the CONNECT statement should succeed. Rather than wait for completed log files to be archived in due course you can test the archive logging as seen below: ~]$ db2 BACKUP DB bidb TO /media/data1/db2backups/catalog Backup successful. The timestamp for this backup image is : [catalog@testmg1 ~]$ db2 CONNECT TO bidb Database Connection Information Database server = DB2/LINUXX SQL authorization ID = CATALOG Local database alias = BIDB [catalog@testmg1 ~]$ db2 terminate DB20000I The TERMINATE command completed successfully. [catalog@testmg1 ~]$ db2 ARCHIVE LOG FOR DB bidb DB20000I The ARCHIVE LOG command completed successfully. [catalog@testmg1 ~]$ find /media/data1/db2archivelogs/ /media/data1/db2archivelogs/ /media/data1/db2archivelogs/catalog /media/data1/db2archivelogs/catalog/bidb /media/data1/db2archivelogs/catalog/bidb/node0000 /media/data1/db2archivelogs/catalog/bidb/node0000/logstream0000 /media/data1/db2archivelogs/catalog/bidb/node0000/logstream0000/c Note the subdirectories for instance and database that were created under the archive log directory. E. [Optional] Restart BigInsights If you wish, you can follow the steps in <III.C-Restart BigInsights>. But if you plan on continuing with any of the further steps (we will soon try restoring from a backup), then keep BigInsights in the down state. To verify that backups can now be performed online, you can activate the database, which is similar to establishing a connection to it. [catalog@testmg1 ~]$ db2 ACTIVATE DB bidb DB20000I The ACTIVATE DATABASE command completed successfully. (The reason we don t simply establish a connection as before is that the backup command in the next step would simply close it if run from the same session. So this would not really be a valid test of our online backup.) F. Perform an ONLINE backup As the catalog user, perform an online backup. Note, this will be necessary for setting up HADR. [catalog@testmg1 ~]$ db2 BACKUP DB bidb ONLINE TO /media/data1/db2backups/catalog Backup successful. The timestamp for this backup image is : Page 18 / 44

19 And similarly, as the bigsql user: ~]$ db2 ACTIVATE DB bigsql DB20000I The ACTIVATE DATABASE command completed successfully. ~]$ db2 BACKUP DB bigsql ON ALL DBPARTITIONNUMS ONLINE TO /media/data1/db2backups/bigsql Part Result DB20000I The BACKUP DATABASE command completed successfully DB20000I The BACKUP DATABASE command completed successfully DB20000I The BACKUP DATABASE command completed successfully DB20000I The BACKUP DATABASE command completed successfully DB20000I The BACKUP DATABASE command completed successfully DB20000I The BACKUP DATABASE command completed successfully. Backup successful. The timestamp for this backup image is : Page 19 / 44

20 V. Setup Automatic Online Backups At this point you can write your own scripts to perform an online backup and perhaps automate them using cron. But DB2 comes with a built-in mechanism for automating backups, which is part of the DB2 Health Monitor. This monitor will wake up approximately every two hours to check if criteria has been met to initiate an online backup of the database. When setup for both the BIDB and BIGSQL database, these automatic backups can be seen as a safety feature to assure that you have a backup at least when the criteria has been met. However, since it s important to keep the content of both databases in synch, you might want to regularly perform a backup on both databases, or whenever major changes are made to Big SQL metadata. You can read more about the automatic online backup mechanism from 01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc/doc/c html?cp=SSEPGG_10.5.0%2F &lang=en (and its sub-topics). These instructions are for the BIDB database, but you can do the same for BIGSQL. A. Define an automatic backup policy Before enabling the automatic backup mechanism, we will establish a backup policy by providing an XML file to the AUTOMAINT_SET_POLICYFILE() stored procedure. It is recommended to use the provided sample XML file as a starting point. Note that the sample file has only read permissions enabled, so we will have to enable our copy for writing. We will work in the sqllib/tmp directory as this is the default location where the procedure looks for the XML file. [catalog@testmg1 ~]$ cd sqllib/tmp [catalog@testmg1 tmp]$ cp../samples/automaintcfg/db2autobackuppolicysample.xml bidb_db2autobackuppolicy.xml [catalog@testmg1 tmp]$ chmod +w bidb_db2autobackuppolicy.xml [catalog@testmg1 tmp]$ vi bidb_db2autobackuppolicy.xml The sample XML file has many comments, so here is a diff showing the only two changes which we ve made. [catalog@testmg1 tmp]$ diff../samples/automaintcfg/db2autobackuppolicysample.xml bidb_db2autobackuppolicy.xml 64c64 < <PathName/> --- > <PathName>/media/data1/DB2Backups/catalog</PathName> 117c117 < <BackupCriteria numberoffullbackups="1" timesincelastbackup="168" logspaceconsumedsincelastbackup="6400" /> --- > <BackupCriteria numberoffullbackups="1" timesincelastbackup="168" logspaceconsumedsincelastbackup="6144" /> Here is the full block of the first change: <BackupOptions mode="online" > <BackupTarget> <DiskBackupTarget> <PathName>/media/data1/DB2Backups/catalog</PathName> </DiskBackupTarget> This block indicate that we have chosen to perform online backups to disk, and rather than use the default target location (located on the same storage as the database itself), Page 20 / 44

21 we will use the path that we created earlier on a different filesystem (with plenty of room). The second change (on the line with BackupCriteria) tells DB2 to perform the backup in any of the following conditions: There must be at least 1 full backup. Perform a backup at least once a week (168 hours). Perform a backup if at least six log files (6 x 1024) have been used (indicating changes to the database). (See in an earlier section that each log file is 1024 pages of 4096 bytes.) Now connect to the BIDB database. After using the AUTOMAINT_GET_POLICYFILE() stored procedure to save the current policy to a file prefixed with orig_, use AUTOMAINT_SET_POLICYFILE() to load the new policy. Note, the policy will be saved without all of the comments of the XML file. As mentioned, the default location of the XML file referenced by the stored procedures is the ~/sqllib/tmp directory. [catalog@testmg1 ~]$ cd sqllib/tmp [catalog@testmg1 tmp]$ db2 CONNECT TO bidb Database Connection Information Database server = DB2/LINUXX SQL authorization ID = CATALOG Local database alias = BIDB [catalog@testmg1 tmp]$ db2 "call sysproc.automaint_get_policyfile( 'AUTO_BACKUP', 'orig_db2autobackuppolicy.xml')" Return Status = 0 [catalog@testmg1 tmp]$ db2 "call sysproc.automaint_set_policyfile( 'AUTO_BACKUP', 'bidb_db2autobackuppolicy.xml')" Return Status = 0 B. Enable the automatic backup policy Check out the current value of the AUTO_DB_BACKUP configuration parameter, which is part of a series of automated maintenance parameters. Make sure both of these parameters are turned ON. [catalog@testmg1 ~]$ db2 GET DB CFG for bidb egrep 'AUTO_MAINT AUTO_DB_BACKUP' Automatic maintenance (AUTO_MAINT) = ON Automatic database backup (AUTO_DB_BACKUP) = OFF [catalog@testmg1 ~]$ db2 UPDATE DB CFG FOR bidb USING AUTO_DB_BACKUP ON DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. Page 21 / 44

22 C. Check for the automatic backups Instead of just periodically checking the target backup path for the appearance of new backup images, you can use the db2diag command to search the DB2 diagnostic log for messages from the automatic backup facility. (This is just a sample of the filtering that can be done with the db2diag command. This method is preferred to simply using grep on the db2diag.log file, since the entire block/entry is displayed.) [catalog@testmg1 ~]$ db2diag -g "PROC=db2acd,FUNCTION:=hmonBkpBackupDBOnline" more I E431 LEVEL: Event PID : TID : PROC : db2acd INSTANCE: catalog NODE : 000 DB : BIDB APPID : *LOCAL.catalog HOSTNAME: testmg1.iic.il.ibm.com FUNCTION: DB2 UDB, Health Monitor, hmonbkpbackupdbonline, probe:500 START : Automatic job "Backup database online" has started on database BIDB, alias BIDB I E381 LEVEL: Event PID : TID : PROC : db2acd INSTANCE: catalog NODE : 000 HOSTNAME: testmg1.iic.il.ibm.com FUNCTION: DB2 UDB, Health Monitor, hmonbkpbackupdbonline, probe:530 STOP : Automatic job "Backup database online" has completed successfully on database BIDB, alias BIDB I E431 LEVEL: Event PID : TID : PROC : db2acd INSTANCE: catalog NODE : 000 DB : BIDB APPID : *LOCAL.catalog HOSTNAME: testmg1.iic.il.ibm.com FUNCTION: DB2 UDB, Health Monitor, hmonbkpbackupdbonline, probe:500 START : Automatic job "Backup database online" has started on database BIDB, alias BIDB I E381 LEVEL: Event PID : TID : PROC : db2acd INSTANCE: catalog NODE : 000 HOSTNAME: testmg1.iic.il.ibm.com FUNCTION: DB2 UDB, Health Monitor, hmonbkpbackupdbonline, probe:530 STOP : Automatic job "Backup database online" has completed successfully on database BIDB, alias BIDB This output shows two backups, two hours apart, but that might be unusual in your environment. (In order to demonstrate this feature, we purposely created a lot of bogus metadata to force a new backup.) Page 22 / 44

23 VI. Restore the Databases from a Backup It s always a good idea to test your recovery preparedness before disaster strikes. So, let s restore our databases from the backup images which we ve taken. This should be done after completing <IV.F-Perform an ONLINE backup>, or at least <III.B- Perform an OFFLINE database backup>. Verify that you have a current backup by looking in the DB2Backups sub-directories. A. Stop BigInsights If you ve started BigInsights, refer to <III.A-Stop BigInsights, but restart the DB2 instances>. B. Restore the BIDB database If restoring to a new server which didn t have this database before, you would follow the preparation steps described in <VIII.A-[Optional] Install BigInsights on the standby server>, <VIII.B-Install DB2 on the standby server> (which also creates the catalog instance) and <VIII.C.1-Create directories needed for the database>. But the examples here will assume that you are replacing an existing database with the one from the backup image. Choose the latest backup image from the DB2Backups/catalog directory and use the timestamp found in the second-to-last portion of the name to identify it with the TAKEN AT option of the RESTORE command. [catalog@testmg1 ~]$ ls -l /media/data1/db2backups/catalog total rw catalog biadmin Jan 29 17:56 BIDB.0.catalog.DBPART rw catalog biadmin Feb 4 14:13 BIDB.0.catalog.DBPART rw catalog biadmin Feb 4 14:14 BIDB.0.catalog.DBPART [catalog@testmg1 ~]$ db2 RESTORE DB bidb FROM /media/data1/db2backups/catalog TAKEN AT REPLACE EXISTING SQL2539W The specified name of the backup image to restore is the same as the name of the target database. Restoring to an existing database that is the same as the backup image database will cause the current database to be overwritten by the backup version. DB20000I The RESTORE DATABASE command completed successfully. (If not replacing an existing database, you would want to replace the REPLACE EXISTING option with LOGTARGET DEFAULT. This extracts the transaction log which was active during the backup and copies it to the default location for active logs.) Page 23 / 44

24 If the backup chosen was an OFFLINE backup, in other words, without archive logging enabled, then the database would now be available for connections. However, if the image was of an ONLINE backup, then the database is now in what is called a rollfoward pending state. This allows you to now rollforward through all of the archive logs taken since the backup was taken, in order to bring it fully up to date (to the latest point-intime). Here is how to check for the rollfoward pending state and perform the ROLLFORWARD. [catalog@testmg1 ~]$ db2 GET DB CFG FOR bidb grep Rollforward Rollforward pending = DATABASE [catalog@testmg1 ~]$ db2 ROLLFORWARD DB bidb TO END OF LOGS AND STOP Rollforward Status Input database alias = bidb Number of members have returned status = 1 Member ID = 0 Rollforward status = not pending Next log file to be read = Log files processed = S LOG - S LOG Last committed transaction = UTC DB20000I The ROLLFORWARD command completed successfully. [catalog@testmg1 ~]$ db2 GET DB CFG FOR bidb grep Rollforward Rollforward pending = NO You can investigate on your own how to use the ROLLFORWARD command to select a different point-in-time. Page 24 / 44

25 C. Restore the BIGSQL database As with backing up, the restore procedure is slightly different for the BIGSQL database since it exists on multiple nodes. With a database of this sort, it s necessary to first restore the data on the management node and then the rest of the nodes can be restored in parallel. [bigsql@testmg1 ~]$ db2 RESTORE DB bigsql FROM /media/data1/db2backups/bigsql TAKEN AT REPLACE EXISTING SQL2539W The specified name of the backup image to restore is the same as the name of the target database. Restoring to an existing database that is the same as the backup image database will cause the current database to be overwritten by the backup version. DB20000I The RESTORE DATABASE command completed successfully. [bigsql@testmg1 ~]$ db2_all "<<-0<; db2 RESTORE DB bigsql FROM /media/data1/db2backups/bigsql TAKEN AT REPLACE EXISTING" grep 'The RESTORE DATABASE command' testdn1.iic.il.ibm.com: DB20000I The RESTORE DATABASE command completed successfully. testdn2.iic.il.ibm.com: DB20000I The RESTORE DATABASE command completed successfully. testdn4.iic.il.ibm.com: DB20000I The RESTORE DATABASE command completed successfully. testdn5.iic.il.ibm.com: DB20000I The RESTORE DATABASE command completed successfully. testdn3.iic.il.ibm.com: DB20000I The RESTORE DATABASE command completed successfully. [bigsql@testmg1 ~]$ db2 ROLLFORWARD DB bigsql TO END OF LOGS ON ALL DBPARTITIONNUMS AND STOP Rollforward Status Input database alias = bigsql Number of members have returned status = 6 Member ID Rollforward Next log Log files processed Last committed transaction status to be read not pending S LOG-S LOG UTC 1 not pending S LOG-S LOG UTC 2 not pending S LOG-S LOG UTC 3 not pending S LOG-S LOG UTC 4 not pending S LOG-S LOG UTC 5 not pending S LOG-S LOG UTC DB20000I The ROLLFORWARD command completed successfully. The db2_all utility was used here again, but with a new notation before the actual DB2 RESTORE command. The notation starts with <<-0<, which means to run on all nodes EXCEPT node 0 (the management node), and then is followed by a semicolon, which means that the command should be run in parallel. (Once again, grep was used to shorten the output displayed.) D. Restart BigInsights If you will NOT be continuing to the setup of HADR, refer to <III.C-Restart BigInsights>. Page 25 / 44

26 VII. Purge Old Backups and Archive Logs Hopefully you ve decided to set up archive logging and enable automatic online backups (and/or established your own automated backup scripts). However, left unchecked, the logs and backups will now continue to fill up your storage. Proper maintenance dictates that you should now set up an automated purging mechanism to delete unneeded logs and backups. Care should be taken, however, not to delete archive logs which can be used by ROLLFORWARD after a database RESTORE. On the other hand, there s no need to keep around archive logs which are older than your oldest backup image. Fortunately, DB2 provides an automated purging mechanism which can make all these decisions for you (Best Practices). Before setting that up, we ll show you how to retrieve the relevant information on your backups and archive logs (should you decide to write your own purge mechanism). A. Using the LIST HISTORY command You can get a report of the BACKUP and RESTORE commands which you ve issued using the LIST HISTORY command as follows (this has been truncated): [catalog@testmg1 ~]$ db2 LIST HISTORY BACKUP ALL FOR bidb List History File for bidb Number of matching file entries = 11 Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID B D F D S LOG S LOG Contains 3 tablespace(s): SYSCATSPACE USERSPACE SYSTOOLSPACE Comment: DB2 BACKUP BIDB OFFLINE Start Time: End Time: Status: A EID: 124 Location: /media/data1/db2backups/catalog Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID B D N D S LOG S LOG Contains 3 tablespace(s): SYSCATSPACE USERSPACE SYSTOOLSPACE Comment: DB2 BACKUP BIDB ONLINE Start Time: End Time: Status: A EID: 218 Location: /media/data1/db2backups/catalog (You might be interested in a similar command, LIST HISTORY ARCHIVE LOG.) Page 26 / 44

27 B. Using the DB_HISTORY view Another way to display the history, which allows you to be more selective of how and what information is shown, is to connect to the database and query an administrative view called DB_HISTORY. ~]$ db2 CONNECT TO bidb Database Connection Information Database server = DB2/LINUXX SQL authorization ID = CATALOG Local database alias = BIDB [catalog@testmg1 ~]$ db2 "SELECT CHAR( operation, 1) oper, CHAR( operationtype, 1) type, start_time, num_log_elems, VARCHAR( firstlog, 12) firstlog, VARCHAR( lastlog, 12) lastlog FROM sysibmadm.db_history WHERE objecttype = 'D' AND operation = 'B' ORDER BY start_time" OPER TYPE START_TIME NUM_LOG_ELEMS FIRSTLOG LASTLOG B F S LOG S LOG B F S LOG S LOG B F S LOG S LOG B N S LOG S LOG B N S LOG S LOG B F S LOG S LOG B N S LOG S LOG B N S LOG S LOG B N S LOG S LOG B N S LOG S LOG B N S LOG S LOG 11 record(s) selected. C. Setup automatic purge DB2 provides a mechanism to allow you to establish how many backup images you wish to retain. This mechanism is also aware of which archive logs are relevant to the retained backup images. When a new backup is taken, it automatically purges the oldest image, its relevant archive logs, and even cleans out the records from the history tracking file. Here are the relevant configuration parameters and their default values: [catalog@testmg1 ~]$ db2 GET DB CFG FOR bidb egrep 'AUTO_DEL_REC_OBJ REC_HIS_RETENTN NUM_DB_BACKUPS' Number of database backups to retain (NUM_DB_BACKUPS) = 12 Recovery history retention (days) (REC_HIS_RETENTN) = 366 Auto deletion of recovery objects (AUTO_DEL_REC_OBJ) = OFF Page 27 / 44

28 Let s change these parameters to allow automatic purges and retain only the last three backups (and their relevant archived logs). [catalog@testmg1 ~]$ db2 UPDATE DB CFG FOR bidb USING num_db_backups 3 rec_his_retentn 0 auto_del_rec_obj ON DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. SQL1363W One or more of the parameters submitted for immediate modification were not changed dynamically. For these configuration parameters, the database must be shutdown and reactivated before the configuration parameter changes become effective. (You will see this warning message if you had opted to start BigInsights. In that case, follow the steps in <III.A-Stop BigInsights, but restart the DB2 instances> in order for the changes to take effect.) For a full description of this feature, see Automatic database recovery object maintenance at 01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc /doc/t html?lang=en. After performing a new full online database backup, we can see that the purge has taken place, retaining only the newest backup and last 2 of the 11 backups taken previously, together with only 7 of the 86 previously archived log files. [catalog@testmg1 ~]$ db2 LIST HISTORY BACKUP ALL FOR bidb head -5 List History File for bidb Number of matching file entries = 11 [catalog@testmg1 ~]$ ls -l /media/data1/db2backups/catalog/bidb* wc l 11 [catalog@testmg1 ~]$ find /media/data1/db2archivelogs/catalog -name \*.LOG wc l 86 [catalog@testmg1 ~]$ db2 BACKUP DB bidb ONLINE TO /media/data1/db2backups/catalog Backup successful. The timestamp for this backup image is : [catalog@testmg1 ~]$ db2 LIST HISTORY BACKUP ALL FOR bidb head -5 List History File for bidb Number of matching file entries = 3 [catalog@testmg1 ~]$ ls -l /media/data1/db2backups/catalog/bidb* wc l 3 [catalog@testmg1 ~]$ find /media/data1/db2archivelogs/catalog -name \*.LOG wc l 7 Page 28 / 44

29 VIII. Redundancy of the Catalog with HADR The DB2 High Availability Disaster Recovery (HADR) mechanism provides a redundancy solution where a standby database can be located on a remote server and kept up-to-date with all the changes made to the primary database. We ll demonstrate how this can be setup with just a few steps, but we won t discuss all the features (such as the synchronization modes; sync, nearsync, async and superasync). At any time, the standby database can become the active database, while the primary will take the role of standby. If applications try to connect to the standby, they will automatically be redirected to the primary database (using Automatic Client Reroute). While it is possible to configure a mechanism to automatically force the standby to takeover control when a failure is detected at the primary server (using Tivoli System Automation (TSA), which is included with DB2), setting that up is beyond the scope of this guide. As mentioned in the Introduction, we will only demonstrate implementing HADR for the Catalog database, BIDB (containing the Hive Metastore). This is because a special HA feature is currently being worked on for the BIGSQL database, which will be released in an upcoming version of BigInsights. (This guide was written for BigInsights v3.0.) When planning your BigInsights Hadoop cluster, one or more nodes should be set aside for High Availability or standby purposes, whether for the Catalog or other BigInsights components. It is NOT recommended to configure your HADR standby to be on one of the data nodes (a.k.a. compute or worker node), as the additional activity can introduce an undesirable skew in your distributed cluster environment. In this section, commands which should be run on the standby server will be shown in a box with a black background. You can read more about HADR at 01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc/doc/t html?lang=en. A. [Optional] Install BigInsights on the standby server You can skip this step if BigInsights is already installed on your standby server, for example because it already serves another High Availability role in the cluster. If BigInsights is not installed, one simple way to install it is by using addnode.sh to expand the existing cluster to a new server and then removenode.sh to decommission it. This way you don t need to find your original installation image or worry about installation parameters which are irrelevant here. Even after the removenode.sh, the installation files, users and groups will all remain. Page 29 / 44

30 Notice we are using the biadmin user and adding a node called testmg2. (We recommend first setting up password-less SSH for the root user in both directions.) [root@testmg1 ~]# su - biadmin [biadmin@testemg1 ~]$ addnode.sh hadoop testmg2.iic.il.ibm.com... [INFO] Progress - 67% [INFO] Deployer - Adding hadoop slaves [testmg2.iic.il.ibm.com]... [INFO] DeployManager - Add hadoop nodes; SUCCEEDED components: [hadoop]; Consumes : ms [biadmin@testemg1 ~]$ removenode.sh hadoop testmg2.iic.il.ibm.com... [INFO] Progress - 100% [INFO] Deployer - Removed hadoop slaves [testmg2.iic.il.ibm.com]... [INFO] DeployManager - Remove hadoop nodes; SUCCEEDED components: [hadoop, monitoring]; Consumes : ms B. Install DB2 on the standby server Even if your HA server already had BigInsights installed on it, the DB2 installation files would not have been deployed, so we will copy them over from the primary management node. We can ensure that DB2 and the standby instance are setup exactly as on the primary by using the same response files which were used during the initial installation of BigInsights. A response file provides a set of automated responses to the db2setup utility. These commands are done on the standby server as the root user. 1. Copy the DB2 installation files to the standby server [root@testmg2 ~]# cd $BIGINSIGHTS_HOME/database [root@testmg2 database]# mkdir install [root@testmg2 database]# scp -qr testmg1:$biginsights_home/hdm/components/db2/binary install [root@testmg2 database]# scp -qr testmg1:$biginsights_home/hdm/components/db2/conf install [root@testmg2 database]# scp -qr testmg1:$biginsights_home/database/install/conf/catalogresponse.properties install/conf [root@testmg2 database]# cd install/binary [root@testmg2 binary]# tar -zxf db2.tar.gz [root@testmg2 binary]# rm db2.tar.gz rm: remove regular file `db2.tar.gz'? y [root@testmg2 binary]# cd.. [root@testmg2 install]# mkdir logs [root@testmg2 install]# chmod 777 logs [root@testmg2 install]# chown -R biadmin:biadmin. Page 30 / 44

31 2. Install DB2 using the response file This should be run from the $BIGINSIGHTS_HOME/database/install directory as the root user. (You can ignore these warnings.) install]# binary/db2setup -l logs/response.log -r conf/response.properties Requirement not matched for DB2 database "Server". Version: " ". Summary of prerequisites that are not met on the current system: DBT3514W The db2prereqcheck utility failed to find the following 32-bit library file: "/lib/libpam.so*". DBI1191I db2setup is installing and configuring DB2 according to the response file provided. Please wait. The execution completed successfully. For more information see the DB2 installation log at "/opt/ibm/biginsights/database/install/logs/response.log". 3. Create the catalog instance We will use db2setup with another response file to create the catalog instance. install]# binary/db2setup -l logs/catalogresponse.log -r conf/catalogresponse.properties Requirement not matched for DB2 database "Server". Version: " ". Summary of prerequisites that are not met on the current system: DBT3514W The db2prereqcheck utility failed to find the following 32-bit library file: "/lib/libpam.so*". DBI1191I db2setup is installing and configuring DB2 according to the response file provided. Please wait. The execution completed successfully. For more information see the DB2 installation log at "/opt/ibm/biginsights/database/install/logs/catalogresponse.log". install]#../db2/instance/db2ilist catalog The db2ilist command lists the available instances and confirms that the catalog instance has been created. Now let s add the DB2 licenses to the installed product. [root@testmg2 install]#../db2/adm/db2licm -a conf/db2aese_tb.lic LIC1402I License added successfully. LIC1426I This product is now licensed for use as outlined in your License Agreement. USE OF THE PRODUCT CONSTITUTES ACCEPTANCE OF THE TERMS OF THE IBM LICENSE AGREEMENT, LOCATED IN THE FOLLOWING DIRECTORY: "/opt/ibm/biginsights/database/db2/license/en_us.iso88591" [root@testmg2 install]#../db2/adm/db2licm -a conf/isfs.lic LIC1402I License added successfully. LIC1426I This product is now licensed for use as outlined in your License Agreement. USE OF THE PRODUCT CONSTITUTES ACCEPTANCE OF THE TERMS OF THE IBM LICENSE AGREEMENT, LOCATED IN THE FOLLOWING DIRECTORY: "/opt/ibm/biginsights/database/db2/license/en_us.iso88591" Page 31 / 44

32 C. Restore a copy of the database to the standby server This should be done after completing <IV.F-Perform an ONLINE backup> for the BIDB database on the primary server. (This procedure won t work with an OFFLINE backup.) 1. Create directories needed for the database Before restoring the BIDB database on the standby, we need to create all of the top level directories which it will rely on. [root@testmg2 install]# mkdir -p /var/ibm/biginsights/database/db2 [root@testmg2 install]# chown -R catalog:biadmin /var/ibm/biginsights/database [root@testmg2 install]# su - catalog [catalog@testmg2 ~]$ mkdir /media/data1/db2archivelogs [catalog@testmg2 ~]$ mkdir -p /media/data1/db2backups/catalog [catalog@testmg2 ~]$ chmod g+rw /media/data1/db2backups [catalog@testmg2 ~]$ exit The first directory created here should match the Local database directory as seen in the LIST DB DIRECTORY command described in <II.A-Find the instance and database names>. (You can ignore errors about directories that already exist.) 2. Restore the database Choose the latest ONLINE backup image of the BIDB database on the primary server and copy it to the same directory on the standby server. We used the root user here because it was simpler, as password-less SSH is configured for it, but make sure the backup image file is owned by catalog:biadmin. [root@testmg2 install]# cd /media/data1/db2backups/catalog [root@testmg2 catalog]# ssh testmg1 "ls -l /media/data1/db2backups/catalog" total rw catalog biadmin Jan 29 17:56 BIDB.0.catalog.DBPART rw catalog biadmin Feb 4 14:13 BIDB.0.catalog.DBPART rw catalog biadmin Feb 4 14:14 BIDB.0.catalog.DBPART [root@testmg2 catalog]# scp -qr testmg1:/media/data1/db2backups/catalog/bidb.0.catalog.dbpart [root@testmg2 catalog]# chown catalog:biadmin * Now, as user catalog, start the instance and restore the database. [root@testmg2 catalog]# su - catalog [catalog@testmg2 ~]$ db2start SQL1063N DB2START processing was successful. [catalog@testmg2 ~]$ db2 RESTORE DB bidb FROM /media/data1/db2backups/catalog TO /var/ibm/biginsights/database/db2 DB20000I The RESTORE DATABASE command completed successfully. [catalog@testmg2 ~]$ db2 GET DB CFG FOR bidb grep Rollforward Rollforward pending = DATABASE (Remember, if you have more than one backup image in the FROM directory, choose one using the TAKEN AT option, as described in <VI.B- Restore the BIDB database>.) Since we restored an ONLINE backup, the database is now in a rollforward pending state. Rather than use the ROLLFORWARD command as we demonstrated earlier, we will leave the database in this state so that HADR can feed it new transactions as they come in. Page 32 / 44

33 D. Configure HADR The configuration of HADR for the primary and standby roles is done by crisscrossing references of some parameters. We suggest creating the following two scripts to be run on each server. UPDATE DB CFG FOR bidb USING LOGINDEXBUILD ON UPDATE DB CFG FOR bidb USING LOGINDEXBUILD ON HADR_SYNCMODE SYNC HADR_SYNCMODE SYNC HADR_LOCAL_HOST testmg1 HADR_LOCAL_SVC HADR_REMOTE_HOST testmg2 HADR_REMOTE_SVC HADR_TARGET_LIST testmg2:50051 HADR_LOCAL_HOST testmg2 HADR_LOCAL_SVC HADR_REMOTE_HOST testmg1 HADR_REMOTE_SVC HADR_TARGET_LIST testmg1:50050 ; HADR_REMOTE_INST catalog ; HADR_REMOTE_INST catalog UPDATE ALTERNATE SERVER FOR DB bidb USING HOSTNAME testmg2 PORT ; TERMINATE; hadr_setup_primary.sql UPDATE ALTERNATE SERVER FOR DB bidb USING HOSTNAME testmg1 PORT ; TERMINATE; hadr_setup_standby.sql The values used for HADR_LOCAL_SVC and HADR_REMOTE_SVC are service port numbers. You should verify that those ports are available on your servers (start by looking in /etc/services), although these defaults are usually available. The ALTERNATE SERVER which is defined in the script allows a feature called Automatic Client Reroute (ACR) to redirect failed connections made to the primary server over to the standby server. It s defined on the standby server as well, in case their roles get reversed. 1. Stop BigInsights If you ve started BigInsights (on the primary server), refer to <III.A-Stop BigInsights, but restart the DB2 instances>. Page 33 / 44

34 2. Setup HADR on the primary and standby servers As user catalog on the primary server (head/management node), run the script as follows: ~]# su - catalog [catalog@testmg1 ~]$ db2 -tvf hadr_setup_primary.sql UPDATE DB CFG FOR bidb USING LOGINDEXBUILD ON HADR_SYNCMODE SYNC HADR_LOCAL_HOST testmg1 HADR_LOCAL_SVC HADR_REMOTE_HOST testmg2 HADR_REMOTE_SVC HADR_TARGET_LIST testmg2:50051 HADR_REMOTE_INST catalog DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. UPDATE ALTERNATE SERVER FOR DB bidb USING HOSTNAME testmg2 PORT DB20000I The UPDATE ALTERNATE SERVER FOR DATABASE command completed successfully. DB21056W Directory changes may not be effective until the directory cache is refreshed. DB20000I The TERMINATE command completed successfully. And likewise on the standby server. [root@testmg2 ~]# su catalog [catalog@testmg2 ~]$ db2 -tvf hadr_setup_standby.sql UPDATE DB CFG FOR bidb USING LOGINDEXBUILD ON HADR_SYNCMODE SYNC HADR_LOCAL_HOST testmg2 HADR_LOCAL_SVC HADR_REMOTE_HOST testmg1 HADR_REMOTE_SVC HADR_TARGET_LIST testmg1:50050 HADR_REMOTE_INST catalog DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. UPDATE ALTERNATE SERVER FOR DB bidb USING HOSTNAME testmg1 PORT DB20000I The UPDATE ALTERNATE SERVER FOR DATABASE command completed successfully. DB21056W Directory changes may not be effective until the directory cache is refreshed. DB20000I The TERMINATE command completed successfully. E. Start HADR Always start HADR on the standby server first, so it will be ready and waiting when HADR is started on the primary. You can use the db2pd command below to track the changes in HADR_STATE while the HADR is started on the primary. In this example the output is printed every 2 seconds to show the change in state. After starting HADR on the primary server, return to this window and hit Ctrl-C to stop the repeat. [catalog@testmg2 ~]$ db2 START HADR ON DB bidb AS STANDBY DB20000I The START HADR ON DATABASE command completed successfully. [catalog@testmg2 ~]$ db2pd -repeat 2 -alldbs -hadr grep HADR_STATE HADR_STATE = REMOTE_CATCHUP_PENDING HADR_STATE = REMOTE_CATCHUP_PENDING Now start HADR on the primary server. [catalog@testmg1 ~]$ db2 START HADR ON DB bidb AS PRIMARY DB20000I The START HADR ON DATABASE command completed successfully. Returning to the window on the standby server, we see that the state has changed, so you can hit Ctrl-C. ^C [catalog@testmg2 ~]$ HADR_STATE = REMOTE_CATCHUP_PENDING HADR_STATE = REMOTE_CATCHUP_PENDING HADR_STATE = PEER HADR_STATE = PEER Page 34 / 44

35 You have now setup HADR and the PEER state means that your standby database is upto-date with the latest changes made to the primary database. 1. Start BigInsights You can now restart BigInsights. Refer to <III.C-Restart BigInsights>. F. Monitor HADR As demonstrated earlier, you can monitor the state of the HADR environment by using db2pd from the command line. Here is the full output showing all of the details. ~]$ db2pd -alldbs -hadr Database Member 0 -- Database BIDB -- Active -- Up 0 days 00:08:28 -- Date HADR_ROLE = PRIMARY REPLAY_TYPE = PHYSICAL HADR_SYNCMODE = SYNC STANDBY_ID = 1 LOG_STREAM_ID = 0 HADR_STATE = PEER HADR_FLAGS = PRIMARY_MEMBER_HOST = testmg1 PRIMARY_INSTANCE = catalog PRIMARY_MEMBER = 0 STANDBY_MEMBER_HOST = testmg2 STANDBY_INSTANCE = catalog STANDBY_MEMBER = 0 HADR_CONNECT_STATUS = CONNECTED HADR_CONNECT_STATUS_TIME = 01/28/ :29: ( ) HEARTBEAT_INTERVAL(seconds) = 30 HADR_TIMEOUT(seconds) = 120 TIME_SINCE_LAST_RECV(seconds) = 9 PEER_WAIT_LIMIT(seconds) = 0 LOG_HADR_WAIT_CUR(seconds) = LOG_HADR_WAIT_RECENT_AVG(seconds) = LOG_HADR_WAIT_ACCUMULATED(seconds) = LOG_HADR_WAIT_COUNT = 0 SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, PRIMARY_LOG_FILE,PAGE,POS = S LOG, 0, STANDBY_LOG_FILE,PAGE,POS = S LOG, 0, HADR_LOG_GAP(bytes) = 0 STANDBY_REPLAY_LOG_FILE,PAGE,POS = S LOG, 0, STANDBY_RECV_REPLAY_GAP(bytes) = 0 PRIMARY_LOG_TIME = 01/27/ :56: ( ) STANDBY_LOG_TIME = 01/27/ :56: ( ) STANDBY_REPLAY_LOG_TIME = 01/27/ :56: ( ) STANDBY_RECV_BUF_SIZE(pages) = 4304 STANDBY_RECV_BUF_PERCENT = 0 STANDBY_SPOOL_LIMIT(pages) = STANDBY_SPOOL_PERCENT = 3 PEER_WINDOW(seconds) = 0 READS_ON_STANDBY_ENABLED = N It is also possible to query the database to get all of these details (see full details in the Knowledge Center). However, since we haven t configured HADR to be in a Reads-on- Standby (ROS) mode, we can only get information from the active PRIMARY database, as that s the only one we can connect to. This makes the db2pd method more attractive as it can report on both servers without making any connections to a database. Page 35 / 44

36 1. List open connections to the databases Before we test an HADR failover operation, which switches roles between the primary and secondary databases, let s check if there are any open connections to the primary database. [root@testmg1 ~]# su catalog [catalog@testmg1 ~]$ db2 LIST APPLICATIONS Auth Id Application Appl. Application Id DB # of Name Handle Name Agents CATALOG db2jcc_applica BIDB 1 CATALOG db2jcc_applica BIDB 1 CATALOG db2jcc_applica BIDB 1 CATALOG db2jcc_applica BIDB 1 CATALOG db2jcc_applica BIDB 1 You should see that there are connections made to the BIDB database, most likely by the Hive application, but there may be others, as well. Now look for connections on the standby server. [catalog@testmg2 ~]$ db2 list applications SQL1611W No data was returned by Database System Monitor. [catalog@testmg2 ~]$ db2 list applications show detail cut -c Application Name db2replay You will see that there are no connections, however, using the SHOW DETAIL option we can see that there is an internal connection made by HADR in order to replay log transactions (we ve used cut to display only a portion of the detailed output). 2. Monitor the change in HADR role and state Let s use the session on the primary server to monitor the change in HADR role and state. Let this run for now and we ll return to this window after the TAKEOVER. [catalog@testmg1 ~]$ db2pd -repeat 2 -alldbs -hadr egrep 'HADR_STATE HADR_ROLE' HADR_ROLE = PRIMARY HADR_STATE = PEER HADR_ROLE = PRIMARY HADR_STATE = PEER HADR_ROLE = PRIMARY HADR_STATE = PEER G. Switch HADR roles with TAKEOVER To test HADR we will use the TAKEOVER command to make the current standby server take the role of primary, and the current primary take the role of standby. NOTE: By doing this, you will interrupt any services using the Catalog, for example, Hive, by momentarily disconnecting it from its Metastore. This is run from the standby server. [catalog@testmg2 ~]$ db2 TAKEOVER HADR ON DB bidb DB20000I The TAKEOVER HADR ON DATABASE command completed successfully. Page 36 / 44

37 If we look over at our session monitoring the HADR role and state (<VIII.F.2>), we see that the roles have switched seamlessly (within the 2 second interval). ^C [catalog@testmg1 ~]$ HADR_ROLE = PRIMARY HADR_STATE = PEER HADR_ROLE = PRIMARY HADR_STATE = PEER HADR_ROLE = STANDBY HADR_STATE = PEER HADR_ROLE = STANDBY HADR_STATE = PEER H. Have we achieved High Availability? After a few seconds, we can see that some of the connections to the Catalog s database have been automatically reestablished on the standby server. Check for connections on the NEW primary instance. [catalog@testmg2 ~]$ db2 list applications Auth Id Application Appl. Application Id DB # of Name Handle Name Agents CATALOG db2jcc_applica BIDB 1 CATALOG db2jcc_applica BIDB 1 CATALOG db2jcc_applica BIDB 1 To confirm that the Catalog is available to the stack of BigInsights components, let s create a table in Big SQL, which will need to interface with the Hive Metastore. This is run as the bigsql user on the head/management node. [root@testmg1 ~]# su - bigsql [bigsql@testmg1 ~]$ db2 CONNECT TO bigsql Database Connection Information Database server = DB2/LINUXX SQL authorization ID = BIGSQL Local database alias = BIGSQL [bigsql@testmg1 ~]$ db2 "CREATE HADOOP TABLE test_table (a INTEGER)" DB20000I The SQL command completed successfully. [bigsql@testmg1 ~]$ db2 "DROP TABLE test_table" DB20000I The SQL command completed successfully. (By the way, if you check out the number of connections to the BIDB database again, you ll notice a lot more have been opened.) Note that we have manually transferred the role of primary to our standby server. Some organizations prefer a manual failover procedure while others prefer this to be automated. DB2 provides tools to automate failovers (detect and act) using Tivoli System Automation (TSA) (also see the db2haicu utility). Page 37 / 44

38 I. Implications of keeping the remote server in the Primary role If, for any reason, you intend on keeping the HA/standby server in the HADR primary role and the head/management server in the standby role as we ve done with the TAKEOVER command, you should consider the following implications: a. If you decide to stop BigInsights on the management node (with stop.sh), this will not stop the DB2 instance on the HA/standby server (although all connections to the database will naturally be closed). b. When you restart BigInsights, you will see that the catalog component (and all of its dependents) are started as normal. With the start of the catalog DB2 instance, HADR resumes the role of STANDBY for the BIDB database, and due to the Automatic Client Reroute (ACR) feature mentioned earlier, all new connections are automatically redirected to the remote BIDB database. J. Stopping HADR (and [Optional] Removal) When you wish to stop HADR, follow these steps. But first, let s return the primary role back to the instance on the head/management node. This is run on the original primary server (primary management node). [catalog@testmg1 ~]$ db2 TAKEOVER HADR ON DB bidb DB20000I The TAKEOVER HADR ON DATABASE command completed successfully. Now stop HADR on the new (original) primary server. [catalog@testmg1 ~]$ db2 STOP HADR ON DB bidb DB20000I The STOP HADR ON DATABASE command completed successfully. On the standby server, you will get the error shown here if you try to stop HADR before deactivating the database. [catalog@testmg2 ~]$ db2 STOP HADR ON DB bidb SQL1769N Stop HADR cannot complete. Reason code = "2". [catalog@testmg2 ~]$ db2 DEACTIVATE DB bidb DB20000I The DEACTIVATE DATABASE command completed successfully. [catalog@testmg2 ~]$ db2 STOP HADR ON DB bidb DB20000I The STOP HADR ON DATABASE command completed successfully. Page 38 / 44

39 1. [Optional] Remove HADR definitions If you want to stop using the HADR configuration entirely, simply remove the HADR definitions. You can use the following script. UPDATE DB CFG FOR bidb USING -- LOGINDEXBUILD ON -- HADR_SYNCMODE SYNC ; HADR_LOCAL_HOST HADR_LOCAL_SVC HADR_REMOTE_HOST HADR_REMOTE_SVC HADR_TARGET_LIST HADR_REMOTE_INST NULL NULL NULL NULL NULL NULL UPDATE ALTERNATE SERVER FOR DB bidb USING HOSTNAME NULL PORT NULL ; TERMINATE; hadr_setup_remove.sql Run the script as follows on both the primary and standby servers: [catalog@testmg1 ~]$ db2 -tvf hadr_setup_remove.sql UPDATE DB CFG FOR bidb USING HADR_LOCAL_HOST NULL HADR_LOCAL_SVC NULL HADR_REMOTE_HOST NULL HADR_REMOTE_SVC NULL HADR_TARGET_LIST NULL HADR_REMOTE_INST NULL DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. SQL1363W One or more of the parameters submitted for immediate modification were not changed dynamically. For these configuration parameters, the database must be shutdown and reactivated before the configuration parameter changes become effective. UPDATE ALTERNATE SERVER FOR DB bidb USING HOSTNAME NULL PORT NULL DB20000I The UPDATE ALTERNATE SERVER FOR DATABASE command completed successfully. DB21056W Directory changes may not be effective until the directory cache is refreshed. DB20000I The TERMINATE command completed successfully. Page 39 / 44

40 As you can see by the warning above, when you run this on the management node (primary server) the changes won t take effect until all connections to the database have been closed and it s been deactivated. This should be done as the biadmin user using the stop.sh/start.sh commands as demonstrated earlier, or you can limit the operation to only the catalog component, as shown here. This will interrupt any connections to the database (from Hive, for example). [root@testmg1 ~]# su - biadmin [biadmin@testmg1 ~]$ stop.sh catalog [INFO] DeployCmdline - [ IBM InfoSphere BigInsights Enterprise Edition NonProductionEnvironment Version ] [INFO] Progress - Stop catalog [INFO] HdmUtil - Install configuration has changed in the system, reloading... [INFO] DB2Operator - Stopping DB2 Instance catalog on node testmg1.iic.il.ibm.com [INFO] DB2Operator - DB2 services stopped on node testmg1.iic.il.ibm.com [INFO] Progress - 100% [INFO] DeployManager - Stop; SUCCEEDED components: [catalog]; Consumes : 6693ms [biadmin@testmg1 ~]$ start.sh catalog [INFO] DeployCmdline - [ IBM InfoSphere BigInsights Enterprise Edition NonProductionEnvironment Version ] [INFO] Progress - Start catalog [INFO] HdmUtil - Install configuration has changed in the system, reloading... [INFO] DB2Operator - Starting DB2 Instance catalog on node testmg1.iic.il.ibm.com. Database to be activated BIDB [INFO] DB2Operator - DB2 node testmg1.iic.il.ibm.com is started with process ID [INFO] DB2Operator - Database BIDB has already been activated [INFO] Progress - 100% [INFO] DeployManager - Start; SUCCEEDED components: [catalog]; Consumes : 8113ms On the remote standby server, you can simply stop the instance after removing the HADR definitions. [root@testmg2 ~]# su - catalog [catalog@testmg2 ~]$ db2 -tvf hadr_setup_remove.sql UPDATE DB CFG FOR bidb USING HADR_LOCAL_HOST NULL HADR_LOCAL_SVC NULL HADR_REMOTE_HOST NULL HADR_REMOTE_SVC NULL HADR_REMOTE_INST NULL DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. UPDATE ALTERNATE SERVER FOR DB bidb USING HOSTNAME NULL PORT NULL DB20000I The UPDATE ALTERNATE SERVER FOR DATABASE command completed successfully. DB21056W Directory changes may not be effective until the directory cache is refreshed. TERMINATE DB20000I The TERMINATE command completed successfully. [catalog@testmg2 ~]$ db2stop SQL1064N DB2STOP processing was successful. Page 40 / 44

41 IX. Summary The goal of this guide has been to demonstrate how to achieve resiliency of your Big SQL metadata. Should your Big SQL metadata or Hive Metastore (Catalog) become damaged or inaccessible, you can use your backup images to rebuild these databases, either on the same server or a replacement server. And by setting up HADR for the Catalog, you ll have a redundant database, making failovers to the standby much quicker and seamless. It s up to you to decide the level of resiliency that you wish to implement and then establish regular backups. Implement the best practices covered in this guide to manage the saved copies of database backups and archive-logs. Page 41 / 44

42 Copyright IBM Corporation The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, these materials. Nothing contained in these materials is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in these materials to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. This information is based on current IBM product plans and strategy, which are subject to change by IBM without notice. Product release dates and/or capabilities referenced in these materials may change at any time at IBM s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at Copyright and trademark information at Page 42 / 44

43 X. [Addendum] Detailed Table of Contents I. Introduction... 3 A. Understanding the relationship between Data and Metadata... 3 B. Strategy for Resiliency... 4 II. First Steps... 6 A. Find the instance and database names... 6 B. Find the size of the database... 9 C. [Optional] Free up space by purging old oozie log records III. Performing Offline Backups A. Stop BigInsights, but restart the DB2 instances Stop BigInsights Restart the catalog instance Restart the bigsql instance B. Perform an OFFLINE database backup Create the backup target directory for BIDB Backup the BIDB database [Optional] Use the COMPRESS argument Create the backup target directory for BIGSQL Backup the BIGSQL database C. Restart BigInsights IV. Enabling ONLINE Backups A. Enable ARCHIVE logging Enable ARCHIVE logging for BIDB Enable ARCHIVE logging for BIGSQL B. [Optional] Further changes to log files C. [Optional] Capping diagnostic log size (DIAGSIZE) D. Perform the initial OFFLINE backup E. [Optional] Restart BigInsights F. Perform an ONLINE backup V. Setup Automatic Online Backups A. Define an automatic backup policy B. Enable the automatic backup policy C. Check for the automatic backups VI. Restore the Databases from a Backup A. Stop BigInsights B. Restore the BIDB database C. Restore the BIGSQL database D. Restart BigInsights VII. Purge Old Backups and Archive Logs A. Using the LIST HISTORY command B. Using the DB_HISTORY view C. Setup automatic purge VIII. Redundancy of the Catalog with HADR A. [Optional] Install BigInsights on the standby server B. Install DB2 on the standby server Copy the DB2 installation files to the standby server Page 43 / 44

44 2. Install DB2 using the response file Create the catalog instance C. Restore a copy of the database to the standby server Create directories needed for the database Restore the database D. Configure HADR Stop BigInsights Setup HADR on the primary and standby servers E. Start HADR Start BigInsights F. Monitor HADR List open connections to the databases Monitor the change in HADR role and state G. Switch HADR roles with TAKEOVER H. Have we achieved High Availability? I. Implications of keeping the remote server in the Primary role J. Stopping HADR (and [Optional] Removal) [Optional] Remove HADR definitions IX. Summary X. [Addendum] Detailed Table of Contents Page 44 / 44