1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
BIG DATA CONFERENCE 2015 Boston August 10-13 Vertica Backup and Restore Ramesh Narayanan, Vertica Professional Services Aug 10, 2015
Module Overview Backup and Restore Copy Vertica Database Online Recovery 3 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Backup and Restore 4 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Backup - Overview Backup is the process of copying the actual data files to a specified location Vertica data and backup files are written once Once a file is written Vertica will not update it Number of files increase with each backup Tuple Mover keeps the number of files under control The TM mergeout process consolidates smaller ROS containers into larger ones To backup, copy Vertica files to stable storage Can be direct attached storage, NFS mounts or SAN Those files can then be moved to tape backup or integrated with other tools 5 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Backup When? Backup is the process of copying the actual data files to a specified location Part of Regular Disaster Recovery Strategy Nightly, weekly, depending on business continuity requirements and resources After loading or altering a large volume of data Before Maintenance Tasks Upgrading to another version of Vertica Dropping a Partition Before and after adding, removing or replacing nodes 6 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Backup and Restore Options There are several ways to take a Vertica Backup Backup and Restore by Database Most common backup process Backs up the entire database which includes all the schemas and objects within them Backup and Restore by Schema Multi-tenant database with different backup frequency Multi-application cluster with different backup requirements /policies Backup and Restore by Table Can be used to backup some critical tables Restore certain tables for QA / Testing Backup frequency depends on the criticality / tolerance of data loss / recovery 7 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica Backup Restore VBR vbr.py is a Python script located under /opt/vertica/bin Use vbr.py with various options to take backup and restore data Create a configuration file vbr.py --setupconfig Goes into interactive mode, gathers all parameters and creates the configuration file VBR parameters Database name, schema name, snapshot name, object names Restore points, backup location, node names, temporary directories etc. 8 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
vbr.py setupconfig options 9 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
vbrtest.ini 10 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica Backup Restore VBR A few parameters explained Snapshot Name stores all the files under that named directory Restore Points number of incremental backups stored in addition to full backup Node Names of nodes in the cluster Data is backed up from each node of the cluster Backup Directory Location where the backup files are stored If it is NFS mount, a separate directory for each node gets created under the backup directory 11 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
VBR preparation Steps and some prerequisites Backup location to be configured on all the nodes Verify database is running Ensure backup hosts are running if data is backed up to those hosts Backup can be done to the same cluster nodes Backup can also be done to a dedicated host which has the SAN storage Backup Directory Permissions / Contents Ensure that the user who starts the backup process has write permissions Backup directory contains sub-directories for each node (if NFS location) Under the Backup directory VBR creates the sub-directory for each snapshot The full backup and each incremental backups are stored in separate directories 12 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Performing a Backup How to run the vbr.py script vbr.py --task backup --config-file <myconfigfile> Same command is used for full and incremental backups First run does a full backup All data files are copied to the sub-directory with the snapshot name Subsequent runs are incremental Copies files which have changed since last backup Files are only added or deleted, never modified Each incremental backup goes into a separate sub-directory with a timestamp Each incremental backup also adds those files to the full backup 13 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
VBR Process Infographics 14 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Performing a Restore The same vbr.py script is used for restore vbr.py --task restore --config-file <myconfigfile> The configuration file is the same that is used for the Backup Restore can be specific Entire database, specific schema or table depending on the configuration file used Vertica copies the files from backup location to the data directory location Some key features Vertica does not have the concept of transaction logging There is no roll forward or roll back of transactions Objects can be restored to the timestamp of the last snapshot 15 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Copy Vertica Database 16 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Copy Vertica Database This option of VBR copies the entire Database (cluster) to a target cluster When do we need copycluster? Maintain a warm-standby cluster for Disaster Recovery Provide an alternative cluster to a different set of users / applications Prerequisites Source and Target cluster must have same number of nodes Database, node names and dbadmin user have to be the same on both sides Password-less ssh has to be established between all the nodes on both sides Target database has to be shut down before starting the process vbr.py --task copycluster --config-file <cfgfile> The task runs as one continuous transaction 17 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Online Recovery 18 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Node Recovery Vertica is highly available MPP architecture, but nodes may go down Node can recover from failure A node can rebuild its data set from other nodes in the cluster if the cluster is K-safe In a full recovery the node rebuilds from scratch Incremental Recovery Node rebuilds from the current persisted state To speed up a full recovery, use a prior backup for the given node and perform incremental recovery RAID 10 is best practice RAID arrays (5,6,10) can be rebuilt without impact to other cluster nodes 19 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Monitor Recovery Monitor disk space df h SELECT * FROM v_monitor.disk_storage; Monitor Recovery tail vertica.log SELECT * FROM v_monitor.recovery_status; 20 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
QUESTIONS? Please attend our Q&A with HP Big Data experts today Marina Ballroom, Lobby level 10:15 am 10:30 am 12:00 pm 1:00 pm 2:30 pm 3:00 pm 4:30 pm 5:00 pm