TCB No. 2012-006 March 2012 Technical Bulletin GS FLX and GS FLX+ Systems Configuration of Data Backup Using backupscript.sh Summary This document describes the intended use and a configuration example of the automation script backupscript.sh, available as part of the 454 Sequencing System Software v2.0 and higher for the GS FLX and GS FLX+ Systems. backupscript.sh is intended to copy Run data, including raw images, to a remote location for backup/archival purposes. While backupscript.sh is also included with the GS Junior System, and the principles are similar on the GS Junior System, the scope of this document is limited to describing a usage example on the GS FLX or GS FLX+ System. For life science research only. Not for use in diagnostic procedures. Page 1 of 7
Table of Contents Summary 1 Workflow Overview 3 Pre-configuration Activities 4 Locate the backupscript.sh file 4 Password-less SSH communication 4 Login information for the remote computer 4 /data directory 4 A Text Editor 4 Configuring backupscript.sh 5 Intended Use 5 Triggering the Script 5 Parameters 5 Default Contents of backupscript.sh 6 Customizing backupscript.sh 7 Page 2 of 7
Workflow Overview Data Acquisition Instrument Camera Data Processing Image Processing Data Processing Signal Processing GS FLX or GS FLX+ Instrument GS FLX or GS FLX+ Instrument GS FLX+ Computing Station The data processing workflow for the GS FLX and GS FLX+ Systems can be divided into three main steps: 1) The Data Acquisition step captures images for each nucleotide flow on the GS FLX or GS FLX+ Instrument. After all the images have been captured, data processing proceeds to the next step. Once the fluidics have completed, which is at a point in time after the final images have been captured and therefore after data processing has proceeded to the next step, the script named backupscript.sh is executed. 2) The Image Processing step takes the raw images as input and generates an intermediate set of data files (Composite Wells Format (CWF) files) that will be the input for the Signal Processing step. Another automation script, postanalysisscript.sh, is executed once the Image Processing step has completed. The absolute timing of the execution of each automation script depends not only on the completion of the processing phases, but also on the timing of the completion of the fluidics, as indicated in the description of the Data Acquisition step. Since it is possible that both the Data Acquisition and Image Processing steps complete before the fluidics completes, and therefore before backupscript.sh is executed, backupscript.sh and postanalysisscript.sh should have no assumptions as to which script is run first. 3) The Signal Processing step is computationally intensive, and it is therefore not recommended to ever perform this step on the GS FLX or GS FLX+ Instrument. The postanalysisscript.sh mechanism was developed to provide a convenient way to work around this fact. For usage guidance on postanalysisscript.sh on the GS FLX and GS FLX+ Systems, refer to TCB 2012-005, Configuration of Remote Data Processing Using postanalysisscript.sh.
Pre-configuration Activities The following activities or items are all required in order for the subsequent automation instructions to work properly. Locate the backupscript.sh file backupscript.sh is included with GS FLX and GS FLX+ Systems, located in the /usr/local/rig/bin/ directory. Password-less SSH communication Unattended file transfer is a necessary requirement for automating remote data transfer. Refer to TCB 2012-004, Configuration of Password-free SSH Access between a GS FLX or GS FLX+ Instrument and a Remote Datarig or Cluster, for instructions on how to set this up. Login information for the remote computer You will need the username, password, and IP address of the remote computer. You should already have these items, since they are also required for configuring password-less SSH communication. This document assumes the username is adminrig, although any username will work as long as it has the correct access permissions. /data directory This document assumes that the remote computer has a /data directory within which the data will reside. This is the target directory for the data transfer from the Instrument. If no such directory exists on the remote computer, create one by executing these two commands (root access may be required): mkdir /data chown adminrig /data A Text Editor The procedure for configuring automation requires manual editing of text files. Therefore, you will need to be comfortable with a text editor to use it for this purpose. Although any text editor will suffice, a convenient one that is included on the GS FLX and GS FLX+ Instruments is called nedit. To open a file for editing in nedit, type the following command: nedit <path to file> <Enter> This document uses the angle-brackets convention for specifying placeholders in files and commands. For a command to execute properly, you need to replace the placeholder with the information indicated by the placeholder text, which in this case is the path to the file you wish to edit. Throughout this document, file names, commands and parts of commands will be represented by placeholders that will need to be replaced. After editing a file, be sure to save it before closing. With nedit, this is done with a typical File->Save menu option, followed by File->Exit.
Configuring backupscript.sh Intended Use The intended use of the backupscript.sh is to copy or transfer the raw data to an off-instrument location for safe keeping. The raw data can only be created by the Instrument during the sequencing Run, and is the key dataset from which all other signal processing results can be derived. The backupscript.sh script is provided as a template that needs to be filled in by the user with their own specific copy/move operations, including choosing the destination (where to copy), the copy method/commands, if/how the copy is verified, and the updating of backup status file. The run data contained in the following files cannot be regenerated (without performing another sequencing run using the same sample) and should be considered the minimal file set to backup: The contents of the rawimages directory datarunparams.parse runlog.parse imagelog.parse ptpimage.pif The sequencing script file (with the.icl extension) aalog.txt Triggering the Script backupscript.sh is called at the end of a sequencing run on the instrument. When setting up a sequencing Run on the instrument using the Instrument Procedure Wizard, the user has the option to check the following box, in order to activate the backup script: Parameters After the fluidics have been completed, the software calls backupscript.sh with the source-path of the run to be backed up as a parameter. Therefore, the path of the run will be available in the special variable$$1 (/data/<yyyy_mm_dd>/r_<yyyy_mm_dd_hh_min_sec>_flx<serial>_<user>_<runname>). Page 5 of 7
Default Contents of backupscript.sh The current version of the 454 Sequencing System Software at the time of this writing is version 2.6. The entire contents of backupscript.sh are presented below. If the contents substantially change in a future version of the software, this document will be updated.!/bin/bash Filename: backupscript.sh Programmer: Bernard Puc Description: Executes the backup of the given directory. $Id: backupscript.sh,v 1.7 2005-01-20 15:59:23 bpuc Exp $ if [ $ -ne 1 ] then echo "Need to specify a directory on the command line." echo "Exiting." exit 1 fi Initialize the backup status variable as failed RET_ERR=1 Edit the backup status file backuplog $1 "running" "permanent" Add custom backup code here... ======================================= ======================================= ======================================= ======================================= Edit the backup status file if [ $RET_ERR -eq 0 ] then backuplog $1 "complete" "permanent" else backuplog $1 "failure" "permanent" fi End of script exit 0 Add your customized backup code here. Page 6 of 7
Customizing backupscript.sh 1. Open the backupscript.sh file for editing: nedit /usr/local/rig/bin/backupscript.sh <Enter> 2. Add an appropriate backup command or commands in the location highlighted in the code listing in the previous section. A typical example would be: rsync -a --exclude= D_* $1 adminrig@<w.x.y.z>:/data/ The <w.x.y.z> placeholder should be replaced with the IP address of the remote computer. This example places the Run directory, and all contents excluding any analysis (D_) directories, directly into the /data directory on the remote computer. 3. There is a variable in the script template called RET_ERR whose setting is used by the script to indicate whether or not the backup completed successfully. As you can see from the figure, the variable is initialized to a value of 1, indicating failure. Therefore, once the backup is successfully completed, an additional statement must be added to the Add custom backup code here section that sets RET_ERR to 0, indicating success. A convenient way to automatically set RET_ERR to 0 upon success of the example command in the previous step would be to add this command below the rsync command: RET_ERR=$? A more complex mechanism of determining success or failure may be necessary if you choose to implement a more complex backup strategy than described in the example in Step 2. This step is important because the success or failure of the backup is reflected in the GS Sequencer Graphical User Interface (GUI), and if the status is correctly logged you may encounter issues with managing your Runs on the Data tab of the GUI. 4. Save and close the file. For life science research only. Not for use in diagnostic procedures. License disclaimer information is subject to change or amendment. For current license information on license disclaimers for a particular product, please refer to https://www.roche-applied-science.com/new/legal/index.jsp?id=legal_000000. 454, 454 LIFE SCIENCES, 454 SEQUENCING, GS FLX, GS FLX TITANIUM, GS JUNIOR, EMPCR, PICOTITERPLATE, PTP, NEWBLER, REM, GS GTYPE, GTYPE, AMPLITAQ, AMPLITAQ GOLD, FASTSTART, NIMBLEGEN, SEQCAP, MAGNA PURE, and CASY are trademarks of Roche. All other product names and trademarks are the property of their respective owners. Page 7 of 7 SYBR is a registered trademark of Molecular Probes, Inc.