create new tag
view all tags

eSDO server backup version 1.1

To view this page as a PDF document, please click here.

Note: The current version of this document is accessible at SDOServerBackup.


This document describes the eSDO server (msslxx) backup procedure.

The eSDO server currently hosts a number of AstroGrid webapps and will eventually hold - under CVS control - the source code for the 11 algorithms being developed by the UK SDO team, as well as datasets used for testing purposes, so regular backups will become essential.

The eSDO server resides in a restricted access area, making the replacement of backup media inconvenient outside of office hours. As the backups will not involve enormous datasets and large bandwidth, the Computing Admin group has agreed to allow backups of the eSDO server across the network to a DVD read/writer installed on one of the office-based linux machines.

A new DVD read/writer has been purchased out of the eSDO budget and installed on msslli (replacing the existing CD-ROM drive).

The DVD read/writer is an NEC device: model NEC ND-3540A.



After installation of the DVD writer, msslli was rebooted and the new hardware detected automatically without the need to install new device drivers/OS patches.

A bash shell script has been written and a cron job set up on the eSDO server to automate the backup. The script makes use of the 'growisofs' set of commands which allow data to be appended to various random access media, including DVD+RW disks which will be used for eSDO server backups. The growisofs commands build on the earlier mkisofs suite of commands and form the backbone of the popular Linux DVD burning application K3B.

SSH protocol 2

Communications between the eSDO server and msslli use ssh protocol 2. By default ssh requires password authentication. This introduces an interactive step into the backup process which is clearly undesirable. However, ssh also supports DSA authentication via the use of public/private keys, which allows host and remote server to communicate without the need for manual verification. This method, although slightly less secure than using passwords, has been deemed acceptable by the Computing Admin group as both hosts are internal to MSSL and therefore firewalled.

Set-up of DSA authentication is as follows:

  • Run ssh-keygen on msslxx, eg ssh-keygen -t dsa -f id_dsa, to generate public and private keys. The user will be prompted for an optional passphrase but this can be ignored.

The 't' option defines the type of key to be generated, DSA in this instance, and the 'f' option defines the name of the output files, id_dsa.pub and is_dsa in this case.

  • Copy the private id_dsa key file to the .ssh directory under the username of interest on the local host - 'griduser' on msslxx in this case.

  • Copy the public id_dsa.pub key file to the .ssh directory under the username of interest on the remote host - 'ms2' on msslli in this instance - and rename the key file as 'authorized_keys'.

  • From the local host (msslxx) ssh to the remote host (msslli) and confirm that the user is automatically logged onto the remote machine without being prompted for a password.

Script details

The script is configured to perform for daily, weekly and monthly backups. Directories/files selected for backup are referenced by pathname at the beginning of the file and assigned to one of the 3 backup categories.

The script features some simple error checking and will terminate gracefully if a fault is detected which would prevent the backup being completed successfully. All stages of the backup process, including errors, are reported to a log file held on msslxx.

Daily backups will commence at midnight each day, weekly backups at midnight every Sunday, and monthly backups will begin at midnight on the first day of each month.

When the backup process begins the following sequence of events takes place:

  • A check is run to see if the DVD writer on msslli is mounted. If not, the drive is mounted.

  • A check is run to see how many blocks of 2048 bytes have been used on the DVD and the result subtracted from the total number of blocks available on a new disk. The sum is used in the latter stages of the backup process to determine if the the latest backup file can be accommodated on the DVD.

  • The selected directories/files are packed into tar files in their local directory. Each tar file is named according to its backup category of its contents. As there may be several tar files created within a given category, a 2-digit suffix is appended so that each is uniquely identified.

For example:

  • daily01.tar
  • daily02.tar
  • weekly01.tar
  • monthly01.tar

As each tar file is created it is moved to /disk/d4/griduser on msslxx. Disk space in the '/home' directory on msslxx is approximately 75% full at the time of writing (16/11/05) and is unsuitable for holding large temporary backup files. 'd4' is one of four largely unused 50GB partitions on the eSDO server and is ideal for holding large backup files on temporary basis.

  • When all tar files have been created and moved to /disk/d4/griduser they are formed into one tar file which is named according to the current date, eg 05Nov05.tar. This file is then compressed using the gzip utility.

  • The zipped file is then transferred to the temporary holding on the remote host (msslli) under /disk/scratch/ms2 using the scp command. Usernames under the /home directory on the alpha cluster have been avoided because of the ease with which quotas are exceeded.

  • The script then uses the growisofs command to check the state of the DVD, ie whether the DVD is blank or contains data. This is an important step as the growisofs burn options have to be adjusted accordingly. For example, burning the first session on a DVD requires the 'Z' option, subsequent sessions the 'M' option.

  • A check is performed to determine if the DVD has enough space to accomodate the zipped backup file. If not, an error will be reported and the backup terminated, otherwise the backup file is appended to the DVD.

  • All tar and zip files created during the backup are removed as the final step of the process, irrespective of whether the backup was successful or not.

  • Finally, the cronjob e-mails selected users to inform them of the job status.


Summary of backup host, directory and file details:

Local host (msslxx)

Backup script /home/griduser/BACKUP/backup.sh
Backup cronjob file /home/griduser/BACKUP/cronbackup
Backup log file /home/griduser/BACKUP/logs/backupscript.log
Temporary filestore /disk/d4/griduser

Remote host (msslli)

Remote host temporary filestore /disk/scratch/ms2

Workarounds/bugs/suggested improvements

  • There is currently a problem running scp from msslxx in order to send/retrieve files. The command appears to run without a hitch as it returns a zero - implying file transfer was successful, when in fact the transfer had failed. Fortunately files can be sent/retrieved to/from msslxx using scp from the alpha cluster, hence the unusual combination of ssh and scp in a single commandline in the backup script. However, this step has meant ssh public/private keys having to be created on msslli to allow it to communicate with msslxx non-interactively. These may be removed safely when the msslxx scp issue has been fixed.

  • A better alternative to the temporary filestore on msslli (which has an unknown quota of diskspace) would be the 12GB /mnt/zip partition on the same machine. Unfortunately to mount this partition requires root permission. I will ask the Computing Admin group to modify this to allow mounting of the partition from within the backup script.

  • The growisofs and mkisofs commands have a plethora of options, many of which haven't been explored. These may well provide a more optimal method for burning DVDs and are worth investigating if the eSDO server backup DVDs start to fill-up rapidly.

  • The present backup scheme works on the principle of archiving selected directories and producing a tar file for each. This is not the most elegant solution, but it is a simple way of ensuring that directory/sub-directory structures are preserved. The current set-up does not detect and backup changed files.

Rotating catalina.out file

  • TOMCAT reports all activity, including errors, to a file called 'catalina.out', which resides in the $CATALINA_HOME/logs directory. Regular back-up of the file is essential so that faulty transactions can be tracked and de-bugged. However, TOMCAT's reporting is prodigious and 'catalina.out' can rapidly grow to an unmanageable size. Fortunately there is a Perl script available that rotates 'catalina.out', i.e. creates smaller, more manageable daily log text files suitable for backup.

The script and instructions on its use are available at http://doylestownpa.us/webadvisor/tomcat5_log_rotation.html

Installation steps are as follows:

  1. The Perl script, called 'spk.log.rotate', is downloaded and copied to $CATALINA_HOME/logs.
  2. In the file $CATALINA_HOME/bin, find the 2 occurrences of the following line:

    >> "$CATALINA_BASE"/logs/catalina.out 2>&1 &

    and replace both with:

    2>&1 | "$CATALINA_BASE"/logs/spk.log.rotate \

    "$CATALINA_BASE"/logs/catalina.out_log.%Y-%M-%D.txt &

    Note: No space after the backslash

  3. Stop/start TOMCAT
  4. Check that a file of the form 'catalina.out_log.yyyy-mm-dd.txt' is produced.

-- MikeSmith - 16 Nov 2005

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | More topic actions
Topic revision: r2 - 2005-11-18 - MikeSmith
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback