Backup using rsync

When running a computer system, backing up data normally is an every day task. But for different reasons, often there is no backup. In case setting up a backup would be really really easy, there would be no excude not to set up a backup.

Why to do a backup

There is a various number of reasons why to do them, here a small extract:

hardware failure
software error
virus, malware, …
user malfunction

It’s not the questions whether an HDD fails, but when. Of course now there are SSDs, but they are quite expensive, so you normally store only a part of all the data or just the operating system on it. Furthermore even a ssd may fail.

My requirements to a backup solution

Backup vs. Archiving

Backing up data means to store a snapshot of the data with a defined timestamp. In case of data loss it is possible to restore these data. Archiving in contrast means keeping a number of versions of a file in order to be able to restore a file with a special timestamp (e.g. the versions last friday at 16h).
When deciding to introduce a backup solution, it might be a good idea to to addionally use archiving. Using an archiving solution means more effort to implement. Thus in this post I don’t handle archiving as this would too much for this post. But backing up is not simply copying data from a to b. Of course it is possible to to a full backup every time, but this increases the amount of time required for backing up all the data.

Required features

scriptable on shell
no need to maintain
easy to customize
simple to restore and copy (without a special tool)
a partial restoration must be possible
stable (Beeing required to use the correct version in order to run is not stable)
keeping metadata like permissions, timestamps, …

The killer features of rsync

Synching only changed data

Normally a backup is always written to the same destination. Rsync provides to just synchronize the data that really is changed. This enormously increases the performance concerning the amount of time, data transfered and I/O. Per default rsync identifies all the files that must be handled. In case a file is not modified rsync simply ignores this file per default. So when changing may be 2% of all the files per week, the backup ist about 50 times faster than a normal backup!

An enormous feature set

include/exclude by pattern
support for soft/hardlinks
dry run, e.g. for testing purpose
handling metadata like permissions, timestamps, …
scriptable via command line…
and a lot more …

Having a look into rsync manual

Soo.., let’s find the most common options imho. Having a look into the rsync manpage helps quite a lot:

-a (archive) implies
-r (recursive)
-l (links)
-p (permissions)
-t (timestamps)
-g (groups)
-o (owner)
-D (devices)

-v (verbose)
-E (preserve executablity)
-H (preserve hard links)
-z (compress during transfer)
--compress-level=NUM (compression level during transfer)
--exclude=PATTERN
--exclude-from=FILE
--include=PATTERN
--include-from=FILE
-h (human readable)
--log-file=FILE
--log-file-format=FMT
--bwlimit=KBPS
--delete (delete extranous file on destination)
--delete-excluded (deletes files that are excluded via --exclude from destination)
--progress (shows progress during file transfer)
--stats
--link-dest= (makes hardlinks to files that are already existing)
-n (dry run if required)

Backing up

In order to get a backup solution by rsync there are some must parameters:

Options for a full featured backup

Parameter	Description
-a	archive (implies rlptgoD)
-r	recursive (folders and subfolders)
-l	links (includes symbolic links)
-p	permissions (keeps the permissions e.g. 600)
-t	preservce modification timestamp
-g	preserve group
-o	preserve owner
-D	preserve devices an special files
-v	verbose (details log, e.g. each file that is handled)
-E	preserve executability
-H	preserve hard links (in case there are hardlinks in the source, this is quite necessary in order to be able to restore the backup correctly)
–exclude-from=FILE	excluding files and directories by patterns that are written in FILE
-h	status information is printed human readably
–log-file=FILE	writes all the interaction in to an extra log file
–delete	deletes all deleted files on destination too
–delete-excluded	in case there is a file in a directory, that is now excluded, it is deleted on the destination too
–progress	prints a progress bar during execution
–stats	prints some stats after execution
-n	dry run (especially interesting for simulating a backup before starting it the first time)

Starting rsync

#!/bin/sh
SOURCE="data"
TARGET="backup"
EXCLUDES="excludes"
LOGFILE="logfile"

# maybe reading command line parameters

RSYNC="rsync"
RSYNC="$RSYNC -ahvHE --stats --progress --delete --delete-excluded"
RSYNC="$RSYNC --log-file=$LOGFILE"

# requires a file called excludes in the working directory
RSYNC="$RSYNC --exclude-from=$EXCLUDES"
RSYNC="$RSYNC $SOURCE/ $TARGET"
$RSYNC
# maybe some exception handling here

In order to start this script regularly, simply add it to /etc/cron.daily or equivalent …

Restoring data

In order to restore all the data simply change source and destination folder of the script. Otherwise it is possible to copy single files or folders in order to restore them.

Conclusion

It is quite easy to write a full featured backup solution with rsync. In order to complete the solution generating checksums for all the files would be great. Furthermore using logrotate could be a nice extention. Email notification in case of an failure or partial backups is another step. This is of course not a DAU compatible solution, but it’s easy to use and easy to restore. As it is not based on a special file format you can compress, encrypt or copy it without any problems.

Sources

rsync manual

block zero