When running a computer system, backing up data normally is an every day task. But for different reasons, often there is no backup. In case setting up a backup would be really really easy, there would be no excude not to set up a backup.
Why to do a backup
There is a various number of reasons why to do them, here a small extract:
- hardware failure
- software error
- virus, malware, …
- user malfunction
It’s not the questions whether an HDD fails, but when. Of course now there are SSDs, but they are quite expensive, so you normally store only a part of all the data or just the operating system on it. Furthermore even a ssd may fail.
My requirements to a backup solution
Backup vs. Archiving
Backing up data means to store a snapshot of the data with a defined timestamp. In case of data loss it is possible to restore these data. Archiving in contrast means keeping a number of versions of a file in order to be able to restore a file with a special timestamp (e.g. the versions last friday at 16h).
When deciding to introduce a backup solution, it might be a good idea to to addionally use archiving. Using an archiving solution means more effort to implement. Thus in this post I don’t handle archiving as this would too much for this post. But backing up is not simply copying data from a to b. Of course it is possible to to a full backup every time, but this increases the amount of time required for backing up all the data.
- scriptable on shell
- no need to maintain
- easy to customize
- simple to restore and copy (without a special tool)
- a partial restoration must be possible
- stable (Beeing required to use the correct version in order to run is not stable)
- keeping metadata like permissions, timestamps, …
The killer features of rsync
Synching only changed data
Normally a backup is always written to the same destination. Rsync provides to just synchronize the data that really is changed. This enormously increases the performance concerning the amount of time, data transfered and I/O. Per default rsync identifies all the files that must be handled. In case a file is not modified rsync simply ignores this file per default. So when changing may be 2% of all the files per week, the backup ist about 50 times faster than a normal backup!
An enormous feature set
- include/exclude by pattern
- support for soft/hardlinks
- dry run, e.g. for testing purpose
- handling metadata like permissions, timestamps, …
- scriptable via command line…
- and a lot more …
Having a look into rsync manual
Soo.., let’s find the most common options imho. Having a look into the rsync manpage helps quite a lot:
-a (archive) implies
-E (preserve executablity)
-H (preserve hard links)
-z (compress during transfer)
--compress-level=NUM (compression level during transfer)
-h (human readable)
--delete (delete extranous file on destination)
--delete-excluded (deletes files that are excluded via --exclude from destination)
--progress (shows progress during file transfer)
--link-dest= (makes hardlinks to files that are already existing)
-n (dry run if required)
In order to get a backup solution by rsync there are some must parameters:
Options for a full featured backup
|-a||archive (implies rlptgoD)|
|-r||recursive (folders and subfolders)|
|-l||links (includes symbolic links)|
|-p||permissions (keeps the permissions e.g. 600)|
|-t||preservce modification timestamp|
|-D||preserve devices an special files|
|-v||verbose (details log, e.g. each file that is handled)|
|-H||preserve hard links (in case there are hardlinks in the source, this is quite necessary in order to be able to restore the backup correctly)|
|–exclude-from=FILE||excluding files and directories by patterns that are written in FILE|
|-h||status information is printed human readably|
|–log-file=FILE||writes all the interaction in to an extra log file|
|–delete||deletes all deleted files on destination too|
|–delete-excluded||in case there is a file in a directory, that is now excluded, it is deleted on the destination too|
|–progress||prints a progress bar during execution|
|–stats||prints some stats after execution|
|-n||dry run (especially interesting for simulating a backup before starting it the first time)|
# maybe reading command line parameters
RSYNC="$RSYNC -ahvHE --stats --progress --delete --delete-excluded"
# requires a file called excludes in the working directory
RSYNC="$RSYNC $SOURCE/ $TARGET"
# maybe some exception handling here
In order to start this script regularly, simply add it to /etc/cron.daily or equivalent …
In order to restore all the data simply change source and destination folder of the script. Otherwise it is possible to copy single files or folders in order to restore them.
It is quite easy to write a full featured backup solution with rsync. In order to complete the solution generating checksums for all the files would be great. Furthermore using logrotate could be a nice extention. Email notification in case of an failure or partial backups is another step. This is of course not a DAU compatible solution, but it’s easy to use and easy to restore. As it is not based on a special file format you can compress, encrypt or copy it without any problems.
- rsync manual