How my easy, home-made backup program saves time, space on the storage medium, and network bandwidth

1

Note: This is a long, in-depth article.

Nothing can ever go wrong with my computer and I will never lose my data. Riiiiight.

I’ve experienced data loss for a myriad of reasons, many of them my own fault. Keeping excellent backups has always enabled me to continue with minimal interruption. There are many ways in which data can be lost, compromised, or corrupted.

This article discusses the Bash backup program I created to prevent catastrophic data loss and facilitate easy recovery. Performing regular backups and ensuring that data can be restored from them is a critical part of any security plan. We’ll start by describing the problem and my solution, and look at two ways to download and install the program. We’ll then move on to running the setup program to prepare both the computer and the USB backup storage device. Finally, we’ll perform a simple backup of the root, home, and /usr/local directories.

The main configuration file works without change for what you’ll have done up to that point. However, you’ll undoubtedly need to backup more than just your own home directory. So we’ll then discuss the rsbu.config file and how to configure it to backup other directories on your local host and selected directories on remote hosts as well.

The Problem

Although data loss is a problem, the related problem is finding a backup system that meets all my requirements. Some were way more complex than I need, while others required tape for backup. In the past, I found that tape seldom works well for various reasons. Most other media specifically designed for backups have become obsolete.

My backup program must meet the following requirements:

  1. Be easy to use.
  2. Have a command line user interface that is easily understandable.
  3. Not have a graphical interface.
  4. Be easily configured.
  5. Maintain the configuration data separately from the executable code.
  6. Create backups that can be restored by any user.
  7. Be able use internal hard drives or external USB drives for backup media.
  8. Use space on the backup medium efficiently.
  9. Minimize the amount of time needed to make a backup.
  10. Backup all hosts on the network to a single backup device.
  11. Have an integrated help function.

No open source backup system whether free (as in beer) or not, that I ever considered met my requirements. Don’t get me wrong, there are some really good ones out there. But none had all the features I needed nor met all my requirements. So I decided to write my own backup program.

Just a quick work about terminology. The local host is the one on which the rsbu program will run. The rsbu program can perform the backup of the local host. It can also backup all remote computers that are in the same network. In this case, remote means “not the local host” and distance is irrelevant. The remote hosts can be a meter or many kilometers away so long as they are on the same network. Of course there are ways to circumvent the same network requirement but that’s a network issue.

Why hard drives?

I chose spinning disk hard drives (HDD) rather than solid state devices (SSD) as my backup medium. The primary reason is that SSDs can begin to experience data degradation after about a year without power applied. That makes them unsuitable for long-term, off-line backup and archival storage. Those rust-covered spinning disk HDDs can store data for decades without degradation. They are also still less expensive than SSDs.

Testing has shown that spinning disk HDDs typically store data significantly longer than SSDs with no power applied. However, there’s no guarantee that data on any given device won’t begin to degrade sooner.

I’ve used many types of tape and removable cartridge formats such as the Bernoulli Box and Zip drives for backup in the past, and all had less than perfect results. One reason is that the storage media on all those devices is open to the air so that the read/write heads which are mounted on the device into which the cartidge is loaded, can access the recording medium. This left them susceptible to contaminants that, even as small as a speck of dust or a hair can cause what IBM used to call Head/Disk Interference (HDI) which would damage the disk and the heads. USB HDDs are sealed and are not susceptible to airborne contaminants.

Why write it in Bash?

I use Bash all the time, in part because it is the default shell for most Linux distributions, including Fedora, the one I normally use. However I also use Bash because it is an excellent shell programming language and it has a large number of features to make life easier for the Lazy SysAdmin.

Features like tab completion, command line recall and editing, shortcuts like aliases, and more all contribute to its value as a powerful shell. One of my favorite Bash features is that, although it uses Emacs mode for command line editing by default, that can be changed to Vi mode so that I can use editing commands that are already part of my muscle memory.

But if we think of Bash solely as a shell we miss much of its true power. While researching Bash for my three-volume Linux self-study course1, I learned things about Bash that I never knew in over 20 years of working with Linux. Many of these new bits of knowledge relate to its use as a programming language. Bash is a powerful programming language, one perfectly designed for use on the command line and in shell scripts.

About rsync

Before I wrote my backup program, I had been experimenting with the rsync command which has some very interesting features that I have been able to use to good advantage. My primary objectives were to create backups from which users could locate and restore files quickly without having to extract data from a backup tarball, and to reduce the amount of time taken to create and the backups.

I wrote about Using rsync for Backup in a previous article so you should read that for additional background about why rsync is an excellent choice for creating backups that meet my requirements. It also explains the rsync commands that will be used in the rsbu Bash program. I’ve repeated two of the most important attributes here.

The bottom line is that rsync can be used to synchronize two directories or directory trees whether they are on the same computer or on different computers. It can create or update the target directory to be identical to the source directory. The target directory is also freely accessible by all the usual Linux tools because it is not stored in a tarball or zip file or any other archival file type; it is just a regular directory structure with regular Linux files that can be navigated by regular users using basic Linux tools such as a desktop file manager. This meets one of my primary objectives.

One of the most important features of rsync is the method it uses to synchronize preexisting files that have changed in the source directory. Rather than copying the entire file from the source, it only copies the blocks that have changed of the source file. This saves an immense amount of time and network bandwidth for remote sync. For example, when I first used my rsync Bash script to back up all of my hosts to a large external USB hard drive, it took about 3 hours. That is because all of the data had to be transferred because none of it had been previously backed up. Subsequent backups took between 3 and 8 minutes of real time, depending upon how many files had been changed or created since the previous backup. I used the time command to determine this so it is empirical data. Last night, for example, it took 3 minutes and 12 seconds to complete a backup of approximately 750GB of data from 6 remote systems and the local workstation. Of course, only a few hundred megabytes of data were actually altered during the day and needed to be backed up.

The rsync command has a very large number of options that you can use to customize the synchronization process. For the most part, the relatively simple commands that I have described in that previous article are perfect to make backups for my personal needs. Be sure to read the extensive man page for rsync to learn about more of its capabilities as well as details of the options discussed here.

Install the code

An RPM and a tarball are both available for download so you can install using the method of your choice. Regardless of which method you use, the files for rsbu are installed in the standard locations defined by the Linux Filesystem Hierarchy Structure (FHS).

From an RPM package

Download the code for rsbu from the link, rsbu-02.06-00.noarch.rpm into /tmp or another directory of your choice. This RPM installs all of the files required for rsbu. This includes rsbu.conf, the configuration file for rsbu. All user configuration is performed in this file. It also includes rsbu-setup script that provides tools to prepare the host computer and to prepare external USB media to store the backups.

Install this RPM as the root user with the following command.

# dnf -y install rsbu-02.06-00.noarch.rpm

From a tarball

If you don’t have a Red Hat, Fedora, or other RPM-based system, download the tarball from rsbu-02.06-00.tar into /tmp or another temporary location of your choice. This is the tarball equivalent of the rsbu RPM for those who don’t have an RPM-based distro. You must be root and in the root directory (/) to extract the files into their proper locations.

Make / (the root directory) the present working directory (PWD). Assuming that you’ve downloaded the tarball to /tmp, use the following command to extract the contents of the tarball into the correct directories. The PWD must be the root directory for this to work properly. I used the -v option so that the tar command will display the files it installs and their locations.

# tar -xvf /tmp/rsbu-02.06-00.tar
etc/
etc/systemd/
etc/systemd/system/
etc/systemd/system/backup.service
etc/systemd/system/backup.timer
etc/ssh/
etc/ssh/ssh_config.d/
etc/ssh/ssh_config.d/01-permitrootlogin.conf
usr/
usr/local/
usr/local/share/
usr/local/share/applications/
usr/local/share/applications/rsbu/
usr/local/share/applications/rsbu/gplv3.txt
usr/local/share/applications/rsbu/README.rsbu
usr/local/bin/
usr/local/bin/rsbu
usr/local/bin/rsbu-setup
usr/local/etc/
usr/local/etc/rsbu.conf
root@testvm1:/#

Preparation

Regardless of which installation method you used, the rest of the instructions are the same. All tasks must be performed as the root user.

Read the README

I know — nobody reads the documentation, right? Well, I ask you to please do that.

Be sure to read the /usr/local/share/applications/rsbu/README.rsbu document before performing setup or backups. It’s as yet incomplete but I’m working on it. What’s there, along with the instructions here will get you going.

Check the code

Before doing anything else, be sure to at least look at the Bash shell code for rsbu and rsbu-setup to get a feel for what they both do. It doesn’t matter if you’re not a programmer. You should do this anyway. The code is well commented and you should be able to sufficiently understand what it does as you read them.

Initial testing

Now do a quick check to ensure that the code has been installed correctly. Just view the Help.

root@testvm1:/# rsbu -h
rsbu - Performs backups of local and remote hosts using rsync.
It also uses the link capability of rsync to minimize storage usage
for unmodified files for series of daily backups.

Syntax: rsbu -[l|L|c|b|h|u]vd <Device number> s <host number> f <file name>
options:
b              Backup data to the selected Backup Media. Implies -u.
c              Check the contents of the Backup Media.
C              Clean the contents of the Backup Media leaving only the most recent
                backup which is renamed using todays date. Removing old Cruft.
                New option. Added 2022-09.
d              The number of the backup device. Use -l to list devices by number.
f <filename>   The fully qualified path to an alternate configuration file
                You are not likely to need the -f option.
h              Print this help.
l              List the backup devices and their respective numbers.
L              List the hosts to be backed up and their respective numbers.
p              Prepare the backup device. Requires device number and device name.
u              Unmount the backup device after completing backup or check.
v              Verbose mode.
V              Print version number and exit.


The configuration file is /usr/local/etc/rsbu.conf. Make all configuration
changes in that file.

Setting a 1 day retention in this file creates a single
backup of each host without dates. Additional backups simply update the
existing set of backup files and do not create a backup history.

root@testvm1:/#

If the Help is displayed as shown above, the code has installed correctly.

Preparing for the rsbu setup

I’ve written a Bash program to perform much of the setup for you. You will need to use the rsbu-setup program once to configure your computer and the external USB disk drive you will use for the backups. You can use the rsbu-setup program as many times as needed to prepare additional disk devices.

You will need an external USB drive for this. You can use a USB thumb drive for testing but you should definitely use a USB hard drive with spinning disks for your production backups for the reasons described above. Be sure that the device you use for testing has enough space to hold the data from /root, /home, and /usr/local.

The rsbu-setup program performs the following tasks while providing the user with descriptions of what it’s doing and multiple opportunities to opt-out of the process.

  1. Notify the user to plug in the external USB backup drive.
  2. Create the /media/Backups mount point.
  3. Select the most recently plugged storage drive.
  4. Add the new mount point to /etc/fstab.
  5. Delete all existing partitions on the drive.
  6. Create a single Linux partition that fills the entire drive
  7. Create EXT4 partition on drive.
  8. Creates the PPKP for SSH.
  9. Installs the PPKP on the localhost.

This script can be used to prepare a new hard drive for use as a backup device without performing any of the tasks required to prepare the system itself. Use the -p option for this.

View the rsbu-setup help.

root@testvm1:~# rsbu-setup -h

################################################################################
# rsbu-setup - Performs setup tasks for the rsbu backup program.               #
#                                                                              #
# This script can be used to prepare a new hard drive for use as a backup      #
# device using the -p option. You can also use the -c option to prepare the    #
# host computer.                                                               #
#                                                                              #
#                                                                              #
# Syntax: rsbu-setup -[h|g|V] -cpqv                                            #
# options:                                                                     #
# c            Prepare the computer. Creates the /media/MyBackups mount point  #
#              and adds the mount point to the /etc/fstab. file. Doesn't       #
#              prepare the USB device.                                         #
# g            Print the GPL license statement.                                #
# h            Print this help.                                                #
# p            Prepare a new external storage device for use as a backup       #
#              medium. This option doesn't prepare the computer itself.        #
# q            Quick mode. Skips built-in wait for drive to be identified.     #
#              Use this option if the drive has already been plugged in        #
#              for more than 30 seconds or so.                                 #
# V            Print the software version number.                              #
# v            Print verbose status information.                               #
################################################################################

This will give you an idea of the capabilities of this program. It also tells us that it was installed properly.

Start the setup procedure. Insert the USB device in a USB slot. Let’s identify the device just inserted so we can be certain we have the correct one during the setup. The data about the USB device should be the last entries in the dmesg output data stream. Look for the serial number to identify this specific device. Further down in this data, you’ll find the ID assigned to this device, sdb in this instance.

[42218.246900] st: Version 20160209, fixed bufsize 32768, s/g segs 256
[42218.403572] BIOS EDD facility v0.16 2004-Jun-25, 1 devices found
[42245.308870] usb 1-1: new high-speed USB device number 2 using xhci_hcd
[42245.641903] usb 1-1: New USB device found, idVendor=abcd, idProduct=1234, bcdDevice= 1.00
[42245.641919] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[42245.641926] usb 1-1: Product: UDisk           
[42245.641931] usb 1-1: Manufacturer: General 
[42245.641935] usb 1-1: SerialNumber: 1404190300490149481200
[42245.688320] usb-storage 1-1:1.0: USB Mass Storage device detected
[42245.688685] scsi host6: usb-storage 1-1:1.0
[42245.688775] usbcore: registered new interface driver usb-storage
[42245.692777] usbcore: registered new interface driver uas
[42246.736852] scsi 6:0:0:0: Direct-Access     General  UDisk            5.00 PQ: 0 ANSI: 2
[42246.737871] sd 6:0:0:0: Attached scsi generic sg2 type 0
[42246.740778] sd 6:0:0:0: [sdb] 15435960 512-byte logical blocks: (7.90 GB/7.36 GiB)
[42246.741613] sd 6:0:0:0: [sdb] Write Protect is off
[42246.741622] sd 6:0:0:0: [sdb] Mode Sense: 0b 00 00 08
[42246.742372] sd 6:0:0:0: [sdb] No Caching mode page found
[42246.742383] sd 6:0:0:0: [sdb] Assuming drive cache: write through
[42246.754919]  sdb: sdb1
[42246.756793] sd 6:0:0:0: [sdb] Attached SCSI removable disk
root@testvm1:~#

Performing the rsbu-setup

Use the c and p options to prepare both the computer and the external storage device. Whether or not you’ve connected the USB storage device, you’ll get this reminder. You could also examine this code to see how it works. If you do so, you’ll notice that this code checks to see if various tasks have been performed before. If they have, it will skip those tasks. This is intended to prevent things like installing a PPKP that supersedes a previous one, thus breaking PPKP communication for other hosts.

root@testvm1:~# rsbu-setup -cp

##############################################################################
#                               ATTENTION!!!                                 #
#                                                                            #
#  Plug in the external USB hard drive you will be using for your backups.   #
#                                                                            #
##############################################################################

Have you plugged in the external USB hard drive? (ynq)

Type y and press Enter to continue. The rsbu-setup program then displays the following message while it extracts the data about the USB device from the dmesg data stream. Once itfinds the device it displays the information it has found about it and asks you to positively verify that the device it found is the one intended to be your backup storage. The term “General UDisk” is almost always used for the device model regardless of the vendor.

Have you plugged in the external USB hard drive? (ynq) y 

##############################################################################
#                                                                            #
#   Please be patient while we identify this new USB hard drive.             #
#                                                                            #
##############################################################################

##############################################################################
#                                                                            #
#                A new device has been plugged in.                           #
#                        New Device = sdb 
#                                                                            #
##############################################################################
##############################################################################
#                                                                            #
# Verify that the information on the external storage device, especially     #
# that the device serial number matches that printed below.                  #
#                                                                            #
##############################################################################
##############################################################################

##############################################################################
#                                                                            #
#    WARNING!!   Be sure that this information is correct!   WARNINIG!!      #
#                                                                            #
#----------------------------------------------------------------------------#
#                                                                            #
# The new storage device is:                                                 #
#                                                                            #
#      Vendor: usb 0xabcd "General"
#      Model: "General UDisk"
#      Serial ID: "1404190300490149481200"
#                                                                            #
#----------------------------------------------------------------------------#
#                                                                            #
# WARNING!!!    If the serial number printed above does not match that       #
#               printed on the physical label on the storage device, DO NOT  #
#               CONTINUE this procedure. Enter n or q below.                 #
#                                                                            #
##############################################################################
Are you POSITIVE that the device shown above is the correct one for your backups? (ynq)

Compare the serial number displayed here to that from the dmesg command. If it’s the same, you can proceed.

Press y and Enter to positively verify that this is the correct device. The program gives you one more chance to be absolutely certain that you’re going to let it delete all the data on the USB device.

Are you POSITIVE that the device shown above is the correct one for your backups? (ynq) y

##############################################################################
#   WARNING!!   WARNING!!   WARNING!!   WARNING!!   WARNING!!   WARNING!!    #
#                                                                            #
#                                                                            #
#    !!THIS PROCEDURE WILL DELETE ALL OF THE DATA ON THE USB HARD DRIVE!!    #
#                                                                            #
#                ARE YOU REALLY SURE YOU WANT TO CONTINUE?                   #
#                                                                            #
#                                                                            #
#   WARNING!!   WARNING!!   WARNING!!   WARNING!!   WARNING!!   WARNING!!    #
##############################################################################
ARE YOU REALLY SURE YOU WANT TO CONTINUE? (ynq)

If you are sure you want to continue, press y and Enter. This process can take several minutes for large devices so be patient. This will complete preparation of the disk and begin preparing the computer by generating a Public/Private keypair (PPKP)2 for SSH. This is required because SSH is used by rsync even on the local host. Without this PPKP you’d need to enter the root password even when performing only a local backup. You’ll need to install the public key for the root user on all other computers that will be part of your backup procedures.

ARE YOU REALLY SURE YOU WANT TO CONTINUE? (ynq) y

##############################################################################
#                                                                            #
# Please be patient.                                                         #
#                                                                            #
# This may take a few minutes depending upon the size of the backup          #
# device being prepared.                                                     #
#                                                                            #
##############################################################################
/dev/sdb: 2 bytes were erased at offset 0x000001fe (dos): 55 aa
/dev/sdb: calling ioctl to re-read partition table: Success
Creating new partition on sdb
Checking that no-one is using this disk right now ... OK

Disk /dev/sdb: 7.36 GiB, 7903211520 bytes, 15435960 sectors
Disk model: UDisk
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

>>> Created a new DOS (MBR) disklabel with disk identifier 0x74631234.
/dev/sdb1: Created a new partition 1 of type 'Linux' and of size 7.4 GiB.
/dev/sdb2: Done.

New situation:
Disklabel type: dos
Disk identifier: 0x74631234

Device     Boot Start      End  Sectors  Size Id Type
/dev/sdb1        2048 15435959 15433912  7.4G 83 Linux

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
mke2fs 1.47.0 (5-Feb-2023)
Creating filesystem with 1929239 4k blocks and 482384 inodes
Filesystem UUID: d9eda845-5855-4f1a-b2be-6ed3e9ba82e6
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done 

##############################################################################
#                  Backup partition space usage data                         #
##############################################################################
# Device label =   MyBackups
# Mount Point =    /media/MyBackups
df: /media/MyBackups: No such file or directory
df: /media/MyBackups: No such file or directory
##############################################################################

##############################################################################
#                                                                            #
# Your new backup device has been prepared for use with the rsbu program.    #
# The device has been unmounted so you can remove if from the USB slot.      #
#                                                                            #
##############################################################################


##############################################################################
#                                                                            #
# Preparing testvm1.both.org for rsbu backups.
#                                                                            #
##############################################################################
Creating fstab entry

##############################################################################
#                            !!!ATTENTION!!!                                 #
#                                                                            #
#  During this portion of configuration you need to create a Public/Private  #
#  KeyPair (PPKP) to provide encrypted communication during the backup       #
#  procedure.                                                                #
#                                                                            #
#  You will be asked to enter a file name in which to save the RSA key.      #
#  DO NOT enter a name for the file. Just press the Enter key to accept      #
#  the default file name.                                                    #
#                                                                            #
#  You will be asked twice to enter a passphrase. Do not enter a passphrase. #
#  Just press the Enter key for an empty passphrase.                         #
#                                                                            #
# Refer to Wikipedia for details of PPKP.                                    #
#  https://en.wikipedia.org/wiki/Public-key_cryptography                     #
#                                                                            #
##############################################################################

Generating public/private ed25519 key pair.
Enter file in which to save the key (/root/.ssh/id_ed25519):

The response to this prompt is to press Enter because the generated file name displayed in the prompt, /root/.ssh/id_ed25519, is fine. Then press Enter for both passphrase prompts. We’re not using passphrases as that would require the SysAdmoin — you — to enter the passphrase every time rsbu begins the backup of any host, local or not. That would prevent performing scheduled, unattended backups.

Enter file in which to save the key (/root/.ssh/id_ed25519): <Enter>
Enter passphrase (empty for no passphrase): <Enter>
Enter same passphrase again: <Enter>

The public and private keys are stored in this section of the code.

Your identification has been saved in /root/.ssh/id_ed25519
Your public key has been saved in /root/.ssh/id_ed25519.pub
The key fingerprint is:
SHA256:EVx8ujEcl9MQ76OnD+bLt9lJvOWXHwAmlr+VHzJMSLc root@testvm1.both.org
The key's randomart image is:
+--[ED25519 256]--+
|       ..o..o*   |
|        ..= B.o  |
|        .= O E.  |
|        ..O +..  |
|        S  = Bo. |
|          . o.*..|
|           .+ .=o|
|           + +o=*|
|            =+++*|
+----[SHA256]-----+

##############################################################################
#                                                                            #
# We now need to install the public key for this computer. You will first    #
# need to respond with 'yes' and press the Enter key when asked if you       #
# want to continue connecting to the host. Be sure to use the word 'yes'     #
# not just 'y'.                                                              #
#                                                                            #
# Next enter the root password when requested.                               #
#                                                                            #
##############################################################################

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_ed25519.pub"
The authenticity of host 'testvm1.both.org (192.168.0.101)' can't be established.
ED25519 key fingerprint is SHA256:i4XXP20X4d+5A8mIprOBq9oVNH2JGo5TGBu2vF7ICWM.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?

Be sure to type the word yes and press Enter to continue.

Are you sure you want to continue connecting (yes/no/[fingerprint])?yes

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_ed25519.pub"
The authenticity of host 'testvm1.both.org (192.168.0.101)' can't be established.
ED25519 key fingerprint is SHA256:i4XXP20X4d+5A8mIprOBq9oVNH2JGo5TGBu2vF7ICWM.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

############################### W A R N I N G #################################
# THIS COMPUTER SYSTEM IS PRIVATE PROPERTY. USERS HAVE NO EXPECTATION OF      #
# PRIVACY. USE OF THIS COMPUTER SYSTEM IS SUBJECT TO MONITORING OR OTHER      #
# REVIEW BY THE OPERATOR/OWNER OR OTHERS. UNAUTHORIZED OR IMPROPER USE OF     #
# THIS SYSTEM MAY RESULT IN LEGAL PROSECUTION AND CIVIL AND CRIMINAL          #
# PENALTIES.                                                                  #
#              USE OF THIS SYSTEM CONSTITUTES CONSENT TO MONITORING.          #
###############################################################################
#    Continuing the login process constitutes acceptance of this policy.      #
###############################################################################

root@testvm1.both.org's password: <Enter the root password>

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'testvm1.both.org'"
and check to make sure that only the key(s) you wanted were added.

root@testvm1:~#

Testing

Testing is an important aspect of installing this new software. So let’s start with some basics. As root on the local host, testvm1 in my case, login to the local host. Be sure to enter yes and press Enter when asked if you want to continue connecting. You must perform this step or the backups of the local host won’t work.

root@testvm1:~# ssh testvm1
The authenticity of host 'testvm1 (192.168.0.101)' can't be established.
ED25519 key fingerprint is SHA256:i4XXP20X4d+5A8mIprOBq9oVNH2JGo5TGBu2vF7ICWM.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'testvm1' (ED25519) to the list of known hosts.

root@testvm1:~#

After verifying that this works, type exit and press Enter to close this child shell session.

There are a couple additional things to check before performing the simple backup configured in /usr/local/etc/rsbu.conf. The first command below shows the revised /etc/fstab with the entry for /media/MyBackups. The second test lists the actual /media/MyBackups directory.

root@testvm1:~# cat /etc/fstab 

#
# /etc/fstab
# Created by anaconda on Mon Jun 10 11:23:45 2024
#
# Accessible filesystems, by reference, are maintained under '/dev/disk/'.
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
#
# After editing this file, run 'systemctl daemon-reload' to update systemd
# units generated from this file.
#
UUID=8a1ca039-dace-4628-b4f8-4dd7a5d79647 /                       ext4    defaults        1 1
UUID=4dbfc467-4ea3-4e0b-8260-30a922319de1 /boot                   ext4    defaults        1 2
UUID=3bf9194e-54b0-4e66-ab37-b04b715c2a6f /home                   ext4    defaults        1 2
UUID=75bafb60-fa9b-4c51-9526-bf31b59ebd03 /tmp                    ext4    defaults        1 2
UUID=8a0d9c1e-c552-4aa5-84a8-76f82fbd07cf /usr                    ext4    defaults        1 2
UUID=ee939988-bd3f-4723-9d99-10e1e2baffcd /var                    ext4    defaults        1 2
LABEL=MyBackups   /media/MyBackups    ext4    noauto,defaults      0 0

root@testvm1:~# ll /media
total 4
drwxr-xr-x 2 root root 4096 Jun 11 09:43 MyBackups
root@testvm1:~#

Now let’s verify that the rsbu program is reading the rsbu.conf file. We’ll list the backup device and the configured hosts and the files to be backed up. The lowercase l (el) option lists the configured backup devices. In this case we have only a single device which is device number 1. We also have only a single host. The hostname for the localhost is determined by the script which obtains it from the $HOSTNAME environment variable. I have set it up this way so that during this initial installation and testing phase, it won’t be necessary for you to make any changes to the rsbu.conf file. We’ll get to that later.

root@testvm1:/usr/local/etc# rsbu -l
Device 1 = External USB hard drive

root@testvm1:/usr/local/etc# rsbu -L
Host 1 = testvm1;--exclude .gvfs --exclude Cache ;:/root :/home :/usr/local

This looks correct and, with the exception of the hostname, should be the same for you.

Performing a simple backup

It’s now time to perform an actual backup as a final test. The default configuration in /usr/local/etc/rsbu.conf will backup the /root, /home, and /usr/local on the localhost to the USB drive that you prepared. The b option tells rsbu to perform a backup based on the details in the rsbu.conf file. The backup device is automatically unmounted at the end of the backup.

root@testvm1:~# rsbu -b

Mount the backup device again, and verify that the backup has been performed. I like the tree command for this as in Figire 1.

root@testvm1:~# mount /media/MyBackups/
root@testvm1:~# tree /media
/media
└── MyBackups
    ├── Backups
    │   └── testvm1.both.org
    │       └── 2024-06-12-RSBackup
    │           ├── home
    │           │   ├── dboth
    │           │   │   ├── .bash_history
    │           │   │   ├── .bash_logout
    │           │   │   ├── .bash_profile
    │           │   │   ├── .bashrc
    │           │   │   ├── .cache
    │           │   │   │   ├── abrt
    │           │   │   │   │   └── applet_dirlist
    │           │   │   │   ├── gstreamer-1.0
    │           │   │   │   │   └── registry.x86_64.bin
    │           │   │   │   ├── imsettings
    │           │   │   │   │   ├── log
    │           │   │   │   │   └── log.bak
    │           │   │   │   ├── obexd
    │           │   │   │   ├── sessions
    │           │   │   │   ├── tracker3
    │           │   │   │   │   ├── files
    │           │   │   │   │   │   ├── errors
    │           │   │   │   │   │   ├── first-index.txt
    │           │   │   │   │   │   ├── http%3A%2F%2Ftracker.api.gnome.org%2Fontology%2Fv3%2Ftracker%23Audio.db
<SNIP>
    │           │   │   │   │   │   ├── http%3A%2F%2Ftracker.api.gnome.org%2Fontology%2Fv3%2Ftracker%23Video.db-wal
    │           │   │   │   │   │   ├── last-crawl.txt
    │           │   │   │   │   │   ├── meta.db
    │           │   │   │   │   │   ├── meta.db-shm
    │           │   │   │   │   │   ├── meta.db-wal
    │           │   │   │   │   │   └── ontologies.gvdb
    │           │   │   │   │   └── rss
    │           │   │   │   │       ├── meta.db
    │           │   │   │   │       ├── meta.db-shm
    │           │   │   │   │       ├── meta.db-wal
    │           │   │   │   │       └── ontologies.gvdb
    │           │   │   │   └── xfce4
    │           │   │   │       └── notifyd
    │           │   │   │           └── log.sqlite
    │           │   │   ├── .config
    │           │   │   │   ├── abrt
    │           │   │   │   ├── dconf
    │           │   │   │   │   └── user
    │           │   │   │   ├── imsettings
    │           │   │   │   ├── pulse
    │           │   │   │   │   └── cookie
    │           │   │   │   ├── Thunar
    │           │   │   │   ├── Thunar
    │           │   │   │   │   └── uca.xml
    │           │   │   │   ├── user-dirs.dirs
    │           │   │   │   ├── user-dirs.locale
    │           │   │   │   └── xfce4
    │           │   │   │       ├── desktop
    │           │   │   │       │   ├── icons.screen0-1256x957.rc
    │           │   │   │       │   └── icons.screen.latest.rc -> /home/dboth/.config/xfce4/desktop/icons.screen0-1256x957.rc
    │           │   │   │       ├── panel
    │           │   │   │       │   ├── launcher-17
    │           │   │   │       │   │   └── 17180401151.desktop
    │           │   │   │       │   ├── launcher-18
    │           │   │   │       │   │   └── 17180401152.desktop
    │           │   │   │       │   ├── launcher-19
    │           │   │   │       │   │   └── 17180401153.desktop
    │           │   │   │       │   └── launcher-20
    │           │   │   │       │       └── 17180401164.desktop
    │           │   │   │       ├── xfconf
    │           │   │   │       │   └── xfce-perchannel-xml
    │           │   │   │       │       ├── displays.xml
    │           │   │   │       │       ├── thunar.xml
    │           │   │   │       │       ├── xfce4-desktop.xml
    │           │   │   │       │       ├── xfce4-keyboard-shortcuts.xml
    │           │   │   │       │       ├── xfce4-notifyd.xml
    │           │   │   │       │       ├── xfce4-panel.xml
    │           │   │   │       │       ├── xfce4-power-manager.xml
    │           │   │   │       │       └── xfwm4.xml
    │           │   │   │       └── xfwm4
    │           │   │   ├── Desktop
    │           │   │   ├── development
    │           │   │   │   ├── .bash_logout
    │           │   │   │   └── .config
    │           │   │   │       └── mc
    │           │   │   │           ├── DavidsGoTar.ini
    │           │   │   │           ├── ini
    │           │   │   │           └── panels.ini
    │           │   │   ├── Documents
    │           │   │   ├── Downloads
    │           │   │   ├── .gnupg
    │           │   │   │   └── private-keys-v1.d
    │           │   │   ├── .ICEauthority
    │           │   │   ├── .local
    │           │   │   │   ├── share
    │           │   │   │   └── state
    │           │   │   │       └── wireplumber
    │           │   │   │           └── stream-properties
    │           │   │   ├── .mozilla
    │           │   │   │   ├── extensions
    │           │   │   │   └── plugins
    │           │   │   ├── Music
    │           │   │   ├── Pictures
<SNIP>
    │           │   ├── .ssh
    │           │   │   ├── authorized_keys
    │           │   │   ├── id_ed25519
    │           │   │   ├── id_ed25519.pub
    │           │   │   ├── known_hosts
    │           │   │   └── known_hosts.old
    │           │   ├── .tcshrc
    │           │   ├── toprc
    │           │   ├── updateGeoIPDatabases
    │           │   ├── updateVBExtPack.sh
    │           │   ├── UpgradeFedora.sh
    │           │   ├── .viminfo
    │           │   ├── xinitrc
    │           │   └── xinitrc.alt
    │           └── TimeStamp
    └── lost+found

120 directories, 130 files

Figure 1: The existence of these directories and files indicate that the backup was successfully created.

Since this backup was performed on a test VM, I have no personal data files in my home directory, just configuration files for the most part.

Backup Strategy

I use a backup strategy in which I employ the capabilities of rsync command in my script. I designed it to create a date-sequence of backups for each host in my network. The backup drives end up with a structure similar to the one shown in the output from the tree command above. You can see a directory level sample of how this would look for multiple hosts over multiple days in Figure 2.

This makes it easy for both SysAdmins and users to locate specific files that might need to be restored.

/-
 |
 /-path to backup media
           |
            /Backups
               |
               |--/host1
               |   |--/2018-01-01
               |   |      |--/etc
               |   |      |--/home
               |   |      |--/var
               |   |      |--/usr/local
               |   V      V
               |   |--2018-01-02
               |   |      |--/etc
               |   |      |--/home
               |   |      |--/var
               |   |      |--/usr/local
               |   V      V
               |   |--2018-01-03
               |   |      |--/etc
               |   |      |--/home
               |   |      |--/var
               |   |      |--/usr/local
               |   V      V
               |--host2
               |   |--2018-01-01
               |   |      |--/etc
               |   |      |--/home
               |   |      |    |
               |   |      |    |--/student
               |   |      |    |     |
               |   |      |    |     |--/file1.txt (Unchanged) 
               |   |      |    |     |
               |   |      |    V     V
               |   |      |--/var
               |   |      |--/usr/local
               |   |      V
               |   |--2018-01-02
               |   |      |--/etc
               |   |      |--/home
               |   |      |    |
               |   |      |    |--/student
               |   |      |    |     |
               |   |      |    |     |--/file1.txt (Unchanged)
               |   |      |    V     V
               |   |      |--/var
               |   |      |--/usr/local
               |   |      |
               |   |      V
               |   |      
               |   |--2018-01-03
               |   |      |--/etc
               |   |      |--/home
               |   |      |    |
               |   |      |    |--/student
               |   |      |    |     |
               |   |      |    |     |--/file1.txt  (Changed)
               |   |      |    V     V
               |   |      |--/var
               |   |      |--/usr/local
               |   |      |
               V   V      V

Figure 2. The directory structure for my backup data storage disks.

Starting with an empty disk on January 1, the rsbu script makes a complete backup for each host of all the files and directories that I have specified in the configuration file. This first backup can take several hours if you have a lot of data like I do.

On January 2, the rsync command uses the –link-dest= option to create a complete new directory structure identical to that of January 1, then it looks for files that have changed in the source directories. If any have changed, A copy of the original file from January 1 is made in the January 2 directory and then the parts of the file that have been altered are updated from the original.

After the first backup onto an empty drive, the backups take very little time because the hard links are created first, and then only the files that have been changed since the previous backup need any further work. The resulting backups will look similar to that in Figure 1.

Figure 2 also shows a bit more detail for the host2 series of backups for one file, /home/student/file1.txt, on the dates January 1, 2, and 3. On January 2 the file has not changed since January 1. In this case, the rsync backup does not copy the original data from January 1. It simply creates a directory entry with a hard link in the January 2 directory to the January 1 directory which is a very fast procedure. We now have two two directory entries pointing to the same data on the hard drive. On January 3, the file has been changed. In this case, the data for ../2018-01-02/home/student/file1.txt is copied to the new directory, ../2018-01-03/home/student/file1.txt and any data blocks that have changed are then copied to the backup file for January 3. These strategies, that are implemented using features of the rsync program, allow backing up huge amounts of data while saving disk space and much of the time that would otherwise be required to copy data files that are identical.

One of my procedures is to run the backup script twice each day from a single systemd timer. The first iteration performs a backup to an internal 4TB hard drive. This is the backup that is always available and always at the most recent version of all my data. If something happens and I need to recover one file or all of them, the most I could possibly lose is a few hours worth of work.

The second backup is made to one of a rotating series of 4TB external USB hard drive. I take the most recent drive to my safe deposit box at the bank at least once per week. If my home office is destroyed and the backups I maintain there are destroyed along with it, I just have to get the external hard drive from the bank and I have lost at most a single week of data. That type of loss is easily recovered.

The drives I am using for backups, not just the internal hard drive but also the external USB storage devices that I rotate weekly, never fill up. This is because the rsbu script I wrote checks the ages in days of the backups on each drive before a new backup is made. If there are any backups on the drive that are older than the specified number of days, they are deleted. The script uses the find command to locate these backups. The number of days is specified in the rsbu.conf configuration file.

Of course after a complete disaster, I would first have to find a new place to live with office space for my wife and I, purchase parts and build new computers, restore from the remaining backups, and then recreate any lost data.

Tip: I have looked at a number of expensive commercial backup programs over the years. None of them are as easy to use as my script using rsync and some of them are actually just commercial front-ends for an rsync back-end.

Recovery Testing

No backup regimen would be complete without testing. You should regularly test recovery of random files or entire directory structures to ensure not only that the backups are working, but that the data in the backups can be recovered for use after a disaster. I have seen too many instances where a backup could not be restored for one reason or another and valuable data was lost because the lack of testing prevented discovery of the problem.

Just select a file or directory to test and restore it to a test location such as /tmp so that you won’t overwrite a file that may have been updated since the backup was performed. Verify that the files’ contents are as you expect them to be. Restoring files from a backup made using the rsync commands above simply a matter of finding the file you want to restore from the backup and then copying it to the location to which you want to restore it.

I have had a few circumstances where I have had to restore individual files and, occasionally, a complete directory structure. I have had to restore the entire contents of a hard drive on a couple occasions. Most of the time this has been self-inflicted when I accidentally deleted a file or directory. At least a few times it has been due to a crashed hard drive. So those backups do come in handy.

Configuring hosts and directories to backup

Now that you’ve completed a successful test backup, you need to look at the rsbu.conf file in order to understand how to configure it for the specifics of your environment. Although the rsbu.conf file shown in Figure 3 is fairly simple and, I would like to think, self-explanatory, these things rarely are. Get ready for a new revelation about how cool is Bash!

################################################################################
#                               rsbu.conf                                      #
################################################################################
################################################################################
################################################################################
#                                                                              #
#  Copyright (C) 2021 David Both                                               #
#  LinuxGeek46@both.org                                                        #
#                                                                              #
#  This program is free software; you can redistribute it and/or modify        #
#  it under the terms of the GNU General Public License as published by        #
#  the Free Software Foundation; either version 3 of the License, or           #
#  (at your option) any later version.                                         #
#                                                                              #
#  This program is distributed in the hope that it will be useful,             #
#  but WITHOUT ANY WARRANTY; without even the implied warranty of              #
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the               #
#  GNU General Public License for more details.                                #
#                                                                              #
#  You should have received a copy of the GNU General Public License along     #
#  with this program; if not, If not, see <https://www.gnu.org/licenses/>.     #
#                                                                              #
################################################################################
################################################################################
################################################################################
# NOTE: We now handle multiple backup devices using arrays for the various     #
# data variables.                                                              #
################################################################################
# Configuration file for rsbu, David's RSYNC BackUp program. 
################################################################################
################################################################################
#                               rsbu.conf                                      #
################################################################################
################################################################################
################################################################################
#                                                                              #
#  Copyright (C) 2021 David Both                                               #
#  LinuxGeek46@both.org                                                        #
#                                                                              #
#  This program is free software; you can redistribute it and/or modify        #
#  it under the terms of the GNU General Public License as published by        #
#  the Free Software Foundation; either version 3 of the License, or           #
#  (at your option) any later version.                                         #
#                                                                              #
#  This program is distributed in the hope that it will be useful,             #
#  but WITHOUT ANY WARRANTY; without even the implied warranty of              #
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the               #
#  GNU General Public License for more details.                                #
#                                                                              #
#  You should have received a copy of the GNU General Public License along     #
#  with this program; if not, If not, see <https://www.gnu.org/licenses/>.     #
#                                                                              #
################################################################################
################################################################################
################################################################################
# NOTE: We now handle multiple backup devices using arrays for the various     #
# data variables.                                                              #
################################################################################
# Configuration file for rsbu, David's RSYNC BackUp program. 
################################################################################
# Always unmount the backup medium. This is for data safety 
unmount=1
#
################################################################################
# The Backup Section contains arrays for defining the backups retention ages   #
# and definitions of the backup media and their mount points.                  #
################################################################################

# Set the oldest age in days of the backup files to retain.
# Files older than this number of days will be deleted.
# Use a 1 day retention to only have one backup with no dates.
Age[1]=60
# Age[2]=90
#

################################################################################
# The path to the media mount point. The mount point must already be           #
# configured in /etc/fstab or this program will not work.                      #
################################################################################
MountPoint[1]="/media/MyBackups"
# MountPoint[2]="/media/MyBackups2"
################################################################################
# The path to the backup directory which may be different from the mount-point #
# especially in cases of NFS or SAMBA mounts                                   #
################################################################################
BasePath[1]="/media/MyBackups/Backups"
# BasePath[2]="/media/MyBackups2/Backups"
################################################################################
# The device name is human readable text name                                  #
################################################################################
DeviceName[1]="External USB hard drive"
# DeviceName[2]="External HDD"

################################################################################
# The hosts section contains the hostnames and a list of directories to be     #
# backed up. Each is a semicolon separated line with the hostname, a list      #
# of directories to back up and a list of exclusion patterns.                  #
# The pattern of these lines is as follows:                                    #
# hostname;excludes;list of directories to back up                             #
# Numbering must be sequential as this creates an array used by rsbu.          #
################################################################################

Backup[1]="$ThisHost;--exclude .gvfs --exclude Cache ;:/root :/home :/usr/local"
# Backup[2]="mycomputer2;--exclude Cache --exclude yumdb --exclude .gvfs  ;:/root :/etc :/home :/usr/local :/var"
################################################################################
# The Backup Section contains arrays for defining the backups retention ages   #
# and definitions of the backup media and their mount points.                  #
################################################################################

# Set the oldest age in days of the backup files to retain.
# Files older than this number of days will be deleted.
# Use a 1 day retention to only have one backup with no dates.
Age[1]=60
# Age[2]=90
#
################################################################################
# The path to the media mount point. The mount point must already be           #
# configured in /etc/fstab or this program will not work.                      #
################################################################################
MountPoint[1]="/media/MyBackups"
# MountPoint[2]="/media/MyBackups2"
################################################################################
# The path to the backup directory which may be different from the mount-point #
# especially in cases of NFS or SAMBA mounts                                   #
################################################################################
BasePath[1]="/media/MyBackups/Backups"
# BasePath[2]="/media/MyBackups2/Backups"
################################################################################
# The device name is human readable text name                                  #
################################################################################
DeviceName[1]="External USB hard drive"
# DeviceName[2]="External HDD"

################################################################################
# The hosts section contains the hostnames and a list of directories to be     #
# backed up. Each is a semicolon separated line with the hostname, a list      #
# of directories to back up and a list of exclusion patterns.                  #
# The pattern of these lines is as follows:                                    #
# hostname;excludes;list of directories to back up                             #
# Numbering must be sequential as this creates an array used by rsbu.          #
################################################################################

Backup[1]="$ThisHost;--exclude .gvfs --exclude Cache ;:/root :/home :/usr/local"
# Backup[2]="mycomputer2;--exclude Cache --exclude yumdb --exclude .gvfs  ;:/root :/etc :/home :/usr/local :/var"

Figure 3: The rsbu.conf file defines the hosts and directories on each that are to be backed up.

The rsbu.conf file contains five one-dimensional indexed arrays. I didn’t realize that Bash supported arrays until I wrote this code.3 I can use them even if they’re unidimensional. Many languages I’ve used support multidimensional arrays but we can do everything we need with Bash and its one-dimensional ones. Bash arrays are zero-based so that the first item in the array has a subscript of zero such as Item[1].

There are five arrays defined in this configuration file.

  1. Age[1]=60 — The oldest age in days of the backup files to retain — 60 days in this case. Files older than this will be deleted. The [1] indicates that this is the first item in the array and thus, the first backup device. The second item is commented out, but would be used if a second type of backup were used, such as an internal hard drive or a different external hard drive. The age value can only be set in the rsbu.config file.
  2. MountPoint[1]=”/media/MyBackups” — Defines the mountpoint for the first backup device.
  3. BasePath[1]=”/media/MyBackups/Backups” — Defines the fully qualified base path for the first backup device. You can see this in Figure 1.
  4. DeviceName[1]=”External USB hard drive” — The device name is human readable text name that has no effect on the actual backup process. It’s just a good way to identify the device for SysAdmins.
  5. Backup[1]=”$ThisHost;–exclude .gvfs –exclude Cache ;:/root :/home :/usr/local” — This array is used to define the list of hosts and the directories to be backed up for each host. It can also be used to define specific patterns for files and directories that can be excluded from the backup. Usually, insteqad of using the variable $ThisHost, the actuasl hostname is given. Note that everything in this line that comes after the hostname is options and arguments for the rsync command, and is passed to rsync at the time each host is backed up. Multiple hosts can be defined as shown in Figure 3, with different directories specified to be backed up for eah host.

Preparing additional backup media

You can prepare additional backup media. The instructions are the same as in the first part of the preparation as shown above.

root@testvm1:~# rsbu-setup -p

Multiple backup devices, i.e., more than a single external hard drive can be used for a given mountpoint so long as it has the proper extended filesystem name applied to it. The device preparation does this. This allows multiple devices to be used in a rotation so that the newest backup can be stored off-site for additional security in case the main site is destroyed.

Do It Yourself (DIY)

Since this backup tool is already working for you — or at least it should be if you’ve followed along — the next step is for you to make a copy of the existing, working rsbu.conf file. Then make any changes necessary to the Backup[1] array item to create the backups you need for the current host. Be sure to specify the hostname instead of using the variable. Test this before going any further.

Next, copy the public key from this host to the next host you want to perform backups for. The ssh-copy-id command is how you’ll do this.

# ssh-copy-id hostname

Then respond to the prompts as you did for this section of the initial preparation.

Add a new line to the Backup array. Specify the hostname and the directories to be backed up. Then test to ensure that both hosts are properly backed up.

Repeat for additional hosts.

Automation

Running the rsbu program is easy. But what if you need to do it at oh-dark-thirty when no one is using the computers? Well — that’s another article.

Summary

Backups are an incredibly important part of our jobs as SysAdmins. I have experienced many instances where backups have enabled rapid operational recovery for places I have worked as well as for my own business and personal data.

We use the rsync command as the basis for making backups that save both time and storage space. These backups can also be directly accessible to regular users. The scripts I have provided for download can get you started using this advanced and powerful backup method.

Like everything else, backups are all about what you need. Whatever you do – do something! Figure out how much pain you would have if you lost everything – data, computers, hard copy records – everything. The pain includes the cost of replacing the hardware and the cost of the time required to restore data that was backed up and to recover data that was not backed up. Then plan and implement your backup systems and procedures accordingly.

There are many options for performing and maintaining data backups. I do what works for me and have never had a situation where I lost more than a few hours worth of data.


Resources


Footnotes

  1. Both, David, Using and Administering Linux: Zero to SysAdmin, Apress, 2023 ↩︎
  2. Wikipedia, Public Key Cryptography ↩︎
  3. Bash also supports associative arrays that are sometimes called hash tables. Those are outside the scope of this article. ↩︎