sync — The command you never heard of
Have you ever heard of the sync command? Have you heard of it but then forgotten all about it — like me? The sync command is one that I tend to ignore most of the time. I mean, I have added it to many of my scripts but then never think of it again.
But what does it do and why do we need it?
The basic problem
The root problem is that data isn’t written to storage devices at the instant we press the “Save” button in our LibreOffice applications. Nor does it happen with other applications. Our data is first stored in an area of RAM designated by Linux for cache.
What is cache?
Cache speeds up writes from the applications because RAM is faster than disk.
The fact is that the much slower spinning hard drives can’t always keep up with data writes from many system and user level programs all trying to save data at the same time. The program currently writing to the hard drive would block all the other programs so they could only proceed serially, one after another, until they all had their turn. By writing data to cache, the application and system programs can return to their tasks immediately so that there is only imperceptible delay.
We need to know a little about buffers and the difference between them and cache. The bottom line is that there’s not much difference. They are both areas of RAM designated by the operating system to store data until it can be used by an application or placed in some form of storage like an HDD or SSD. They are so similar that our memory management tools like free, top, htop, and so on, don’t differentiate between them in their reporting.
We can use the free command to display the combined amount of buffers and cache.
# free
total used free shared buff/cache available
Mem: 16367432 712616 14348668 11720 1627392 15654816
Swap: 8388604 0 8388604
The immediate problem
One side-effect of caching is that it might not be clear to users when all of the cache has been committed to disk. This leaves open the possibilities that a forced power-off or reboot might cause the loss of data if that residing in cache has not been flushed to persistent storage.
We assume that our data is safe once the program terminates or we press the “Save” button.
sync to the rescue
The sync command is a simple one but it resolves that little problem for us. The sync program is designed to flush all data held in cache to storage, thus making it safe for us to reboot or shut down the host. The typical use by experienced SysAdmins after running programs like Vim or other text editors, updates, installations, and more, is to run sync twice in a row to ensure that all data has been flushed.
$ sync ; sync
That’s it. The sync program produces no output. It does have a couple options you can use. The -d (data) option can be used to flush only the data and not any unneeded metadata. This can save time. The -f (file-system) option will sync the entire filesystem containing the cached files.
Is sync still needed?
The short answer is “yes.”
The long answer is that the sync command was designed in a time when tape storage was in common use and disk drives were very expensive. It takes a relatively long time for cached data to be written to tape. The sync command was used to ensure that all data was flushed from the cache onto tape. Early disk drives were much slower than the ones we typically use on our personal computers today.
However, I’ve seen the Linux shutdown wait on a busy program for a minute or so, but if the program doesn’t respond to a termination request after the timeout, the Linux kernel just terminates it, even if it hasn’t completed writing data to storage.
I try to perform a clean shutdown of all the programs running on my system, then run sync ; sync and then do the reboot or power-off. I use the double sync after performing many tasks, despite the fact that a graceful shutdown of a modern Linux computer is supposed to flush all the cache to storage.
That said, I’ve seldom encountered a problem that I could identify as being a result of uncommitted data being left in the cache. A few mangled files scattered over the last 28 years since I’ve been using Linux might have been caused by that, although there’s no way to prove it.
Today’s fast systems and hard drives, along with the advent of blazing fast SSDs make these issues much less common. But there is still a slight probability that data loss might still occur.
So my final, unequivocal answer is — maybe. Better safe than sorry.
One thought on “sync — The command you never heard of”
Leave a Reply
You must be logged in to post a comment.
I loved reading today’s article about the sync command! I always use sync when running backups. To make a backup, I use rsync (unrelated, despite similar name) to copy my $HOME to an external USB drive, then run sync to force the system to flush the buffers. Using sync to flush the system buffers means I don’t have to wait as long when unmounting the USB drive, because I usually run it like this:
$ date ; rsync -a $HOME/ /path/to/backup/ ; date ; sync ; date
(Once I see the third timestamp printed, I know using umount won’t take too long.)
I noticed David uses two sync commands. From my RHEL days, I remembered when we learned about the sync command. Out of habit, I typed it three times. The trainer asked if I had learned Unix on Sun systems (I did; my first job was maintaining a network of Apollo and SunOS systems).
She said you can usually tell where a person learned their first Unix by how many times they ran the sync command: 2 times, it was probably AIX or HP-UX. 3 times, it was probably SunOS. But she said Linux does it better, you only need to run sync just once.
In more recent years, I learned that the number of times you run sync is more likely based on whether your system was tape-based. Running sync once flushed the buffers; but 2 or 3 times (depending on the system) would also force the system to rewind the tape. And yes, the Apollo and SunOS systems I maintained in the mid 1990s were all tape-based, and you installed the OS and updates from QIC tape.