rsync is wonderful, but ….

rsync /datastore/61/E4 /newserver/61/E4

is wrong and will mess you up!

Imagine if you will, that you have a whole bunch of data stored on an old server, and you need to copy it to a new server. The rsync utility would be an obvious way to go. There are things about the job and rsync that you might want to tweak, though, and that’s where things get ugly. Part of this is bash’ fault.

Imagine if you will, that your data store is 120 million small files (emails) stored in 256**3 directories. 256 cubed is 16,777,216 sub directories.

The programmer that created the data store to hold all these files needed subdirectories to put the files in. Linux doesn’t really like 20,000+ files in one directory. It would be better to have more subdirectories, with less files per subdirectory. So the programmer started with a loop:

for 00 .. FF mkdir

Then the programmer did a change directory into each of those directories he just made, and did the exact same thing.

cd 00;for 00 .. FF mkdir;cd ..
cd 01;for 00 .. FF mkdir;cd ..
cd 02;for 00 .. FF mkdir;cd ..
...
cd FF;for 00 .. FF mkdir;cd ..

That gets you to 256 squared, which is 65,536

And then the programmer did a change directory into each of those directories he just made, and did the exact same thing. All 65,536 second level subdirectories got a third level of another 256 subdirectories. That gets you to 16,777,216 which is 256 cubed.

So your file server directory structure might contain this:

/datastore/61/E4/7D

Inside good old 61/E4/7D there might be twenty to thirty files, each one holding the content of an email, or a metadata file about the email. The programmer was pretty good about filling all of the datastore subdirectories to nineteen files each, then twenty files each, then twenty one files each. No Linux system is going to have a problem with twenty one files in a subdirectory.

The only real problem here is if you need to traverse everything in /datastore – this takes forever

Back to the problem of copying everything from /datastore to /newserver. Let’s assume that /newserver in on a different machine, and we are using remote file system mount command to make the remote machine appear to be a local disk (mount point).

You might think the rsync command ought to look like this:

rsync --archive /datastore /newserver

There are two things that make this sub-optimal. First, it is single-threaded. Second, there is no progress feedback.

The single threaded part isn’t so bad; it just means that we are losing speed due to rsync overhead. The server has twelve cores, the network is 10 Gbps Fibre Channel, the /datastore disk has multiple spindles, but rsync was designed for slow networks way back when in the bad old days.

At this point, you might ask “why not do a straight cp -r” (copy command, recursive)? It’s not a terrible idea; but, what if there were a network glitch? The entire cp -r would have to be started over, and every bit already copied would be copied again. This is where rsync shines: if the files in the destination are the same as the source, the copy is skipped. cp -r also suffers from the same lack of progress feedback.

Did I mention that the 120 million files are also 9.3 terabytes of files? I really don’t want to get to 98% done and then have a network glitch cause me to copy another 9.3 TB over, which would be the case with cp -r

The tests I’ve done indicate that four rsync commands, running simultaneously, copied the most data in the shortest period of time in my environment*. More than four rsync commands at once, and I started to saturate the disk channel. Less than four rsync commands, and something is waiting around, twiddling it’s thumbs, waiting for rsync to get busy with the copying again, which it will do, as soon as it finishes up with the overhead it’s working on.

The other problem is a lack of progress feedback. The copy is going to take multiple days. It would be nice to know if we are at 8% complete or 41% complete or 93% complete. It would be nice to be able to compute what the percentage complete is.

Well, how about 64K rsync commands, each with a print statement of the directory it is processing? And if we could run four of them in parallel, we could get the multiple jobs speedup too.

You might think the rsync commands ought to look like this:

rsync --archive /datastore/00/00 /newserver/00/00
rsync --archive /datastore/00/01 /newserver/00/01
rsync --archive /datastore/00/02 /newserver/00/02
rsync --archive /datastore/00/03 /newserver/00/03
rsync --archive /datastore/00/04 /newserver/00/04
...
rsync --archive /datastore/FF/FF /newserver/FF/FF

but WOW would you ever be wrong!

Remember old /datastore/61/E4/7D up there? This format for rsync would put E4 in the source under E4 in the destination! In other words, although the source looks like this: /datastore/61/E4/7D the destination would look like this: /newserver/61/E4/E4/7D

To be done right, the command needs to look like this:

rsync --archive /datastore/00/00/* /newserver/00/00/
rsync --archive /datastore/00/01/* /newserver/00/01/
rsync --archive /datastore/00/02/* /newserver/00/02/
rsync --archive /datastore/00/03/* /newserver/00/03/
rsync --archive /datastore/00/04/* /newserver/00/04/
...
rsync --archive /datastore/FF/FF/* /newserver/FF/FF/

The source needs a trailing slash and asterisk to tell rsync to copy the stuff underneath the source (not the source itself) to the destination (which is finished with a slash).

Enter the problem where bash is a pain in the ass.

Well, before I go there, let me mention that it wasn’t too bad to write a Perl script to write this bash script, and do three things per source and destination pair:

echo "rsync --archive /datastore/00/00/* /newserver/00/00/"
rsync --archive /datastore/00/00/* /newserver/00/00/
echo "/newserver/00/00/" > /tmp/tracking_report_file

The first line prints the current status to the screen. The second line launches the rsync. The third line overwrites a file, tracking_report_file, with the last rsync finished.

So, crank up screen first, launch the bash script, and some number of days from now, the copying will be done.

That /tmp/tracking_report_file gives me a pair of hexadecimal pairs, which I can then use to compute percentage complete. For example, when /newserver/7F/FF updates to /newserver/80/00, then we are going to be just over 50% done.

Heck, I can detach from screen, and I don’t even have to watch the rsyncs happen. I mean that I do need to, but I don’t have to. Better yet, I can take the same routine that converts the pair of hexadecimal pairs into percentage complete and wrap that inside a cron job that sends an email. Progress status tracking accomplished!

But this does not solve the single-threaded rsync problem.

And ultimately, I could not get it done.

What looked to be an okay solution was using the find command, to feed into xargs which could do shell stuff in parallel. I even got as far as getting bash shell variables to create the rsync --archive /datastore/00/00/00 /newserver/00/00/00 part.

Okay, that would be 16 million smaller rsyncs instead of 64 thousand larger ones, but I might even be able to bump up the parallelism to six or eight or nine.

But the serious problem the rsync –archive /datastore/00/00/00 /newserver/00/00/00 command has, is the naive problem: the missing trailing slash and asterisk are going to put the source underneath a destination. I need to put the trailing slash and asterisk on there.

And bash says “that’s a nope”

Trailing slashes and asterisks are automatically culled from output, because (reasons).

Oh well. The find command also spits out the directories it finds in rather random order. My bash script with sequential rsyncs by sorted order means that the last one complete really is some-percentage-of-the-total done. But if find chooses to spit out /datastore/b3/8e/76 instead of /datastore/00/00/00 then my status tracking doesn’t actually work. I would be forced to traverse all of /newserver/ and count which of the 17 million are complete; which would take freaking forever.

Yes, I said 17 million. Did you notice that the programmer that created subdirectories did some of them in lowercase hexadecimal? That happened when we brought in another email system (Exchange). Lovely.

*the last time I did this migration, although it was on a four core box, then.

Leave a Reply