Copying and moving LARGE file volumes on Lustre mountpoints

Most of the Internet forums consider "large" volumes to be a few hundred Gigabytes, or a few Terabytes. So, the techniques commonly advised will apply to those scenarios.

However, when the Petabyte scale infrastructure is considered, most common practise techniques do not apply - the data sets are just too vast, and simply cannot be manipulated within reasonable time.

For example: its commonly assumed that rsync can be used to move/copy data across two such block devices. Yes, rsync is a great utility, but its slow, being dependent on the SSL layer. Too slow for 100+ TB migrations.

A tested method is to first copy the bulk of the dataset across with cp:

cp -v -r -p -d /lustre/<filespec> /nlustre/<filespec> /* "-p -d" Preserve as many as possible attributes, for rsync to work */

then, followed by an rsync operation on the same paths to confirm completion:

rsync -raH --progress /lustre/<filespec> /nlustre/<filespec>

NOTE TO SELF: USE --inplace SWITCH WITH RSYNC TO PREVENT TEMP FILE CREATION

NOTE TO SELF2: USE MULTIPLE STREAMS SIMULTANEOUSLY (FOR BOTH STEPS) TO SATURATE IB A BIT MORE

The above will typically take less than half the time than rsync alone.

PS: Also see Mutil and Mcp entries elsewhere for even better multithreaded performance