Most of the Internet forums consider "large" volumes to be a few hundred Gigabytes, or a few Terabytes. So, the techniques commonly advised will apply to those scenarios. However, when the Petabyte scale infrastructure is considered, most common practise techniques do not apply - the data sets are just too vast, and simply cannot be manipulated within reasonable time. For example: its commonly assumed that rsync can be used to move/copy data across two such block devices. Yes, rsync is a great utility, but its slow, being dependent on the SSL layer. Too slow for 100+ TB migrations. A tested method is to first copy the bulk of the dataset across with cp: cp -v -r -p -d /lustre/<filespec> /nlustre/<filespec>    /* "-p -d" Preserve as many as possible attributes, for rsync to work */ then, followed by an rsync operation on the same paths to confirm completion: rsync -raH --progress  /lustre/<filespec>    /nlustre/<filespec> NOTE TO SELF: USE --inplace SWITCH WITH RSYNC TO PREVENT TEMP FILE CREATION NOTE TO SELF2: USE MULTIPLE STREAMS SIMULTANEOUSLY (FOR BOTH STEPS) TO SATURATE IB A BIT MORE The above will typically take less than half the time than rsync alone. PS: Also see Mutil and Mcp entries elsewhere for even better multithreaded performance |
Home‎ > ‎Server config‎ > ‎