The neverending quest to improve performance (in this case to box.com)

in the way overthinking this department - large local volumes 15tb+ many millions of files - thankfully the initial copies are done, now it’s about the deltas - been playing around with different alternatives, thoughts appreciated - thanks

  • always use the usual things that help: no-update-modtime,size-only,fast-list,ignore-checksum

  • doing a straight up sync takes FOREVER since there is a such a huge amount of files (in some cases well over 24-36 hours)

Alternative 1 :

just copy files newer than 3 days - runs very fast but obviously doesn’t deal with deletes/moves/renames


newfiles=(find /Volumes/XXXX/ -mtime 3 -type f -print)
echo “$variable” | grep -v drw | awk -F"XXXX" ‘{print $2}’ | grep -v .DS_Store | grep -v .Trashes | grep -v .afpDeleted | grep -v Thumbs.db | sed ‘s/\//g’ | sed ‘s//////g’ > XXXX.list
(or)
rclone lsf --files-only --max-age 3d /Volumes/XXXX/ > XXXX.list
sed -i ‘s/^///’ XXXX.list

rclone --files-from XXXX.list copy “/Volumes/XXXX” “box:_backups/XXXX --retries 10 --backup-dir=“box:_backups/_rclone/XXXX/” --no-update-modtime --size-only --exclude " $” --max-size 14.9G --fast-list --exclude-from ./exclude.rclone --transfers 3 -vv --ignore-checksum --stats 360s --log-file “XXXX.copy”

Alternative 2: What I am now testing

rclone check /Volumes/XXXX cache:_backups/XXXX --size-only --exclude " $" --max-size 14.9G --fast-list --exclude-from ./exclude.rclone --transfers 3 --ignore-checksum --stats 360s --log-file “XXXXX.check”

cat XXXX.check | grep “File not in box root” | awk -F"ERROR : " ‘{print $2}’ | awk -F": File" ‘{print $1}’ > XXXX.missing

cat XXXX.check | grep “File not in Local file system” | awk -F"ERROR : " ‘{print $2}’ | awk -F": File" ‘{print $1}’ > XXXX.delete

./rclone --files-from XXXX.missing copy “/Volumes/XXXX” “cache:_backups/XXXX” --retries 10 --backup-dir=“box:_backups/_rclone/XXXXX/” --no-update-modtime --size-only --exclude " $" --max-size 14.9G --fast-list --exclude-from ./exclude.rclone --transfers 3 -vv --ignore-checksum --stats 360s --log-file “xxxx.copy”

rclone --files-from XXXX.delete moveto box:_backups/XXXXX box:_backups/_rclone/XXXXX/ --no-update-modtime --size-only --fast-list --transfers 3 --ignore-checksum --stats 360s --log-file “XXXX.delta”

With box (and drive) the directory listings take ages as each directory listed takes an https round trip. I don’t think --fast-list works with box unfortunately.

You can list lots in parallel by increasing --checkers - that will help speed things up. --checkers is basically the number of directories rclone is checking simultaneously. I don’t have that much experience with box, but I think that might help a lot.

You don’t need the sed as --files-from is always rooted and your lsf needs the -R flag.

This is great for quickie backups.

The rclone check does as much work as file system traversal of rclone sync… I see you are using the cache backend here, so why not just do the sync to the cache backend? That is what it is for.

I never used cache before today so that’s where I’m going to focus, thanks for confirming

BT