Cache <-> Cache random thoughts

Hello, just working on optimizing a rather large daily one way backup of a file server to box.com, total of maybe 20TB and 5-6 million files. Using copy instead of sync due to the quirks of how box deals with versions / deleted items,

What has helped immensely, since this is a one way backup, is to cache the box side of things with a very long age time (I think I set it to 6 months) , and doing size only. I was wondering if everything is running on one box if there would be any benefit to caching the local FS (with obviously somewhat shorter max age) the daily delta isn’t too bad but of course scanning the entire data set for changes takes a very long time. I have played around with say doing an rclone check and generating a specific list of files to copy, but I guess in practice it would be a wash?

EDIT: perhaps a check with time since last change locally? So stuff that hasn’t been touched in ages is skipped wholesale?

Thanks

It is pretty quick scanning the local FS. The only think you could speed up significantly would be calculating checksums if you are doing a --checksum sync. I see you are doing a --size-only sync so I wouldn’t have thought it would help much, but it would be interesting to measure!

Some people use rclone lsf -R --max-age 1d /path/to/local > files.txt to get a list of potentials (files modified within one day) then feed that into rclone copy --files-from files.txt /path/to/local remote: - that doesn’t do any remote directory scanning (which has been improved further in the latest beta) so that is quite a quick way of getting incremental updates.

Rclone should really have a sync mode which does this (It used to with --no-traverse but I had to take that out after re-arranging the internals!)

as a quick follow up - it’s been a while since I used -files-from and I know I am just missing something incredibly simple even re-re-reading the various posts about this

(running on windows)

rclone lsf -R --files-only --exclude-from exclude.rclone --max-age 2d “M:\PATH” > delta.txt

(PS: sanity check is this time modified??)

generates a list folder/path/etc/file that is within M:\Path

now doing a copy:

rclone --files-from .\delta.txt copy “M:\PATH” “cache:topfolder/PATH/” --exclude-from exclude.rclone --max-size 14.9G --transfers 7 --ignore-checksum --stats-file-name-length 0 --stats 30s -checkers 12 --cache-db-path C:\Windows\Temp\rclone\cache-backend --config C:\rclone\rclone.conf -vvv --log-file “rclone.log”

runs fo a second and stops

what the log shows is:

2018/11/18 15:04:37 DEBUG : FOLDER1: Excluded
2018/11/18 15:04:37 DEBUG : FOLDER3: Excluded

2018/11/18 15:04:37 DEBUG : FOLDER12: Excluded

all folders that have files in delta.txt

-I tried / and \ in both rclone command line and in the delta file,
-no quotes around the source path
-bypassing path and going directly to box

what obvious thing am I missing ?

UPDATED also ran copy with --exclude-from which contains the following with no change

.Trashes
.TemporaryItems
.*
.DS_Store
.afpDeleted*
.approject
~

Thumbs.db

thanks

Yes

You shouldn’t need the --exclude-from on this command line - it needs to be on the lsf one. Likewise the --max-size.

I have a feeling the --exclude-from may be messing with the filtering

Can you try with the latest beta?

Will test, I didn’t have a chance to figure out what I was doing wrong but lsf was not seeing a brand new folder and lsf max age didn’t find it, I’ll re-test from scratch with latest beta and advise

Oh one other thing I feel like I read this in other threads - any weirdness on standard cmd vs powershell?

Note that the lsf command to find stuff above only works on files, so if you just create a new directory with no new content then it won’t be listed.

Weird things in powershell are to do with the command line parsing, so cause very obvious things to go wrong.

ok so quick updated I tested v1.44-099-g26e2f1a9-beta

and it seems to work fine, however going back it seems to work fine with 1.44

I think where I went wrong is inconsistently using cmd or powershell without thinking, for example lsf “M:\PATH” does not work in CMD at all, but works in powershell, I think the > delta.txt comes out with different encoding under powershell also - will do a full run and advise

thanks, happy holidays

No probs. And I had no idea powershell was so likely to mess up command use!

Basically it appears that double quotes is wha really screws up CDM and is generally helpful in powershell, I don’t have the error in front of me but with quotes on the beta it explicitly said that the internal UNC translation it does was broken, I’ll try to capture it at some point - my two cents probably a feature request is to maybe to piping internally to the app ? On a related note, wouldn’t just doing a copy with max age produce the same results and performance as lsf / files from?

If you could that would be helpful :smile:

Yes!

It will traverse the destination directory if you do that which may be very much slower.

What I need to do is bring back the --no-traverse flag which did the same as the lsf and the copy with --files-from.

I’ve written a bit of code to bring back the --no-traverse flag - that be the equivalent. So just make the copy you want and use the --no-traverse flag.

https://beta.rclone.org/branch/v1.45-003-g872b5e7f-no-traverse-beta/ (uploaded in 15-30 mins)