Speed up --include with cache or similar?

https://pastebin.com/3FE4spFv

Edit: Tried it with dropbox now. Cache is very fast if I don’t use --fast-list
Command: rclone.exe copy DriveCache:/ test --include {*17550*} --cache-dir "rclone cache" --cache-chunk-path "rclone cache" --cache-db-path "rclone cache" --config rclone.conf

So fast-list and the cache backend don’t really work well together as they are trying to do different things so I wouldn’t use fast-list on a cache backend as it’s trying to limit transactions and doesn’t work.

If you run just the basic ls command twice, it should return fast on the second one.

Okay. But it’s definitely not working very fast for me. When indexing the whole dropbox (not google drive, without --fast-list), it takes several minutes after cache. I’m not sure if it’s working as it should and this is the speed I should expect, or if it’s not working at all - and if not, why?

You’d need to share the command and log with -vv so we can look at the logs.

Should work. I just recreated what you did and cached a 8TB google remote (12,000 files) and the result returned in ~10 seconds. You have something else going on I think. debug logs might help.

xxxx@notebook2:~/.cache/rclone/cache-backend$ time rclone lsl robgsc-cryptp: --include=MVI_0017.avi
30693776 2015-07-17 16:36:01.970000000 Media/Pictures/2015/MVI_0017.avi

real 0m9.798s
user 0m1.168s
sys 0m0.064s

I am on linux while you’re on windows however.

Hi,

This is the log for creating cache using dropbox:
rclone.exe ls DriveCache: --cache-dir "rclone cache" --cache-chunk-path "rclone cache" --cache-db-path "rclone cache" --config rclone.conf -vv --log-file cache.txt
https://drive.google.com/open?id=1X9uX38vUnnKQG2O-tIrKOicfm0zW83HE

This is the log after creating cache using dropbox:
rclone.exe ls DriveCache: --cache-dir "rclone cache" --cache-chunk-path "rclone cache" --cache-db-path "rclone cache" --config rclone.conf -vv --log-file aftercache.txt
https://drive.google.com/open?id=1jNeOmSjESCEpYvEgohcuBSqkjRgzG7kw

Your logs with the after cache as somewhat confusing as I’m seeing entries not coming from the cache.

There are a number of lines of it going back to the source:

2019/03/26 13:55:33 DEBUG : Cache remote DriveCache:: list 'External - FTV/3.6.6.22497 Panasonic test version 2 NO LOGGING'
2019/03/26 13:55:33 DEBUG : External - FTV/3.6.6.22497 Panasonic test version 2 NO LOGGING: list: empty listing
2019/03/26 13:55:33 DEBUG : External - FOX International Singapore: list: read 0 from source
2019/03/26 13:55:33 DEBUG : External - FOX International Singapore: list: source entries: []
2019/03/26 13:55:33 DEBUG : External - FOX International Singapore: list: cached directories: 0
2019/03/26 13:55:33 DEBUG : External - FOX International Singapore: list: cached dir: 'External - FOX International Singapore', cache ts: 2019-03-26 13:55:33.8875635 +0100 CET m=+1.472418501

as an example. I can’t replicate that in my testing at all as if I purge the cache and run the same thing, those messages never appear after my first run.

Is there a particular reason you are using a beta build over the stable build? Can you test with just the standard 1.46?

Its interesting. Although cache stops the API hits, its slower than just listing from the remote directly.

time rclone ls robgs-cryptp:Media/Videos/Series -vv --log-file=direct.out --cache-info-age=99y --cache-workers 20 --checkers 20


|real|0m13.181s|
|user|0m3.060s|
|sys|0m0.372s|

time rclone ls robgsc-cryptp:Media/Videos/Series -vv --log-file=direct.out --cache-info-age=99y --cache-workers 20 --checkers 20


|real|1m19.529s|
|user|0m3.720s|
|sys|0m0.284s|

I looked at the entries that rclone still needed to go to the source and recheck (adding to the times) and it looks like if the directory is empty it checks the source rather than believing the cache.

2019/03/26 10:34:30 DEBUG : d057l9q8tiko9vbpf9rmcq9cntu6u0k0j6lo4ad8n3chbba0ibd0/bk5uildi65trven0k7k9keplik/bi26l61nr0n7u32atdd3t4u0fo: list: empty listing
2019/03/26 10:34:30 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2019/03/26 10:34:30 DEBUG : HTTP RESPONSE (req 0xc000390e00)
2019/03/26 10:34:30 DEBUG : HTTP/1.1 200 OK^M
Transfer-Encoding: chunked^M
Alt-Svc: quic=":443"; ma=2592000; v="46,44,43,39"^M
Cache-Control: private, max-age=0, must-revalidate, no-transform^M
Content-Type: application/json; charset=UTF-8^M
Date: Tue, 26 Mar 2019 14:34:30 GMT^M
Expires: Tue, 26 Mar 2019 14:34:30 GMT^M
Server: GSE^M
Vary: Origin^M
Vary: X-Origin^M
X-Content-Type-Options: nosniff^M
X-Frame-Options: SAMEORIGIN^M
X-Xss-Protection: 1; mode=block^M
^M
176^M
{"files":[{"id":"1xPftMD8_Ys567kf3ESpRaDsyRr7bcdFS","name":"bi26l61nr0n7u32atdd3t4u0fo","mimeType":"application/vnd.google-apps.folder","trashed":false,"parents":["1e4ezyJZkbDymglyx7aAKh-n5JyIrgJjv"],"webViewLink":"https://drive.google.com/drive/folders/1xPftMD8_Ys567kf3ESpRaDsyRr7bcdFS","createdTime":"2019-03-04T00:33:43.552Z","modifiedTime":"2019-03-04T00:33:43.552Z"}]}^M
0^M
^M

I then removed the directories that were empty and repopulated the cache and it flew through.

The question i’d ask is… Is this a bug? Does it need to go to the source if the directory is in fact empty? @ncw It would seem that if i’ve said my cache is up to date for 99y, it shouldn’t go to the source for even empty directories.

@zotune Do you have a lot of empty directories? Can you validate the ones it hung up on reading are those?

You can see it checked 551 times in his after cache file and those are probably all blank directories as they are 0 returned:

grep "read 0 from source" aftercache.txt  | wc -l
     551

1.46 stable:
https://drive.google.com/open?id=1kYNtzEag64GSBj0mem4AYL7Mb6dJA-eY
https://drive.google.com/open?id=1pAVpFj0qUzQlRztNpM_8GB4ElFPAUTKm

@zotune if you are able to remove the empty directories from the source before populating you cache, it would be interesting to see how fast your list is afterward. I personally think this is a bug but we’ll see what Nick says as there may be a reason it needs to do this.

rclone rmdirs remote:

Hmm, empty directories are treated as directories which haven’t been cached if I remember rightly, so yes going back to the source is to be expected and no, I don’t think this is the correct behaviour!

This is the bit of code I was thinking of: https://github.com/ncw/rclone/blob/6fb1c8f51cec3f5d00d2f3da027dae478bca4bf1/backend/cache/cache.go#L978

Just for statistics. I’ve deleted all 200 empty directories in my case. and here are results.

cache:
notebook2:~/Downloads$ time rclone ls robgsc-cryptp:Media/Videos/Series -vv --log-file=before.out --cache-info-age=99y >/dev/null

real 0m2.561s
user 0m3.128s
sys 0m0.044s

direct:
notebook2:~/Downloads$ time rclone ls robgs-cryptp:Media/Videos/Series -vv --log-file=before.out --cache-info-age=99y >/dev/null

real 0m21.720s
user 0m2.812s
sys 0m0.184s

Original with empty directories and from cache (also had more checkers and workers btw)

time rclone ls robgsc-cryptp:Media/Videos/Series -vv --log-file=direct.out --cache-info-age=99y --cache-workers 20 --checkers 20

|real|1m19.529s|
|user|0m3.720s|
|sys|0m0.284s|

1 Like

seems to be a ‘bug’ or ‘needs improvement’ issue with empty folders then?
how likely is it that this might be fixed anytime soon?
appreciate everyone’s help!

As a work around, can you use rclone rmdirs to delete the empty folders?

Also you may want to open an issue on GitHub so this can be tracked.

1 Like

Made a ticket:

Do I have to run rclone rmdirs after rclone copy operations, or is there any way to combine the two?
Will try workaround now.

1 Like

You need to do it before you create the cache so the cache isn’t full of them.

Log from latest attempt using rmdirs before cache creation:
https://drive.google.com/open?id=1MgIz8rTSKxkj9KmP7eauOaT_20qy5Szq

Seems to use 42 minutes.
Not really ideal.

Still quite a number of files read from the source that have 0 file directories.

grep "from source" test.log | wc -l
    4909

So definitely the blank dirs are not removed.

Yes, I figured as much. It’s probably because I use rclone sync initially, and it brings those empty folders back. I don’t think I have a good workaround for this, so I might just wait for a fix. Unless rclone sync can be configured to not include empty directories. Question is, how fast will it be even if this works? Even 2 minutes is too slow compared to just searching Google Drive for that installer build and downloading it manually.

Appreciate the help!