Speed up --include with cache or similar?

I know there is --fast-list for Google Drive, but it’s still slower than I wish it was. I want a cache of sorts to make filtering almost instant. Right now it takes 10 minutes or more because the directory is so large. I have a bunch of installers with different builds, and I want it to download a specific build quickly without searching through the whole directory (searching through a cache of the directory tree instead).

Appreciate any help!

fast-list doesn’t work on a mount.

Sounds like a great use for the cache backend that already exists.

https://rclone.org/cache/

hi. I don’t need it for mount. I use it for --include, but it’s too slow. I want it to cache the file names, not the actual files. rclone copy --include * is used as a form of search as to what installer build to download. I was wondering if there’s a way to speed that up.

Example:

rclone.exe copy “Drive:/” “C:\Users\mel\Google Drive\Code\Pyjama\Downloads\test” -vv --progress --stats-one-line --include {27574} --ignore-case --fast-list --drive-acknowledge-abuse --transfers 10

Indexing this folder is too slow, any way to cache the index?

See post above and use the cache backend if you want to cache things.

ok, thanks. so cache might work for rclone copy --include?

It’s a remote so you’d use the cache remote instead, which ‘caches’ all the information.

thanks. I think I got cache to work now. but it’s still really slow using rclone copy with --include and --cache-db-path. it uses several minutes to find a build (installer). anything I am missing?

–checkers 1000000 really speeds things up, but it still takes more than 5 minutes. a google drive search for the same build number would take 1 second. rclone.exe is using 60% cpu so I’m not sure what else to increase.

Can you share more details as to what command you are running, the logs in debug of the output?

Here is a few seconds of the log (becomes too big):
https://drive.google.com/open?id=1adiY-CWlRSHbaA3mWAMEEupLVzcBNYtO

I basically download the cache first (it is created nightly using task scheduler on a different computer):
“C:\Users\mel\Google Drive\Code\Pyjama\Addons\Software\rclone\rclone.exe” copy “Drive:/Viz Mosart/Viz Mosart Project & Support/Installers/Cache” “C:\Users\mel\Google Drive\Code\Pyjama\Cache\DriveCache” -vv --progress --stats-one-line --transfers 5 --update --config “C:\Users\mel\Google Drive\Code\Pyjama\Addons\Software\rclone\rclone.conf”

I then search the case using rclone copy --include:
“C:\Users\mel\Google Drive\Code\Pyjama\Addons\Software\rclone\rclone.exe” copy “DriveCache:/” “C:\Users\mel\Google Drive\Code\Pyjama\Downloads\Drive” -vv --progress --stats-one-line --include {25022} --ignore-case --transfers 10 --checkers 1000000 --log-file “C:\Testlog.txt” --cache-db-path “C:\Users\mel\Google Drive\Code\Pyjama\Cache\DriveCache” --config “C:\Users\mel\Google Drive\Code\Pyjama\Addons\Software\rclone\rclone.conf”

But it takes too long for my use case (even with cache). There’s 1tb of installers. Cache file is about 250MB. It takes several hours to create.

Your chunk size is 1G which means each time is does anything, it has to transfer 1GB chunk. This should be something much smaller like 32M or 64M.

If you are using Google Drive, those settings are going to be pretty bad as you can only do 10 transactions per second and create roughly 3 files per second.

You’d want to use --fast-list if you have memory for it as well as that would help and you need to ‘walk’ the remotes so you can get a full cache list built out.

I’d bump up the cache workers to 8 and make the transfers and checkers more realistic like 5 / 8.

Thanks. I don’t think --fast-list works for reading cache, only writing it. I could be wrong, but it seems very slow. I assume by workers you mean -cache-workers 8?

Download seems to get stuck, complete log: https://drive.google.com/open?id=1qTD0rYRETmOClKZf4pif9nxYxh2_Ik7N

It is still far to slow for my use case. Searching for an installer build number in google drive takes 1 second, this takes several minutes.

Fast-list is for listing, not writing:

https://rclone.org/drive/#fast-list

You seem to be one rev back as well on 1.45. I’d move to 1.46. Looking through the log, it still was building the cache. You probably want to prime / warm it up with a rclone ls cacheremote: and let that baby run until done.

Once you completes, run another rclone ls on the remote and lets see if the timing improves as it should. If not, we have a different issue before we jump back to your full command.

tried 1.46:

  • 1st server machine nightly uploads latest installers from different network shares to google drive using rclone copy drive:/ --fast-list
  • that same machine uses rclone ls cacheremote: --fast-list --cache-db-path “C:\cachedir” each night (this cache is connected to drive:/ remote)
  • it then uploads the cache directory to google drive using rclone copy drive:/cachedir to “C:\cachedir”
  • 2nd client machine downloads the cache directory from google drive using rclone copy drive:/ --update (–update will ensure it skips downloading the cache/database if already up to date)
  • it then uses rclone copy cacheremote:/ --cache-db-path “C:\cachedir” to find a certain build (this is too slow, because it is probably still connecting to drive and not always using offline cache to filter include only) [ --fast-list seems to not use cache here]

the whole point is not having to create cache on 2nd client machine, but rather download the nightly created server cache/database so that search using --include is almost instant. is there no way to force it to be cache/offline only? having to create the cache on client machine defeats the purpose

Not sure the cache can be copied around like your thinking.

If the delta isn’t much, just build the cache one time and let the updates roll through via the normal methods.

Are you able to test the scenario I asked to see if the cache is working properly for you?

Appreciate the help Animosity022. I tried reducing the cache to just a subfolder now (should be faster). But it is still too slow for my use case. I was hoping cache would be more or less instant. Takes several minutes, as can be seen in the log (I did not let rclone copy finish, but this was after having run a successful rclone ls)
https://drive.google.com/open?id=1DrHfTo3ZrFLJFICLB44oUFCrmxPqET6D

This is the rclone config I am using. I am testing this all on the same computer now (no copying of cache database):
[DriveCache]
type = cache
remote = Drive:Viz Mosart/Viz Mosart Project & Support/Installers/Dropbox
chunk_size = 64M
info_age = 99y
chunk_total_size = 6400M

This is the command I use to create the cache (I have tried this without --fast-list also):
rclone.exe ls DriveCache: --fast-list --cache-chunk-path "rclone cache" --cache-db-path "rclone cache" --cache-db-purge --config rclone.conf

This is the command I use to download a certain installer build number:
rclone.exe copy DriveCache: Test --include {*3.6.0.14446*} --ignore-case -vv --progress --stats-one-line --cache-chunk-path "rclone cache" --cache-db-path "rclone cache" --config rclone.conf --log-file log.txt

I don’t understand why it’s reading so much from Google Drive when using cache. I tried turning off my network adapter (internet) and it does not work at all then. Maybe I have an incorrect understanding of what rclone cache is.

I tested running rclone ls multiple times to see if cache creation speed improves:
Was unable to create log with rclone ls but speed seemed very slow on second try:

rclone.exe ls DriveCache: --fast-list --cache-chunk-path "rclone cache" --cache-db-path "rclone cache" --cache-db-purge --config rclone.conf

cache-db-purge as that removes the cache after every command so it has to rebuild each time.

You should remove that and run it twice.

My first run took 1m 31s to build the cache and my second run was 1s.

real	0m1.692s
user	0m4.884s
sys	0m0.228s
[felix@gemini rclone]$ time rclone ls gmedia:

Yeah, I removed that now.
rclone.exe ls DriveCache: --fast-list --cache-dir "rclone cache" --cache-chunk-path "rclone cache" --cache-db-path "rclone cache" --config rclone.conf takes about 1-2 min first try.

second try takes about 30 seconds? definitely more than 1s. will try and time it

Edit: Stopwatch on first try 01:01:22
Edit2: Stopwatch on second try (with cache) 01:04:37

So I don’t think cache is working for me? It’s definitely creating it though:

Are you using Google Drive with API to test this?

The ‘DriveCache’ folder next to DriveCache.db is empty by the way:

I use Google Drive with my own API key.

If the cache isn’t instant the second time, you have something else going on.

What’s your rclone.conf look like for that remote?

Appreciate the help. This is what my rclone.conf looks like.
Drive is actual drive with my own API
DriveCache is cache of Drive

Run something a bit shorter so we can take a look at the debug:

rclone -vv lsd DriveCache:

and share the output of that.

You should see some ‘warm’ entries in there:

2019/03/25 07:52:13 DEBUG : : list: warm 6 from cache for: , expiring on: 2019-03-30 06:55:45.300645117 -0400 EDT
2019/03/25 07:52:13 DEBUG : : list: cached entries: [1G.file 95tj3q4gj5ban13ppu0kisguco d93daonsmv845k5s40knr5mf4s s1n6gn87oo537vvls13nahl6co smu5ej34ujbdoip1cm3mlk92q4 tnvepu36qiohcun8v84ddhsam0]