Slow listing of bucket with large number of files - swift

Hi –

In Swift, When I have large numbers of files in a bucket, ls operations are taking ages to complete. I’m wondering if this is a known issue, and if so, is there a way to mitigate it?

swift list returns very quickly (3-4s), but rclone swift: ls will easily run for many minutes before returning. The bucket size I have in front of me is about 11k objects.

Of course, this also affects rclone copy operations with --include filters, which is what I’m really after.

Any ideas?

First thing to try - use the --fast-list flag…

Thanks @ncw, I tried that. In fact, I should have pasted my full (non-trimmed down) command line:

rclone --config rclonerc --fast-list ls swift:bucket

I don’t have timings right now, mainly because it takes a long time to wait for the commands to run. I’ll try to get that and report back.

It would be worth running with -vv --dump-headers that will show you the requests rclone is doing.

rclone has to do a HEAD request on every large file to find its size, so if you have a directory full or large files that may be the cause.

If that is the case then you can increase --checkers to do more stuff in parallel.

indeed, it is a directory of many large files (compressed log files). They are not all huge, but there are likely thousands of them that are.

  1. Is there a way to turn off size checking until after the --include --exclude computation would be done? At least in my case, that would solve my problem since I’m only copying a couple named files.

  2. Is there a way to turn off size checking entirely?

Just an update:

  • swift list: 8s
  • ls with checkers: 8 (default), --fast-list: 9m
  • ls with checkers: 256, --fast-list: 8m
  • ls with checkers: 1024, --fast-list: 7m45s

I just remembered we have discussed this before in the forum: Ls slow on large directories (500~ files) (Hubic)

Number of checkers makes no difference as it is one per directory :frowning:

I made this issue about it: https://github.com/ncw/rclone/issues/1791

rclone could defer reading the size until it is actually used. That would work too I expect.

You can try the --ignore-size flag - I’m not sure that will help though.

I don’t think swift list reads the size does it?

Thanks for the pointer to the GH issue.

swift list -l does, and doesn’t appear to impact the run time of the command. But, I see many of what I would expect are the large files are zero-byte, probably signaling that they need another lookup and the swift client doesn’t do that for you.

I just tried and as you suspected, it doesn’t appear to change the speeds. Ah well.