Handling (or not) of large objects in Swift

eevans · October 4, 2022, 8:13pm

This in follow-up to Swift sync –checksum calls HEAD on every object so is very slow; We'd like to use rclone to sync containers between two Swift clusters, but the HEAD used to determine whether a file is a large object (either dynamic or static) makes doing so prohibitively expensive.

We do not make use of large objects in any of our clusters, so strictly speaking a hack like the one described in the thread above (i.e. commenting out the header checks in Object.Hash) would be enough. I believe something like that was proposed in the form of a no_head_hash option. Again, this would work for us, but I expect that will cause problems for anyone that used it and did in fact have large objects. Maybe this is OK if properly documented?

There was also some discussion of a no_large_files option, but it wasn't clear to me how that would differ.

Having read through the large object documentation, I'm not sure there is a way to gracefully handle large objects based only on what is returned in a listing. DLO manifests are said to be zero bytes in size, though not enforced, and SLO manifests are definitely non-zero in size. The only way to know for certain is to examine the headers. Perhaps handling could be somehow moved later in the process, and the headers gleaned from the GET?

ncw · October 5, 2022, 8:19am

Yes, an option with documentation would be perfectly acceptable. no_head_hash is a little obscure though...

A no_large_files option could not do the HEAD on zero sized files that happens at the moment also. It could also include the functionality of no_chunk. I think it would be easier to explain to users than no_head_hash.

I agree and it is a major problem for efficiency in the swift backend.

If we did have a no_large_files option, then rclone could produce a warning if it detected the SLO/DLO headers on a HEAD or a GET of an object.

eevans · October 5, 2022, 3:32pm

Any suggestions? disable_large_object_check maybe?

Could you expound on this? Are you saying it would avoid doing the HEAD on files it could infer were large objects (by virtue of being zero-length), or avoid redundantly doing a HEAD on files it could infer were large objects? If so, and assuming environments where LOs are exceptional (or non-existent), this doesn't sound like it would be of much (or any) benefit. Also, what does including the no_chunk functionality provide?

ncw · October 5, 2022, 4:50pm

I really think this should be part of disabling support for large objects, I don't think it makes sense on its own (except in your quite specific use case).

It would avoid doing any HEAD operations during the List function. At the moment all zero sized objects are HEAD-ed and depending on the environment you may have quite a few zero sized objects. Its probably a small benefit though.

no_chunk basically disables creating large objects if the size of the input object is unknown. This is very common when using a swift backend in rclone mount with `--vfs-cache-mode < writes) and means every file is uploaded as a large object.

I had a go at implementing --swift-no-large-objects - please have a go and see if it fixes your problem.

v1.60.0-beta.6475.64d4bb513.fix-swift-no-large-objects on branch fix-swift-no-large-objects (uploaded in 15-30 mins)

--swift-no-large-objects

Disable support for static and dynamic large objects

Swift cannot transparently store files bigger than 5 GiB. There are
two schemes for doing that, static or dynamic large objects, and the
API does not allow rclone to determine whether a file is a static or
dynamic large object without doing a HEAD on the object. Since these
need to be treated differently, this means rclone has to issue HEAD
requests for objects for example when reading checksums.

When no_large_objects is set, rclone will assume that there are no
static or dynamic large objects stored. This means it can stop doing
the extra HEAD calls which in turn increases performance greatly
especially when doing a swift to swift transfer with --checksum set.

Setting this option implies no_chunk and also that no files will be
uploaded in chunks, so files bigger than 5 GiB will just fail on
upload.

If you set this option and there are static or dynamic large objects,
then this will give incorrect hashes for them. Downloads will succeed,
but other operations such as Remove and Copy will fail.

Properties:

Config: no_large_objects
Env Var: RCLONE_SWIFT_NO_LARGE_OBJECTS
Type: bool
Default: false

eevans · October 5, 2022, 5:54pm

Oh, very nice; I will give this a go and report back!

eevans · October 13, 2022, 12:34am

I can confirm; This does exactly what we need. Thanks for this!

ncw · October 13, 2022, 10:58am

Thanks for testing. I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.60

system · December 12, 2022, 10:59am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.