I have a use case where we need to upload a large number of small files from a local file system to a Swift remote, and I am trying to optimize the file transfer. These files are new and not going to be in the destination, and I would like to disable as much checking of the remote as possible. Is there any way I can disable listing, modtime and checking checking and just do a dumb copy? With the number of requests needed for just the files, I want to minimize the other requests to the object store.
@ncw I’ve been testing with the beta version, and while I can see the GETs on subdirectories in containers stops with --no-traverse, I’m still seeing a HEAD after each PUT. Is there any way to stop it from doing that? I know you want to verify the size, mtime, and checksum is correct, but on a PUT in swift the md5 is returned in the 2xx response. Is there any way to use that instead?
That is a good thought… I’ve adjusted the code to do that for single part uploads so it will use the hash in the response instead of doing another HEAD request.
One other question along these lines: what exactly is going on with the “checkers” after a sync? Is it going back to the destination again to check and make sure it matches the source? Or is it just checking locally to see if anything changed while it was syncing?
@ncw - the new code confirmed only does one PUT per file and doesn’t do a subsequent HEAD. This is a good improvement for dealing with many many small files.
I’m still not sure about what the checkers do. Is there a description of the overall algorithm rclone uses during sync? Generalized for any backend?
Thanks @ncw! Any idea when 1.46 will be available?
Also can you point me at the documentation describing the matching? I’m trying to understand what all is checked and what might trigger extra unnecessary steps when trying to upload millions of files from local file to Swift.