Optimizing copying a large number of small files to Swift


#1

Hello,

I have a use case where we need to upload a large number of small files from a local file system to a Swift remote, and I am trying to optimize the file transfer. These files are new and not going to be in the destination, and I would like to disable as much checking of the remote as possible. Is there any way I can disable listing, modtime and checking checking and just do a dumb copy? With the number of requests needed for just the files, I want to minimize the other requests to the object store.


#3

Try this beta with the --no-traverse flag

https://beta.rclone.org/branch/v1.45-003-g872b5e7f-no-traverse-beta/

That will just do a dumb copy as you put it!


#4

Thanks Nick! I’ll try that out.


#5

@ncw I’ve been testing with the beta version, and while I can see the GETs on subdirectories in containers stops with --no-traverse, I’m still seeing a HEAD after each PUT. Is there any way to stop it from doing that? I know you want to verify the size, mtime, and checksum is correct, but on a PUT in swift the md5 is returned in the 2xx response. Is there any way to use that instead?


#6

That is a good thought… I’ve adjusted the code to do that for single part uploads so it will use the hash in the response instead of doing another HEAD request.

Have a go with this and tell me what you think!

https://beta.rclone.org/branch/v1.45-012-g6e000d26-no-traverse-beta/ (uploaded in 15-30 mins)