We're trying to optimise sync'ing a large swift container to s3 (16m objects, >10TB). I'm trying to understand the rclone design but struggling with the delimiter stuff.
It seems to me that in two "flat" object storage platforms like swift & s3 the delimiter ('/') only causes there to be a ton of uneccessary listing queries if there's lots of "delimiter depth" in the storage? The only option to disable delimiter appears to be --fast-list, which appears to require it to store all the listing results in memory up front. Is there a design reason that rclone can't iterate through the listing without delimiter, and without storing the entire listing in memory?
What is your rclone version (output from rclone version)
rclone v1.55.1
os/type: linux
os/arch: amd64
go/version: go1.16.3
go/linking: static
go/tags: none
Which OS you are using and how many bits (eg Windows 7, 64 bit)
Ubuntu 18.04.2 LTS, 64 bit
Which cloud storage system are you using? (eg Google Drive)
Swift + S3
The command you were trying to run (eg rclone copy /tmp remote:tmp)
as a workaround, i was looking into using rclone ls and to build my own syncing logic off the back of that. the listing query seems to return all the information that rclone needs to list the object, but it appears to do a separate HEAD request for each object it lists. is this intentional and if so what is the reason?
with --dump-bodies, the listing query returns something like:
i think i can see why the full listing is required by design with --fast-list, because otherwise it can't compare storageA with storageB deterministically?
should arguably be possible using consistent listing markers on both storage accounts, though .
e.g.:
loop
list 1000 from storageA
get the marker
list from storageB up until marker
do checking & syncing
or something similar.. but i can see how this is difficult to make portable across backends, particularly in that it requires identical listing order..