I need to set up synchronization between two object storages in different locations, syncing 7 million objects from one bucket to another, with hash comparison. The synchronization needs to run every 4 hours. I'm struggling to find options that allow the process to complete within 2 hours. I've used transfers and checkers set to 900, as well as the fast-list option. On average, synchronization takes 4 hours, sometimes even longer. During synchronization, less than 10,000 objects are copied at a time, and the volume is under 5GB. Please suggest what options I could try adding to speed up the process.
Run the command 'rclone version' and share the full output of the command.
rclone v1.73.1
os/version: ubuntu 22.04 (64 bit)
os/kernel: 5.15.0-60-generic (x86_64)
os/type: linux
os/arch: amd64
go/version: go1.25.7
go/linking: static
go/tags: none
The command you were trying to run (eg rclone copy /tmp remote:tmp)
The virtual machine is already in the same region. I have considered obtaining listings (e.g., using the lsf command), but I thought there might be some options for the sync command itself. The --s3-upload-cutoff option is not suitable because the files are small (less than 1 GB), and the most time-consuming part is building the file list. In one bucket, running lsf takes about 15–20 minutes; in another, it takes around an hour. The remaining time is spent comparing hash sums. I am specifically interested in speeding up the listing and hash comparison stages—the actual copying of new files is fine and does not take much time.
I'm not sure if I understood the suggestion correctly. The idea is to check checksums only for new files that have been modified within a certain time period? And also build the file list based on that for the sync command. Suppose we run the command:
Will it work as follows: it looks at the modification time of the object in src, looks at the modification time of the object in dst, and if the object's modification time is older than 1 day, it skips those objects? And for all others, it applies listing and checksum comparison. Is that correct?
Thank you for your help. I’m going to try using the implementation with --max-age and --use-server-modtime. Based on my tests, this appears to be the most optimal approach.
@allnelone this sounds like an Enterprise problem so if you would like some support then If you are working on behalf of a company you might be interested in taking out a support contract which can help you get answers quicker and keeps the rclone project sustainable.