How to pick up sync where it was before closing the cmd (without listing all files)

I am using a simple cmd file to sync my local folder to a amazon s3 storage.

rclone sync "D:\heavy folder" "aws":"heavy folder" --log-file %logpath%" --log-level INFO --delete-after --copy-links --stats-one-line -P --stats 5s

My folder has about 200 000 files and its size is more than 300GB so this sync is going to take many (many) weeks at my internet speed. As I need to switch of my computer at least once every 24h, this mean that I can't simply start my cmd and wait for the sync to finish.

The problem is, each time I will double click my cmd after I switch on my computer to start the sync, rclone will list the files on the Amazon S3 storage to know which files it should upload but (if I understood well) this first check will generate many requests which are costly.

So I am looking for a way to resume the rclone sync without rclone doing this listing. A bit like what rclone does when I put my computer to sleep: it can resume the sync just like if nothing happened (without relisting all the files on the server).

So is there a way to keep a log that rclone can access to know where it have to resume the sync without listing all the files on the server?

What is your rclone version (output from rclone version)

rlcone

Which OS you are using and how many bits (eg Windows 7, 64 bit)

rclone v1.50.0
- os/arch: windows/amd64
- go version: go1.13.3

Which cloud storage system are you using? (eg Google Drive)

Amazon S3 deep glacier

I don't know particularly about the costs, and especially not the best route through to Glacier, but traditionally it would be normal practice for any large file migration, limited by bandwidth, to move blocks of files that were not expected to change ahead of time before the final sync. Even if it is just a case of copying files selected by first alphabetical letter of title, that takes load off the final sync, and those initial transfers do not need to query what is already at remote.

200,000 isn't a lot of files for sync, and Glacier might not be appropriate for 300GB rather than stopping at S3.

1 Like

Thanks for your answer Edward. So I will go with sending small blocks of files.

If you use --fast-list listing 200,000 files should take only 200 transactions which doesn't cost very much.

Note that using --checksum or --size-only will stop rclone trying to read the modified time from S3. --checksum will md5sum the local file so you probably want --size-only.

Using those two tricks you should find a resume sync is very quick and efficient.

2 Likes

Thanks a lot for this details Nick and also for replying on my other question!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.