How to pick up sync where it was before closing the cmd (without listing all files)

Toc · May 29, 2020, 4:49pm

I am using a simple cmd file to sync my local folder to a amazon s3 storage.

rclone sync "D:\heavy folder" "aws":"heavy folder" --log-file %logpath%" --log-level INFO --delete-after --copy-links --stats-one-line -P --stats 5s

My folder has about 200 000 files and its size is more than 300GB so this sync is going to take many (many) weeks at my internet speed. As I need to switch of my computer at least once every 24h, this mean that I can't simply start my cmd and wait for the sync to finish.

The problem is, each time I will double click my cmd after I switch on my computer to start the sync, rclone will list the files on the Amazon S3 storage to know which files it should upload but (if I understood well) this first check will generate many requests which are costly.

So I am looking for a way to resume the rclone sync without rclone doing this listing. A bit like what rclone does when I put my computer to sleep: it can resume the sync just like if nothing happened (without relisting all the files on the server).

So is there a way to keep a log that rclone can access to know where it have to resume the sync without listing all the files on the server?

What is your rclone version (output from `rclone version`)

rlcone

Which OS you are using and how many bits (eg Windows 7, 64 bit)

rclone v1.50.0
- os/arch: windows/amd64
- go version: go1.13.3

Which cloud storage system are you using? (eg Google Drive)

Amazon S3 deep glacier

Edward_Barker · May 29, 2020, 8:31pm

I don't know particularly about the costs, and especially not the best route through to Glacier, but traditionally it would be normal practice for any large file migration, limited by bandwidth, to move blocks of files that were not expected to change ahead of time before the final sync. Even if it is just a case of copying files selected by first alphabetical letter of title, that takes load off the final sync, and those initial transfers do not need to query what is already at remote.

200,000 isn't a lot of files for sync, and Glacier might not be appropriate for 300GB rather than stopping at S3.

Toc · May 30, 2020, 3:39pm

Thanks for your answer Edward. So I will go with sending small blocks of files.

ncw · May 30, 2020, 5:33pm

If you use --fast-list listing 200,000 files should take only 200 transactions which doesn't cost very much.

Note that using --checksum or --size-only will stop rclone trying to read the modified time from S3. --checksum will md5sum the local file so you probably want --size-only.

Using those two tricks you should find a resume sync is very quick and efficient.

Toc · May 30, 2020, 6:37pm

Thanks a lot for this details Nick and also for replying on my other question!

system · June 2, 2020, 6:37pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

How to pick up sync where it was before closing the cmd (without listing all files)

What is your rclone version (output from rclone version)

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Which cloud storage system are you using? (eg Google Drive)

What is your rclone version (output from `rclone version`)