Question about the behavior of `rclone copy`

Hi, and welcome!

Let's say I have files i want to copy over from S3 to Azure Blob, and I do this every single day as a cron job, rclone will not copy over files that already exist in Azure Blob from S3?

True.

how does rclone determine whether the file is copied over?

From docs: "Doesn't transfer unchanged files, testing by size and modification time or MD5SUM. "

Now what does this actually mean? I find it a bit confusing myself, but this post sums up what kind of comparisons rclone performs to decide if copy/upload should be performed, with some configurable options:

There are 3 main syncing methods

* no flags - (size, modtime)
* --size-only (size)
* --checksum (size, checksum)

Then there are the modifiers

* --ignore-size makes all of the above skip the size check
* --ignore-times - uploads unconditionally (no checks) 

Is this hash calculated on the client running rclone or do files in S3 come with a hash?

All of the information are retrieved from the source and destination backends, so rclone uses the file size, timestamp and hash information as reported by S3 and Azure blob in your case. If both S3 and Azure keeps file hashes of same type, and judging from this both uses MD5, then it will be used whenever rclone wants to compare hashes. If there are no common hashes, then rclone will not be able to compare hashes during copy/sync (but see note about check command below).

I want to know if rclone still downloads all the files onto the client machine before determining if they are to be copied over.

No, it does not. However, there is a check command which you can use to compare without copying anything, and it has an option --download to do exactly this: Download files from both remotes and compare them on the client.

1 Like