Docs for server-side copy

What is the problem you are having with rclone?

This is documentation related. The features I need are already present in rclone but were not clear until now :slight_smile:

I was struggling to understand why rclone was re-copying files that had not changed.

After a lot of testing I realised it was because I have the source and target remotes the same, which enables server-side copy. It sounded like server-side copy is a good idea but in my case it actually performs worse than downloading/uploading objects.

I think the documentation could be made clearer to indicate how server-side copy relates to metadata. Suggestion below.

Run the command 'rclone version' and share the full output of the command.

rclone v1.64.2
- os/version: amazon 2 (64 bit)
- os/kernel: 4.14.322-246.539.amzn2.x86_64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.21.3
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

s3

I am copying data from an s3 bucket to an s3 bucket via an access point.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

Example commands and stats below. The order is important - performance varies depending on if the destination is empty or full. The source data, source bucket, and destination access point are identical. The destination prefix changes between command 2 and 3.

# Command 1: Implicit server-side copy, empty destination.

time rclone copy afs1:<source bucket>/<source prefix> afs1:<dest access point>/test_same --transfers 96 --checkers 96 --stats 10000h --stats-log-level NOTICE

2023/11/08 09:34:06 NOTICE:
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Transferred:       156795 / 156795, 100%
Server Side Copies:156795 @ 7.305 GiB
Elapsed time:      5m20.3s


real    5m20.442s
user    2m29.040s
sys     0m25.132s

# Command 2: Implicit server-side copy, full destination after #1.

time rclone copy afs1:<source bucket> afs1:<dest access point>/test_same --transfers 96 --checkers 96 --stats 10000h --stats-log-level NOTICE
2023/11/08 09:39:55 NOTICE:
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:            156795 / 156795, 100%
Transferred:       156799 / 156799, 100%
Server Side Copies:156799 @ 7.307 GiB
Elapsed time:       5m5.1s


real    5m5.244s
user    4m16.633s
sys     0m39.394s

Notice how the second command takes around the same time, and re-copies everything.

In command 1 & 2 no metadata is added to the destination objects.

# Command 3: No server-side copy, empty destination

time rclone copy afs1:<source bucket>/<source prefix> afs1-2:<dest access point>/test_diff --transfers 96 --checkers 96 --stats 10000h --stats-log-level NOTICE
2023/11/08 09:48:27 NOTICE:
Transferred:        7.307 GiB / 7.307 GiB, 100%, 33.643 MiB/s, ETA 0s
Transferred:       156800 / 156800, 100%
Elapsed time:      6m16.4s


real    6m16.502s
user    3m38.027s
sys     1m17.418s

# Command 4: No server-side copy, full destination after #3

time rclone copy afs1:<source bucket>/<source prefix> afs1-2:<dest access point>/test_diff --transfers 96 --checkers 96 --stats 10000h --stats-log-level NOTICE
2023/11/08 09:51:29 NOTICE:
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:            156800 / 156800, 100%
Elapsed time:      2m31.9s


real    2m32.020s
user    2m3.796s
sys     0m18.942s

Notice how command 3 is a bit slower than 1. But 4 is much faster than 2.

In practice, if rclone is used to replicate an "incrementally changing" dataset, than the destination is likely to be empty only the first time, and subsequent runs would be much faster if command 3&4 were used when only a few files are being changed.

This is probably only true when there are many tiny files (per above), as I would understand larger files would take longer to download and upload.

The rclone config contents with secrets removed.

 rclone config show

[afs1]
type = s3
provider = AWS
env_auth = true
region = af-south-1
location_constraint = af-south-1
no_check_bucket = true
server_side_encryption = aws:kms

[afs1-2]
type = s3
provider = AWS
env_auth = true
region = af-south-1
location_constraint = af-south-1
no_check_bucket = true
server_side_encryption = aws:kms

A log from the command with the -vv flag

(not included)

Suggestion

I think the docs regarding server side copy could be improved as follows:

1

Explicitly state that when a server side copy is done, the source object metadata is copied - RClone will not add further user metadata such as mtime or checksum to the destination objects. Subsequent copies might not perform better even if objects are not changed. If server-side copy is disabled, then objects will be re-copied at least once by Rclone (but possibly more if I understand correctly how metadata is maintained on objects). But, assuming the remotes are unchanged when rclone runs again - server-side copy will happen and at least for S3 it does not seem to detect matching objects as efficiently as rclone itself could.

There is probably a better way to word this, and really server-side probably works fine when files are larger than in my case.

Worth mentioning, my buckets are encrypted with different KMS keys - causing different ETags - maybe s3 backend does not determine a match in that case.

2

I think that the docs on server-side copy should also call out how to disable server-side copy if you want to avoid the aforementioned issue.
--disable copy

At first I thought it wasn't possible because there was no global flag that I found. The forum helped :slight_smile:

Below is an example showing the performance is better for my second run with this flag.

# Command 5: Server-side copy disabled, empty destination
time rclone copy afs1:<source bucket>/<source prefix> afs1:<dest access point>/test_same2 --transfers 96 --checkers 96 --stats 10000h --stats-log-level NOTICE --disable copy
2023/11/08 09:59:15 NOTICE:
Transferred:        7.307 GiB / 7.307 GiB, 100%, 33.033 MiB/s, ETA 0s
Transferred:       156800 / 156800, 100%
Elapsed time:       6m8.3s


real    6m8.445s
user    3m36.750s
sys     1m20.406s

# Command 6: Server-side copy disabled, full destination
time rclone copy afs1:<source bucket>/<source prefix> afs1:<dest access point>/test_same2 --transfers 96 --checkers 96 --stats 10000h --stats-log-level NOTICE --disable copy
2023/11/08 10:17:23 NOTICE:
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:            156800 / 156800, 100%
Elapsed time:      3m42.7s


real    3m42.775s
user    2m3.230s
sys     0m20.286s

As expected, similar performance to #3 and #4 but without the need for duplicate remote in the config.

1 Like

That isn't right. A server side copy should not copy the files again.

Here is an example...

$ rclone copy -vv s3:rclone/test s3:rclone/test.copy
[snip]

Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Transferred:           99 / 99, 100%
Server Side Copies:    99 @ 4.742 KiB
Elapsed time:         2.1s

And again

$ rclone copy -vv s3:rclone/test s3:rclone/test.copy
[snip]
2023/11/08 11:25:10 INFO  : There was nothing to transfer
2023/11/08 11:25:10 INFO  : 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Checks:                99 / 99, 100%
Elapsed time:         0.9s

Note the 0 transfers and the nothing to transfer message.

In your example above you've used a different rclone command each time which would explain it, but that might just be a typo when writing this up.

time rclone copy afs1:<source bucket>/<source prefix> afs1:<dest access point>/test_same --transfers 96 --checkers 96 --stats 10000h --stats-log-level NOTICE
time rclone copy afs1:<source bucket> afs1:<dest access point>/test_same --transfers 96 --checkers 96 --stats 10000h --stats-log-level NOTICE

So server side copy should not recopy stuff - lets dig down into that.

Can you make a small example which shows the problem and produce a log with -vv?

Thanks for the feedback.

Yes, that was a typo in my write up - well spotted. Checking my raw copy of the commands shows prefix I used was identical.

Here is a smaller test, there were 200 files involved.

I'm going to do another test to and from the exact same bucket which then will rule out KMS.

rclone copy -vv afs1:<source bucket>/<source prefix>/ afs1:<dest access point>/test_small --transfers 96 --checkers 96 --stats 10000h --stats-log-level NOTICE --log-file 1.log
head 1.log
2023/11/08 11:41:03 DEBUG : rclone: Version "v1.64.2" starting with parameters ["rclone" "copy" "-vv" "afs1:<source bucket>/<source prefix>/" "afs1:<dest access point>/test_small" "--transfers" "96" "--checkers" "96" "--stats" "10000h" "--stats-log-level" "NOTICE" "--log-file" "1.log"]
2023/11/08 11:41:03 DEBUG : Creating backend with remote "afs1:<source bucket>/<source prefix>/"
2023/11/08 11:41:03 DEBUG : Using config file from "/home/ec2-user/.config/rclone/rclone.conf"
2023/11/08 11:41:03 DEBUG : fs cache: renaming cache item "afs1:<source bucket>/<source prefix>/" to be canonical "afs1:<source bucket>/<source prefix>"
2023/11/08 11:41:03 DEBUG : Creating backend with remote "afs1:<dest access point>/test_small"
2023/11/08 11:41:03 DEBUG : info_version=1/_INFO: Need to transfer - File not found at Destination
2023/11/08 11:41:03 DEBUG : info_version=1/_SUCCESS: Need to transfer - File not found at Destination
2023/11/08 11:41:03 DEBUG : info_version=1/part-00000-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Need to transfer - File not found at Destination
2023/11/08 11:41:03 DEBUG : info_version=1/part-00001-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Need to transfer - File not found at Destination
2023/11/08 11:41:03 DEBUG : info_version=1/part-00002-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Need to transfer - File not found at Destination
tail -n 20 1.log

2023/11/08 11:41:04 INFO  : info_version=1/part-00191-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Copied (server-side copy)
2023/11/08 11:41:04 DEBUG : info_version=1/part-00198-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:04 INFO  : info_version=1/part-00198-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Copied (server-side copy)
2023/11/08 11:41:04 DEBUG : info_version=1/part-00195-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:04 INFO  : info_version=1/part-00195-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Copied (server-side copy)
2023/11/08 11:41:04 DEBUG : info_version=1/part-00193-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:04 INFO  : info_version=1/part-00193-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Copied (server-side copy)
2023/11/08 11:41:04 DEBUG : info_version=1/part-00199-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:04 INFO  : info_version=1/part-00199-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Copied (server-side copy)
2023/11/08 11:41:04 DEBUG : info_version=1/part-00194-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:04 INFO  : info_version=1/part-00194-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Copied (server-side copy)
2023/11/08 11:41:04 DEBUG : info_version=1/part-00197-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:04 INFO  : info_version=1/part-00197-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Copied (server-side copy)
2023/11/08 11:41:04 NOTICE:
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Transferred:          202 / 202, 100%
Server Side Copies:   202 @ 7.141 MiB
Elapsed time:         1.1s

2023/11/08 11:41:04 DEBUG : 201 go routines active

Then second command

rclone copy -vv afs1:<source bucket>/<source prefix>/ afs1:<dest access point>/test_small --transfers 96 --checkers 96 --stats 10000h --stats-log-level NOTICE --log-file 2.log
head -n20 2.log
2023/11/08 11:41:48 DEBUG : rclone: Version "v1.64.2" starting with parameters ["rclone" "copy" "-vv" "afs1:<source bucket>/<source prefix>/" "afs1:<dest access point>/test_small" "--transfers" "96" "--checkers" "96" "--stats" "10000h" "--stats-log-level" "NOTICE" "--log-file" "2.log"]
2023/11/08 11:41:48 DEBUG : Creating backend with remote "afs1:<source bucket>/<source prefix>/"
2023/11/08 11:41:48 DEBUG : Using config file from "/home/ec2-user/.config/rclone/rclone.conf"
2023/11/08 11:41:48 DEBUG : fs cache: renaming cache item "afs1:<source bucket>/<source prefix>/" to be canonical "afs1:<source bucket>/<source prefix>"
2023/11/08 11:41:48 DEBUG : Creating backend with remote "afs1:<dest access point>/test_small"
2023/11/08 11:41:49 DEBUG : S3 bucket <dest access point> path test_small: Waiting for checks to finish
2023/11/08 11:41:49 DEBUG : info_version=1/_INFO: Modification times differ by 7360h7m19s: 2023-01-05 19:33:45 +0000 UTC, 2023-11-08 11:41:04 +0000 UTC
2023/11/08 11:41:49 DEBUG : info_version=1/_INFO: Dst hash empty - aborting Src hash check
2023/11/08 11:41:49 DEBUG : info_version=1/_INFO: Src hash empty - aborting Dst hash check
2023/11/08 11:41:49 DEBUG : info_version=1/part-00064-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Modification times differ by 7360h10m16s: 2023-01-05 19:30:48 +0000 UTC, 2023-11-08 11:41:04 +0000 UTC
2023/11/08 11:41:49 DEBUG : info_version=1/part-00064-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:49 DEBUG : info_version=1/part-00064-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Src hash empty - aborting Dst hash check
2023/11/08 11:41:49 DEBUG : info_version=1/part-00060-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Modification times differ by 7360h7m43s: 2023-01-05 19:33:21 +0000 UTC, 2023-11-08 11:41:04 +0000 UTC
2023/11/08 11:41:49 DEBUG : info_version=1/part-00060-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:49 DEBUG : info_version=1/part-00060-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Src hash empty - aborting Dst hash check
2023/11/08 11:41:49 DEBUG : info_version=1/part-00081-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Modification times differ by 7360h9m17s: 2023-01-05 19:31:47 +0000 UTC, 2023-11-08 11:41:04 +0000 UTC
2023/11/08 11:41:49 DEBUG : info_version=1/part-00081-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:49 DEBUG : info_version=1/part-00081-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Src hash empty - aborting Dst hash check
2023/11/08 11:41:49 DEBUG : info_version=1/part-00030-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Modification times differ by 7360h8m26s: 2023-01-05 19:32:38 +0000 UTC, 2023-11-08 11:41:04 +0000 UTC
2023/11/08 11:41:49 DEBUG : info_version=1/part-00030-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
tail -n20 2.log
2023/11/08 11:41:50 DEBUG : info_version=1/part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:50 DEBUG : info_version=1/part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Src hash empty - aborting Dst hash check
2023/11/08 11:41:50 INFO  : info_version=1/part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Copied (server-side copy)
2023/11/08 11:41:50 DEBUG : info_version=1/part-00112-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:50 DEBUG : info_version=1/part-00112-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Src hash empty - aborting Dst hash check
2023/11/08 11:41:50 INFO  : info_version=1/part-00112-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Copied (server-side copy)
2023/11/08 11:41:50 DEBUG : info_version=1/part-00138-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:50 DEBUG : info_version=1/part-00138-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Src hash empty - aborting Dst hash check
2023/11/08 11:41:50 INFO  : info_version=1/part-00138-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Copied (server-side copy)
2023/11/08 11:41:50 DEBUG : info_version=1/part-00181-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:50 DEBUG : info_version=1/part-00181-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Src hash empty - aborting Dst hash check
2023/11/08 11:41:50 INFO  : info_version=1/part-00181-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Copied (server-side copy)
2023/11/08 11:41:50 NOTICE:
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:               202 / 202, 100%
Transferred:          202 / 202, 100%
Server Side Copies:   202 @ 7.141 MiB
Elapsed time:         1.4s

2023/11/08 11:41:50 DEBUG : 777 go routines active

And just to check a sample file:

cat 1.log | grep "part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet"

2023/11/08 11:41:03 DEBUG : info_version=1/part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Need to transfer - File not found at Destination
2023/11/08 11:41:04 DEBUG : info_version=1/part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:04 INFO  : info_version=1/part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Copied (server-side copy)

cat 2.log | grep "part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet"
2023/11/08 11:41:49 DEBUG : info_version=1/part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Modification times differ by 7360h8m46s: 2023-01-05 19:32:18 +0000 UTC, 2023-11-08 11:41:04 +0000 UTC
2023/11/08 11:41:49 DEBUG : info_version=1/part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:49 DEBUG : info_version=1/part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Src hash empty - aborting Dst hash check
2023/11/08 11:41:50 DEBUG : info_version=1/part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Dst hash empty - aborting Src hash check
2023/11/08 11:41:50 DEBUG : info_version=1/part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Src hash empty - aborting Dst hash check
2023/11/08 11:41:50 INFO  : info_version=1/part-00153-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Copied (server-side copy)

This might be the problem... The large files don't have md5 hashes so rclone assumes they are different because their modtimes are different. I suspect these files weren't uploaded with rclone.

Try adding --size-only to your copy command.

Yes, the source files were not copied with RClone but placed on s3 by another application. So there is no user metadata eg for the modtime or checksum on any source objects.

If I add --size-only then RClone does not transfer anything and terminates very quickly.

rclone copy -vv afs1:<source bucket>/<source prefix>/ afs1:<dest access point>/test_small --transfers 96 --checkers 96 --stats 10000h --stats-log-level NOTICE --log-file 4.log --size-only
tail 4.log
2023/11/08 12:46:40 DEBUG : info_version=1/part-00081-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Unchanged skipping
2023/11/08 12:46:40 DEBUG : info_version=1/part-00019-b33d9cdc-3de9-4115-96b2-52f3c8535d94-c000.snappy.parquet: Unchanged skipping
2023/11/08 12:46:40 DEBUG : S3 bucket <dest access point> path test_small: Waiting for transfers to finish
2023/11/08 12:46:40 INFO  : There was nothing to transfer
2023/11/08 12:46:40 NOTICE:
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:               202 / 202, 100%
Elapsed time:         0.4s

2023/11/08 12:46:40 DEBUG : 8 go routines active

In your test, rclone did not recopy the data even though it was doing a server side copy.

Did your source data already have modate / checksum metadata present?
I assume so because on s3 the physical modification date can't be set on an object, so rclone would always find the source and dest to be different if not for metadata.

Good, that is what is supposed to happen :slight_smile:

Correct.

Yes it was uploaded with rclone.

Looking at your logs above

Rclone should be reading the Last-Modified date from the object (if rclone's metadata is not present) and setting it as metadata on the destination.

In the above log the times are

src: 2023-01-05 19:32:18 +0000 UTC
dst: 2023-11-08 11:41:04 +0000 UTC

So given that the dst is round about today, it looks like the src is being read correctly but the modtime on the destination is not being set correctly which brings us back to your comment above:

So yes you are right, these objects are copied and retain their original metadata. Because they don't get the rclone special mtime metadata (and didn't have it originally) then rclone sees them as changed and copies them again.

Perhaps rclone should be reading the metadata from the source, adding an mtime and setting that on the destination. What do you think?

Currently exactly which metadata is preserved when doing server side copies isn't specified in the rclone integration tests so backends are free to do whatever is most efficient.

Anyway, back to your problem, adding either --size-only or --checksum should work for these copies I think.

Yes. So on Amazon S3 its unfortunately not possible for a user (us) to set the physical modtime of an object. There are some blog posts with workarounds involving object tags even for AWS services.

For my use case, this would be perfect. Assuming you mean "reading the modtime from source" (because I don't have it in the metadata) and then setting it on destination as metadata there :wink:

This can be done for example with the S3 CopyObject API, see MetadataDirective and Metadata.

It would give a significant performance boost, and without needing to check based only on size.

It may not work for everyone though ... of course :slight_smile:

Some thoughts:

  • I have no existing metadata on the source objects, so for me the "REPLACE" MetadataDirective is fine. However, I'm not sure what other backends support, even "S3 API compatible" ones.

  • If there are any s3 compatible backends that do support setting the object mtime then this feature isn't necessary for them. But I guess it would be unlikely that it is supported.

  • If metadata is replaced by default, someone will need an option to disable that. Maybe they have some existing metadata and for whatever reason it needs to be copied. On the other hand, not having this as a default behavior, means non-ideal performance when doing server-side copies of objects that don't already have the mtime metadata field rclone expects.

  • There is no additional HTTP request needed for S3 to replace the metadata along with copying the object - no cost / performance impact. Not sure about other backends.

  • RClone has a --metadata flag which preserves existing metadata and conflicts with what we are describing (replacing metadata). I feel like --metadata is implied when doing server-side copies at the moment, though I haven't tested with an object that has other user metadata.

  • I feel like, whatever we decide, the rclone docs on server-side copy and metadata should highlight rclones behavior when the source files lack mtime metadata. This should be a common use case.

By set the modtime, I meant set the mtime user defined metadata which rclone could be doing.

Yes, rclone would read the LastUpdated field from the source object (which is what it uses if there is no x-amz-mtime) and write it as x-amz-mtime on the destination object.

Currently I'm using the Copy metadata directive here.

I'd need to be sure I'm copying all the metadata and there is quite a bit of it which is why I haven't used Replace before.

I don't think there are.

I think rclone will have to read all the metadata from the source object and set the x-amz-mtime if necessary before writing it to the destination object.

But there may be an additional HTTP request to read the metadata from the source object if rclone hasn't read it already.

This is a somewhat undefined area... Rclone does its best to preserve metadata when doing server side copies at the moment.

I think noting this in the docs is a good idea!

Great, I'm following :slight_smile:

The scenarios regarding server-side copy:

  1. "Copy metadata" (current functionality)

No changes needed.

  1. "Replace metadata" (new functionality)

Useful when the source data was not produced with any user-defined metadata, and we want to add metadata.

mtime could be added to all objects easily as it is available in object listings.
It can avoid an extra HTTP request.

Use REPLACE metadata directive?

Might be implemented as an s3 specific flag, if only s3/s3-compatible supports this behavior ?

  1. "Merge metadata" (new) ?

Would also use replace metadata directive, but HEAD the object to get existing metadata. Copy/preserve existing metadata where possible, update rclone metadata fields if present. (I wonder should we replace existing, or ignore existing?)

Would require an extra HTTP request.

I can imagine this only in an edge case:

Only some of the rclone metadata is present and we want to add more.

Example, mtime present on source but now we want to add checksum at destination.

Checksum is interesting as it requires download of larger objects... This starts falling outside the scope of server-side copying.

Maybe there is some existing combination of commands that provides for the edge case already - so no new functionality needed here.
Just some clarifying docs I suppose.

What do you think?

Hey @ncw

Just wondering if you've had a chance to think on this topic yet :slight_smile:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.