Intended behavior of sync and --checksum for Swift

What is the problem you are having with rclone?

I'm in the progress of migrating from one Swift installation to another, but having a little trouble fully understanding and what flags to use.

First I transferred everything from a fairly large container (~50k objects / ~10T) using the first [1] command below using rclone v.1.50.2 (Ubuntu distributed).

  • Things looked OK for the most part, and altough there were quite a few error messages "Failed to copy: Object Corrupted", after retries (attempt 2/3) it seemed to successfully have transferred everything.
  • The objects that failed initially are not compressed (e.g. .tar.gz), nor are they marked as such in the object metadata (i.e.Content-Encoding: gzip), so I don't believe using --no-gzip-encoding should have any effect.
  • Running exactly the same command again after it finished yielded no errors or transfers .

Still curious about how --checksum would work I then ran the same command, but now with it included [2].

  • This time around however it again produced error messages of "Failed to copy: Object Corrupted".
  • Moreover, it started transferring objects again, and from what I can see the objects was already present on the destination, checksum and exact bytesize reported as the same.
  • I verified that the destination Swift indeed had recieved the objects rclone now tried to transfer from the first run by looking at its logs, and looking at the last modified at the destination before transfer was finished by rclone matched the period of which the initial transfer took place.
    • After rclone finished the same object however the reported Last Modified was updated, but Meta Mtime was retained, as well as size/checksum unchanged
    • Altough I did not use it, --use-server-modtime states that using without also using --update - "would cause all files modified at any time other than the last upload time to be uploaded again", is possibly something similar happening in this case?
  • The source container included segmented objects, however I have used the default --swift-chunk-size of 5Gi, and none of the objects are larger than this and as such the destination does not include a _segments container.

So the question then really is; is this expected behavior?
Should I have just used --checksum to begin with?

What is your rclone version (output from rclone version)

rclone v1.56.1
- os/version: ubuntu 20.04 (64 bit)
- os/kernel: 5.4.0-53-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.16.8
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

OpenStack Swift v2.7.1 (source) and v2.25.1 (destination)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

# [1] initial transfer
$ rclone sync --config rclone.conf src:container dest:container\
  --progress --transfers=40 --checkers=40

# [2] verification
$ rclone sync --config rclone.conf src:container dest:container\
  --progress --transfers=40 --checkers=40 --checksum

The rclone config contents with secrets removed.

[src]
type = swift
user = ***
key = ***
auth = https://domain:5000/v3
tenant = ***
region = ***
domain = ***
tenant_domain = ***
auth_version = 3

[dest]
type = swift
user = ***
key = ***
auth = https://domain:5000/v3
tenant = ***
region = ***
domain = ***
tenant_domain = ***
auth_version = 3

A log from the command with the -vv flag

The output of this is seems too verbose or irrelevant to include in this scenario, but I'm happy to parts of it if necessary.

Hi timss,

I have no experience with Swift, but it sounds like you may be interested in these rclone commands and options:

https://rclone.org/commands/rclone_check/
https://rclone.org/commands/rclone_lsl/
https://rclone.org/commands/rclone_hashsum/
https://rclone.org/swift/#limitations
https://rclone.org/filtering/#include-include-files-matching-pattern

The last option will allow you to test and debug most commands on a single file like this:

rclone … --include “myfile” -vv

Note: It will search all your files, which may take some time if you have many files. This can be avoided by pointing your remote(s) directly to the folder containing the file.

Thanks for your reply Ole, I will look into playing around with some of those commands, --download or --one-way for the check command looks interesting.

Seems like I forgot to specify that the objects which was reported as "Failed to copy: Object Corrupted" or retransferred the second time around with --checksum that I looked closer into wasn't segmented, so if I understand it correctly I don't believe the discrepancy of reported checksum for segmented files necessarily is the/an issue in this case. I did however only sample a few of the objects, there were quite a few reported errors (thousands).

I should also mention that the second command in OP (using --checksum) is still running, and is reporting as having transferred more data than there is in the source ("12.763Ti / 19.188 TiByte, 67"), and done more checks than objects present ("58940 / 68981"). Not sure if perhaps this is due to multiple attempts at transfers.

Good to hear, sounds like you are on the right track.

For copying swift -> swift this is the best plan. This is because reading modtimes on swift takes an extra transaction, whereas checksums come back in the listing.

Using --fast-list is a good idea too if you have enough memory.

I'm not sure why you are seeing object corrupted though...

This will be to do with retries, you are right.

Thanks for having a look at my thread :bowing_man: First time using rclone and it's blowing my socks off!

I think I found something that could explain the trouble I had with using --checksum:

$ rclone sync --config rclone.conf src:container dest:container\
  --progress --checksum -vv --dump requests,responses
[...]
2021-09-30 11:38:18 DEBUG : myfile.gz: md5 = <hash> (Swift container mycontainer>
2021-09-30 11:38:18 DEBUG : myfile.gz: md5 = "<hash>" (Swift container mycontainer>

Apparently the old Swift installation communicated the md5 checksum ETag meta field without quotes, while the new had it enclosed with double quotes. This is an explicit feature of a middleware included in Swift 2.24 which I had enabled without much more thought at the time:

According to Swifts interpretation "clients should be aware of the fact that ETags may be quoted for RFC compliance", implying that perhaps rclone should detect (and strip?) the quotes when trying to check for checksum differences between source and destination?

I can choose to not enable double quoted ETag by default on server side, and by doing so I'm not getting any errors while using --checksum. This would probably also work well for the other users of this installation of Swift, but does not resolve the issue when quotes are present.


Thanks for clarifying the part about transferred data as well, makes sense :+1:

Ah, I was unaware of this.

Yes rclone should be stripping the quotes.

This really needs to be fixed in the upstream library GitHub - ncw/swift: Go language interface to Swift / Openstack Object Storage / Rackspace cloud files (golang)

Can you make an issue there with the above details in please?

Even better send a PR as I don't think it will be hard to fix :slight_smile:

Sure thing, I'll look into creating a PR. I located what piece of the code I believe is responsible for the comparison(s), just need to get a build environment running as this is the first time I've played around with Go :slight_smile: Might be delayed to over the weekend

I updated this in the spring (when I was Go beginner too):
https://github.com/rclone/rclone/blob/master/CONTRIBUTING.md

It also has some tips to get started with Go.

I hope it will give you a good start, otherwise complain to me :slight_smile:

1 Like

I created a PR for ncw/swift which I believe resolves this issue :slight_smile:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.