A number of "corrupted on transfer: sizes differ" errors over past few weeks during sync from Swift (Rackspace) cloud to local

patakijv · August 30, 2022, 9:31pm

What is the problem you are having with rclone?

A number of "corrupted on transfer: sizes differ" error messages over past few weeks or more during sync from cloud to local. The problem seems to happen sporadically but I have a particular file which is recurring more frequently right now so I am proceeding to dive deeper into what is going on to see if I can resolve it. I've read other posts here that offer a workaround of --ignore-size however I would prefer to see if the source of the problem can be determined and resolved so I don't have to loose the intention of that file size check.

In the example log output below the file size is actually 422421 bytes when I inspect it server side either using rclone ls or cyberduck and when it is successfully downloaded it is 422421 bytes locally. For some reason when it fails a file size of 422438 is found at some point. I've noticed many other occurrences throughout the logs with different file sizes, sometimes the first number found its a little smaller and sometimes a little larger (not a consistent pattern of one or the other).

The file type is a zip file and it is created by our software (a Windows .NET app) running at different location which uploads the file to Rackspace Cloud Files.

Rclone is syncing a local file with the Rackspace Cloud Files container (from cloud to local) and is running in a docker container from an image built "FROM" rclone/rclone:latest (at the time the image was last built "latest" had 1.57 in it - see below).

Any ideas of what is going on?
What sort of troubleshooting or next step can be taken to see what is happening that would causes this filesize mismatch and flag it as corrupt? As mentioned, I am aware I could just disable the check using --ignore-size but I would like solve the problem it so I don't have to do that.

Run the command 'rclone version' and share the full output of the command.

inside a Dockerfile FROM rclone/rclone:latest

rclone v1.57.0
- os/version: alpine 3.14.2 (64 bit)
- os/kernel: 4.15.0-161-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.17.2
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Swift (Rackspace)

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

rclone sync --log-file ${LOGS_DIR}/${CONFIG_CODE}.log -v --filter-from ${REMOTE_FILTERS_FILE} ${REMOTE_NAME}:${REMOTE_CONTAINER} ${LOCAL_REMOTE_SYNC_DIR}

The rclone config contents with secrets removed.

[cloudfiles-dfw]
type = swift
auth = https://identity.api.rackspacecloud.com/v2.0
region = DFW

A log from the command with the `-vv` flag

This is an excerpt (I purged dozens of successful transfer output to reduce it for this paste) of a log output that shows it failing and then succeeding on the next attempt however the problem occurs often such that all 3 attempts fail often.

2022/08/30 14:00:45 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "sync" "--log-file" "/srv/logs/company_53.log" "-vv" "--filter-from" "/srv/config/pnd.filters" "cloudfiles-dfw:PnD-Company-53-Content" "/srv/data/cloudfiles-dfw/PnD-Company-53-Content"]
2022/08/30 14:00:45 DEBUG : Creating backend with remote "cloudfiles-dfw:PnD-Company-53-Content"
2022/08/30 14:00:45 DEBUG : Using config file from "/config/rclone/rclone.conf"
2022/08/30 14:00:45 DEBUG : Creating backend with remote "/srv/data/cloudfiles-dfw/PnD-Company-53-Content"
2022/08/30 14:00:45 DEBUG : logs: Excluded
2022/08/30 14:00:45 DEBUG : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: Waiting for checks to finish
2022/08/30 14:00:45 DEBUG : master/sapb1_itt1.zip: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/08/30 14:00:45 DEBUG : master/sapb1_itt1.zip: Unchanged skipping
2022/08/30 14:00:45 DEBUG : master/sapb1_obpp.zip: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/08/30 14:00:45 DEBUG : master/sapb1_obpp.zip: Unchanged skipping
2022/08/30 14:00:45 DEBUG : master/sapb1_obin.zip: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/08/30 14:00:45 DEBUG : master/sapb1_obin.zip: Unchanged skipping
2022/08/30 14:00:45 DEBUG : master/sapb1_oibq.zip: Modification times differ by 4h35m8s: 2022-08-30 14:24:41 +0000 UTC, 2022-08-30 11:59:49 -0700 PDT
2022/08/30 14:00:45 DEBUG : master/sapb1_oitb.zip: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/08/30 14:00:45 DEBUG : master/sapb1_oitb.zip: Unchanged skipping
2022/08/30 14:00:45 DEBUG : master/sapb1_oitm.zip: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/08/30 14:00:45 DEBUG : master/sapb1_oitm.zip: Unchanged skipping
2022/08/30 14:00:45 DEBUG : master/sapb1_owhs.zip: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/08/30 14:00:45 DEBUG : master/sapb1_owhs.zip: Unchanged skipping
2022/08/30 14:00:45 DEBUG : master/sapb1_oibq.zip: md5 = 50c7485b30eca6fa50ab216aeb861ce3 (Swift container PnD-Company-53-Content)
2022/08/30 14:00:45 DEBUG : master/sapb1_oibq.zip: md5 = 9012ad811fdece9023ee8fcf5aa778f6 (Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content)
2022/08/30 14:00:45 DEBUG : master/sapb1_oibq.zip: md5 differ
2022/08/30 14:00:45 ERROR : master/sapb1_oibq.zip: corrupted on transfer: sizes differ 422438 vs 422421
2022/08/30 14:00:45 INFO  : master/sapb1_oibq.zip: Removing failed copy
2022/08/30 14:00:46 DEBUG : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: Waiting for transfers to finish
2022/08/30 14:00:46 ERROR : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: not deleting files as there were IO errors
2022/08/30 14:00:46 ERROR : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: not deleting directories as there were IO errors
2022/08/30 14:00:46 ERROR : Attempt 1/3 failed with 1 errors and: corrupted on transfer: sizes differ 422438 vs 422421
2022/08/30 14:00:46 DEBUG : logs: Excluded
2022/08/30 14:00:46 DEBUG : master/sapb1_obin.zip: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/08/30 14:00:46 DEBUG : master/sapb1_obin.zip: Unchanged skipping
2022/08/30 14:00:46 DEBUG : master/sapb1_obpp.zip: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/08/30 14:00:46 DEBUG : master/sapb1_obpp.zip: Unchanged skipping
2022/08/30 14:00:46 DEBUG : master/sapb1_itt1.zip: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/08/30 14:00:46 DEBUG : master/sapb1_itt1.zip: Unchanged skipping
2022/08/30 14:00:46 DEBUG : master/sapb1_oitm.zip: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/08/30 14:00:46 DEBUG : master/sapb1_oitm.zip: Unchanged skipping
2022/08/30 14:00:46 DEBUG : master/sapb1_oitb.zip: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/08/30 14:00:46 DEBUG : master/sapb1_oitb.zip: Unchanged skipping
2022/08/30 14:00:46 DEBUG : master/sapb1_oibq.zip: md5 = 9012ad811fdece9023ee8fcf5aa778f6 OK
2022/08/30 14:00:46 INFO  : master/sapb1_oibq.zip: Copied (new)
2022/08/30 14:00:46 DEBUG : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: Waiting for checks to finish
2022/08/30 14:00:46 DEBUG : master/sapb1_owhs.zip: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/08/30 14:00:46 DEBUG : master/sapb1_owhs.zip: Unchanged skipping
2022/08/30 14:00:46 DEBUG : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: Waiting for transfers to finish
2022/08/30 14:00:46 DEBUG : Waiting for deletions to finish
2022/08/30 14:00:46 ERROR : Attempt 2/3 succeeded
2022/08/30 14:00:46 INFO  : 
Transferred:   	  825.041 KiB / 825.041 KiB, 100%, 0 B/s, ETA -
Checks:               183 / 183, 100%
Transferred:            1 / 1, 100%
Elapsed time:         1.3s

2022/08/30 14:08:45 DEBUG : 26 go routines active

asdffdsa · August 30, 2022, 9:40pm

hi,

so cyberduck never fails and rclone will fail repeatedly.

patakijv · August 30, 2022, 10:10pm

Uh, no. I only referred to cyberduck when saying I used it to confirm the remote file size (using the UI to inspect the file attributes) as part of my troubleshooting. I don't have an equivalent script that uses cyberduck at the command line for the syncing so there is no comparison of the syncing in what I was saying.

asdffdsa · August 30, 2022, 10:14pm

before when the problem started, for what time period did rclone work as expected.

and might want to update rclone to latest stable.

patakijv · September 2, 2022, 7:50pm

Hi @asdffdsa, I upgraded from v1.57 to v1.59 but it didn't appear to change results.

I don't know exactly how to answer your question about when it "worked as expected" because I am not exactly sure what the nature of the reason for the error messages are, meaning are they legitimate file size mismatches that reported correctly or are there incorrectly reported mismatches.

This scripted rclone solution is syncing every few minutes for each company container and there are a couple dozen zip files per container. On the other side (customer site) our software is uploading freshly built versions of these files every 5 to 10 minutes to the container.

I see examples of failed attempts due to suspected corruption throughout the logs going back many months. I don't know if the occurrence of failed attempts and then a successful 2nd or 3rd attempt means its functioning as expected because the file was actually a different size at that moment on the Rackspace Cloud Files server when it started downloading and then it changed or something else. When 99% of these show in the log, whether it fails all 3 attempts or it gets resolved on the 2nd or 3rd, the file size differences are always very small and the problems typically just show up then go away and don't repeat continuously.

What I do know right now is we have a new situation recently that this one particular file from one customer has been causing which will fail 0,1,2,3 or more many times over and over starting about a week ago. The file sizes displayed are curiously always a small amount more or less and it seems to go in patterns.

So... with this recurring situation ... recurring meaning this one file seems to get flagged very often, I thought to use it as a test case to see what might be causing it and then resolve it for other cases that don't happen in the moment I am troubleshooting so I can't reproduce those. But again, as I stated earlier, I don't even know if the cases I see in the logs that go back many months are accurately showing that rclone is working as expected or is not working as expected. I won't know that until I learn more about this one particular file case that only started up its repeating pattern about a week ago.

patakijv · September 6, 2022, 10:58am

Since I have a file that keeps getting this file size difference more often than others and it a more recurring situation than others (even though it eventually succeeds), is there something I can do to see what it might be about this file that triggers this file size difference flag? Either a test at the file system or in the cloud when it occurs or is there or some debug path in the rclone code when it is detecting size?

Ole · September 6, 2022, 12:24pm

I would suspect clashes between then app uploading and rclone syncing/downloading. Such clashes could cause file modifications during sync and result in the type of issues you are seeing.

I would therefore start by checking file info before and after each sync to investigate this possibility.

Here is a conceptual script illustrating the idea:

echo "pre-checks" >> troubleshooting.log
rclone lsl source:folder/with/troublefile --include="your_trouble_file"  >> troubleshooting.log
rclone lsl target:folder/with/troublefile --include="your_trouble_file" >> troubleshooting.log
rclone sync --retries=1 source:folder/with/troublefile target:folder/with/troublefile --include="your_trouble_file" -vv >> troubleshooting.log
echo "post-checks" >> troubleshooting.log
rclone lsl source:folder/with/troublefile --include="your_trouble_file"  >> troubleshooting.log
rclone lsl target:folder/with/troublefile --include="your_trouble_file" >> troubleshooting.log

If the source file info differs between the pre and post check, then the file is changing during the rclone sync causing the type of errors you are seeing.

I see @ncw typing and stop here

ncw · September 6, 2022, 12:25pm

Is that file being continuously updated? It is possible for this error to be caused by

rclone reads length of file
file gets updated with new length
rclone transfers file
rclone sees size is different to that expected and throws an error

That would seem the most likely scenario.

Ha! I think you said just about the same thing as me, but with some useful tips

patakijv · September 6, 2022, 1:27pm

@Ole , @ncw thanks for your replies and input. I have suspected similar problem scenario where the file might actually be changing, however what is strange is that there are currently about 250 files that are going through this similar update and sync process and while I see examples of this error situation speckled throughout the logs with some files here or there as blips on the radar, this one particular file stands out as occurring throughout the day (even though it seems like it succeeds most of the time eventually and now that I bumped retries to 6 - but not always).

I will plan for some testing by adding a troubleshooting process as you have outlined @ole however when syncing the top level container it will be syncing dozens of files vs just one so I wonder if the differing size test before and after the whole sync will be affected by the duration of the full sync.

New info: I added --ignore-size to see if i would get different results and I see now this same file getting multiple corrupted on transfer: md5 hash differ messages. So on one hand it seems like it does follow the potential scenario @ncw outline however it doesn't explain why this one particular file is more sensitive to this than the other 250 or so.

Question: Is there a way to tell rclone to pause between retry attempts? Perhaps if this scenario is occurring, a pause would help it resolve. Unless the log timestamps are deceiving, the retry attempts are nearly instantaneous.

Question: Are you aware if Swift (Rackspace) objects are supposed to be atomic changes or not? I would think any object storage system is atomic in that you should not be able to read a partially updated file object, instead the system should only present the newly updated object when it is completely uploaded and updated. If it is atomic, shouldn't rclone be able to download the previously identified file in its entirety before the newly uploaded versions makes it?

Ole · September 6, 2022, 2:49pm

--retries-sleep-time

@ncw only outlined the simple conceptual approach, things get more complicated when you are considering checksums (md5) and concurrent multipart up/downloads.

You could add --ignore-checksum, but be careful, let me cite the docs (with my emphasis):

You should only use it if you have had the "corrupted on transfer" error message and you are sure you might want to transfer potentially corrupted data.

I don't know, but I would be wary to rely on it. Even if it was there is still a risk of data corruption due to software/hardware errors along the line (server, network, client, rclone) - we have seen this recently with a major cloud service.

I would instead try to eliminate/reduce the risk of clashes to keep all the transfer checks.

patakijv · September 6, 2022, 3:56pm

Great! - So is it --retries-sleep or --retries-sleep-time ? the linked documentation shows --retries-sleep but the URL and your link show --retries-sleep-time. And are the units in seconds, milliseconds? The documentation doesn't indicate what the units are.

My goal is to not to have to ignore any checks, I was ignoring size temporarily only to test the affect. I wouldn't plan to continue to ignore checksums. I want to try to understand and resolve what is happening without ignoring any checks, that is why I am trying to dig deeper with this repeatable test case. Later today or tomorrow I hope to try more coded testing as your previous outlined process suggests.

There should be a way to see what is actually causing these apparent changed files. I wonder also if it is possible that either the cloud object or the local file system where the data for this file is somehow problematic ( sector, node, inode - not sure what the right terminology is) and that is part of the problem and why it is just this file that is generating these errors so often as compared with the others that are getting updated with similar frequency.

ncw · September 6, 2022, 4:12pm

Yes, but...

For single part objects, swift should be perfectly atomic. For multipart objects then swift can be horribly non-atomic. The relevant flags are

  --swift-chunk-size SizeSuffix   Above this size files will be chunked into a _segments container (default 5Gi)
  --swift-no-chunk                Don't chunk files during streaming upload

Are you uploading files bigger than 5G?

Are you uploading files with rclone mount? If so (and you aren't uploading files > 5G) I recommend setting no_chunk = true in the config which will mean you never have chunked files.

IMHO swift chunked files should be avoided!

You can query rclone for this info... rclone help flags sleep gives

  --retries-sleep duration   Interval between retrying operations if they fail, e.g. 500ms, 60s, 5m (0 to disable)

If this isn't caused by a file that is being continuously updated, or by a multipart file eventual consistency problem, then there is likely a problem in the swift cluster...

Ole · September 6, 2022, 4:34pm

You can also see an overview of all the flags here: https://rclone.org/flags/

A free text seach for "--retries" finds

      --retries int                          Retry operations this many times if they fail (default 3)
      --retries-sleep duration               Interval between retrying operations if they fail, e.g. 500ms, 60s, 5m (0 to disable)

If you find an interesting (non-backend) flag or need to understand a parameter, then you can search in the docs for more info: https://rclone.org/docs/

A free text search for "duration" has 8 hits; this is the 3rd hit:

Options which use TIME use the go time parser. A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".

Do you know how to use a web browser and free text searches on a web page to find technical information?

patakijv · September 6, 2022, 5:28pm

@ncw The files are not that big, this file that is showing this problem I am troubleshooting the problem with is less than 1Mb. They are not uploaded by rclone, rather our C#.Net app that uses a Rackspace Cloud Files Provider package CloudFilesProvider Class from openstacknet. I don''t know off hand if it is chunking - I can check later but if the files are so small, I wonder if it is not even a factor.

@ncw, I added a 10s duration for --retries-sleep and still get multiple occurrences with this file (sapb1_oibq.zip)- see below log using -v recently after adding the 10s delay.

2022/09/06 09:54:45 ERROR : master/sapb1_oibq.zip: corrupted on transfer: sizes differ 424907 vs 424860
2022/09/06 09:54:45 INFO  : master/sapb1_oibq.zip: Removing failed copy
2022/09/06 09:54:45 ERROR : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: not deleting files as there were IO errors
2022/09/06 09:54:45 ERROR : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: not deleting directories as there were IO errors
2022/09/06 09:54:45 ERROR : Attempt 1/6 failed with 1 errors and: corrupted on transfer: sizes differ 424907 vs 424860
2022/09/06 09:54:55 ERROR : master/sapb1_oibq.zip: corrupted on transfer: sizes differ 424907 vs 424860
2022/09/06 09:54:55 INFO  : master/sapb1_oibq.zip: Removing failed copy
2022/09/06 09:54:55 INFO  : master/sapb1_obin.zip: Copied (replaced existing)
2022/09/06 09:54:55 INFO  : master/sapb1_itt1.zip: Copied (replaced existing)
2022/09/06 09:54:55 INFO  : master/sapb1_oitb.zip: Copied (replaced existing)
2022/09/06 09:54:55 INFO  : master/sapb1_owhs.zip: Copied (replaced existing)
2022/09/06 09:54:55 INFO  : master/sapb1_obpp.zip: Copied (replaced existing)
2022/09/06 09:54:55 INFO  : master/sapb1_oitm.zip: Copied (replaced existing)
2022/09/06 09:54:55 ERROR : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: not deleting files as there were IO errors
2022/09/06 09:54:55 ERROR : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: not deleting directories as there were IO errors
2022/09/06 09:54:55 ERROR : Attempt 2/6 failed with 1 errors and: corrupted on transfer: sizes differ 424907 vs 424860
2022/09/06 09:55:05 ERROR : master/sapb1_oibq.zip: corrupted on transfer: sizes differ 424907 vs 424835
2022/09/06 09:55:05 INFO  : master/sapb1_oibq.zip: Removing failed copy
2022/09/06 09:55:06 ERROR : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: not deleting files as there were IO errors
2022/09/06 09:55:06 ERROR : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: not deleting directories as there were IO errors
2022/09/06 09:55:06 ERROR : Attempt 3/6 failed with 1 errors and: corrupted on transfer: sizes differ 424907 vs 424835
2022/09/06 09:55:16 ERROR : master/sapb1_oibq.zip: corrupted on transfer: sizes differ 424835 vs 424907
2022/09/06 09:55:16 INFO  : master/sapb1_oibq.zip: Removing failed copy
2022/09/06 09:55:16 ERROR : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: not deleting files as there were IO errors
2022/09/06 09:55:16 ERROR : Local file system at /srv/data/cloudfiles-dfw/PnD-Company-53-Content: not deleting directories as there were IO errors
2022/09/06 09:55:16 ERROR : Attempt 4/6 failed with 1 errors and: corrupted on transfer: sizes differ 424835 vs 424907
2022/09/06 09:55:26 INFO  : master/sapb1_oibq.zip: Copied (new)
2022/09/06 09:55:26 ERROR : Attempt 5/6 succeeded
2022/09/06 09:55:26 INFO  : 
Transferred:   	   12.296 MiB / 12.296 MiB, 100%, 172.743 KiB/s, ETA 0s
Checks:               461 / 461, 100%
Transferred:           15 / 15, 100%
Elapsed time:        41.1s

@ncw is there any chance that there is something in or about the zip file itself where it is created before it is uploaded to Swift that would trigger a file size or checksum by rclone? Just trying to narrow down the options of where the issue could be getting introduced.

patakijv · September 7, 2022, 11:20am

@ncw just following back on this chunk point you mentioned, according to this CloudFilesProvider.CreateObjectFromFile Method the default chunkSize = 4096 and we are not overriding this in our usage. Based on your comments and experience, are you suggesting we try to find a way modify this value somehow to achieve a no-chunk upload similar to your no_chunk = true? And/or with a file size for this particular file of approximately 424907 bytes, does the topic of chunking become insignificant?

ncw · September 12, 2022, 10:16am

I would be suprised if that was uploading 4k chunks - I suspect that is just an internal buffer size.

If you do a HEAD request on that file then it will say whether it is chunked or not - you an do this with rclone with

rclone lsjson --stat remote:path/to/file -vv --dump bodies

The last HTTP request will have an X-Object-Manifest if it is a chunked file, eg

2022/09/12 10:15:41 DEBUG : HTTP RESPONSE (req 0xc00023a300)
2022/09/12 10:15:41 DEBUG : HTTP/1.1 200 OK
Content-Length: 16
Accept-Ranges: bytes
Content-Type: text/plain; charset=utf-8
Date: Mon, 12 Sep 2022 10:15:41 GMT
Etag: "680e66be91de73f812d7b57017dc975a"
Last-Modified: Fri, 08 Jan 2021 09:45:16 GMT
X-Object-Manifest: test%2Ddir%5Fsegments/file2.txt/1610099116.083927673/%2D1
X-Object-Meta-Mtime: 1610099115.892724194
X-Timestamp: 1610099116.42814
X-Trans-Id: txd1eb2f7adfc640fbb94ff-00631f06cc

patakijv · September 12, 2022, 7:10pm

@ncw, thanks - ok so I ran that command and I see no X-Object-Manifest in any of the responses. (no `X-Object-*`` at all).

So that should confirm it is not a chunked file.

I guess it is on to investigating and testing some other potential aspect of what it is about this particular file that is causing this flag to be thrown so often. I even tried manually deleting the file in the Cloud Object side in case it was a problem with the object and updating it wasn't clearing something.

It is strange that this is the only file among 250 other zip files that have this particularly frequent flag (either size or checksum) issue...

ncw · September 12, 2022, 7:15pm

Yes it does.

It is odd!

The file is being updated regularly isn't it?

I suspect this is some sort of eventual consistency problem in the swift cluster.

If it is then the problems will happen around the time the file is updated - you could see whether the problem times are correlated with the update times.

patakijv · September 12, 2022, 7:31pm

Yes, about a dozen company accounts where each "company" (container) has about a dozen or two zip files (similar set of data files for each company) that get updated every 5 to 10 minutes. So in this particular case, this one file within this one company account/container, every 5 to 10 minutes the file is updated along with a dozen or two other files.

What is meant by "eventual consistency problem in the swift cluster"? Do you have a reference for something I can read up on or is this just your vernacular for a problem with their servers?

Perhaps I should run a battery of tests that check the timestamps for all of the files that get updated in this particular container to see it raises anything suspicious when compared with the times when the flag is thrown for the transfer?

Is there something you are aware of I can test to see if the problems on the swift cluster are occurring and when or is it something I would need to have Rackspace investigate and respond to?

Also, we have a longer term potential plan to use a different backend file/object storage/server, most likely a SFTP server but maybe another object store (Azure, AWS, etc). In your experience does Swift have more or less issues like this or similar than SFTP servers or others or are they all just as potentially finicky in one way or another and could cause rclone sync failures from time to time? The plan is not certain or settled, just speculating but using rclone as part of the mix will continue one way or another and would like to make a decision that considers this input.

ncw · September 12, 2022, 7:54pm

Take a look here: Eventual consistency - Wikipedia - swift clusters are known to be eventually consistent. This means that after an update it will take a while (unspecified) for all the reads to return the new data. After the update and before that time, reads may return the old data or the new data, and the metadata (the listing) can be out of sync with the data also.

Might be an idea.

Eventual consistency is a fact of life on most object storage platforms. That is why rclone does retries, and it looks like the retries clear up the problem so maybe you don't need to worry about it?

Swift is much worse than for example S3 at eventual consistency. I think they've pretty much eliminated eventual consistency problems at AWS.

You won't see any eventual consistency problems with a non-distributed system like an sftp server.

That said, the internet is a messy place, and rclone does low and high level retries so you don't have to worry about syncs failing - it will retry them!