Rclone stuck copying objects with short object expiration

What is the problem you are having with rclone?

We are happily using rclone in order to backup our (on-site) object storage to an off-site (Ceph-based) location.

For most of our buckets we haven’t had any issues, however, for one of our logging bucket we have a relatively short expiry rule:

❯ mc ilm rule ls prod/infra-acpt-logging-loki-303 --json
{
 "status": "success",
 "target": "prod/infra-acpt-logging-loki-303",
 "config": {
  "Rules": [
   {
    "Expiration": {
     "ExpiredObjectDeleteMarker": true
    },
    "ID": "expire-after-1day",
    "NoncurrentVersionExpiration": {
     "NoncurrentDays": 1
    },
    "Status": "Enabled"
   }
  ]
 },
 "updatedAt": "2025-01-28T08:38:48Z"
}

The reason for this short expiry rule is that it is logging and the system itself (in this case LokiStack) has rules setup for when it can be removed (for us, 90 days). So it does not make sense to keep these objects even longer.

However, because of this rule it can sometimes happen that when before rclone job starts transferring it has listed some files that during the job itself will be deleted/expired. This result in a logging entry like this:

2025/09/04 20:38:43 ERROR : infrastructure/1a6a923d9571e693/19744cc7ae7:197453bdba4:645b0269: Failed to copy: failed to open source object: operation error S3: GetObject, https response error StatusCode: 404, RequestID: 1862284EC7F6EE1E, HostID: 5d4e4d0f6fc859fe0f0c9ba35f218284c3f7dd583372659a5ce994e609e5dbc4, NoSuchKey:

When we check this particular object (via MinIO mc). :

❯ mc ls prod/infra-acpt-logging-loki-303/infrastructure/1a6a923d9571e693/19744cc7ae7:197453bdba4:645b0269 --versions
[2025-09-04 20:37:45 CEST]     0B STANDARD 765269d4-a030-4727-b339-0573e65fbe75 v2 DEL 19744cc7ae7:197453bdba4:645b0269
[2025-06-06 14:34:05 CEST]  11KiB STANDARD 19260aa0-2d9c-47dc-ae57-3a85a9d981a8 v1 PUT 19744cc7ae7:197453bdba4:645b0269

It seems that just before it was supposed to transfer it, it got deleted. This has happened multiple times during this rclone job. As this job is supposed to retry on error it will try the job but then find other (new) expired objects. Resulting in some loop. Because of this phenomenon the job was stuck for a long time, over the weekend.

Basically, our question is:

How to make sure something like this won’t happen? Are there any flags that should be enabled or disabled in order to prevent such behavior?

Run the command 'rclone version' and share the full output of the command.

rclone v1.70.2
- os/version: alpine 3.22.0 (64 bit)
- os/kernel: 6.8.0-79-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.24.4
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

MinIO to Ceph RGW

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync --config /config/rclone.conf   source:"infra-acpt-logging-loki-303"/   target:"infra-acpt-logging-loki-303"/   --retries=3   --low-level-retries 10   --log-level=NOTICE   --use-mmap   --list-cutoff=100000   --progress   --stats 1m   --stats-log-level=ERROR   --metadata   --transfers=50   --checkers=8   --checksum   --s3-use-multipart-etag=true   --multi-thread-cutoff=256Mi   --s3-chunk-size=5Mi

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

We are using env variables to set it, but basically should look like:

[minio]
type = s3
provider = minio
access_key_id = xxx
secret_access_key = xxx
endpoint = xxx
region = ""

[ceph]
type = s3
provider = Ceph
access_key_id = xxx
secret_access_key = xxx
endpoint = xxx
sse_customer_algorithm = xxx
sse_customer_key_base64 = xxx
sse_customer_key_md5 = xxx
region = ""

A log from the command that you were trying to run with the -vv flag

[2025-09-04 20:10:03 CEST] INFO: START rclone sync from https://xxx.xxx.xxx.xxx/infra-acpt-logging-loki-303 to https://xxx.xxx.xxx/infra-acpt-logging-loki-303
[2025-09-04 20:10:03 CEST] INFO: Executing command: rclone sync --config /config/rclone.conf   source:"infra-acpt-logging-loki-303"/   target:"infra-acpt-logging-loki-303"/   --retries=3   --low-level-retries 10   --log-level=NOTICE   --use-mmap   --list-cutoff=100000   --progress   --stats 1m   --stats-log-level=ERROR   --metadata   --transfers=50   --checkers=8   --checksum   --s3-use-multipart-etag=true   --multi-thread-cutoff=256Mi   --s3-chunk-size=5Mi
...
[lots of listing]
...
2025/09/04 20:38:43 ERROR : infrastructure/1a6a923d9571e693/19744cc7ae7:197453bdba4:645b0269: Failed to copy: failed to open source object: operation error S3: GetObject, https response error StatusCode: 404, RequestID: 1862284EC7F6EE1E, HostID: 5d4e4d0f6fc859fe0f0c9ba35f218284c3f7dd583372659a5ce994e609e5dbc4, NoSuchKey:
...
[lots of transferring]
...
2025/09/06 10:21:38 NOTICE: Failed to sync with 13 errors: last error was: march failed with 12 error(s): first error: operation error S3: ListObjectsV2, exceeded maximum number of attempts, 10, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Get "https://xxx.xxx.xxx.xxx/infra-acpt-logging-loki-303?delimiter=%2F&encoding-type=url&list-type=2&max-keys=1000&prefix=application%2F3364f26957d32e57%2F": net/http: timeout awaiting response headers
[2025-09-06 10:21:38 CEST] ERROR: rclone sync FAILED with return code 5. See https://rclone.org/docs/#exit-code
[2025-09-06 10:21:38 CEST] ERROR: FAILED rclone sync from https://xxx.xxx.xxx.xxx/infra-acpt-logging-loki-303 to https://xxx.xxx.xxx/infra-acpt-logging-loki-303
[2025-09-06 10:21:38 CEST] INFO: ZIPPING and UPLOADING report log file to https://xxx.xxx.xxx.xxx/infra-acpt-rclone-logging/infra-acpt-logging-loki-303

Indeed the file is not synced to target:

❯ aws s3api head-object \
    --profile infra-acpt \
    --bucket infra-acpt-logging-loki-303 \
    --key infrastructure/1a6a923d9571e693/19744cc7ae7:197453bdba4:645b0269 \
    --sse-customer-algorithm AES256 \
    --sse-customer-key "$KEY_BASE64" \
    --sse-customer-key-md5 "$MD5_DIGEST" \
    --output json;

An error occurred (404) when calling the HeadObject operation: Not Found

maybe --ignore-errors

I think the best approach would be to try to ignore these files so first add

--use-server-modtime

To ensure rclone uses the same dates as your expiry process, then add this to ignore files over 89 days old.

--max-age 89d
--ignore-errors                   Delete even if there are I/O errors

This means it will delete it from the listed, and to be transferred, objects?

This might be worth looking into just for logging buckets. At the moment we do not differentiate between types of buckets (e.g. logging) but only team-env. I will do some testing on this.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.