Upload to oracle S3 fails for large files

What is the problem you are having with rclone?

I'm testing rclone with Oracle Object Storage to sync some backup files (large files, 100GB and more). Rclone works fine with small files, multipart upload also works for small files, however large files do not get uploaded:

  1. There is a delay of 1 hour before anything happens on the network (or anything gets logged)
    I guess rclone reads the entire file before start; have tried to avoid that and disable checksum, but with no luck
    2.After the delay, there are multiple sessions seen on the firewall, either closed by client or closed by server. No session is able to transmit more than 10 MB of data. Relevant part of rclone log attached bellow (408 error code from Oracle object storage; context canceled).

I have started with example command from rclone documentation for Oracle S3, tried to experiment with chunk size, s3_disable_checksum parameters... I would like to sync entire folder, however the rclone command attached to this topic tries to upload one large file.

Any suggestion?

Run the command 'rclone version' and share the full output of the command.

rclone v1.66.0
- os/version: ubuntu 22.04 (64 bit)
- os/kernel: 5.15.0-105-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.22.1
- go/linking: static
- go/tags: none

Which cloud storage system are you using?

Oracle Object Storage

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone -vv  --progress   --max-stats-groups 10 --log-format date,time,UTC,longfile --fast-list --buffer-size 256Mi --oos-no-check-bucket --oos-upload-cutoff 10Mi --multi-thread-cutoff 16Mi --multi-thread-streams 3000 --transfers 10 --checkers 64 --retries 2 --oos-chunk-size 10Mi --oos-upload-concurrency 8 --oos-attempt-resume-upload --oos-leave-parts-on-error --s3-disable-checksum --log-file=rclone.txt sync /localstorage/backup/FileLargerThan100GB namespace:bucket/backup/ -vv

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[company]
type = oracleobjectstorage
provider = user_principal_auth
namespace = XXX
compartment = XXX
region = eu-frankfurt-1
config_file = /root/.oci/config
config_profile = Default

A log from the command that you were trying to run with the -vv flag

#Probably read of the file for 1 hour
2024/05/15 12:55:55 github.com/rclone/rclone/fs/operations/multithread.go:145: DEBUG : FileLargerThan100GB: multi-thread copy: disabling buffering because source is local disk
2024/05/15 13:45:11 github.com/rclone/rclone/backend/oracleobjectstorage/multipart.go:408: DEBUG : FileLargerThan100GB: attempting to resume upload for FileLargerThan100GB (if any)
2024/05/15 13:45:11 github.com/rclone/rclone/backend/oracleobjectstorage/multipart.go:443: DEBUG : FileLargerThan100GB: created new upload id: ...
2024/05/15 13:45:11 github.com/rclone/rclone/backend/oracleobjectstorage/multipart.go:125: DEBUG : FileLargerThan100GB: open chunk writer: started multipart upload: ...
2024/05/15 13:45:11 github.com/rclone/rclone/fs/operations/multithread.go:217: DEBUG : FileLargerThan100GB: Starting multi-thread copy with 9615 chunks of size 10Mi with 3000 parallel streams
2024/05/15 13:45:11 github.com/rclone/rclone/fs/operations/multithread.go:81: DEBUG : FileLargerThan100GB: multi-thread copy: chunk 6/9615 (52428800-62914560) size 10Mi starting

#408 error codes
2024/05/15 13:50:20 github.com/rclone/rclone/fs/operations/reopen.go:194: DEBUG : FileLargerThan100GB: Seek from 10485760 to 0
2024/05/15 13:50:24 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 1/2 (error Put "https://objectstorage.eu-frankfurt-1.oraclecloud.com/...": use of closed network connection)
2024/05/15 13:50:24 github.com/rclone/rclone/fs/pacer.go:46: DEBUG : pacer: Rate limited, increasing sleep to 10ms
2024/05/15 13:50:25 github.com/rclone/rclone/fs/operations/reopen.go:194: DEBUG : FileLargerThan100GB: Seek from 32768 to 0
2024/05/15 13:51:15 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 1/2 (error Put "https://objectstorage.eu-frankfurt-1.oraclecloud.com/...": use of closed network connection)
2024/05/15 13:51:15 github.com/rclone/rclone/fs/pacer.go:46: DEBUG : pacer: Rate limited, increasing sleep to 20ms
2024/05/15 13:51:15 github.com/rclone/rclone/fs/operations/reopen.go:194: DEBUG : FileLargerThan100GB: Seek from 1048576 to 0
2024/05/15 13:51:23 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 1/2 (error Error returned by ObjectStorage Service. Http Status Code: 408. Error Code: RequestTimeout. Opc request id: fra-1:... Message:
Operation Name: UploadPart
Timestamp: 2024-05-15 13:51:23 +0000 GMT
Client Version: Oracle-GoSDK/65.55.1
Request Endpoint: PUT https://objectstorage.eu-frankfurt-1.oraclecloud.com/...
Troubleshooting Tips: See https://docs.oracle.com/iaas/Content/API/References/apierrors.htm#apierrors_408__408_requesttimeout for more information about resolving this error.
Also see https://docs.oracle.com/iaas/api/#/en/objectstorage/20160918/MultipartUpload/UploadPart for details on this operation's requirements.
To get more info on the failing request, you can set OCI_GO_SDK_DEBUG env var to info or higher level to log the request/response details.
If you are unable to resolve this ObjectStorage issue, please contact Oracle support and provide them this full error message.)
2024/05/15 13:51:23 github.com/rclone/rclone/fs/pacer.go:46: DEBUG : pacer: Rate limited, increasing sleep to 40ms
2024/05/15 13:51:38 github.com/rclone/rclone/fs/operations/reopen.go:194: DEBUG : FileLargerThan100GB: Seek from 98304 to 0
2024/05/15 13:51:40 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 1/2 (error Error returned by ObjectStorage Service. Http Status Code: 408. Error Code: RequestTimeout. Opc request id: fra-1.... Message:
Operation Name: UploadPart
Timestamp: 2024-05-15 13:51:40 +0000 GMT
Client Version: Oracle-GoSDK/65.55.1
Request Endpoint: PUT https://objectstorage.eu-frankfurt-1.oraclecloud.com/...
Troubleshooting Tips: See https://docs.oracle.com/iaas/Content/API/References/apierrors.htm#apierrors_408__408_requesttimeout for more information about resolving this error.
Also see https://docs.oracle.com/iaas/api/#/en/objectstorage/20160918/MultipartUpload/UploadPart for details on this operation's requirements.
To get more info on the failing request, you can set OCI_GO_SDK_DEBUG env var to info or higher level to log the request/response details.
If you are unable to resolve this ObjectStorage issue, please contact Oracle support and provide them this full error message.)
2024/05/15 13:51:40 github.com/rclone/rclone/fs/pacer.go:46: DEBUG : pacer: Rate limited, increasing sleep to 80ms
2024/05/15 13:52:07 github.com/rclone/rclone/fs/operations/reopen.go:194: DEBUG : FileLargerThan100GB: Seek from 10485760 to 0
2024/05/15 13:52:27 github.com/rclone/rclone/fs/operations/reopen.go:194: DEBUG : FileLargerThan100GB: Seek from 98304 to 0
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 2/2 (error Put "https://objectstorage.eu-frankfurt-1...": use of closed network connection)
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:46: DEBUG : pacer: Rate limited, increasing sleep to 160ms
2024/05/15 13:52:32 github.com/rclone/rclone/backend/oracleobjectstorage/multipart.go:193: ERROR : FileLargerThan100GB: multipart upload failed to upload part:439 err: Put "https://objectstorage.eu-frankfurt-1.oraclecloud.com/...uploadPartNum=439": use of closed network connection
2024/05/15 13:52:32 github.com/rclone/rclone/fs/operations/multithread.go:68: DEBUG : FileLargerThan100GB: multi-thread copy: chunk 439/9615 failed: multi-thread copy: failed to write chunk: multipart upload failed to upload part: Put "https://objectstorage.eu-frankfurt-1.oraclecloud.com/...PartNum=439": use of closed network connection
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 2/2 (error context canceled)
2024/05/15 13:52:32 github.com/rclone/rclone/fs/operations/multithread.go:81: DEBUG : FileLargerThan100GB: multi-thread copy: chunk 3001/9615 (31457280000-31467765760) size 10Mi starting
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 2/2 (error context canceled)
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 1/2 (error context canceled)
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 1/2 (error context canceled)
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 1/2 (error context canceled)
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 1/2 (error context canceled)
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 1/2 (error context canceled)
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 2/2 (error context canceled)
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:46: DEBUG : pacer: Rate limited, increasing sleep to 320ms
2024/05/15 13:52:32 github.com/rclone/rclone/backend/oracleobjectstorage/multipart.go:193: ERROR : FileLargerThan100GB: multipart upload failed to upload part:1815 err: context canceled
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:46: DEBUG : pacer: Rate limited, increasing sleep to 640ms
2024/05/15 13:52:32 github.com/rclone/rclone/backend/oracleobjectstorage/multipart.go:193: ERROR : FileLargerThan100GB: multipart upload failed to upload part:35 err: context canceled
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:46: DEBUG : pacer: Rate limited, increasing sleep to 1.28s
2024/05/15 13:52:32 github.com/rclone/rclone/fs/pacer.go:46: DEBUG : pacer: Rate limited, increasing sleep to 2s
2024/05/15 13:52:32 github.com/rclone/rclone/backend/oracleobjectstorage/multipart.go:193: ERROR : FileLargerThan100GB: multipart upload failed to upload part:1781 err: context canceled
2024/05/15 13:52:41 github.com/rclone/rclone/fs/pacer.go:90: DEBUG : pacer: low level retry 2/2 (error context canceled)

#After 20 hours still running
Transferred:       56.188 MiB / 93.923 GiB, 0%, 0 B/s, ETA -
Transferred:            0 / 1, 0%
Elapsed time:  19h18m49.0s
Transferring:

OCI has max parts limit of 10,000 chunks and you set --oos-chunk-size 10M - probably too small for such large files.

Try to set it to something much larger, e.g. 128M

Or let rclone manage it for you and leave all advanced options to defaults. Simply to don't tinker with them. Then as per docs:

Rclone will automatically increase the chunk size when uploading a large file of known size to stay below the 10,000 chunks limit.

This is not S3 you are using so this flag is useles.

Use --oos-disable-checksum

But maybe you can try S3 interface (if you have credentials) if OOC does not behave.

Thx for reply. I have previously tried with 100M chunk size, will try once more and report results...

What about the gap between 2024/05/15 12:55:55 and 13:45:11 in the log? What exactly is going on? Is there any way to speed up this?

Should --oos-disable-checksum solve this delay?

2024/05/15 12:55:55 github.com/rclone/rclone/fs/operations/multithread.go:145: DEBUG : FileLargerThan100GB: multi-thread copy: disabling buffering because source is local disk
2024/05/15 13:45:11 github.com/rclone/rclone/backend/oracleobjectstorage/multipart.go:408: DEBUG : FileLargerThan100GB: attempting to resume upload for FileLargerThan100GB (if any)

Have tried with this command (removed all oos parameters and added oos_disable_checksum).
Upload starts right away (no 1 hour delay), I can see 10 sessions on my firewall, where bytes got uploaded (before there were numerous closed sessions).

If I can trust rclone output, my speed is 13 MiB/s. I'm on a 2 GiB internet link. Is this expected speed? Could I try something to achieve better upload?

rclone -vv  --progress   --max-stats-groups 10 --log-format date,time,UTC,longfile --fast-list --buffer-size 256Mi --multi-thread-cutoff 16Mi  --transfers 10 --checkers 64 --retries 2  --oos-disable-checksum --log-file=rclone.txt sync /localstorage/backup/largefile namespace:bucket/backup/ -vv

hi, every setup is different, so there is not really an expected speed.
fwiw, you can always test using the official oracle copy tool.

based on the rclone docs
"Increasing --oos-upload-concurrency will increase throughput (8 would be a sensible value) and increasing --oos-chunk-size also increases throughput (16M would be sensible).

The default values are high enough to gain most of the possible performance without using too much memory."
"Multipart uploads will use --transfers * --oos-upload-concurrency * --oos-chunk-size extra memory"

so might test by increasing those values.

Hi,
I have added --oos-upload-concurrency 8 but the throughput did not change. I have tried vairous --oos-chunk-size values but the script became unreliable.

The main problem for me was the delay before upload, which caused some sort of a network problem either on corporate firewall (session timeout) or in Oracle Object storage (getting 408 errors from Oracle)... After adding oos_disable_checksum parameter (and removing oos-chunk-size), the script is reliable.

Regarding speed... Rclone will be used instead of Oracle Storage Gateway (official oracle solution to expose Object storage as NFS; deprecated now). We had there around 60 Mib/s throughput, so rclone in current setup is three times slower, but still good enough for me. It is possible that the reduce in speed is caused by some bottleneck in our infrastructure and cannot hold rclone 100% responsible for that.

Thanks for all your help

welcome

that seems to be the case.