Union with --copy-dest enabled is slower than expected

What is the problem you are having with rclone?

Union backend (onedrives) with --copy-dest enabled is slower than expected.

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.0

  • os/version: ubuntu 22.04 (64 bit)
  • os/kernel: 5.15.0-40-generic (aarch64)
  • os/type: linux
  • os/arch: arm64
  • go/version: go1.18.3
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using?

sync from google drive to union (onedrive)

The rclone config contents with secrets removed.

[GoogleDrive_BackupSource]
type = drive
client_id = /
client_secret = /
scope = drive
token = /

[OneDrive_Union]
type = union
upstreams = OneDrive_01: OneDrive_02: OneDrive_03:
create_policy = epff
cache_time = 3600

[OneDrive_01]
type = onedrive
token = /
drive_id = b!
drive_type = business
client_id = /
client_secret = /
auth_url = /
token_url = /
chunk_size = 100Mi

[OneDrive_02] and [OneDrive_03] are similar with [OneDrive_01]

The command you were trying to run (eg rclone copy /tmp remote:tmp)

["rclone" "sync" "--config=./configs/rclone.conf" "GoogleDrive_BackupSource:/sample_folder_a" "OneDrive_Union:/sample_folder_b" "--transfers=3" "--checkers=6" "--max-backlog=2500" "--use-mmap" "--stats-log-level=NOTICE" "--stats-file-name-length=0" "--delete-during" "--fast-list" "--size-only" "-vv"]

and

["rclone" "sync" "--config=./configs/rclone.conf" "GoogleDrive_BackupSource:/sample_folder_a" "OneDrive_Union:/sample_folder_b" "--transfers=3" "--checkers=6" "--max-backlog=2500" "--use-mmap" "--copy-dest=OneDrive_Union:/sample_folder_c" "--stats-log-level=NOTICE" "--stats-file-name-length=0" "--delete-during" "--fast-list" "--size-only" "-vv"]

A log from the command with the -vv flag

I have omitted some file name and filre directory.

  • Without copy-dest
2022/07/31 22:23:29 DEBUG : a.md5: Sizes identical
2022/07/31 22:23:29 DEBUG : a.md5: Unchanged skipping
2022/07/31 22:23:29 DEBUG : a.json: Sizes identical
2022/07/31 22:23:29 DEBUG : a.json: Unchanged skipping
...
Transferred:        6.693 MiB / 7.711 MiB, 87%, 48.677 KiB/s, ETA 21s
Checks:              4989 / 4989, 100%
Transferred:           37 / 48, 77%
Elapsed time:      3m11.1s
  • With copy-dest enabled
2022/07/31 22:57:27 DEBUG : a12.txt: Sizes differ (src 160033 vs dst 159955)
2022/07/31 22:57:27 DEBUG : a12.txt: Destination not found in --copy-dest
2022/07/31 22:57:27 DEBUG : a12.txt: Sizes identical
2022/07/31 22:57:27 DEBUG : a12.txt: Unchanged skipping
2022/07/31 22:57:28 DEBUG : asset/[u.mp4: Sizes identical
2022/07/31 22:57:28 DEBUG : asset/[u.mp4: Sizes identical
2022/07/31 22:57:28 DEBUG : asset/[u.mp4: Unchanged skipping
2022/07/31 22:57:28 DEBUG : asset/test.md5: Sizes identical
2022/07/31 22:57:28 DEBUG : asset/test.md5: Sizes identical
2022/07/31 22:57:28 DEBUG : asset/test.md5: Unchanged skipping
2022/07/31 22:57:29 DEBUG : asset/test_index.mp4: Sizes differ (src 598248 vs dst 597764)
2022/07/31 22:57:29 DEBUG : asset/test_index.mp4: Destination not found in --copy-dest
2022/07/31 22:57:29 DEBUG : asset/test_index.mp4: Sizes identical
2022/07/31 22:57:29 DEBUG : asset/test_index.mp4: Unchanged skipping
...
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:               280 / 2792, 10%
Elapsed time:      3m11.1s

The overall result is that when copy-dest is enabled, checking files is significantly slower.

Notes

I think that rclone should check existed files first and use copy-dest if they are not identical.

From the logs, I suspect that rclone is checking copy-dest first.

Meanwhile, there is also possibility of onedrive api limits, especially on uploads and server side copy, but I do not think that matters in this case.

The code for copy-dest is here

The reason it is slow is the call CopyDest.NewObject(ctx, remote) which asks onedrive whether this file exists or not and if it does to give details. This potentially involves quite a lot of API calls, looking up the parent directory ids, then asking for that filename in that directory id.

So what I think you are suggesting is that in this code

We calculate NeedTransfer(ctx, dstObj, srcObj) first as it is cheap. If the result of that is false then we never need to call CompareOrCopyDest(ctx, fdst, dstObj, srcObj, copyDestDir, backupDir)

I've had a go at that here

v1.60.0-beta.6377.c4bbd262f.fix-compare-dest on branch fix-compare-dest (uploaded in 15-30 mins)

Please give it a go and see if you think it fixes it :slight_smile:

Sorry, I believe that this doesn't fix.

The logging doesn't show any difference from before, i.e. duplicate file checks.

Logs here rclone logs for fix-compare-dest

I see what I did, I only changed the single file case.

Try this

v1.60.0-beta.6377.13c14178b.fix-compare-dest on branch fix-compare-dest (uploaded in 15-30 mins)

Here is my test

# Setup
mkdir src old
echo "test file" > src/file.txt 
echo "old test file" > old/file.txt 
rclone sync src s3:rclone/src
rclone sync old s3:rclone/old
rclone sync src s3:rclone/dst

# Test old then new rclone for directories / files
rclone-v1.59.0 -vv sync src s3:rclone/dst --copy-dest s3:rclone/old
rclone -vv sync src s3:rclone/dst --copy-dest s3:rclone/old
rclone-v1.59.0 -vv copyto src/file.txt s3:rclone/dst/file.txt --copy-dest s3:rclone/old
rclone -vv copyto src/file.txt s3:rclone/dst/file.txt --copy-dest s3:rclone/old

It works this time. Thank your for your support.

I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.60

Thanks for pointing this out optimization - it makes a lot of difference.