Union with create_policy = all : copy stuck when first union fails

What is the problem you are having with rclone?

I'm trying to copy a file to a union of two upstreams. The first one may fail, for example due to lack of space or permissions.

When it fails, it seems that rclone is stuck trying to copy to the second upstream.
When using local upstreams specifically, rclone creates the empty destination file, but gets stuck trying to write to it

Run the command 'rclone version' and share the full output of the command.

2022/03/05 15:43:11 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "version"]
rclone v1.57.0
- os/version: ubuntu 18.04 (64 bit)
- os/kernel: 4.15.0-166-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.17.2
- go/linking: static
- go/tags: none
2022/03/05 15:43:11 DEBUG : rclone: Version "v1.57.0" finishing with parameters ["rclone" "version"]

Which cloud storage system are you using?

Union with create_policy = all (tested with local and remote backends)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy some-file u:

The rclone config contents with secrets removed.

[u]
type = union
upstreams = /notfound /tmp
create_policy = all

A log from the command with the -vv flag

2022/03/05 15:52:59 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "copy" "some-file" "u:"]
2022/03/05 15:52:59 DEBUG : Creating backend with remote "some-file"
2022/03/05 15:52:59 DEBUG : Using config file from "/home/jovyan/.config/rclone/rclone.conf"
2022/03/05 15:52:59 DEBUG : fs cache: adding new entry for parent of "some-file", "/home/jovyan"
2022/03/05 15:52:59 DEBUG : Creating backend with remote "u:"
2022/03/05 15:52:59 DEBUG : Creating backend with remote "/tmp"
2022/03/05 15:52:59 DEBUG : Creating backend with remote "/notfound"
2022/03/05 15:52:59 DEBUG : union root '': actionPolicy = *policy.EpAll, createPolicy = *policy.All, searchPolicy = *policy.FF
2022/03/05 15:52:59 DEBUG : some-file: Need to transfer - File not found at Destination
2022/03/05 15:53:59 INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:       1m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

The command is stuck at this step. If I then run this command

rclone ls u:

I can see that some-file was created but is empty

        0 some-file

Now if I run the copy again, the file gets written properly

2022/03/05 15:56:21 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "copy" "some-file" "u:"]
2022/03/05 15:56:21 DEBUG : Creating backend with remote "some-file"
2022/03/05 15:56:21 DEBUG : Using config file from "/home/jovyan/.config/rclone/rclone.conf"
2022/03/05 15:56:21 DEBUG : fs cache: adding new entry for parent of "some-file", "/home/jovyan"
2022/03/05 15:56:21 DEBUG : Creating backend with remote "u:"
2022/03/05 15:56:21 DEBUG : Creating backend with remote "/tmp"
2022/03/05 15:56:21 DEBUG : Creating backend with remote "/notfound"
2022/03/05 15:56:21 DEBUG : union root '': actionPolicy = *policy.EpAll, createPolicy = *policy.All, searchPolicy = *policy.FF
2022/03/05 15:56:21 DEBUG : some-file: Sizes differ (src 1 vs dst 0)
2022/03/05 15:56:21 DEBUG : some-file: md5 = 68b329da9893e34099c7d8ad5cb9c940 OK
2022/03/05 15:56:21 INFO  : some-file: Copied (replaced existing)
2022/03/05 15:56:21 INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         0.0s

2022/03/05 15:56:21 DEBUG : 4 go routines active

How long did you wait? Rclone might be doing retries at this point which might take some time.

If it is stuck after 10 minutes (say) can you press CTRL-\ and post the output here. We can then work out where rclone is stuck from that.

I left it running for 20 minutes then ran CTRL-\. Here's the output

<7>DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "copy" "some-file" "u:"]
<7>DEBUG : rclone: systemd logging support activated
<7>DEBUG : Creating backend with remote "some-file"
<7>DEBUG : Using config file from "/home/owner/.config/rclone/rclone.conf"
<7>DEBUG : fs cache: adding new entry for parent of "some-file", "/home/owner/Desktop"
<7>DEBUG : Creating backend with remote "u:"
<7>DEBUG : Creating backend with remote "/tmp"
<7>DEBUG : Creating backend with remote "/notfound"
<7>DEBUG : union root '': actionPolicy = *policy.EpAll, createPolicy = *policy.All, searchPolicy = *policy.FF
<7>DEBUG : some-file: Need to transfer - File not found at Destination
<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:       1m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:       2m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:       3m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:       4m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:       5m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:       6m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:       7m0.6s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:       8m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:       9m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:      10m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:      11m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:      12m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:      13m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:      14m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:      15m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:      16m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:      17m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:      18m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:      19m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

<6>INFO  : 
Transferred:              1 B / 1 B, 100%, 0 B/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:      20m0.0s
Transferring:
 *                                     some-file:100% /1, 0/s, 0s

^\SIGQUIT: quit
PC=0x558cf505f201 m=0 sigcode=128

goroutine 0 [idle]:
runtime.futex()
        runtime/sys_linux_amd64.s:519 +0x21
runtime.futexsleep(0xc000175800, 0x0, 0x0)
        runtime/os_linux.go:44 +0x36
runtime.notesleep(0x558cf78c5d50)
        runtime/lock_futex.go:160 +0x87
runtime.mPark()
        runtime/proc.go:1441 +0x2a
runtime.stopm()
        runtime/proc.go:2408 +0x78
runtime.findrunnable()
        runtime/proc.go:2984 +0x865
runtime.schedule()
        runtime/proc.go:3367 +0x239
runtime.park_m(0xc00031a000)
        runtime/proc.go:3516 +0x14d
runtime.mcall()
        runtime/asm_amd64.s:307 +0x45

goroutine 1 [semacquire, 20 minutes]:
sync.runtime_Semacquire(0x0)
        runtime/sema.go:56 +0x25
sync.(*WaitGroup).Wait(0x558cf50004a7)
        sync/waitgroup.go:130 +0x71
github.com/rclone/rclone/backend/union.multithread(0x2, 0xc00017a000)
        github.com/rclone/rclone/backend/union/union.go:921 +0xd2
github.com/rclone/rclone/backend/union.(*Fs).put(0xc0006a80e0, {0x558cf6b01108, 0xc000052140}, {0x558cf6ac4bc0, 0xc00048a800}, {0x7f53b8fdb1c0, 0xc00012c240}, 0x0, {0xc0001ca5a0, 0x1, ...})
        github.com/rclone/rclone/backend/union/union.go:470 +0x6c5
github.com/rclone/rclone/backend/union.(*Fs).Put(0x558cf6ae0de8, {0x558cf6b01108, 0xc000052140}, {0x558cf6ac4bc0, 0xc00048a800}, {0x7f53b8fdb1c0, 0xc00012c240}, {0xc0001ca5a0, 0x1, 0x1})
        github.com/rclone/rclone/backend/union/union.go:505 +0x147
github.com/rclone/rclone/fs/operations.Copy({0x558cf6b01108, 0xc000052140}, {0x558cf6b36718, 0xc0006a80e0}, {0x0, 0x0}, {0x7fff9bf0441b, 0x0}, {0x558cf6b357c8, 0xc00012c240})
        github.com/rclone/rclone/fs/operations/operations.go:472 +0x1a95
github.com/rclone/rclone/fs/operations.moveOrCopyFile({0x558cf6b01108, 0xc000052140}, {0x558cf6b36718, 0xc0006a80e0}, {0x558cf6b35758, 0xc00026f9e0}, {0x7fff9bf0441b, 0xc0000b2360}, {0x7fff9bf0441b, 0x9}, ...)
        github.com/rclone/rclone/fs/operations/operations.go:1869 +0xa91
github.com/rclone/rclone/fs/operations.CopyFile(...)
        github.com/rclone/rclone/fs/operations/operations.go:1891
github.com/rclone/rclone/cmd/copy.glob..func1.1()
        github.com/rclone/rclone/cmd/copy/copy.go:88 +0x87
github.com/rclone/rclone/cmd.Run(0x1, 0x1, 0x2, 0xc000843d50)
        github.com/rclone/rclone/cmd/cmd.go:255 +0x13b
github.com/rclone/rclone/cmd/copy.glob..func1(0x558cf7815d40, {0xc0003a78e0, 0x2, 0x2})
        github.com/rclone/rclone/cmd/copy/copy.go:84 +0xcf
github.com/spf13/cobra.(*Command).execute(0x558cf7815d40, {0xc0003a7880, 0x2, 0x2})
        github.com/spf13/cobra@v1.2.1/command.go:860 +0x5f8
github.com/spf13/cobra.(*Command).ExecuteC(0x558cf7811740)
        github.com/spf13/cobra@v1.2.1/command.go:974 +0x3bc
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.2.1/command.go:902
github.com/rclone/rclone/cmd.Main()
        github.com/rclone/rclone/cmd/cmd.go:555 +0x76
main.main()
        github.com/rclone/rclone/rclone.go:14 +0x17

goroutine 8 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc000122080)
        go.opencensus.io@v0.23.0/stats/view/worker.go:276 +0xb9
created by go.opencensus.io/stats/view.init.0
        go.opencensus.io@v0.23.0/stats/view/worker.go:34 +0x92

goroutine 66 [select, 2 minutes]:
github.com/rclone/rclone/cmd.StartStats.func2()
        github.com/rclone/rclone/cmd/cmd.go:364 +0xf1
created by github.com/rclone/rclone/cmd.StartStats
        github.com/rclone/rclone/cmd/cmd.go:360 +0xb2

goroutine 20 [select]:
github.com/rclone/rclone/fs/accounting.(*StatsInfo).averageLoop(0xc00038c1a0)
        github.com/rclone/rclone/fs/accounting/stats.go:310 +0x16a
created by github.com/rclone/rclone/fs/accounting.(*StatsInfo).startAverageLoop.func1
        github.com/rclone/rclone/fs/accounting/stats.go:339 +0x72

goroutine 21 [chan receive, 20 minutes]:
github.com/rclone/rclone/backend/local.(*fadvise).worker(0xc00004e240)
        github.com/rclone/rclone/backend/local/fadvise_unix.go:115 +0x4b
created by github.com/rclone/rclone/backend/local.newFadvise
        github.com/rclone/rclone/backend/local/fadvise_unix.go:75 +0x12f

goroutine 22 [select]:
github.com/rclone/rclone/fs/accounting.(*Account).averageLoop(0xc00048a800)
        github.com/rclone/rclone/fs/accounting/accounting.go:212 +0x10a
created by github.com/rclone/rclone/fs/accounting.newAccountSizeName
        github.com/rclone/rclone/fs/accounting/accounting.go:110 +0x44f

goroutine 25 [select, 20 minutes]:
io.(*pipe).Write(0xc0004326c0, {0xc000178000, 0x1, 0x1000})
        io/pipe.go:94 +0x21a
io.(*PipeWriter).Write(0x0, {0xc000178000, 0xc0001766f8, 0x558cf5040672})
        io/pipe.go:163 +0x25
bufio.(*Writer).Flush(0xc00004e440)
        bufio/bufio.go:607 +0x62
github.com/rclone/rclone/backend/union.multiReader.func1()
        github.com/rclone/rclone/backend/union/union.go:428 +0x2ca
created by github.com/rclone/rclone/backend/union.multiReader
        github.com/rclone/rclone/backend/union/union.go:421 +0x436

goroutine 27 [select, 20 minutes]:
io.(*pipe).Read(0xc000432720, {0xc000896000, 0x8000, 0x8000})
        io/pipe.go:57 +0xb7
io.(*PipeReader).Read(0x0, {0xc000896000, 0xc00069fad8, 0x558cf5040672})
        io/pipe.go:134 +0x25
io.(*teeReader).Read(0xc00029eb20, {0xc000896000, 0xc0001ca900, 0x8000})
        io/io.go:560 +0x37
io.copyBuffer({0x558cf6acb680, 0xc0001ca900}, {0x558cf6ac6680, 0xc00029eb20}, {0x0, 0x0, 0x0})
        io/io.go:423 +0x1b2
io.Copy(...)
        io/io.go:382
os.genericReadFrom(0x558cf5423f86, {0x558cf6ac6680, 0xc00029eb20})
        os/file.go:162 +0x5d
os.(*File).ReadFrom(0xc000152040, {0x558cf6ac6680, 0xc00029eb20})
        os/file.go:156 +0x1b0
io.copyBuffer({0x558cf6ac6d20, 0xc000152040}, {0x558cf6ac6680, 0xc00029eb20}, {0x0, 0x0, 0x0})
        io/io.go:409 +0x14b
io.Copy(...)
        io/io.go:382
github.com/rclone/rclone/backend/local.(*Object).Update(0xc000432a20, {0x558cf6b01108, 0xc000052140}, {0x558cf6ac65e0, 0xc000152030}, {0x7f53b8fdb1c0, 0xc00012c240}, {0xc0001ca5a0, 0x1, 0x1})
        github.com/rclone/rclone/backend/local/local.go:1176 +0x305
github.com/rclone/rclone/backend/local.(*Fs).Put(0x0, {0x558cf6b01108, 0xc000052140}, {0x558cf6ac65e0, 0xc000152030}, {0x7f53b8fdb1c0, 0xc00012c240}, {0xc0001ca5a0, 0x1, 0x1})
        github.com/rclone/rclone/backend/local/local.go:559 +0xcc
github.com/rclone/rclone/backend/union/upstream.(*Fs).Put(0xc000122b80, {0x558cf6b01108, 0xc000052140}, {0x558cf6ac65e0, 0xc000152030}, {0x7f53b8fdb1c0, 0xc00012c240}, {0xc0001ca5a0, 0x1, 0x1})
        github.com/rclone/rclone/backend/union/upstream/upstream.go:172 +0xc4
github.com/rclone/rclone/backend/union.(*Fs).put.func1(0x1)
        github.com/rclone/rclone/backend/union/union.go:477 +0x105
github.com/rclone/rclone/backend/union.multithread.func1()
        github.com/rclone/rclone/backend/union/union.go:918 +0x63
created by github.com/rclone/rclone/backend/union.multithread
        github.com/rclone/rclone/backend/union/union.go:916 +0x4a

rax    0xca
rbx    0x0
rcx    0x558cf505f203
rdx    0x0
rdi    0x558cf78c5d50
rsi    0x80
rbp    0x7fff9bf02e88
rsp    0x7fff9bf02e40
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0x7fff9bf02f98
r13    0xc000454a80
r14    0x558cf78c54e0
r15    0x0
rip    0x558cf505f201
rflags 0x286
cs     0x33
fs     0x0
gs     0x0

The union backend is stuck waiting for all the files to upload by the look of it.

I think what has happened is that it has failed to upload to the first upstream as you said, but the second upload isn't progressing because the first upload hasn't been reading data from the multi reader.

When rclone wants to upload a file twice, it tee's the data which splits it into two streams. One is sent to the first upstream and one to the second. However there isn't much buffering in that Tee so when the first upload fails and stops uploading, the second upload blocks waiting from data from the Tee which will never come because the first upload isn't reading it.

I had a go at fixing this here - can you give it a go?

v1.58.0-beta.6023.211884814.fix-union-deadlock on branch fix-union-deadlock (uploaded in 15-30 mins)

Looks good, thank you so much!

I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.58

Thanks for testing :slight_smile:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.