Sonarr / Radarr Hangs when using cache temp upload path


#21

The problem seems to be you can’t do ops on files in upload quote while the background uploader is doing work.

Sonarr / Radarr shouldn’t be the issue here because plenty of tools work fine using them in this way. Google Drive File Stream for example works perfectly with both tools, and there’s other tools too (webdrive, netdrive, expandrive, potatodrive etc).

I don’t dabble with the go language but it wouldn’t surprise me if a mutex being held for a long time period being the cause.


#22

I download a file via a torrent or nzb.
The download completes.
Sonarr/Radarr should pick up the file from the completed location and start to copy it to the destination.

I’ve noticed that Sonarr/Radarr hangs/stops during that file copy regardless of if it’s local or anything else. A bigger file can produce the error. When I used local storage, I used hard links to prevent this from happening.

Are you seeing that occur as well?

The way I get the cache-tmp-upload is that it has to copy to the destination since the fuse file system can’t tell it’s the same or not the same. So the larger the file, the longer the copy.

At that point, Sonarr/Radarr should be done with said file once it’s imported.

Once the cache-tmp-upload expires, it should do a rclone move of the file to the location and I’m not sure how that would interfere with Sonarr/Radarr.


#23

Dug in a little more and did testing with vfs-cache writes as that’s what I’m currently using.

My mount:

/home/felix/go/bin/rclone cmount gcrypt: /gmedia --allow-other --dir-cache-time 48h --vfs-read-chunk-size 32M --vfs-read-chunk-size-limit 2G --cache-dir /data/rclone --vfs-cache-mode writes --buffer-size 512M --syslog --umask 002 --bind 192.168.1.30 -o auto_cache -o sync_read --log-level INFO

I noticed that shows seem quite fine as they copy but I get some duplicates hanging around for larger movies that lock things up.

I tested by grabbing a bunch of movies before I went to bed last night and got a lot partials:

felix@gemini:/gmedia$ find . -name *partial*
./Radarr_Movies/Blade (1998)/Blade (1998).mkv.partial~
./Radarr_Movies/Blade II (2002)/Blade II (2002).mkv.partial~
./Radarr_Movies/Blade Trinity (2004)/Blade Trinity (2004).mkv.partial~
./Radarr_Movies/Ghost Rider (2007)/Ghost Rider (2007).mkv.partial~
./Radarr_Movies/Hulk (2003)/Hulk (2003).mkv.partial~
./Radarr_Movies/I Feel Pretty (2018)/I Feel Pretty (2018).mkv.partial~
./Radarr_Movies/Peter Rabbit (2018)/Peter Rabbit (2018).mkv.partial~
./Radarr_Movies/Punisher War Zone (2008)/Punisher War Zone (2008).mkv.partial~
./Radarr_Movies/Super Troopers 2 (2018)/Super Troopers 2 (2018).mkv.partial~
./Radarr_Movies/The Punisher (2004)/The Punisher (2004).mkv.partial~
./Radarr_Movies/Thor The Dark World (2013)/Thor The Dark World (2013).mkv.partial~
./Radarr_Movies/Traffik (2018)/Traffik (2018).mkv.partial~
./Radarr_Movies/Welcome to Curiosity (2018)/Welcome to Curiosity (2018).mkv.partial~

You can see the file gets copied twice as the partial goes along with the final file as well:

Jul 13 05:14:08 gemini rclone[2384]: Radarr_Movies/Welcome to Curiosity (2018)/Welcome to Curiosity (2018).mkv.partial~: Copied (new)
Jul 13 05:22:11 gemini rclone[2384]: Radarr_Movies/Welcome to Curiosity (2018)/Welcome to Curiosity (2018).mkv: Copied (new)
Jul 13 05:33:34 gemini rclone[2384]: TV/Little Big Shots/Little.Big.Shots.S03E12.mkv.partial~: Copied (new)
Jul 13 06:05:32 gemini rclone[2384]: Radarr_Movies/Peter Rabbit (2018)/Peter Rabbit (2018).mkv.partial~: Copied (new)
Jul 13 06:38:29 gemini rclone[2384]: TV/Nashville/Nashville.2012.S06E14.mkv.partial~: Copied (new)
Jul 13 06:47:36 gemini rclone[2384]: TV/Marvel's.Cloak.&.Dagger/Marvels.Cloak.and.Dagger.S01E07.mkv.partial~: Copied (new)
Jul 13 06:49:10 gemini rclone[2384]: Radarr_Movies/Peter Rabbit (2018)/Peter Rabbit (2018).mkv: Copied (new)
Jul 13 07:08:18 gemini rclone[2384]: Radarr_Movies/Spinning Man (2018)/Spinning Man (2018).mkv.partial~: Copied (new)
Jul 13 07:26:51 gemini rclone[2384]: Radarr_Movies/Spinning Man (2018)/Spinning Man (2018).mkv: Copied (new)

Lots of mono locks along the way as well:

felix@gemini:/gmedia$ dmesg | grep mono
[33954.525329] INFO: task mono:4611 blocked for more than 120 seconds.
[33954.525407] mono            D    0  4611      1 0x00000004
[34075.356973] INFO: task mono:4611 blocked for more than 120 seconds.
[34075.357131] mono            D    0  4611      1 0x00000004
[34558.683166] INFO: task mono:4611 blocked for more than 120 seconds.
[34558.683233] mono            D    0  4611      1 0x00000004
[35404.503861] INFO: task mono:4611 blocked for more than 120 seconds.
[35404.503931] mono            D    0  4611      1 0x00000004
[36854.481064] INFO: task mono:4611 blocked for more than 120 seconds.
[36854.481153] mono            D    0  4611      1 0x00000004
[36975.312587] INFO: task mono:4611 blocked for more than 120 seconds.
[36975.312765] mono            D    0  4611      1 0x00000004
[54737.539331] INFO: task mono:4604 blocked for more than 120 seconds.
[54737.539393] mono            D    0  4604      1 0x00000004
[54858.370758] INFO: task mono:4604 blocked for more than 120 seconds.
[54858.370816] mono            D    0  4604      1 0x00000004
[54979.202237] INFO: task mono:4604 blocked for more than 120 seconds.
[54979.202396] mono            D    0  4604      1 0x00000004
[55100.033636] INFO: task mono:4604 blocked for more than 120 seconds.
[55100.033689] mono            D    0  4604      1 0x00000004

It’s easy enough to clean up with:

find . -name *partial* -delete

So from my take, it seems like it copies it locally to the cache-write area, which I can see complete.

2018/07/13 09:28:54 DEBUG : /test.partial: >Write: n=2558
2018/07/13 09:28:54 DEBUG : /test.partial: Flush: fh=0x0
2018/07/13 09:28:54 DEBUG : test.partial(0xc4204ec240): close:

After that it goes to move/write the file to the GD.

2018/07/13 09:28:56 DEBUG : s1otfrmsl9lriifem1hcggi2f8: Sending chunk 8388608 length 8388608
2018/07/13 09:28:56 DEBUG : s1otfrmsl9lriifem1hcggi2f8: Sending chunk 16777216 length 8388608
2018/07/13 09:29:02 DEBUG : /: >Statfs: stat={Bsize:4096 Frsize:4096 Blocks:1125899906842624 Bfree:1125888060224748 Bavail:1125899906842624 Files:1000000000 Ffree:1000000000 Favail:0 Fsid:0 Flag:0 Namemax:255}, errc=0
2018/07/13 09:29:03 DEBUG : s1otfrmsl9lriifem1hcggi2f8: Sending chunk 117440512 length 8388608
2018/07/13 09:29:03 DEBUG : s1otfrmsl9lriifem1hcggi2f8: Sending chunk 125829120 length 8388608

I tried to replicate the mv that Radarr/Sonarr does and I just lock on the mv as the file is still copying:

felix@gemini:/test$ mv test.partial test.mkv

Once the file gets finished sending to the remote, it processes and lets me rename the file:

2018/07/13 09:32:10 INFO  : test.partial: Copied (new)
2018/07/13 09:32:10 DEBUG : test.partial: transferred to remote
2018/07/13 09:32:10 DEBUG : test.partial(0xc4204ec240): >close: err=<nil>
2018/07/13 09:32:10 DEBUG : /test.partial: >Flush: errc=0
2018/07/13 09:32:10 DEBUG : /test.partial: Rename: newPath="/test.mkv"
2018/07/13 09:32:11 DEBUG : test.partial: Updating file with test.mkv 0xc4204448c0
2018/07/13 09:32:11 DEBUG : /test.partial: >Rename: errc=0
2018/07/13 09:32:11 DEBUG : /test.mkv: Release: fh=0x0
2018/07/13 09:32:11 DEBUG : /test.mkv: Getattr: fh=0xFFFFFFFFFFFFFFFF
2018/07/13 09:32:11 DEBUG : /test.mkv: >Getattr: errc=0
2018/07/13 09:32:11 DEBUG : test.partial(0xc4204ec240): RWFileHandle.Release nothing to do
2018/07/13 09:32:11 DEBUG : /test.mkv: >Release: errc=0

@B4dM4n - any thoughts ?


#24

Man, that’s strange. So I did another test and I can see the first file finish:

Jul 13 10:03:34 gemini rclone[2384]: Radarr_Movies/Ghost Rider Spirit of Vengeance (2011)/Ghost Rider Spirit of Vengeance (2011).mkv.partial~: Copied (new)

and I can see the next file that was being uploaded and reuploaded again with a new file name after it was complete as the rename finally goes:

'./vfs/gcrypt/Radarr_Movies/Ghost Rider Spirit of Vengeance (2011)':
total 13919538
drwx------ 2 felix felix          4 Jul 13 10:03  .
drwx------ 3 felix felix          3 Jul 13 09:53  ..
-rw------- 1 felix felix 7290114048 Jul 13 10:05 'Ghost Rider Spirit of Vengeance (2011).mkv'
-rw------- 1 felix felix 8971693530 Jul 13 09:55 'Ghost Rider Spirit of Vengeance (2011).mkv.partial~'

#25

OK so I did some more testing:

  1. Took Sonarr out of the testing and used the command line to keep things simple.

  2. OSX High Sierra using the backend cache and cache-tmp-upload-path:

sudo /usr/local/bin/rclone mount Media: ~/rclone_dataedia_mount_point --allow-other --dir-cache-time=160h --cache-total-chunk-size=10G --cache-chunk-size=10M --cache-info-age=168h --cache-workers=6 --attr-timeout=1s --cache-tmp-upload-path ~/rclone_data/rclone_upload --modify-window 1s --umask 002 --drive-use-trash=false --cache-chunk-path ~/rclone_data/cache_dir/cache-backend --cache-db-path ~/rclone_data/cache_dir/cache-backend --cache-dir ~/rclone_data/cache_dir --config ~/.config/rclone/rclone.conf --buffer-size=0M --volname Media -vv --log-file ~/rclone_data/testing_logs/rclone_testing.log --cache-tmp-wait-time 1h

Goggle Drive -> Cache -> Crypt

  1. Copy three files from my local drive into the mounted drive. These files also appear in the cache-tmp-upload-path (as expected). I can delete and rename those files on the mounted drive without issue.

  2. Umount and remount with “–cache-tmp-wait-time 1s” to force the upload to commence. In the rclone log I can see the chunks uploading. The mount’s icon does not actually appear on my desktop until the first file has finished uploading, but then it appears and I can open the mounted cached drive in Finder.

  3. The 2nd file starts uploading and whilst it is doing so I delete the 3rd file from the mounted drive. The Finder delete status popup just hangs for a few minutes before finally making the “trash” noise and indicating successful deletion. Here’s where it goes wrong:

  • In the Finder window, the deleted file 3 is still showing in the mounted drive
  • In cache-tmp-upload-path, the deleted file has indeed been deleted
  • Meanwhile file 2 is still uploading.
  1. File 2 finishes uploading and then rclone shuts down and drops the mount:
  • The mounted drive icon still shows on the desktop
  • The open Finder window still shows the contents of the mounted drive (including the deleted 3rd file)
  • The rclone service has stopped and is no longer in Activity Monitor
  • Files 1, 2, and 3 are all deleted from cache-tmp-upload-path, but their parent folders have been left behind (I am using the latest rclone that normally cleans these temporary folders up).
  1. The logs just look normal right up until the end and even then don’t seem to be giving out any clues:

2018/07/14 22:20:02 DEBUG : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0/6mslu1r2d9h1m4gjlkdim33k80: Sending chunk 2231369728 length 6773280
2018/07/14 22:20:04 INFO : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0/6mslu1r2d9h1m4gjlkdim33k80: Copied (new)
2018/07/14 22:20:05 INFO : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0/6mslu1r2d9h1m4gjlkdim33k80: Deleted
2018/07/14 22:20:05 INFO : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0/6mslu1r2d9h1m4gjlkdim33k80: background upload: uploaded entry
2018/07/14 22:20:05 DEBUG : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0: cache: expired 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0
2018/07/14 22:20:05 DEBUG : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0: cache: expired 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo
2018/07/14 22:20:05 DEBUG : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0: cache: expired 106n94pc3io9prlg2p3gbqqrkk
2018/07/14 22:20:05 DEBUG : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0: cache: expired
2018/07/14 22:20:05 DEBUG : tv/12 Monkeys/Season 4: forgetting directory cache
2018/07/14 22:20:05 INFO : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0/6mslu1r2d9h1m4gjlkdim33k80: finished background upload
2018/07/14 22:20:05 DEBUG : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0/hp16mm6nkdpupthlc1e5f71lkc: removing object
2018/07/14 22:20:05 DEBUG : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0: cache: expired 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0
2018/07/14 22:20:05 DEBUG : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0: cache: expired 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo
2018/07/14 22:20:05 DEBUG : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0: cache: expired 106n94pc3io9prlg2p3gbqqrkk
2018/07/14 22:20:05 DEBUG : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0: cache: expired
2018/07/14 22:20:05 DEBUG : tv/12 Monkeys/Season 2: forgetting directory cache
2018/07/14 22:20:05 DEBUG : tv/12 Monkeys/Season 3: forgetting directory cache
2018/07/14 22:20:05 DEBUG : tv/12 Monkeys/Season 4: forgetting directory cache
2018/07/14 22:20:05 DEBUG : tv/12 Monkeys/Season 1: forgetting directory cache
2018/07/14 22:20:05 DEBUG : tv/12 Monkeys: forgetting directory cache
2018/07/14 22:20:05 DEBUG : tv/12 Monkeys/Season 4/: >Remove: err=
2018/07/14 22:20:05 DEBUG : rclone: Version “v1.42” finishing with parameters ["/usr/local/bin/rclone" “mount” “Media:” “/Users/jivey/rclone_data/media_mount_point” “–allow-other” “–dir-cache-time=160h” “–cache-total-chunk-size=10G” “–cache-chunk-size=10M” “–cache-info-age=168h” “–cache-workers=6” “–attr-timeout=1s” “–cache-tmp-upload-path” “/Users/jivey/rclone_data/rclone_upload” “–modify-window” “1s” “–umask” “002” “–drive-use-trash=false” “–cache-chunk-path” “/Users/jivey/rclone_data/cache_dir/cache-backend” “–cache-db-path” “/Users/jivey/rclone_data/cache_dir/cache-backend” “–cache-dir” “/Users/jivey/rclone_data/cache_dir” “–config” “/Users/jivey/.config/rclone/rclone.conf” “–buffer-size=0M” “–volname” “Media” “-vv” “–log-file” “/Users/jivey/rclone_data/testing_logs/rclone_testing.log” “–cache-tmp-wait-time” “1s”]
2018/07/14 22:20:05 DEBUG : Cache remote GdriveCache:: Services stopped
2018/07/14 22:20:06 DEBUG : Cache remote GdriveCache:: Services stopped

I think file 3 wasn’t actually deleted until after file 2 had finished uploading (and this was long after the “trash” sound):

…/hp16mm6nkdpupthlc1e5f71lkc: removing object

I also checked in the system logs, nothing obvious there.

Any thoughts?


#26

I ran this test again with:

  • Copying/deleting just using the command line - no use of Finder
  • Added the xyz parameter to the mount.

Got some more info from this test:

  1. 2 files, both in cache-tmp-upload-path awaiting upload to commence
  2. Upload of File 1 commences, during upload issue rm command to delete File 2 from the mount.
  3. At the moment of issuing the rm command, the logs show:

2018/07/15 17:13:32 DEBUG : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0/6mslu1r2d9h1m4gjlkdim33k80: Sending chunk 1216348160 length 8388608
2018/07/15 17:13:33 DEBUG : fuse: <- Getattr [ID=0x5 Node=0x163 Uid=0 Gid=0 Pid=54155] 0x0 fl=0
2018/07/15 17:13:33 DEBUG : tv/12 Monkeys/Season 4/test3.nzb: Attr:
2018/07/15 17:13:33 DEBUG : tv/12 Monkeys/Season 4/test3.nzb: >Attr: a=valid=1s ino=0 size=552197 mode=-rw-rw-r–, err=
2018/07/15 17:13:33 DEBUG : fuse: -> [ID=0x5] Getattr valid=1s ino=14308011696149986330 size=552197 mode=-rw-rw-r–
2018/07/15 17:13:33 DEBUG : fuse: <- Access [ID=0x3 Node=0x163 Uid=0 Gid=0 Pid=54155] mask=0x0
2018/07/15 17:13:33 DEBUG : fuse: -> [ID=0x3] Access
2018/07/15 17:13:33 DEBUG : fuse: <- Access [ID=0x4 Node=0x163 Uid=0 Gid=0 Pid=54155] mask=0x2
2018/07/15 17:13:33 DEBUG : fuse: -> [ID=0x4] Access
2018/07/15 17:13:33 DEBUG : fuse: <- Access [ID=0x2 Node=0x163 Uid=0 Gid=0 Pid=54155] mask=0x0
2018/07/15 17:13:33 DEBUG : fuse: -> [ID=0x2] Access
2018/07/15 17:13:33 DEBUG : fuse: <- Remove [ID=0x5 Node=0x158 Uid=0 Gid=0 Pid=54155] “test3.nzb” dir=false
2018/07/15 17:13:33 DEBUG : tv/12 Monkeys/Season 4/: Remove: name=“test3.nzb”
2018/07/15 17:13:33 DEBUG : 106n94pc3io9prlg2p3gbqqrkk/329pc8jr21ppddv0faiqfcvfbo/3n5g7dl0avqum9rlbn2dcp94g0/6mslu1r2d9h1m4gjlkdim33k80: Sending chunk 1224736768 length 8388608

  1. Terminal command line then hangs for some time before finally coming back with:

rm: /Users/jivey/rclone_data/media_mount_point/tv/12 Monkeys/Season 4/test3.nzb: Socket is not connected

  1. The logs do not show this event until the first file has finished uploading some time later. Then they show:

2018/07/15 17:15:43 DEBUG : fuse: -> [ID=0x5] Remove
2018/07/15 17:15:43 DEBUG : fuse: kernel write error: error=“socket is not connected” stack=
goroutine 11193 [running]:
github.com/ncw/rclone/vendor/bazil.org/fuse.stack(0x1d8b8e0, 0xc42078b090)
/home/ncw/go/src/github.com/ncw/rclone/vendor/bazil.org/fuse/debug.go:9 +0x74
github.com/ncw/rclone/vendor/bazil.org/fuse.(*Conn).respond(0xc420340ea0, 0xc42078b080, 0x10, 0x10)
/home/ncw/go/src/github.com/ncw/rclone/vendor/bazil.org/fuse/fuse.go:1113 +0x96
github.com/ncw/rclone/vendor/bazil.org/fuse.(*Header).respond(0xc4247f0140, 0xc42078b080, 0x10, 0x10)
/home/ncw/go/src/github.com/ncw/rclone/vendor/bazil.org/fuse/fuse.go:313 +0x59
github.com/ncw/rclone/vendor/bazil.org/fuse.(*RemoveRequest).Respond(0xc4247f0140)
/home/ncw/go/src/github.com/ncw/rclone/vendor/bazil.org/fuse/fuse.go:2112 +0x6d
github.com/ncw/rclone/vendor/bazil.org/fuse/fs.(*Server).handleRequest(0xc42014a240, 0x1d94180, 0xc422eaa140, 0x1d88520, 0xc42000ca18, 0xc42477c2c0, 0x1d92640, 0xc4247f0140, 0xc423013ef8, 0x1, …)
/home/ncw/go/src/github.com/ncw/rclone/vendor/bazil.org/fuse/fs/serve.go:1029 +0x34c5
github.com/ncw/rclone/vendor/ba

  1. At this point rclone stops but the OS doesn’t realise it and still shows the mount.

I am no expert on how to read these logs, but is this saying that osxfuse is having some sort of permissions/access error?


#27

There don’t appear to be any rclone errors.

What appears to have happened is that OSXFUSE has given up the ghost and closed its socket. Rclone takes this to mean the filesystem has been unmounted and exits normally.

Can you find any logs as to why OSXFUSE is stopping? Probably kernel logs of some kind.


#28

I think that is a separate issue from the Sonarr/Radarr hangs with the unmounting.


#29

Perhaps but I suspect it is the same hang-if-delete/edit-whilst-uploading issue but compounded on osx because osxfuse has some timeout behaviour (daemon_timeout) after which the code needs to handle an interrupt (else socket gets dropped by design) that it doesn’t need to handle on Linux.

I am reviewing the kernel logs as requested to see if there is any support of that theory…after the World Cup final finishes!


#30

Sadly there were no hints in the kernel logs but with some further research I suspect there are two separate issues here:

1) osxfuse timeouts

When the kernel sends calls to a user-space file system, the latter must reply within some reasonably short time limit, otherwise those calls, and in turn the application that made those calls, will “hang”.

…default timeout 60 seconds…the volume will be automatically ejected (data in any open files may be lost)…in Leopard…There may be more fine-grained and more fault-tolerant behavior in future versions of osxfuse.

So I think the original issue of the modification-hanging-until-upload-finished is leading to this timeout in osxfuse when the uploading takes longer than daemon_timeout.

I am waaaaay out of my depth here, but I also think from this older post that there may be a way to manage this in the code (or is this a http://bazil.org/fuse thing?):

In the newly released osxfuse 3.2.0 the kernel extension sends an interrupt request to give the file system daemon a chance to handle the timeout. If the daemon does not respond to the interrupt request within daemon_timeout seconds, the file system will be marked dead. I think this is a good compromise. File systems that handle interrupts get a proper warning.

2) Original Issue: modification-hanging-until-upload-finished
I agree with @andy012345, it really does seem that an elongated mutex or file lock of some sort is causing this.

As soon as the queue starts uploading chunks to the remote, the mutex/file-lock/whatever seems to lock ALL files in the queue, rather than just the file currently sequentially uploading. I am guessing this is somewhere related to this area in the code:

I sooooo want to help beyond this but GO isn’t my language and I am just guessing a whole lot here.

@ncw - an aside - thx for all of the effort you put into this project and thx to all of the many forum and code contributors - rclone is turning into an incredible solution with utility waaaay beyond our various hobbies.


#31

I don’t think it’s cache related per se as the same thing happens without the cache as well.


#32

OK that makes sense… So does it take longer than 60s? It would be possible to set that option higher if necessary

From the above

I just tested this and it makes all well-behaved http://bazil.org/fuse -using filesystems do the right thing on slow requests. This means even aborting outgoing network requests and all. This is great, thank you very much!

(Well-behaved = respects Context cancellation: http://blog.golang.org/context)

Ah, rclone isn’t well behaved in this respect. It was written before context was invented and I haven’t threaded context through it.

This would still break something anyway.

I think the best solution for rclone would be to work out how to increase daemon_timeout.

Bazil/FUSE exposes a DaemonTimeout config option.

You’d jam it in here if anyone wants to have a go with it.

What is the simplest repro for this? Can it be done with standard unix commands?

Does it require the cache? @Animosity022 says not…


#33

Does the cache-tmp-upload and the vfs-cache-mode writes use the same process to move a file up? I produced the error repeatedly with larger files when the upload was going on.


#34

I have reproduced it already with standard Unix commands but have only tested with back-end cache.

When I get some time tonight I will test again with combinations of no backend/vfs cache and with/without tmp-upload when back-end cache is used.

For daemon_timeout, yes it would be great if this was configurable but we will need a GO developer to volunteer, that’s not me I am afraid.


#35

Great!

I made a test binary here with a --daemon-timeout flag. Let me know if it helps!

https://beta.rclone.org/branch/v1.42-052-g4928d2e3-fix-daemon-timeout/ (uploaded in 15-30 mins)


#36

I cross posted on the Sonarr reddit as I’m curious what the heck it is doing.

In my case, I got a double upload since it hung up on an upload and basically retried:

-rw-rw-r-- 1 felix felix 3820989249 Jul 18 13:52 Animal.Kingdom.2016.S03E01.mkv
-rw-rw-r-- 1 felix felix 3820989249 Jul 18 13:55 Animal.Kingdom.2016.S03E01.mkv.partial~

In my Sonarr logs, I can see it locked the DB and you get the mono ‘hung’ and you see this in the logs:

https://pastebin.com/1HFnLcW5

I’m feeling very strongly it’s tied back to the copy of the partial~ which leads to an immediate rename while the file is uploading.


#37

Wow! Testing the --daemon-timeout now!


#38

Is this test binary missing the --cache-total-chunk-size flag?

Fatal error: unknown flag: --cache-total-chunk-size


#39

OK daemon-timeout worked!

  1. Mount with --deamon-timeout 10m and --cache-tmp-upload-path
  2. cp large test file 1 to mount, cp small test file 2 to mount (both in upload queue)
  3. whilst test file 1 is uploading, rm test file 2.
  4. rm command hangs/waits until test file 1 has finished uploading, then the test file 2 deletion gets processed
  5. as long as test file 1 uploaded in less than 10m, rclone no longer exits.

So, the daemon_timeout problem on OSX is resolved but the original modification-hanging-until-upload-finished issue remains.

I still owe this thread some testing on that - stay tuned.


#40

You may have luck going on their IRC channel on freenode, #sonarr, and talking to one of the mods there. Sometimes they’re around on there and respond faster there than their forums or reddit.