Rclone to cause high IOWAIT on ERROR

Situation:
I use rclone v1.40-031-ge42cee5eβ with plex to perform plex scanning ( technically Plex media analysis) of a huge library of TV shows. It starts normally fast, but after 2 days running the scanning, it always starts generating errors with below pattern. My Library is huge so it is normal for Plex Media analysis to take few days ( as I just delete the Plex database so it is just the same as Plex initial scanning and media analysis)

I monitor the server with netdata dashboard, the symptom is always like this:
Plex media analysis looks stalled with no Plex error log.
Server IOWAIT spike to 80-90% all the time, but there is no significant amount of new data being downloaded or removed by RCLONE. IOWAIT is definitely caused by rclone threads.
On syslog, I can see errors below as an example, despite all files is accessible without permission error at all.
Checking on Google dashboard I can confirm the errors.
This is recurring pattern for scanning/media analysis that after few days let it running, it just stalled with errors like this. The only workaround now is to restart RCLONE cache mount service.
After restarting rclone-cache-mount service, IOWAIT will suddenly back to normal. same with GDrive console error as you can see in the screenshot. Plex scanning/media analysis would resume normally.
However after few hours or few days depending on my luck, the same thing happens again.
Enabling or disabling plex integration doesn’t change anything.
I can verify the same problem since 1.40 stable.

my rclone setup: GDrive > GCache > rclone

my rclone service:
ExecStart=/usr/sbin/rclone mount
–cache-workers 4
–cache-db-purge
–transfers 4
–cache-chunk-no-memory --config /home/aaa/.config/rclone/rclone.conf
–allow-other
–cache-writes
GDCache: /mnt/Plexdrive

EDIT: I don’t know if this matter, this problem only happens when scanning folders owned by other people with read-only access to the folders and files. Upon receiving the Shared Folders I did “Add to My Drive”.
This problem never happens when scanning files I own in GDrive.

GDrive console:

ERROR example:
Apr 3 16:07:31 jupiter rclone[14680]: 2018/04/03 16:07:31 ERROR : worker-3 <Honjitsu wa, Ohigara mo Yoku ep01 (848x480 x264).mp4>: object open failed 62914560: bad response: 403: 403 Forbidden
Apr 3 16:07:31 jupiter rclone[14680]: 2018/04/03 16:07:31 ERROR : worker-2 <170128 Tofu Pro Wrestling Ep02.mp4>: object open failed 62914560: bad response: 403: 403 Forbidden
Apr 3 16:07:31 jupiter rclone[14680]: 2018/04/03 16:07:31 ERROR : worker-2 <Hello Harinezumi EP04 720p HDTV x264 AAC-DoA.mkv>: object open failed 41943040: bad response: 403: 403 Forbidden
Apr 3 16:07:32 jupiter rclone[14680]: 2018/04/03 16:07:32 ERROR : worker-3 <Shikaku Tantei Higurashi Tabito EP01 720P HDTV X264 AAC.mkv>: object open failed 62914560: bad response: 403: 403 Forbidden
Apr 3 16:07:33 jupiter rclone[14680]: 2018/04/03 16:07:33 ERROR : worker-3 <Tokyo Tarareba Musume ep10 finale (848x480 x264).mp4>: object open failed 62914560: bad response: 403: 403 Forbidden
Apr 3 16:07:33 jupiter rclone[14680]: 2018/04/03 16:07:33 ERROR : worker-1 <Rental no Koi ep03 (848x480 x264).mp4>: object open failed 0: bad response: 403: 403 Forbidden
Apr 3 16:07:33 jupiter rclone[14680]: 2018/04/03 16:07:33 ERROR : worker-3 <Honjitsu wa, Ohigara mo Yoku ep01 (848x480 x264).mp4>: object open failed 62914560: bad response: 403: 403 Forbidden
Apr 3 16:07:33 jupiter rclone[14680]: 2018/04/03 16:07:33 ERROR : worker-2 <170128 Tofu Pro Wrestling Ep02.mp4>: object open failed 0: bad response: 403: 403 Forbidden
Apr 3 16:07:33 jupiter rclone[14680]: 2018/04/03 16:07:33 ERROR : worker-1 <quartet EP03 720p HDTV BAGIKUY.mp4>: object open failed 0: bad response: 403: 403 Forbidden
Apr 3 16:07:33 jupiter rclone[14680]: 2018/04/03 16:07:33 ERROR : worker-1 <Honjitsu wa, Ohigara mo Yoku ep02 (848x480 x264).mp4>: object open failed 62914560: bad response: 403: 403 Forbidden
Apr 3 16:07:33 jupiter rclone[14680]: 2018/04/03 16:07:33 ERROR : worker-3 <Okaasan, Musume wo Yamete Ii desu ka ep02 (848x480 x264).mp4>: object open failed 62914560: bad response: 403: 403 Forbidden
Apr 3 16:07:33 jupiter rclone[14680]: 2018/04/03 16:07:33 ERROR : worker-0 <Okaasan, Musume wo Yamete Ii desu ka ep03 (848x480 x264).mp4>: object open failed 20971520: bad response: 403: 403 Forbidden
Apr 3 16:07:33 jupiter rclone[14680]: 2018/04/03 16:07:33 ERROR : worker-2 <Shikaku Tantei Higurashi Tabito EP01 720P HDTV X264 AAC.mkv>: object open failed 41943040: bad response: 403: 403 Forbidden
Apr 3 16:07:33 jupiter rclone[14680]: 2018/04/03 16:07:33 ERROR : worker-0 <Okaasan, Musume wo Yamete Ii desu ka ep03 (848x480 x264).mp4>: object open failed 20971520: bad response: 403: 403 Forbidden
Apr 3 16:07:34 jupiter rclone[14680]: 2018/04/03 16:07:34 ERROR : worker-1 <[Bagikuy] Ngebet Kawin VS Mager Kawin ep01 (Ciee Ketemuan) 480p JdramaCity.mp4>: object open failed 0: bad response: 403: 403 Forbidden
Apr 3 16:07:34 jupiter rclone[14680]: 2018/04/03 16:07:34 ERROR : worker-0 <Okaasan, Musume wo Yamete Ii desu ka ep01 (848x480 x264).mp4>: object open failed 0: bad response: 403: 403 Forbidden
Apr 3 16:07:34 jupiter rclone[14680]: 2018/04/03 16:07:34 ERROR : worker-3 <Honjitsu wa, Ohigara mo Yoku ep02 (848x480 x264).mp4>: object open failed 62914560: bad response: 403: 403 Forbidden
Apr 3 16:07:34 jupiter rclone[14680]: 2018/04/03 16:07:34 ERROR : worker-3 <Seirei no Moribito Season 3 Ep02 (848x480 x264).mp4>: object open failed 20971520: bad response: 403: 403 Forbidden
Apr 3 16:07:34 jupiter rclone[14680]: 2018/04/03 16:07:34 ERROR : worker-0 <Tokyo Tarareba Musume ep10 finale (848x480 x264).mp4>: object open failed 0: bad response: 403: 403 Forbidden
Apr 3 16:07:34 jupiter rclone[14680]: 2018/04/03 16:07:34 ERROR : worker-0 <Hello Harinezumi EP04 720p HDTV x264 AAC-DoA.mkv>: object open failed 0: bad response: 403: 403 Forbidden
Apr 3 16:07:35 jupiter rclone[14680]: 2018/04/03 16:07:35 ERROR : worker-1 <Rental no Koi ep10 finale (848x480 x264).mp4>: object open failed 20971520: bad response: 403: 403 Forbidden
Apr 3 16:07:36 jupiter rclone[14680]: 2018/04/03 16:07:36 ERROR : worker-2 <Okaasan, Musume wo Yamete Ii desu ka ep02 (848x480 x264).mp4>: object open failed 41943040: bad response: 403: 403 Forbidden
Apr 3 16:07:36 jupiter rclone[14680]: 2018/04/03 16:07:36 ERROR : worker-2 <Shikaku Tantei Higurashi Tabito EP01 720P HDTV X264 AAC.mkv>: object open failed 41943040: bad response: 403: 403 Forbidden
Apr 3 16:07:36 jupiter rclone[14680]: 2018/04/03 16:07:36 ERROR : worker-1 <Honjitsu wa, Ohigara mo Yoku ep01 (848x480 x264).mp4>: object open failed 20971520: bad response: 403: 403 Forbidden
Apr 3 16:07:36 jupiter rclone[14680]: 2018/04/03 16:07:36 ERROR : worker-0 <Okaasan, Musume wo Yamete Ii desu ka ep02 (848x480 x264).mp4>: object open failed 20971520: bad response: 403: 403 Forbidden
Apr 3 16:07:36 jupiter rclone[14680]: 2018/04/03 16:07:36 ERROR : worker-3 <Okaasan, Musume wo Yamete Ii desu ka ep02 (848x480 x264).mp4>: object open failed 62914560: bad response: 403: 403 Forbidden
Apr 3 16:07:36 jupiter rclone[14680]: 2018/04/03 16:07:36 ERROR : worker-1 <Okaasan, Musume wo Yamete Ii desu ka ep02 (848x480 x264).mp4>: object open failed 0: bad response: 403: 403 Forbidden
Apr 3 16:07:36 jupiter rclone[14680]: 2018/04/03 16:07:36 ERROR : worker-0 <Honjitsu wa, Ohigara mo Yoku ep01 (848x480 x264).mp4>: object open failed 0: bad response: 403: 403 Forbidden
Apr 3 16:07:37 jupiter rclone[14680]: 2018/04/03 16:07:37 ERROR : worker-1 <quartet EP03 720p HDTV BAGIKUY.mp4>: object open failed 20971520: bad response: 403: 403 Forbidden
Apr 3 16:07:37 jupiter rclone[14680]: 2018/04/03 16:07:37 ERROR : worker-3 <Seirei no Moribito Season 3 Ep05 (848x480 x264).mp4>: object open failed 62914560: bad response: 403: 403 Forbidden
Apr 3 16:07:37 jupiter rclone[14680]: 2018/04/03 16:07:37 ERROR : worker-0 <Maji de Koukaishitemasu EP02 720p HDTV x264 AAC-DoA.mkv>: object open failed 20971520: bad response: 403: 403 Forbidden
Apr 3 16:07:37 jupiter rclone[14680]: 2018/04/03 16:07:37 ERROR : worker-0 <Super Salaryman Saenai-shi ep10 finale (848x480 x264).mp4>: object open failed 20971520: bad response: 403: 403 Forbidden
Apr 3 16:07:37 jupiter rclone[14680]: 2018/04/03 16:07:37 ERROR : J-Drama/Totsuzen desu ga, Ashita Kekkon Shimasu/[Bagikuy] Ngebet Kawin VS Mager Kawin ep01 (Ciee Ketemuan) 480p JdramaCity.mp4: (0/570341976) error (chunk not found 0) response

Plex is a bit nasty in how it opens files for scanning.

Are you using the plex integration? It seems like your 403s are way up since plex tends to open up multiple files at a time and if your cache-workers are 4 per file, you are going to hit a lot of 403s as you are seeing.

I’d make sure plex integration is configured so the scans only use 1 worker per file or you can tweak it down to even use 1 worker while it scans if you aren’t using the integration.

I rescanned a good chunk of my 40TB library in a few days with very limited 403s as I had to tweak things down a bit.

My scan is the big chunk a few days back and you can see my rate is just a bit over 2.5-3 at most.

Can you share your command option for rclone mount? and How much RAM do you use for typically?
I will try to re-enable again Plex integration, but last time I did it doesn’t change the result.

Note that this problem only happens when scanning or performing media analysis to files that I do not own in my Gdrive. Basically it is a bunch of shared folders that I add to My Drive with Read-Only access.
The biggest pain for me is that once I get a lot of 403 ERRORs, my server Iowait jumps and stay around 80-90% and both rclone and Plex seems to be idle other than spitting out 403 Errors.

Rclone can serve Plex scanning/media analysis for files that I own pretty fast, few TB of scans in under 1 hour.

If you are using a shared account, someone is generating a lot of rate limiting errors (403s), which is going to cause a timeout for file reads in rclone. You can lower the timeout, but that won’t fix the issue as you need to figure out how to get folks to all place nice together.

the problem is, it is not someone else or other program that generates 403 error, but it is a combination of Plex and rclone, and how they interact with GDrive API.

Here is what I find out:

  • Using Google Drive File Stream + Plex without rclone on Windows, the Plex scans and analysis works perfect, very fast, no error at all. Using Plexdrive + Plex on linux result the same but at much slower Plex media scanning and analysis.
    Using rclone + Plex on Linux, I get 403 errors above, with the speed of scan faster than PlexDrive.

  • I get 403 errors even when I am the only one accessing shared files for days. There was no other user accessing the shared files directly or via rclone.

  • Much older version of rclone never gets this 403 errors. I am long time user of rclone before they have rclone-cache feature. The shared folders in GDrive worked perfectly fine with PLEX+very old rclone that doesn’t support cache.

Hence… it is my suspicion that the blame is partly in the way rclone do file access, file cache/index, file read timeout, connection/thread limitation etc.

How can I change file read timeout as you suggest? This could be a temporary solution for people who access Shared Folders.

Enabling Plex integration still produce 403 errors and very high IOWAIT that ultimately stop Plex scanning from working.
On my original screenshot, the spike of 403 errors on the end part of the grapsh is like that because I tested by having Plex media analysis and at the same time manually choosing TV Shows for refresh-scanning/media analysis.

Regardless of the amount of 403 errors, it is alarming that this cause a very high IOWAIT and everything just seems to stop working until I restart rclone service.

EDIT: After some test, I suspect there is undocumented GDrive limit that limits the number of threads/ API requests for Shared Folders with read-only access.
I tested with 2 folders, one is Shared Folder with read-and write access, files and folders are owned by different user.
The other folder is Shared Folder read-only access owned by different user.
Plex scans ( with rclone cache) on these 2 folders shows that the one with read-only access is significantly slower to scan, to perform media analysis etc. But playback of the files have the same good result.

Normal disk IO happens usually in milliseconds so when you start to get high number of 403s and the retry cycle happens, you are going to get IOWAITs as the OS is waiting for the IO returns. If that starts to get too much, you are going to see exactly what you are seeing.

If you aren’t using or the plex integration isn’t working properly, you are going to spawn off all the cache workers and each time plex scans, it’s going to open up many files at a time.

the --transfers doesn’t do anything with mount so that can be removed.
if you purge your cache-db each time, it’s going to rebuild after each restart so you have to wait for your cache to build.

If all your Plex files are analyzed, the scan should go very fast. I purged my cache and restarted to validate as I have ~40TB in my GD and a new scan with all my files analyzed only took ~10 minutes to rebuild the cache.

I have no 403s and my peak only only tops at 5.

My mount:

/usr/bin/rclone mount gmedia: /gmedia \
   --allow-other \
   --dir-cache-time=160h \
   --cache-chunk-size=10M \
   --cache-info-age=168h \
   --cache-workers=5 \
   --cache-tmp-upload-path /data/rclone_upload \
   --cache-tmp-wait-time 60m \
   --buffer-size 0M \
   --attr-timeout=1s \
   --syslog \
   --umask 002 \
   --rc \
   --log-level INFO

If you are on Unix, you can use lfos /mountpoint to see what’s going on with the mount point during the scan and how many open files you have or you can turn up the logging in rclone or plex to see more information. You can see by your errors, you are trying to download quite a lot of files as they are all files.get errors.

Without seeing your plex library logs and some settings there, it’s hard to understand why it’s trying to download every file other than I’d guess it is trying to analyze the files the first time and your settings are allowing quite a number of files to be opened at a time. “lsof” during the scan would confirm this as well as the debug logs on rclone.

Why no memory for the chunks too? That would just slow it down as well.