Looking for a resource-saving and stable config to run rclone on a DiskStation for Hyper Backup to OneDrive

What is the problem you are having with rclone?

I recently started my journey with rclone and I'm still looking for an resource-saving but still stable configuration for the following scenario:

  • rclone is running on a Synology DiskStation (the model is some years old and has only 1GB RAM)
  • I want to backup data using Hyper Backup
  • the backup destination is OneDrive, which is not directly supported by Hyper Backup
  • so I'm using the webdav capabilities of rclone to make it work

Initially I started with VFS File Caching (--vfs-cache-mode writes). But Rclone consumed a lot of memory which lead to heavy swapping on the DiskStation and thus an I/O load that slowed down the NAS significantly. I could not find any set of options with enabled VFS File Caching that limited the memory consumption or number of rclone threads to an acceptable level.
I tried:

  • reducing --transfers
  • reducing --checkers
  • reducing --vfs-cache-max-size
  • reducing --onedrive-chunk-size
  • reducing --tpslimit
  • setting --buffer-size 0
  • --use-mmap

In the end I found that rclone's resource consumtion was very acceptable with VFS File Caching disabled. So currently I'm running rclone with the options shown in the corresponding section below and it works fine most of the time.

System load is now absolutely fine but it is not yet as stable as required.

These are the stability issues I face:

SHA-1 hash differs
2021/05/03 11:54:43 ERROR : Backup/Data.hbk/Pool/0/0/63.bucket.2: corrupted on transfer: SHA-1 hash differ "8adcfa29ec45e30a04cf6ce818cde821a39acf86" vs "79d8ead431eb92aed2b40bd6d8bea9800de4808b"
2021/05/03 11:54:44 ERROR : Backup/Data.hbk/Pool/0/0/63.bucket.2: WriteFileHandle.New Rcat failed: corrupted on transfer: SHA-1 hash differ "8adcfa29ec45e30a04cf6ce818cde821a39acf86" vs "79d8ead431eb92aed2b40bd6d8bea9800de4808b"

Here I'm aware of this information:
Unexpected file size/hash differences on Sharepoint

But Hyper Backup chunks and compressed the data anyway. So there are no "office files" transferred to OneDrive that could be automatically opened and altered by OneDrive. (See the affected file from the log: 1521.index.2)
So I'm hesitating to use the options --ignore-checksum and --ignore-size as I believe these checks protect me from corrupt data. If not, meaning disabling the checks would not affect the data integrity in a bad way, please let me know.

InvalidAuthenticationToken: CompactToken validation failed
2021/05/04 02:20:31 ERROR : Backup/Data.hbk/Pool/0/0/1521.index.2: Failed to copy: InvalidAuthenticationToken: CompactToken validation failed with reason code: 80049228.
2021/05/04 02:20:31 ERROR : Backup/Data.hbk/Pool/0/0/1521.index.2: WriteFileHandle.New Rcat failed: InvalidAuthenticationToken: CompactToken validation failed with reason code: 80049228.

Looks like the token is refreshed every hour. So why does it not bother the running backup most of the time while sporadically come up with InvalidAuthenticationToken: CompactToken validation failed with reason code: 80049228?

Both types of errors make the backups fail. Especially for the initial backups which take several days this is very disturbing.

So I would appreciate any help to either

  • run rclone with VFS File Caching enabled but limiting the memory consumption
  • keep running rclone with VFS File Caching disabled but get the "stability issues" fixed.

What is your rclone version (output from rclone version)

rclone v1.56.0-beta.5475.2833941da

  • os/version: unknown
  • os/kernel: 3.2.40 (i686)
  • os/type: linux
  • os/arch: 386
  • go/version: go1.16.3
  • go/linking: static
  • go/tags: none

(I did not pay attention to the errors described above with rclone v1.55.0 (the first version I used) since "Unable to initialize RPS" was the main issue at this time.)

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Synology DSM 6.2 (linux), 32 bit

Which cloud storage system are you using? (eg Google Drive)

OneDrive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone serve webdav onedrive_data: --addr 127.0.0.1:12345 --use-mmap

The rclone config contents with secrets removed.

[onedrive_data]
type = onedrive
region = global
token = {"access_token":"[secret]","expiry":"2021-05-04T08:03:16.190094121+02:00"}
drive_id = [secret]
drive_type = personal
client_id = [secret]
client_secret = [secret]

A log from the command with the -vv flag

Currently I don't have detailed logs available, since the errors mostly happen after many hours and --vv option is not used to keep system load low. I will enable --vv after the current backup jobs have been finished or interrupted.

Paste log here

How much memory did rclone use? Was the VFS cache being stored on a RAM disk maybe?

These probably mean the file changed during the upload... If so this would be something that the --vfs-cache-mode writes would fix. I'm not 100% sure though.

Best not to use those options if you can avoid it!

These are probably the token expiring, as you suggested. If you aren't using --vfs-cache-mode writes rclone can't retry the upload as it doesn't have a cache of the data it has already sent :frowning:

If you were just using rclone copy rclone would retry the upload at this point.

I think this would be the best solution for you.

How much memory is rclone using?

Is the VFS cache stored on disk, not in RAM?

You can add the --rc parameter and then use these debug techniques to figure out what is using the RAM?

You might have found a bug if rclone memory keeps going up - the memory use should be fairly modest.

Thanks a lot, I will try to analyze the RAM consumption but it will take some time.

Is rclone "closing" the webdav port if the VFS if full or data can't be uploaded fast enough to the cloud?

This night I gave it again a try with VFS enabled. But the backup failed because the destination (rclone webdav server) was not reachable anymore after some hours.
I checked netstat and although rclone was running the port was not opened.
I killed rclone and restarted it. No webdav port was opened. I just saw many queuing for upload in 5s entries in the log. Only after about 5 minutes the log showed WebDav Server started

It shouldn't do, but see below...

I think you've probably got a backlog of things which need uploading - this causes rclone to start slowly.

This is a known issue

Meanwhile I have some new insights. Looks like the RAM consumption is indeed not that bad. Still with VFS enabled the consumption seems to be higher (~110MB) than without VFS (~70MB). Top just gave me the wrong impression that I had to sum up all single memory values for all the rclone processes.

My current feeling ist that the VFS adds I/O activity which is not that good for the fairly weak CPU in my NAS. Looking at the CPU consumption the biggest problem is "I/O waiting time".

I assume with VFS it is:
reading orignal -> [send to rclone webdav server] -> write to VFS -> read from VFS -> [send to OneDrive]

Without VFS I could imagine less I/O since I can spare some write and read activities. Still as far as I have seen there seems to be still some buffering on hard disk (spool).

At the moment I'm playing with --onedrive-chunk-size.
Synology Hybrid Drive anyhow creates 50MB chunks by itself.
So I tried with setting --onedrive-chunk-size hoping that with that no buffering on disk would be required. But then I remembered that note: "Note that the chunks will be buffered into memory." Means with the default 4 transfers 200MB would be buffered into memory which would lead to heavy swapping activities.

Right now I'm using a --onedrive-chunk-size of 5MB which works quite well. Hoping that RAM consumption ist reduced.

That sounds ok.

It does, it it storing stuff on disk and reading it off disk.