S3 + crypt settings to play nicely with lots of scanning (syncthing)

What is the problem you are having with rclone?

Scanning speed on a mounted drive (crypt + s3) through syncthing is insanely slow. My few olders with maybe 30-60gb in total take literal days until syncthing is able to finish scanning, so I want to optimize that a bit.

Syncthing has to scan all files on startup, so it's a lot of opening of files, traversing directories and repeating. A very bad setup for a remotely mounted block storage like s3.

I am trying to tweak the syncthing side of course as well, but because I'm using rclone to mount the drive, I wanted to see what I can do to help syncthing a bit out.

Run the command 'rclone version' and share the full output of the command.

1.59.2

Which cloud storage system are you using? (eg Google Drive)

Crypt + s3

The rclone config contents with secrets removed.

        [xxx]
        type = s3
        provider = Wasabi
        access_key_id = xxx
        secret_access_key = xxx
        endpoint = s3.ap-southeast-1.wasabisys.com
        acl = private
        bucket_acl = private

        [xxx]
        type = crypt
        remote = xxx/rclone-crypt
        password = xxx

There is a similar thread her: Are there optimal Rclone settings to speed up the scanning of a drive with multiple small files?, and the config adds these parameters that I wanted to try out:

   --allow-other \
   --buffer-size 256M \
   --dir-cache-time 72h \
   --drive-chunk-size 32M \
   --log-level INFO \
   --log-file /home/craftyclown/logs/rclone.log \
   --umask 002 \
   --vfs-read-chunk-size 128M \
   --vfs-read-chunk-size-limit off \
   --rc

That's for Google Drive.

Any reason you changed this from the default?

S3 is a non polling remote I believe so that means any changes outside the mount won't be picked up until after 72 hours expires.

Can you start with whatever your default mount is and run through it and share a log as that might shed some light. Also, what numbers are you talking about as lots can mean very many things.

I haven't actually used those settings yet, just found them in the linked post as potential things to try. Currently I don't use any additional settings and just have it as default

S3 is a non polling remote I believe so that means any changes outside the mount won't be picked up until after 72 hours expires.

Ah I see, I didn't know that mattered in rclone. Thanks for clarifying!

Can you start with whatever your default mount is and run through it and share a log as that might shed some light. Also, what numbers are you talking about as lots can mean very many things.

Sure I can try that. Currently I have a few synced folders through syncthing with about ~60gb total, not much, but scanning the entire thing takes anywhere between 4~12 hours. Syncthing scanning is already not the fastest, and I guess the combination with rclone to remotely mount s3 is a very bad match

Are there other syncthing users here? Would be curious how you setup rclone

hi,
what is the total amount of data that needs to be synced?

The amount of data isn't so big, just those 30-60 gb I mentioned, but I am using syncthing ontop of rclone. It's a p2p syncing solution, so all machines in the syncthing network can connect with each other and share the files to their peers.

I'm using rclone on the serverside, to have syncthing put a copy of those files into s3 (wasabi), so syncthing is set to use the mounted rclone s3 volume as it's folder backend. That means on restart, syncthing needs to rescan all the files on that storage to figure out if there has been changes to the blocks, and validate the store. That's the part that's uber slow currently.

I asked over at the syncthing forum as well, and maybe mounted block storage is just not a good fit for syncthing. I have 'tweaking rclone' on my list of todos for the weekend to see if I can help speed things up a bit

Hi,

I suggest you first try/test with a small (subfolder) to get the basic settings right, and then slowly scale up from there.

One thing you may need to add is --vfs-cache-mode=writes (or full) according to this post:
https://forum.syncthing.net/t/issues-with-permissions-using-with-rclone-mount/15338

1 Like

in case, --vfs-cache-mode does not help, some suggestions

--- can two syncthings connect to each other?
that each copy of syncthing scans its own local files into its local database and have syncthing compare the two databases?

--- connect local syncthing to a hetzner storage box, as it offers smb/samba access, sort of a network drive.
and not use rclone at all.

note: wasabi has a 90/180 days rentention policy,
so if you upload a 1GiB file today, delete it tomorrow, wasabi with pro-rate charge you for full 90/180 days.
those costs could add up.

fwiw, instead of syncthing, could use something like rclone sync --backup-dir

rclone sync /path/to/source remote:full --backup-dir=remote:incrementals/`date +%Y%m%d.%I%M%S` -vv

Thanks for the thoughts!

I looked into the config parameters of the CSI provider I'm using for rclone, and it sadly already has vfs-cache-mode set to write by default - csi-rclone/nodeserver.go at ccf9a8af2a19487cb9caaa36aa6efa56976b1656 · wunderio/csi-rclone · GitHub

I'm using digitalocean and wasabi as cloud storage, which are supposedly directly connected for low latency, so it shouldn't be too different to something like hetzner connected to a smb drive. Although of course being in the same data center probably has better performance... it's worth giving a shot

Sadly most block storage is very expensive, so outside of wasabi there isn't much that has high storage options for relatively competitive pricing

so you are running syncthing inside a vm at digital ocean, trying to sync with files in wasabi?

in you case, i would try --vfs-cache-mode=full, as the intial scan is reading all the data in the file in wasabi.

as for wasabi, which i am a heavy user of, in my testing, nothing has lower latency of api calls.
i would not consider wasabi to be block storage, it is object storage.

i run the cheapest vm from hetzner connected hetzner storagebox, inside the same data center.
should check out the pricing for the storagebox.

Just to update - I've did some changes that made it a bit faster, mainly moving to a different hosting provider. Connection from digitalocean to wasabi, even in the same country (singapore), wasn't that great. Getting much better performance now so latency was probably an issues as well

More importantly I've stopped using wasabi through rclone as the main storage volume directly. The match with syncthing which relies heavily on having a relatively fast file lookup on disk wasn't good. Maybe some other solution like resilio would have been better? Not sure

Instead, while my storage usage allows it, I ended up using a provider that has HDD storage available that's cheaper than SSD storage. Of course not as cheap as wasabi but oh well, at least for now the issues are gone. I'll revisit when it becomes too expensive, but for now I'll sync a backup of my data with rclone into wasabi once a week.

I still have wasabi mounted through rclone for smaller folders that don't have something like syncthing scanning those (like webdav), and I'm happy with that currently.

The Hetzner stuff looks very nice as well, but I live far away from the datacenter locations so latency and speed isn't that great :innocent:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.