I have quite a particular case, causing high memory usage on RClone (+1.7GB RAM) using mount with Google Drive remote
The usecase is the following one:
A program can get a file to download and send it to the client, or receive it from client.
If the client needs the file, the path is known and the file is fetched from mountpoint and sent to the user.
If user makes an upload, a temporary file is created, preallocated, filled with the data and then moved to it's final place - everything on the mountpoint.
Files are about 2-4 MiB in size
Mounted unit currently has ~600 GiB of files of that kind.
The issues I'm having are:
Slow path access due to large folders that need to be listed from the remote - already fixed using --fast-list and doing a periodic vfs/refresh remote command, specially after the unit has been mounted.
Slow IO on writing - probably because of locks while doing the prealloc - filling - move operation. I have solved this doing a local cache using mergerfs and an every-minute rclone move.
At the beggining I was pointing the server directly to RClone unit, but that caused slowness on uploads and lot's of IO wait processes.
To speed up this part, I'm doing the rclone move to the mountpoint, so I can make use of directory cache and speed up the move a lot (from 200KiB/s moving directly to the remote vs 2MiB/s when moving through FUSE mount)
Excesive memory usage - after mounting the unit and refresh the duirectory cache, the RClone processes take like 400 MiB of RAM. But after the file dancing starts, it easy hits 1.8 GiB of memory. Also, stoping all related programs and processes - leaving the mountpoint with no traffic - doesn´t help, as the memory is not returned.
Can I tweak the mount command to avoid that excessive memory usage?
$ find . -type d | wc -l
10408
$ find . -type f | wc -l
372690
However, if I start the mount and force a vfs/refresh rc command to populate the cache, it takes like 800MiB. It's when it starts handling file uploads when it grows a lot.
Right now, with the service stopped and ~5 minutes of inactivity:
Update: just restarted the mount and refreshed the directory cache:
And after starting moving files, it just went to 1.4 GiB again
To make use of the directory cache already in memory of the mountpoint. There are lots of folders and files, so it increases a los the throughput when moving files.
The command used for moving is:
if mountpoint -q /mnt/remote
then
rclone move "/mnt/local/" "/mnt/remote/" --exclude storage/temp/** --multi-thread-streams=5 --log-level=INFO
echo "===================="
echo "FINISHED SYNC CYCLE"
echo "==================="
fi
File reads are done in the local cache (if the file is there) or in the remote
File deletions are made where the file is located
The temporary files (the place where the file is preallocated and filled) never hits the remote
Every 2 minutes new files in local cache are transferred to the remote, making use of already filled directory cache to speed up the transfers (lots of small files)
I know it's a very specific use case, but I cannot get why the memory grows so badly (and it's not released) when files start to be uploaded
So the x2 increase in memory could be related to making the move directly to the mountpoint?
Because after making the vfs/refresh (recursive=true) the memory usage is lower than when we started moving files.
Also, the memory doesn´t decrease once the movement is over. is that the expedcted behaviour? Is there any other cache not related to vfs directory cache?
Hmmmm... chewcking the issue I got to the ofrum thread associated, and I will try to reduce the directoy cache time, as it looks like it may be itneresting to find a middle point between performance and memory usage.
Regarding the throughput of moving through the mountpoint and doing it directly to the remote with --fast-list, in my particular case there are huge differences.
At the moment of writing I have just tested moving the same amount of files in both ways:
Moving through the mountpoint: files transferred in less than a minute, at ~2MiB/s
Moving them directly to the remote: more than 7 minutes just listing the directories, not even one file copied yet
It could work making a larger local cache and syncing every more time (every hour or so), because the start time of the move without making use of already cached dir entries take too much time to do quick updates as before (I was doing syncs every 2 minutes) .
Will investigate the dir cache expire time and will post results!
You aren't writing to the remote though, you are just writing locally as you are using cache mode writes so you can't compare. It still has to upload them to the remote.