Windows Rclone Script Recommendations?

Hi All,

I have a 25TB dataset which is made up of hundreds of thousands of files where about 20% of the files by count make up 80% of the total disk usage. I’ve setup an encrypted Google Drive remote which works really well on the large files (completely saturating my 40Mbps uplink using default settings), yet the small files don’t seem to go anywhere close. I’ve tried a number of different setting for the options below, no combination seems to go close to saturating my link so far;

rclone.exe copy X:\Users\ GoogleCrypt:/ -P --buffer-size 8M --transfers 10 --drive-chunk-size 4M

Wondering if I am missing something or should be taking a different approach. Any suggestions?

I am currently using a Windows 2012 R2 virtual machine with 4GB ram (16GB available on host). Rather than reinvent the wheel, just wondering if there were any well regarded community developed scripts that can be scheduled to backup & verify?

You can only create 3 files per second with Google making small files just bad.

Oh…bummer. Was trying to get my head into all the rclone parameters and was way off the mark. Was wondering why uploads seemed to shot off like a rocket and then stall. The 3 files per second limit would explain it.

I guess about the only thing you could do would be to TAR / ZIP files up into batches prior to being uploaded. Not sure how you’d go about doing this in an automated way that still guarantees data integrate. Should it could be done though.

Or you could just wait… It shouldn’t take longer than 24 hours, then once you’ve got the files uploaded you won’t need to upload them again.

Fair point. I have about 340,000 files in 25,000 directories to backup, I’ll see how it goes over the next week or two.

I noticed that when I kicked off rclone to start copying all my data the progress stats are only listing out about 10,000 files (10TB in total). Would there be any reason it isn’t calculating based on all the files/data that I’ve source path I specified?

I believe It only fills the queue so much. Once it gets to the threshold, it waits for transfers to complete before checking further.

You can change this threshold

  --max-backlog int           Maximum number of objects in sync or check backlog. (default 10000)

It is set to 10,000 so it doesn’t use too much memory.

i use wasabi for cloud storage, have 3TB there. my server is the free windows server 2019 hyper-v edition.

i have my own python script for backups