What are good ways to sync many files (5M+) that are all within the same folder? Unfortunately I can not change folder structure.
I am using hetzner storagebox but unfortunately its taking over an hour with little activity. It works great if I wait longer or when transfering 1-3M of files (same folder).
I run the command twice with slightly different params, also because there is a max connection with hetzner.
First run is without checksum check, second run is with and increased checkers threads.
I enable insecure algo for speed improvements during transfer. The issue is not bw or datatransfer rates, but that there is no activity for a long time.
This is rclone listing that directory with 5 Million files in it. It is probably taking that time to list the directory.
It might be different protocols are faster - the storage box supports lots webdav/ftp/smb so it would be worth trying those. However I suspect it might be the storage box itself.
Rclone still has to read the entire directory before filtering it by age. Of all the backends only Google drive can filter the listing by age in the server.
no-traverse does not work with sync? And 99% are sync type of transfers.
Switching to webdav immensely lowered processing time:
Thu Mar 28 07:49:18 AM UTC 2024 - [world_nether] starting bulk backup...
2024/03/28 08:17:25 ERROR :
Transferred: 62.621 GiB / 62.621 GiB, 100%, 11 B/s, ETA 0s
Checks: 74228 / 74228, 100%
Transferred: 27636 / 27636, 100%
Elapsed time: 28m6.8s
Thu Mar 28 08:17:25 AM UTC 2024 - [world_nether] starting integrity backup...
e[32m2024-03-28 08:19:55 [INFO]: 2m30s silence achieved after 2m30.000140809se[39m
timeout: sending signal TERM to command ‘rclone’
Thu Mar 28 08:39:55 AM UTC 2024 - [world_nether] integrity backup failed
Thu Mar 28 08:39:55 AM UTC 2024 - [world_nether] all backups done
However, now the secondary checksum-based run failed on timeout. My guess is its reallt the “ls” that might be cached on host or not causing longer times.
Unfortunately I need to do checksum check, so will look for alternatives.
--no-traverse should work with sync but I thought you were using copy.. It doesn't make sense to use --max-age with sync normally.
If you want to check integrity you could use rclone check --download and that will give you 100% assurance on any backend. Since you aren't going over the Internet to the storage box that is probably quite a reasonable thing to do.
You could also try the smb backend also - I think that works with the storage box too.
hi, the OP needs checksums, which the smb backend lacks.
indeed, storagebox does support smb. tho not sure it is a good idea to expose samba over internet.
Im actually thinking I might have outgrown hetzner. The fact that md5 hashes are calculated on-demand is not good for performance, nor that a file-list (with either webdav or sftp) are being done on-demand each time.
Seaweedfs is the only solution that stores metadata, checksum and directory listing in a db, making remote-ls/lookups fast.