I’ve got a folder with a substantial number of files (upwards of 2.8M, approx 150GB)
It has two sub directories, one with approx 1.8M, the rest in the other. Inside each of those there is approx 13000 folders.
In order to sync it in manageable chunks (each team drive caps out at approx 390k files) I’ve split the sync into sections using --include, each going to a separate team-drive. Capping out at around 2000 folders in each sync operation.
I’m running into 3 issues with this. The only one that I think is a bug is #2, the rest are configuration related (eg, There’s probably a more suited command for what I’m doing)
- A standard sync transfers properly but does not actually finish. I suspect that it’s scanning the remaining 1.7M files in the same subdirectory instead of just the 94k I asked it too, The initial scan only took 15 minutes, but it’s now at past 2 hours and no change. Is there a better way I can use to make it only look at a section of folders than using --include?
Transferred: 21.081M / 21.081 MBytes, 100%, 6.436 kBytes/s, ETA 0s
Checks: 93774 / 93774, 100%
Transferred: 248 / 248, 100%
Elapsed time: 2h6m23.1s
Using --fast-list vastly improves the time it takes to do the initial directory scan of the 94k files (3 minutes instead of 15), but once it’s done doing so, it just starts uploading, even if the file was already on the drive.
It doesn’t “check” any of the files against each other before doing so.
Performance. I’m getting no where close to the 2-3 transfers per second I would expect to cap out at. Bandwidth wise I’m limited to around 120K/s peak, so I limited each sync operation to 30K to make it somewhat manageable. Any time a large file hits the queue it would obviously slow down the transfers per second till it completes, but the vast majority of these files are tiny, under 50 bytes.
The best setup I’ve gotten so far is with 3 sync operations running at once, First one being for a folder with almost exclusively 500K+ files at the default 4 transfers at a time, The other two on the smaller files with 20 transfers each. They go in bursts, I’ll see all 20 files in the queue at 0%, then 60 seconds later, all 20 will swap to 100% at once and finish within 1-2 seconds, Then the other sync operation will do the same thing. Overall I was able to transfer 400k files in just over a week, which only works out to 0.6 files per second, No where close to the expected rate limit on google drive’s end. Lowering the active transfer amount or running only a single sync operation without --bw-limit at a time was proportionally slower. I am not getting any rate limit or throttle warnings currently so I suspect this is purely on my end, not google’s. Is my overall bandwidth too low to make hitting 3-4tps feasable?
Here’s an example of the command I’m using. For problem #2 the only change is adding the --fast-list to the end.
rclone sync E:\MainDirectory\Subdirectory1 Drive1_14-16k:Subdirectory1 --include Q1[0-9][0-9][0-9]/** --bwlimit 30k -P --transfers 20
The intent of the --include line is that I want it to upload everything from Q14000 to Q15999 in this sync. I have another one for Q16000 to Q17999 and so on. Due to 3rd party/proprietary software I can’t easily mess with the existing folder structure to split them out at this time.