Google unencrypted copy to local mergerfs pauses at 100% a lot?

What is the problem you are having with rclone?

rclone google unencrypted remote copy job to ubuntu local ext4 mergerfs HDDs "pauses" at 100% a lot.

I'm saturating gigabit downloading until files hit 100%, then I presume some checking process comparing gdrive with local files runs, but it takes so long that it slows the whole big copy job. It would be preferable to me if the copy job saturated my gigabit consistently, so I can get this big copy job over sooner rather than later.

Example: runs at 100+MB/s until files start hitting 100%, then:

Elapsed time:    1d33m2.2s
Transferring:
 * file1:100% /9.475Gi, 0/s, 0s
 * file2:100% /9.241Gi, 0/s, 0s
 * file3:100% /9.442Gi, 0/s, 0s
 * file4:100% /7.980Gi, 0/s, 0s
 * file5:100% /7.864Gi, 0/s, 0s
 * file6:100% /7.890Gi, 0/s, 0s
 * file7:100% /8.137Gi, 0/s, 0s

Once it's done checking the files and most are done after a good while, it will spin up to 100+MB/s with 8 new active transfers

Run the command 'rclone version' and share the full output of the command.

rclone v1.64.2

  • os/version: ubuntu 22.04 (64 bit)
  • os/kernel: 6.2.0-36-generic (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.21.3
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy tdrive: /mnt/localhddsunionfs/ --checkers=16 --drive-chunk-size=128M --low-level-retries=2 --retries=1 --stats=60s --transfers=8 -v

I will gladly take advice here on what I should change. More transfers? Disable checking? Or some other flag?

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[tdrive]
client_id = XXX
client_secret = XXX
type = drive
token = XXX
team_drive = XXX

A log from the command that you were trying to run with the -vv flag

I can't present this log, at this time.

More isn’t generally better. If you are IO bound, adding more transfers and checkers is going to make it far worse.

Use defaults and see how that works.

1 Like

Thank you. I believe the rclone defaults are transfers=4 and checkers=8 ?

It seems to me to be checkers that are slowing the copy. Should I use a very low number like 1 perhaps?

The best is to test to find out the sweet spot - there is no one settings good for all. Depends on too many local factors.

1 Like

After some testing, If I add a "--no-check-dest" flag as expected my transfer stays at 1gigabit consistently, no drop in speed, because no checking is done.

However is there a better way to do the checking that my mechanical HDD mergerfs file system can cope with better?

The server is otherwise powerful, CPU with over 18000+ CPU PassMark and over 20gigs of RAM available.

Hm, I said this a little too early. It seems even with --no-check-dest flag, files are hanging at 100% for a while before it moves on. Puzzling.

Edit: It's possible the other larger copy job is interfering with my tests and hanging up my --no-check transfer. When it eventually finishes I'll do some more substantial tests.

After successful file transfer its checksum is checked to ensure that there were no errors. It means that for a local filesystem all file has to be read and checksum calculated. It can result in noticeable "pauses" when your disk is slow and/or files are big.

If you do not care about checking use below flag:

      --ignore-checksum   Skip post copy check of checksums

Less safety you apply faster things will become - at the cost of increased risk of undetected errors. Choice is yours.

1 Like

Thank you. Aren't --ignore-checksum & --no-check-dest essentially doing the same or?

I realise checking is quite useful when moving vast data amounts, it would be annoying if many files got corrupted without knowing about it and the transfer as it is, is running at an average 40MB/s WITH checking, so that's not terrible.

But good for me to know the options.

No they are not the same. You can read all details in docs. Which is always advisable as using random flags will lead to random results.

1 Like

How right you are, I assume it's still checking with that flag. Could --check-first help me out in this case? I guess not, because the destination directory is going to be empty when the copy job starts.

The files aren't going to be there for long either, because they will be uploaded out to a crypt google remote, I'm not doing it directly because I need to encrypt & a script cycles through some accs to upload more than the 750GB limit a day.

Help with what? Have you read what this flag is doing?

If not sure and this is one off job it is much safer to rely on default settings. Does it really matter if it takes a bit longer?

1 Like

Yes, it does the checking before the transfer begins. However there are no files in the destination before it begins, so I guess my assumption is it would not do much at all.

This is just 1 copy batch of many, so it's useful for me to know how I can optimize it optimally. Thank you for your assistance.

--check-first will check before transferring anything what has to be transferred. It can be useful if you want more precise ETA up front or when checking impacts transfers (on some IO limited systems)

It has no impact on actual transfers and hash checking later.

1 Like

Okay, thanks for clarifying. I think this flag you've provided me with

--ignore-checksum   Skip post copy check of checksums

Is really the only one that will impact my copy jobs considerably.

If I changed --checkers=1 - it would still do the checksum on every single file right? And possibly slower with only 1 checker am I understanding that right?

Unless you specify --ignore-checksum - checkers check what to transfer. Not what was transferred successfully or not.

Your bottleneck is your HDD - not rclone.

Now what is optimal number of "checkers" - it depends on your local system. Only way to find out is to test.

1 Like

I am well aware, rclone is a fantastic tool & champ here and the HDD is bottlenecking me. I am going to be conducting a lot of tests.
I know for one fact that my creation policy for my mergerFS is mspmfs and 1 disk is larger than the 3 others, which means it will use only 1 out of the 4 disks a majority of the time, so I'm going to change it so the data gets distributed over the 4 HDDs, this should be very helpful for me.

1 Like

Okay with 4 mechanical HDDs in play this is running much much better! Thanks for assistance y'all.