Transferring Terabytes from Google Drive, could I optimize anything in my commands?

What is the problem you are having with rclone?

I need to transfer up to 15 Terabytes worth of data from Google Drive, and need to know what the most efficient commands would be for it.

Run the command 'rclone version' and share the full output of the command.

- os/version: darwin 12.5.1 (64 bit)
- os/kernel: 21.6.0 (x86_64)
- os/type: darwin
- os/arch: amd64
- go/version: go1.21.1
- go/linking: dynamic
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

I have 30 folders in my Google Drive root directory. 1 has been downloaded so far. This is my command for getting the rest

rclone copy -P --ignore-checksum gd:/ "/Volumes/20TB-HardDrive/" \
--exclude "/First Folder that was Downloaded Already/**"

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

type = drive
client_id = XXX
client_secret = XXX
scope = drive
token = XXX
team_drive =

A log from the command that you were trying to run with the -vv flag

2023/09/13 01:50:59 DEBUG : rclone: Version "v1.64.0" starting with parameters ["rclone" "copy" "-P" "local:testvid.mov" "." "-vv"]
2023/09/13 01:50:59 DEBUG : Creating backend with remote "local:testvid.mov"
2023/09/13 01:50:59 DEBUG : Using config file from "/Users/local/.config/rclone/rclone.conf"
2023/09/13 01:50:59 DEBUG : Google drive root 'testvid.mov': 'root_folder_id = 0AJj8kfw9dikcGJQUk9PVA' - save this in the config to speed up startup
2023/09/13 01:51:00 DEBUG : fs cache: adding new entry for parent of "local:testvid.mov", "local:"
2023/09/13 01:51:00 DEBUG : Creating backend with remote "."
2023/09/13 01:51:00 DEBUG : fs cache: renaming cache item "." to be canonical "/Users/local/Desktop"
2023-09-13 01:51:00 DEBUG : testvid.mov: Need to transfer - File not found at Destination

Because I have so many folders and terabytes to transfer, I am wondering if the approach I had

rclone copy -P --ignore-checksum gd:/ "/Volumes/20TB-HardDrive/" \
--exclude "/First Folder that was Downloaded Already/**"

is most efficient or if there is any way for me to optimize things. Thanks!

You do not have to exclude already copied directories. They will be ignored regardless.

Now what exactly is the problem you are facing?

What is your copy command speed? Much lower than your internet connection?

Thanks, I did not know that, that is great to know. The issue is that while my copy speed is often on par with my connection speed, around 56.476 MiB/s, there are times where it drops down to a few kilobytes.

This makes me wonder if there are parallelization efforts or if I am missing out on a command that would help facilitate sustained large transfer high volume speeds. Thanks

Difficult to say why it slows down sometimes without any log files etc.

What is possible is that default settings work perfectly for big files but not for small ones.

You can split all copy process into two steps.

  1. Big files only

rclone copy src dst --min-size 1M

  1. Small files only

rclone copy src dst --max-size 1M --transfers 10

The --order-by flag can be helpful here too.

In particular the mixed mode where you assign 75% of the transfers for transferring big files, but make sure you are always transferring small files with the other 25% as they take ages. Eg

--order-by size,mixed,25

This should stop the slowdowns when it is only transferring small files and keep your network pipe full.

3 Likes

I have a single .tar file on Google Drive that is 4 TB. I am trying to maximize the download speed and was wondering if either

--multi-thread-streams or --multithread-upload-cutoff flags would make a difference.

For example, I thought I could try:

rclone copy -P --ignore-checksum \
--multi-thread-streams 8 --multithread-upload-cutoff 250M \
--transfers 1 \
gd:/"FolderTo/4TBFile.tar" "/Volumes/Large Hard Drive/"

But am not sure if for downloads the above would make a difference at all? Thanks so very much.

Experiment with:

  --multi-thread-chunk-size SizeSuffix    Chunk size for multi-thread downloads / uploads, if not set by filesystem (default 64Mi)
  --multi-thread-cutoff SizeSuffix      Use multi-thread downloads for files above this size (default 256Mi)
  --multi-thread-streams int     Number of streams to use for multi-thread downloads (default 4)

If defaults do not saturate your internet connection.

There is not one recommended set of values here. All depends on your connection. Usually defaults are good enough.

For large file like you mentioned I would try larger chunk size - maybe even 1G...

Please note that it all requires RAM - so 8 streams and 1G chunk will use 8GB of RAM

But then again only testing few values will tell you what the sweet spot is.