How to speed up: move many small files between Google drive and shared drive (team drive)

What is the problem you are having with rclone?

I'm trying to move many small files from my Google drive to my shared drive (team drive) but the speed seems slow.

More details:
I have 2 G Suite accounts and I'd like to transfer files from A to B. I used copy and transferred ~720GiB everyday. Yesterday I learnt that moving to a shared drive doesn't have this 750GiB quota limit. I created a shared drive with B and assigned A as content manager. It used to take about 1 hour to copy 720GiB (6-7k) files from A to B. Now it takes 2 hours and 40 minutes to move the same amount of files. I've tried many flags (listed in the command section below) but the speed is unchanged. Sure I can move more than 750GiB files now but I still wonder why it's slower than copy and how I can improve it.

What is your rclone version (output from rclone version)

1.54.0

Which OS you are using and how many bits (eg Windows 7, 64 bit)

macOS Mojave 10.14.6

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

# the copy command I used before
rclone -P copy gdriveA:data/csv/ gdriveB:data/csv/ --drive-server-side-across-configs --fast-list --transfers 8
# move commands I tried with various flags
# 27080 files with 11h48min
rclone -P move gdriveA:data/csv/ gdrive-team:data/csv/ --drive-server-side-across-configs --fast-list --transfers 8
# 5013 files with 2h0min
rclone -P move gdriveA:data/csv/ gdrive-team:data/csv/ --drive-server-side-across-configs --fast-list --no-check-dest --retries 1 --no-traverse
# 7273 files with 2h41min
rclone -P move gdriveA:data/csv/ gdrive-team:data/csv/ --drive-server-side-across-configs --fast-list --no-check-dest --retries 1 --no-traverse  --checksum --check-first
# 6877 files with 2h42min
rclone -P move gdriveA:data/csv/ gdrive-team:data/csv/ --drive-server-side-across-configs --fast-list --no-check-dest --retries 1 --no-traverse  --checksum --check-first  --transfers 20 --checkers 40

The rclone config contents with secrets removed.

[gdriveA]
type = drive
client_id = 
scope = drive
token = 
root_folder_id = 

[gdrive-team]
type = drive
client_id = 
scope = drive
token = 
team_drive = 

A log from the command with the -vv flag

# logs of move, 6735 files 719 GiB in 2h33min
2021/03/23 15:56:31 DEBUG : rclone: Version "v1.54.0" starting with parameters ["rclone" "-P" "move" "gdriveA:data/csv" "gdrive-team:data/csv" "--drive-server-side-across-configs" "--fast-list" "--no-check-dest" "--retries" "1" "--no-traverse" "--checksum" "--check-first" "-vv" "--log-file=/Users/me/Desktop/rclone-1.log"]
2021/03/23 15:56:31 DEBUG : Using config file from "/Users/me/.config/rclone/rclone.conf"
2021/03/23 15:56:31 DEBUG : Creating backend with remote "gdriveA:data/csv"
2021/03/23 15:56:38 DEBUG : Creating backend with remote "gdrive-team:data/csv"
2021/03/23 15:56:40 INFO  : Google drive root 'data/csv': Running all checks before starting transfers
2021/03/23 15:57:50 DEBUG : Google drive root 'data/csv': Waiting for checks to finish
2021/03/23 15:57:50 INFO  : Google drive root 'data/csv': Checks finished, now starting transfers
2021/03/23 15:57:50 DEBUG : Google drive root 'data/csv': Waiting for transfers to finish
2021/03/23 15:57:52 INFO  : a.csv.gz: Moved (server-side)
2021/03/23 15:57:54 INFO  : b.csv.gz: Moved (server-side)
2021/03/23 15:57:55 INFO  : c.csv.gz: Moved (server-side)
2021/03/23 15:57:57 INFO  : d.csv.gz: Moved (server-side)
2021/03/23 15:57:58 INFO  : e.csv.gz: Moved (server-side)

# logs of copy, 3682 files 409 GiB in 32min
2021/03/23 19:26:38 DEBUG : rclone: Version "v1.54.0" starting with parameters ["rclone" "-P" "copy" "gdriveA:data/csv" "gdriveB:data/csv" "--drive-server-side-across-configs" "--fast-list" "--no-check-dest" "--retries" "1" "--no-traverse" "--checksum" "--check-first" "-vv" "--log-file=/Users/me/Desktop/rclone-2.log"]
2021/03/23 19:26:38 DEBUG : Using config file from "/Users/me/.config/rclone/rclone.conf"
2021/03/23 19:26:38 DEBUG : Creating backend with remote "gdriveA:data/csv"
2021/03/23 19:26:48 DEBUG : Creating backend with remote "gdriveB:data/csv"
2021/03/23 19:26:54 INFO  : Google drive root 'data/csv': Running all checks before starting transfers
2021/03/23 19:27:36 DEBUG : Google drive root 'data/csv': Waiting for checks to finish
2021/03/23 19:27:36 INFO  : Google drive root 'data/csv': Checks finished, now starting transfers
2021/03/23 19:27:36 DEBUG : Google drive root 'data/csv': Waiting for transfers to finish
2021/03/23 19:27:42 DEBUG : a.csv.gz: MD5 = ef11 OK
2021/03/23 19:27:42 INFO  : a.csv.gz: Copied (server-side copy)
2021/03/23 19:27:43 DEBUG : b.csv.gz: MD5 = eb43 OK
2021/03/23 19:27:43 INFO  : b.csv.gz: Copied (server-side copy)
2021/03/23 19:27:43 DEBUG : c.csv.gz: MD5 = ff78 OK
2021/03/23 19:27:43 INFO  : c.csv.gz: Copied (server-side copy)
2021/03/23 19:27:45 DEBUG : d.csv.gz: MD5 = fcdf OK
2021/03/23 19:27:45 INFO  : d.csv.gz: Copied (server-side copy)
2021/03/23 19:27:45 DEBUG : e.csv.gz: MD5 = 8516 OK
2021/03/23 19:27:45 INFO  : e.sv.gz: Copied (server-side copy)

It's always faster to do upload/download for smaller files when moving across drives or shared drives due to the nature of how it works. There isn't much that you can do about it since rclone only does an API call for that and the remaining time is taken by Google for it.

Thanks for the explanation! I update the post with new logs. It seems that move doesn't check MD5 hash values. So what is it checking, and, can I accelerate this processing?

move -P like this
Transferred: 0 / 0 Bytes, -, 0 Bytes/s, ETA -
Checks: i / 6735, p %
Renamed: j
Elapsed time: _h_m_s
Checking:

  • a.csv.gz: checking
  • b.csv.gz: checking
  • c.csv.gz: checking
  • d.csv.gz: checking

copy -P like this
Transferred: n G / 409.437 GBytes, p %, * MBytes/s, ETA -
Transferred: i / 3682, p %
Elapsed time: _m_s
Transferring:

  • a.csv.gz: p % /size-k, speed/s, -
  • b.csv.gz: p % /size-M, speed/s, -
  • c.csv.gz: p % /size-M, speed/s, -
  • d.csv.gz: p % /size-M, speed/s, -

can you post a debug log

I think it's in there (log section in the post) already, with the log level set by -vv, in other words, there is no DEBUG log about MD5 checksum

If you move server side, there is no checksum because nothing changed. It moves / renames the file and the metadata (checksum) goes along with it.

gdrive can do a server side move.
so like a local file system, gdrive is just updating the location of the file, not actually moving/copying data.

with S3, rclone would have to simulate a move - copy and then delete

Thanks for the explanation! Can you elaborate a little bit further, about the 'checking' in the move progress? copy shows the list with size, speed, etc., while move only says 'checking'. I believe this is the part that slows down things and make it even slower than copy. What is it checking anyway, and how can I speed it up?

Run the command with -vv and you can see everything. Share that log.

Thanks! So... I think my questions would be: why is server-side move slower than server-side copy, and what flags can I use to make the move faster?

i would assume some kind of rate limiting

@Animosity022 would know for sure

Here is the log. I hope I've masked all the personal info. There's no log entry about the checking and I don't know if this can provide extra info than the log snippet I included in my original post.

https: //privatebin. net/ ?fd429db439e02aa4#A1jAVpc31JxJSE1ZgPs39BPZyrshuYRXQ6L4bfLbqZQr

I can't post links or attachments yet. Please remove the spaces in the URL.

I don't see any issues in the logs.

Google rate limits file creation (not documented anywhere) and you normally get 2-3 files at most per second making small files bad for Google Drive in general.

I read the logs and the timestamps show that move rarely reaches 1 file per second, while copy sometimes can even do 2 files per second. So I'm not sure either operation hits the rate limit.

Then I did some experiments with the flags, with --drive-server-side-across-configs --fast-list --no-check-dest --retries 1 --no-traverse --check-first being the same:

  1. no other flag, 6843 files in 3h19m (34/min)
  2. --checksum --ignore-size 1759 files in 35m (50/min)
  3. --checksum --ignore-size --ignore-times 7163 files in 1h53m (63/min)
  4. --ignore-checksum --ignore-size --ignore-times 5996 files in 1h28m (68/min)

It gets faster and faster, but still slower than copy. Exp. 1 uses mod time and file size checking, I assume, and that's why it's so slow. What else is move checking with --checksum or ignore-checksum? The logs show that it's not doing checksum and I deliberately disabled it in the last run. If I drag a file from my Google drive to a shared drive in a browser, will Google check anything besides changing the file metadata?

Even when you drag something in the web interface its not immediate if its a lot of files. It just fibs to you and shows the files in the new place and does it in the background. Unfortunately there really isn't much of a way to speed it up, the ball is in Google's court

Interesting use case. I've been a 'victim' of GDrive's horrendous performance in an otherwise very similar many-small-files scenario (just search for my initial messages here in the forum, from ~5 years ago), but my problems were limited to the copy/sync command. I've never used move in this scenario.

Here's what I know:

  1. the GDrive limit impacting my scenario (and, very likely, @rnm's too) is the number of transactions per second: by default GDrive throttles more than (IIRC) 2-3 transactions per second, which slowed things to a crawl.

  2. this limit is not undocumented: it shows somewhere in Google's developer's web console, and you can even request (via ticket) for it to be set higher: I did this with my original account about 5 years ago, asking for it to be set to 20 transactions/second, and at the time Google obliged by setting it to 6 instead; on a more recent case with another account and another project, I asked again but this time Google simply refused to change anything.

And here's what I suppose:

a) perhaps in @rnm's scenario, what's happening is that for some idiosyncracy of the GDrive API or the way rclone uses it, a move in this specific case could end up using more transactions per file than a local-to-GDrive copy; if I were to investigate this further, I would investigate that first.

Good luck to @rnm in diagnosing/fixing this, and pleasen keep this thread posted; I'm curious to see how this will end.

Cheers,
-- Durval.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.