Copy or sync - performance for initial copy

Hi,

While I understand the behavioral differences between rclone's "copy" and "sync" commands, I'm wondering if one can count on "copy" performing a little faster than "sync" when transferring a large amount of data ( > 500G ) for the first time to an empty destination.

Thanks!

A copy is if you want to make a duplicate of something from either a file or directory on a source to a destination.

So you copy things from A->B

If you want to keep things in sync, you make A sync to B and B becomes a replica of A.

If you delete something from A, it gets deleted from B and is in "sync".

A sync copies every from A->B.

Does that help?

Hi,

Thanks for your reply. Yes, I understand the difference in behavior that you described. What I'm asking is whether or not there is a difference in performance when doing an initial copy to an empty target.

So, if I'm transferring 1TB of data from SOURCE:/dir to TARGET:/dir for the first time and when TARGET:/dir is empty, is it faster to use "rclone copy" than "rclone sync" that one time?

I wouldn't think so offhand, but it may depend on what's on the other side.

A sync of say 10 million files making up 1TB would be long/slow, but a copy or sync of a few thousand files wouldn't matter much imo.

What backend are you using?

The back end is a Google Team Drive. There is nothing there. The copy is fresh.

Right, I was asking what about the source files.

Are they lots of small files or larger video files?

With Google, you can only create 2-3 files per second so lots of small files suck for Google.

Got it. Unfortunately I don't know that much about the source data. Its max folder depth is 13. There are ~5TB in total. I think there's a relatively vanilla distribution of small and large files. "Large" here is roughly between 1GB-15GB.

Is the Google file-creation limit for G Suite only or GCP Storage too?

I only can speak for my use / what I've seen from Google Drive. I'd assume Google Storage is probably different and allows for more, but would need someone who uses it to confirm.

You can also use --fast-list to help speed things up.

Thanks. I was confused by --fast-list. I wasn't sure when it was appropriate to use it. I tried it once (for a much smaller data set--around 10-to-100 GB) and it completely hosed my Mac trashcan (32GB RAM).

I do not think that would be fast list as it just uses a little more memory but should not eat up 32GB of memory.

I just listed out my GD and didn't creep more than a few hundred MB of memory.

Hmm. OK, then I probably used too-high values for --transfers and --checkers. Thanks for the correction.

The default values are pretty good for the most things. You can get some 403 rate limit errors but those are nothing to worry about too much as it just retries. If you set them too high, it wastes a lot of time retrying.

From the point of the code, copy and sync are identical, except sync deleted excess files. Since you have no excess files the performance will be identical I think.

From a practical point of view, use copy - sync can delete stuff you didn't mean to delete :smiley:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.