How much egress does rclone sync use?

Sorry for the rudimentary question, but does anyone know how much egress rclone sync uses? I ask because I am seeing two very different behaviors.

I use rclone sync to backup the same local files to BackBlaze B2 and Google Cloud Storage every 15 mins (using --fast-list). About 1 TB of 100,000 files. Vast majority of these files do not change, so most of the time rclone is not uploading anything. I am never downloading anything, aside from whatever sync needs to do its job.

On BackBlaze B2, I am being charged $0 for egress; since B2 allows 1 GB free egress per day, that means rclone is using 1 GB or less in egress every day. On Google Cloud Storage, rclone is using 10 GB of egress per day.

What is up? 10 GB of egress seems excessive when I am not downloading anything from Google Cloud Storage, only uploading.

These are the two commands that I run every 15 mins:

rclone --fast-list sync "path\to\files" EncryptBackblaze:

rclone --fast-list sync "path\to\files" EncryptGoogleCloudStorage:

There's no one stop answer to that as it really depends.

It depends on what is being uploaded as the main factor as the other calls are really not that intensive.

What version are you running?
What's your rclone.conf look like?
Can you share a log show the 10GB of egress?

Basically, the help template...

Sorry, I didn't think the help template was useful in this situation. I have stopped using Google Drive and switched to Scaleway, but egress is still quite high. About 3 GB of egress per day on Scaleway, even though I am not downloading or restoring anything from Scaleway. Same about 1 TB of 100,000 files.

What is the problem you are having with rclone?

sync is using a LOT of egress.

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.2
- os/version: Microsoft Windows 10 Pro 21H2 (64 bit)
- os/kernel: 10.0.19044.2006 (x86_64)
- os/type: windows
- os/arch: amd64
- go/version: go1.18.6
- go/linking: static
- go/tags: cmount

Which cloud storage system are you using? (eg Google Drive)

Scaleway

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone --verbose --fast-list --update --use-server-modtime sync /path/to/files EncryptScaleway:

This command is run every 15 minutes.

The rclone config contents with secrets removed.

[Scaleway]
type = s3
provider = Scaleway
access_key_id = *redacted*
secret_access_key = *redacted*
region = fr-par
endpoint = s3.fr-par.scw.cloud
storage_class = GLACIER
chunk_size = 100Mi
max_upload_parts = 1000

[EncryptScaleway]
type = crypt
remote = Scaleway:*redacted*
password = *redacted*
password2 = *redacted*

A log from the command with the -vv flag

2022/09/21 23:38:32 DEBUG : rclone: Version "v1.59.2" starting with parameters ["rclone.exe" "--verbose" "--fast-list" "--update" "--use-server-modtime" "-vv" "sync" "/path/to/files" "EncryptScaleway:"]
2022/09/21 23:38:32 DEBUG : Creating backend with remote *redacted*
2022/09/21 23:38:32 DEBUG : Using config file from *redacted*
2022/09/21 23:38:32 DEBUG : fs cache: renaming cache item *redacted* to be canonical *redacted*
2022/09/21 23:38:32 DEBUG : Creating backend with remote "EncryptScaleway:"
2022/09/21 23:38:33 DEBUG : Creating backend with remote "Scaleway:*redacted*"
2022/09/21 23:39:33 INFO  : 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:       1m0.0s

2022/09/21 23:40:33 INFO  : 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:       2m0.0s

2022/09/21 23:41:32 DEBUG : *redacted*: Destination is newer than source, skipping
[similar repeating lines for different files]
2022/09/21 23:41:34 DEBUG : Encrypted drive 'EncryptScaleway:': Waiting for transfers to finish
2022/09/21 23:41:34 DEBUG : Waiting for deletions to finish
2022/09/21 23:41:34 INFO  : 
Transferred:   	    3.419 KiB / 3.419 KiB, 100%, 3.418 KiB/s, ETA 0s
Checks:            109019 / 109019, 100%
Transferred:            1 / 1, 100%
Elapsed time:       3m1.1s

2022/09/21 23:41:34 DEBUG : 4 go routines active

rclone needs to download a complete directory listing with meta data (date, size, hash,...) every time you do a sync to check the directory against a similar list generated for your local folder.

You can get an impression of the size and data by performing this command:

rclone lsjson --recursive --hash Scaleway:  >  Dir_Scaleway.txt

and then checking the size and content of Dir_Scaleway.txt (using e.g. Windows Explorer and Notepad)

There will typically be some overhead to transfer data, so I would guess you see a file with a size around 20MB (3GB/4/24/1.5) equivalent to 200 Bytes(characters) per directory entry (20MB/100,000). That sounds as expected to me, especially because you are using (standard) encryption which increases the length of paths and filenames (as you can see in Dir_Scaleway.txt).

The simplest way to decrease the downloads is to reduce the sync frequency, as an example you could reduce directory downloads by a factor 4 by just syncing once every hour.

You may also want to consider the more advanced approach using top-up syncs every 15 minutes and a full sync every 24 hours, the approach is described in these posts:
Copy to S3 incrementally - #2 by ncw
--no-traverse too slow with lot of files - #2 by ncw

1 Like

Thank you for this awesome response. I ran the command you suggest, and now it is obvious what is going on. Rclone is using about 50 MB of egress every time I run the sync, which leads to GBs of egress per day if I am running a sync every 15 mins.

Interestingly, Backblaze does not seem to charge for this egress, as I am running the same sync command every 15 mins for Backblaze but have thus far been charged $0 for this egress. Scaleway, Azure, GCS, etc. all do charge for this egress.

I will look into top-up syncs. Thanks!

hi,
to reduce the amount of checks of dest files, might try --max-age=15m

fwiw, Wasabi, s3 clone, does not charge for api calls and associated egress

1 Like

Thanks a lot, happy to hear :slightly_smiling_face:

Perhaps Backblaze only measures/charges for the egress of files - not directory listings. It really depends on where you do the measurement, you probably also noted the difference between the raw file transfer rate displayed by rclone and the bandwidth usage (including directory listings and misc. overheads) displayed by your OS.

1 Like

Thanks for the suggestion. This is a viable option. I am a bit hesitant because this assumes rclone successfully running every 15 mins, but perhaps using --max-age=15m every 15 mins and then without it every 24 hours to grab the missed files.

yes, that is the way to do it.
on the whole, will greatly reduce the cost of api calls.

Sounds like you may not have fully understood the trick/details in the top-up approach, and that's perfectly OK - it is a hack cleverly combining the effect of 2 commands and 2 flags. Let me try to explain using your situation as example:

I suggest you execute a copy command like this every 15 minutes:

rclone copy --max-age=1h --no-traverse --update --use-server-modtime --verbose /path/to/files EncryptScaleway:

--max-age=1h
tells rclone to only compare files with "modified time" within the past hour. Used by itself it will make a full directory of both source (/path/to/files) and target (EncryptScaleway) and then filter those based on modification time before comparing them, so no reduction in your egress by using this flag alone.

Important note: It will not detect rename and deletion of old files. It also will not detect if you copy an old file into the local folder (assuming an unchanged modification time older than --max-age). That's why we also need a daily job to catch and sync these; that job is at the end.

You can test the effect with this command: rclone lsl --max-age=1h /path/to/files

I have included some overlap to compensate for scheduling variances and network glitches causing single job failures. Adjust as you see fit, it could probably be 24h unless you create a lot of new files during the day.

--no-traverse
tells rclone to compare the entries in the source directory listing one by one to the target directory. Used by itself it would cause a one-by-one look up of (almost) all the entries in your target directory and be quite costly in time, egress and API cost. It is far more efficient to download/traverse the entire directory listing in one sweep, that's why it is the default.

--max-age=1h --no-traverse
tells rclone to first look in the source for any files modified in the past hour and then look up only these (few) entries in the target directory. Now this is fast and cost effective if you only have few new/modified files and a lot of files in the target.

Now you just need to run your original sync job periodically (e.g. every 24 hours) to catch the situations mentioned above:

rclone sync --fast-list --update --use-server-modtime --verbose /path/to/files EncryptScaleway:

Hope you now have a good understanding of the benefits and limitations of the top-up approach.

Thanks again, Ole! One question. What effect would it have if I also use --fast-list in the copy command you suggest?

--fast-list list has no effect on local folders and no effect on the target (S3) when using --no-traverse to read the directory entries one-by-one. So I removed it to make the command simpler.

Also note that --fast-list isn't always the fastest way, more info here:
https://rclone.org/docs/#fast-list

If your API calls are free, then I suggest you try whether it makes your sync command significantly faster and then only use it when it does. Note: The result may vary from command to command and also heavily depends on your data (including the files to be transferred).

1 Like

Thank you for all your help. Makes total sense.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.