Why so many transactions on S3?

I recently set up an automatic sync to S3 and B2 with a cron job; I’m using v1.36 on Linux. Last night there were no files to update because I haven’t added/changed anything, and it was the first day of the month so it’s a perfect time to compare what’s happening with B2 and S3.

First, the S3 sync takes 10 times longer, no exaggeration. And my S3 billing information shows 32,117 GETs; only $0.03/day, no big deal. Incidentally the S3 GUI claims I have 32,118 files.

Now B2 claims 78 transactions TOTAL for that same sync. I’m not sure how rclone works in this regard, but this certainly explains some of the sync time differences.

I’m using the same command line for both with the only difference being the remote:

rclone sync --transfers=3 --old-sync-method --bwlimit “05:00,400K 22:00,700K” /backupdrive S3: >/dev/null
rclone sync --transfers=3 --old-sync-method --bwlimit “05:00,400K 22:00,700K” /backupdrive B2: >/dev/null

This is because rclone needs to do another HEAD to read the modification time on S3 whereas they arrive in the directory listings for B2.

You can either do --size-only or --checksum to work around this.

2 Likes

Awesome, thanks for the reply!

Hmmm, made the change:

rclone sync --checksum --transfers=3 --old-sync-method --bwlimit “05:00,400K 22:00,700K” /backupdrive S3: >/dev/null

And it ran last night, now it’s up to over 64K transactions for the month. Didn’t seem to make any difference.

Just curious why use the old-sync-method

Since it reduced the number of transactions on B2 I guess I assumed that would help with all systems. But that I don’t know.

1 Like

So, I finally looked at the log I create when this cron job runs, and something has changed. The number of transactions is the same, but instead of taking 4-5 minutes, it’s super fast. It runs in 14 seconds!

However time isn’t so important. And now that I calculate it, transactions are peanuts at about 60 cents/month. If there was an easy way to reduce it I would, but otherwise the cost is insignificant.

Strange…

It will help with any bucket based remote like S3, GCS, B2, Swift

That doesn’t make sense to me… The reason it got much faster is that it didn’t do 32,000 HEAD requests. So I’m really surprised to see that the number of requests didn’t go down.

If you do a sync with -vv and --dump-headers then you can count the transactions. With --checksum it should only be a dozen maybe.

I tried running manually with this:

rclone sync --transfers=3 --checksum -vv --dump-headers --old-sync-method --bwlimit "05:00,400K 22:00,700K" /backupdrive S3:

and redirected the output to a file. The output started like this:

2017/06/05 06:36:13 INFO  : Starting bandwidth limiter at 400kBytes/s
2017/06/05 06:36:13 DEBUG : rclone: Version "v1.36" starting with parameters ["/usr/sbin/rclone" "sync" "--checksum" "-vv" "--dump-headers" "--transfers=3" "--old-sync-method" "--bwlimit" "05:00,400K 22:00,700K" "/backupdrive/" "S3:"]
2017/06/05 06:36:13 INFO  : Encrypted S3 bucket (redacted): Modify window is 1ns
2017/06/05 06:36:13 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2017/06/05 06:36:13 DEBUG : HTTP REQUEST (req (redacted))
2017/06/05 06:36:13 DEBUG : HEAD /(redacted) HTTP/1.1
Host: s3.amazonaws.com
User-Agent: rclone/v1.36
Authorization: XXXX
X-Amz-Content-Sha256: (redacted)
X-Amz-Date: 20170605T113613Z

2017/06/05 06:36:13 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2017/06/05 06:36:14 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2017/06/05 06:36:14 DEBUG : HTTP RESPONSE (req (redacted))
2017/06/05 06:36:14 DEBUG : HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Mon, 05 Jun 2017 11:36:14 GMT
Server: AmazonS3
X-Amz-Bucket-Region: us-east-1
X-Amz-Id-2: (redacted)
X-Amz-Request-Id: (redacted)

followed by 33 more REQUEST/RESPONSE pairs that look like this (I may be paranoid with the redactions):

2017/06/05 06:36:14 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2017/06/05 06:36:14 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2017/06/05 06:36:14 DEBUG : HTTP REQUEST (req (redacted))
2017/06/05 06:36:14 DEBUG : GET /(redacted)?delimiter=&max-keys=1024&prefix= HTTP/1.1
Host: s3.amazonaws.com
User-Agent: rclone/v1.36
Authorization: XXXX
X-Amz-Content-Sha256: (redacted)
X-Amz-Date: 20170605T113614Z
Accept-Encoding: gzip

2017/06/05 06:36:14 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2017/06/05 06:36:14 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2017/06/05 06:36:14 DEBUG : HTTP RESPONSE (req (redacted))
2017/06/05 06:36:14 DEBUG : HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Mon, 05 Jun 2017 11:36:15 GMT
Server: AmazonS3
X-Amz-Bucket-Region: us-east-1
X-Amz-Id-2: (redacted)
X-Amz-Request-Id: AB82807559DFAB28

followed by a little over 32K of these, one for each file apparently:

2017/06/05 06:36:27 DEBUG : some_file: Size of src and dst objects identical
2017/06/05 06:36:27 DEBUG : some_file: Unchanged skipping

followed by:

2017/06/05 06:36:27 INFO  : Encrypted S3 bucket (redacted): Waiting for transfers to finish
2017/06/05 06:36:27 INFO  : Waiting for deletions to finish
2017/06/05 06:36:27 INFO  : 
Transferred:      0 Bytes (0 Bytes/s)
Errors:                 0
Checks:             32139
Transferred:            0
Elapsed time:       13.9s
2017/06/05 06:36:27 DEBUG : Go routines at exit 9
2017/06/05 06:36:27 DEBUG : rclone: Version "v1.36" finishing with parameters ["/usr/sbin/rclone" "sync" "--checksum" "-vv" "--dump-headers" "--transfers=3" "--old-sync-method" "--bwlimit" "05:00,400K 22:00,700K" "/backupdrive/" "S3:"]

Whew. Maybe more than I needed to post, but I am ignorant about how this works underneath.

So, a few things. First, if the initial 34 REQUEST/RESPONSE pairs are what gets counted as GETs, it appears this is quite low. Now at the 4th day into the month, the number of GETs shown on the AWS console is hanging at a little over 64K. It is increasing, but very slowly. Not sure why it jumped from 32K to 64K after I first added --checksum; maybe I messed up somewhere.
Second, it seems to me all files are being checked, so despite my ignorance, I guess I can safely assume it’s working as it should?