GET requests using --fast-list with S3 vs B2

ctaranto · December 6, 2018, 3:22am

I’m relatively new to rclone but love what it does with syncing to cloud storage.

My goal is to sync home security footage to cloud storage and I’m testing out both S3 and Backblaze B2. I use Zoneminder for camera control and motion detection and have a script that the detection calls to rclone the videos to the cloud.

The script has more before and after, but these are the two rclone calls (back to back in the script).
rclone sync --fast-list --skip-links --include $czone/** --log-file $LOGFILE --config /srv/http/rclone.conf /mnt/storage/zoneminder/ b2drive:my-bucket
rclone sync --fast-list --skip-links --include $czone/** --log-file $LOGFILE --config /srv/http/rclone.conf /mnt/storage/zoneminder/ s3-test:my-bucket

$czone is a number that refers to the directory of camera that was triggered (1, 2, 3, etc). This helps reduce the numbers of invalid transfers due to other cameras being triggered at the same time.

This month, I’ve had a total of 657 events across all cameras (not unexpected as there are people usually home).

I am seeing an exponential number of more transactions on S3 than B2.

For S3, I used 180,000 GET requests for December (5 days)
For B2, I used 5,720 total transactions for December (5 days)

Is --fast-list no longer working with S3? Am I using it incorrectly?

Any information is appreciated!

rclone v1.45

os/arch: linux/amd64
go version: go1.11.2

ncw · December 6, 2018, 9:47am

I think what is happening here is that the extra transactions are caused by rclone reading the modification time off the objects in s3. Unlike b2, these don't come in the listing rclone needs to HEAD each object.

So if you switch to using --checksum or --size-only for the s3 sync, it will stop doing those extra transactions.

I keep meaning to put something about this in the docs... How about something like this?

Modified time

The modified time is stored as metadata on the object as X-Amz-Meta-Mtime as floating point since the epoch accurate to 1 ns.

Note that metadata is not returned in directory listings on s3 so rclone needs to do an extra transaction (a HEAD request) for each object to read it. If you wish to save transactions then use --size-only or --checksum in your sync - this will stop rclone reading the modification time for each object in the sync.

ctaranto · December 6, 2018, 1:06pm

Thanks for the detailed reply! I switch my s3 call to use --size-only and will monitor how it changes the number of transactions.

Just to be clear, you are suggesting that I replace --fast-list with --size-only for the s3 connection, correct?

ncw · December 6, 2018, 1:40pm

You can have both --fast-list and --size-only. --fast-list will do the minimum number of transactions but it will use more memory and may or may not actually be faster!

Note that if you wanted just to copy the latest stuff quickly, then using the latest beta you could do

rclone copy --max-age 1h --no-traverse /path/to/src remote:

Then maybe sync once a day to tidy up.

ctaranto · December 6, 2018, 2:52pm

Thanks!

I’m using Manjaro and get all packages from the native repo or AUR. Because of the rolling release model of Arch/Manjaro, straying off of those repos can break updates pretty quickly.

Any thoughts on when this beta feature will make it into the mainline release?

Before getting these features from the beta, would copy save me transactions? Would I use --fast-list and/or --size-only with copy? I had thought about using copy and syncing once a day to remove purged files, but then I read about --fast-list and thought it would take care of the transaction count.

ncw · December 6, 2018, 9:03pm

The release is scheduled for the start of February.

If you don't want to use the beta then you could use this which will be as efficient with 1.45

rclone lsf --files-only --max-age 1h /path/to/local > new_files
rclone copy --files-from new_files /path/to/local remote:path

That should only use 1-3 transactions per file transferred.

Using --no-traverse or --files-from there is no advantage to using --fast-list as the idea of those is that rclone doesn't list the destination saving you transactions.

It all depends on how many files you have in the destination.... Assuming you are using --size-only then --fast list lists 1000 files at once on s3, so you'll use no_of_files/1000 transactions for doing the listing. If you use copy --no-traverse or copy --files-from then rclone won't do the listing at all but will do one extra transaction per file. Which one wins depends on how many files in the destination, and how many files you expect to sync each time.

So if you had 50,000 files in the destination and copied over 10 files each time, that would take 50+210 = 70 transactions with --fast-list and 310 = 30 transactions for --no-traverse/--files-from.

ctaranto · December 6, 2018, 11:46pm

Thanks!

ncw:

If you don't want to use the beta then you could use this which will be as efficient with 1.45
rclone lsf --files-only --max-age 1h /path/to/local > new_files
rclone copy --files-from new_files /path/to/local remote:path
That should only use 1-3 transactions per file transferred.

Thanks! But this appears to not work for me. I ran the following command:
rclone lsf --skip-links --files-only --max-age 9h /mnt/storage/zoneminder/
and I received nothing back. There are definitely files newer than 9 hours old (though they are multiple directories deep since ZoneMinder buries them).

Update 1: I pointed this to a specific directory and it did return the right files. Is there a way to have lsf traverse deep into directories?

Update 2: Nevermind. -R will recurse into directories. Duh!

Update3: I don't think this model won't work for me. I can't predict how often the cameras write out a file, as they are motion activated. Overnight, there isn't a new file for 6 hours. During the day, maybe a new file every few minutes. But good information regardless.

Thanks.

ncw:

ctaranto:

I had thought about using copy and syncing once a day to remove purged files, but then I read about --fast-list and thought it would take care of the transaction count.

It all depends on how many files you have in the destination.... Assuming you are using --size-only then --fast list lists 1000 files at once on s3, so you'll use no_of_files/1000 transactions for doing the listing. If you use copy --no-traverse or copy --files-from then rclone won't do the listing at all but will do one extra transaction per file. Which one wins depends on how many files in the destination, and how many files you expect to sync each time.

So if you had 50,000 files in the destination and copied over 10 files each time, that would take 50+210 = 70 transactions with --fast-list and 310 = 30 transactions for --no-traverse/--files-from.

I didn't understand what you wrote in the last paragraph. I think there are too many = symbols.

Update: Nevermind again! The "times" symbol doesn't show up in reading mode but does in quote mode.
So if you had 50,000 files in the destination and copied over 10 files each time, that would take 50+2*10 = 70 transactions with --fast-list and 3*10 = 30 transactions for --no-traverse/--files-from.

Thanks!

ncw · December 7, 2018, 7:58am

I should have added that - sorry!

I see. Could the cameras call a script when they write a file out?

Ah yes, that was a proof reading fail Sorry! Markdown interpreted all the * as italic markers!

ctaranto · December 7, 2018, 12:59pm

I thought about this more and I think I can use --max-age.

ZoneMinder has "zones" per camera that are used to detect or ignore motion in various regions of the video. They can be layered and configured independently for sensitivity, color balance, etc. There is also a separate "filter" feature that does what you asked: It can use a myriad of conditions to trigger an action. I have a filter for each camera that looks for both a Start Time of up to an hour ago and a new event for the camera itself. This filter is configured to run in the background every 30 seconds, and if a new event for the camera is detected, a script is called that will push the video to S3 and B2 using rclone.

Using this filtering feature, I am limiting the number of rclone calls when there is very little activity (like overnight hopefully!), but also enables captured video to be pushed every 30 seconds when something is new. A best of both worlds I suppose.

That said, since I have configured the filter to look for events an hour or newer, that should align well with the --max-age 1h in rclone (or I could adjust both to be 30 mins, or 2 hours, or whatever else).

I am going to think about this more to make sure nothing would be missed, but I do believe this should work.

Orthogonal to this, I think the change to --size-only for S3 did help the bleeding of transactions at S3. I'll keep monitoring it.

ncw · December 7, 2018, 4:52pm

It sounds plausible to me

Great!

ctaranto · December 7, 2018, 6:26pm

Thought about this a bit more, and it probably won't work unless I dynamically set the --max-age based on the last call to the script.

Let's assume ZoneMinder (ZM) always creates video snippers of 1 minute each (they are usually between 30 seconds and 1 minute).

A ZM filter checks for new events every 30 seconds, and once an event triggers the filter, the events that were a part of that trigger will not be a part of a future trigger. (This is the important part).

Assume ZM creates a new files each at 1:05pm, 1:08pm, 1:10pm, and 1:30pm.

A filter, running every 30 seconds, will see the 1:05pm file at 1:06pm (since an event isn't complete until the file finishes writing). The filter then calls rclone. If I set --max-age to 1h, it will capture the file at 1:05pm.

The filter will catch the 1:08pm file at 1:09pm (and will ignore the 1:05pm file since it already caught it). The filter will call rclone, and --max-age will see both the 1:05pm and 1:08pm file. This will copy the 1:05pm file twice.

From here, you can see how it gets worse as time goes on during the hour.

rclone --max-age doesn't have knowledge of what's already been caught by the filter, but ZM does (but at the event level, not file level).

I supposed I could set the --max-age smaller, but how small? If I set it to 1 minute, it won't capture a file that is over a minute long. If it set it to two minutes, it won't capture a file that is more than two minutes long or could copy a 30-second file twice if there are two events close to each other.

Is my understanding correct?

ncw · December 7, 2018, 11:30pm

If you set --max-age too high then rclone will still check before copying the file, so at most you'll do a little bit more work than you needed to - you won't transfer a file that has already been transferered.

ctaranto · December 8, 2018, 3:52am

Ah.. I'll give that a shot.

I realized this evening that my rclone to S3 wasn't working correctly when ZoneMinder was calling it. I still am not sure why, but I fixed it.

Before:
rclone sync --fast-list --size-only --skip-links --include $czone/** --log-file $LOGFILE --config /srv/http/rclone.conf /mnt/storage/zoneminder/ s3-test:my-bucket

After:
rclone sync --fast-list --size-only --skip-links --include "$czone/**" --log-file $LOGFILE --config /srv/http/rclone.conf /mnt/storage/zoneminder/ s3-test:my-bucket

ZM calls a unique script for each camera when it's triggered, which calls a common script that executes rclone. The unique script passes in a single parameter, which I run through case in the common script, to derive a number. That number is $czone. The number represents the specific camera.

When czone is 1, it works fine. When it's 2 or 3, the ** after $czone became the first directory found in /mnt/storage/zoneminder/2 (or 3). I only found this by using eval echo of the statement.

What is odd is when I run the unique script at the command line, it works fine (with 1, 2, and 3).

Putting quotes around $czone/** makes it work when called from ZoneMinder. Odd.

I'm going to monitor the usage of S3 again since it wasn't copying 2/3's of the files.

ncw · December 8, 2018, 9:55am

Without quotes the shell will be interpreting the ** and what results it gives will depend on the current directory.