Options for waiting between downloads --wait --random-wait

AeroMaxx · September 21, 2022, 7:02am

So we have --retries-sleep but could we possibly have some additional flags such as --sleep and --random-sleep.

The below snipped from the wget man page on linux.

--retries-sleep I think would be the equivalent to --waitretry from wget.

To me right now --wait or --sleep flag in wget would be very useful right now.

       -w seconds
       --wait=seconds
           Wait the specified number of seconds between the retrievals.  Use of this option is recommended, as it lightens the server load by making the requests less frequent.  Instead of in seconds, the time can be specified in
           minutes using the "m" suffix, in hours using "h" suffix, or in days using "d" suffix.

           Specifying a large value for this option is useful if the network or the destination host is down, so that Wget can wait long enough to reasonably expect the network error to be fixed before the retry.  The waiting
           interval specified by this function is influenced by "--random-wait", which see.

       --waitretry=seconds
           If you don't want Wget to wait between every retrieval, but only between retries of failed downloads, you can use this option.  Wget will use linear backoff, waiting 1 second after the first failure on a given file,
           then waiting 2 seconds after the second failure on that file, up to the maximum number of seconds you specify.

           By default, Wget will assume a value of 10 seconds.

       --random-wait
           Some web sites may perform log analysis to identify retrieval programs such as Wget by looking for statistically significant similarities in the time between requests. This option causes the time between requests to
           vary between 0.5 and 1.5 * wait seconds, where wait was specified using the --wait option, in order to mask Wget's presence from such analysis.

           A 2001 article in a publication devoted to development on a popular consumer platform provided code to perform this analysis on the fly.  Its author suggested blocking at the class C address level to ensure automated
           retrieval programs were blocked despite changing DHCP-supplied addresses.

           The --random-wait option was inspired by this ill-advised recommendation to block many unrelated users from a web site due to the actions of one.

Animosity022 · September 21, 2022, 11:26am

Why would it be useful? Rclone already implements a retry/back off process based on the providers requirements for this.

What's would those flags provide?

AeroMaxx · September 21, 2022, 1:17pm

I'm getting connection refused when using http backend, would be helpful to be able to tell rclone to wait between retries like you can with wget "see man page in my initial post".

I figured it would of been fairly obvious I was talking about http backend and http requests with having mentioned wget but I guess not.

Does it do that for the http backend? this wasn't a question about the google drive backend.

The flags would do the same as the flags do in wget see the man page in the initial post.

Animosity022 · September 21, 2022, 1:23pm

That already exists as it's retries and low level retries:

      --low-level-retries int                Number of low level retries to do (default 10)

      --retries int                          Retry operations this many times if they fail (default 3)
      --retries-sleep duration               Interval between retrying operations if they fail, e.g. 500ms, 60s, 5m (0 to disable)

No, it wasn't as you shared an example of wget which could be anything so without giving details, I have to ask questions to make sure I understand what you are trying to do. The HTTP backend is a generic backend and would not have a exponential backoff/sleep as the other remotes are defined by the providers (Google / Dropbox / etc) on how to deal with retries and they have specific parameters to use in the configuration which rclone honors.

Without knowing the requirements for the HTTP backend you are using, it's tough to guess what to set them to as I've never see a HTTP site refuse connections unless you are hammering it or it has some other configuration but that's all very specific to the website.

As noted above, I don't know what backend you were talking about as you didn't specify.

AeroMaxx · September 21, 2022, 2:51pm

Animosity022:

AeroMaxx:

I'm getting connection refused when using http backend, would be helpful to be able to tell rclone to wait between retries like you can with wget "see man page in my initial post".

That already exists as it's retries and low level retries:
      --low-level-retries int                Number of low level retries to do (default 10)
      --retries int                          Retry operations this many times if they fail (default 3)
      --retries-sleep duration               Interval between retrying operations if they fail, e.g. 5

No no you are mistaken, I don't want it to wait on a retry, I was wanting it to wait before it tries, not wait after it has tried and failed.

For example say for arguments sake I was wanting to download a directory of files, and I had --transfers set to 1, I'd like rclone to download the first file and then wait say for example 5 minutes before downloading the next file in that directory.

I believe rclone is hammering the website as it is successfully downloading 1 file and then immediately going on to download the next without a wait which also successfully downloads, it then goes to download the third file again without a wait, and again and again and again until it fails then it waits.

Animosity022 · September 21, 2022, 3:02pm

If you have transfers 1, it'll download one file at a time before moving on. You can see that in the logs.

I don't think rclone would be hammering the website unless you are running some odd parameters.

AeroMaxx · September 21, 2022, 3:07pm

Yes it moves on without waiting, so it's downloading a huge number of files one after each other with no wait in between.

But it's fine I'll just ditch rclone and do the download with wget and save it locally and then use rclone for the upload, as with wget I have the option of having it wait between downloads.

The website only allows 20 files to be downloaded every 5 or 10 minutes I can't recall which off the top of my head.

Animosity022 · September 21, 2022, 3:11pm

Understood as that's a very niche case and you can always submit a feature request for this but personally, I'd agree with you and just wget.

You can always grab a file list, script and wait for things as well. That's a bit more cumbersome imo as I'd lean with wget as well.

ncw · September 21, 2022, 4:38pm

If you want a wait between files use --tpslimit

  --tpslimit float       Limit HTTP transactions per second to this
  --tpslimit-burst int   Max burst of transactions for --tpslimit

If you want one file per second use --tpslimit 1 if you want 10 per second use --tpslimit 10.

AeroMaxx · September 21, 2022, 4:53pm

@ncw what if I wanted it to go really slow like 1 file per 60 seconds?

Could we not get options equivalent to those in wget? these could be made to only apply to the http backend? rather than being global flags.

ncw · September 21, 2022, 8:50pm

Just use --tpslimit 0.01666 or whatever.

That's a possibility, but I think tpslimit does pretty much what you need

AeroMaxx · September 22, 2022, 12:15am

@ncw thanks for the reply, it seems --tpslimit-burst doesn't like decimals only --tpslimit

Could you possibly tell me the maths to get to that figure?

I've tried to figure it out in Excel but have been unable to work it out.

Anyway I guess I will have to use that for now, but in the long term I think the proposed additional flags would be better from a usability point of view going forward.

As in it's much easier to understand that --wait 60m means wait 60 minutes between attempts over --tpslimit 0.01666 which doesn't tell someone else looking at the script it means to wait x minutes between attempts without having a comment in the script explaining how to calculate the decimal value back into a human readable value.

If we can get the extra options from wget that would be ideal.

ncw · September 22, 2022, 11:13am

Yes, this is an integer. Its the number of transactions rclone can have for free. The default means no burst is configured.

--tpslimit takes the transactions per second. If you have 60 seconds per transaction then you need to give --tpslimit 1/60 which is 0.0166666...

There could be a --tpswait 5s flag which is an alias for --tpslimit 0.2

Or I could make --tpslimit accept --tpslimit 1/60s as a syntax.

AeroMaxx · September 22, 2022, 12:21pm

Of the two options I think the --tpswait one would be the preferred option.

We could potentially also have a --tpswait-random flag also maybe, this would work the same way it does in Wget?

--tpswait-random

Some web sites may perform log analysis to identify retrieval programs such as Wget by looking for statistically significant similarities in the time between requests. This option causes the time between requests to vary between 0.5 and 1.5 * wait seconds, where wait was specified using the --tpswait option, in order to mask Wget's presence from such analysis.

Ole · September 22, 2022, 12:49pm

The pedant in me doesn't like the terminology here, you are saying "transactions per second wait" where you in reality mean "seconds wait per transaction".

So from a pure terminology standpoint I prefer something indicating "wait/sleep per transaction" that is something like --spt, --wpt, --tsleep, --twait, ... or alternatively and probably better something like --tpslimit-by-time to clearly show that it is an alternative to --tpslimit

ncw · September 22, 2022, 3:31pm

Or maybe --tpslimit-wait? The tpslimit is good as it groups the flags in the users mind and the docs and wait is what wget calls the option.

I thought of --sptlimit but I thought that was probably being too clever!

Ole · September 22, 2022, 4:39pm

I can easily see the benefit for users knowing alle the options of wget (such as @AeroMaxx ) and is is so far the best proposal. I would just love to find a name that fits better within rclones already established terminology, which favors something like --tpslimit-time or --tpslimit-duration - though there aren't perfect either - perhaps I am just to perfectionistic.

Clever and consistent: Yes - User friendly: NO

AeroMaxx · September 23, 2022, 5:46am

@ncw Actually I have just thought on what if it I am using a http remote as the source and a gdrive remote as the destination.

Won't restricting the --tpslimit also affect the gdrive remote?

I would like to back track a bit and ask for http specific config flags that only affect the source such as those in the wget.

-w seconds
--wait=seconds
    Wait the specified number of seconds between the retrievals.  Use of this option is recommended, as it lightens the server load by making the requests less frequent.  Instead of in seconds, the time can be specified in
    minutes using the "m" suffix, in hours using "h" suffix, or in days using "d" suffix.

    Specifying a large value for this option is useful if the network or the destination host is down, so that Wget can wait long enough to reasonably expect the network error to be fixed before the retry.  The waiting
    interval specified by this function is influenced by "--random-wait", which see.

--waitretry=seconds
    If you don't want Wget to wait between every retrieval, but only between retries of failed downloads, you can use this option.  Wget will use linear backoff, waiting 1 second after the first failure on a given file,
    then waiting 2 seconds after the second failure on that file, up to the maximum number of seconds you specify.

    By default, Wget will assume a value of 10 seconds.

--random-wait
    Some web sites may perform log analysis to identify retrieval programs such as Wget by looking for statistically significant similarities in the time between requests. This option causes the time between requests to
    vary between 0.5 and 1.5 * wait seconds, where wait was specified using the --wait option, in order to mask Wget's presence from such analysis.

    A 2001 article in a publication devoted to development on a popular consumer platform provided code to perform this analysis on the fly.  Its author suggested blocking at the class C address level to ensure automated
    retrieval programs were blocked despite changing DHCP-supplied addresses.

    The --random-wait option was inspired by this ill-advised recommendation to block many unrelated users from a web site due to the actions of one.

system · November 22, 2022, 5:46am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.