Use RClone to locally cache torrents fully before seeding. Possible?

ryyt · July 17, 2021, 11:02pm

Hello,

First time posting here, I'm sorry if I didn't set up the post correctly.

The title basically summarizes my question. I want to use RClone to seed torrents from the cloud. The problem is, lots of random access requests and latency will just wreck performance and I'd probably hit API hit rate limits and get my account locked our or suspended.

So what I'd like to do instead is, whenever a peer connects, the entire contents of the torrent is downloaded and fully cached on my local machine before seeding begins, then after no peer has connected for 24 hrs let's say, the cache is flushed. I have gigabit internet so it would never take more than a few minutes to fully cache a torrent before seeding starts. And of course I'd need to have a large enough cache so I don't run out. This would only be used for old torrents on a private tracker that hardly ever get downloaded. Is this at all possible to do, and how would I go about setting it up? What flags would I have to use? Would it work with any torrent client?

Alternatively, what would be even better is, as soon as a peer requests a torrent, RClone starts locally caching it and then the torrent client starts uploading immediately at a slower speed than the caching/download speed and only serves the peer(s) chunks that have already been cached locally. So it's real-time caching+uploading. That way the peer wouldn't have to wait for the whole torrent to cache (although again, with gigabit internet it doesn't really matter). But I'm thinking, for this method to work, the torrent client would have to know how to implement it, and right now none do is my guess.

This has to be possible somehow with RClone, right? At least the former option with full caching before seeding.

Any advice would be greatly appreciated!

Animosity022 · July 18, 2021, 1:20am

Torrents are not a great use case for cloud storage IMO.

You can vfs cache mode full and set the age to meet your requirements as it’s already there.

ryyt · July 18, 2021, 2:22am

I researched this a little since posting. Indeed, it seems that --vfs-cache-mode full does not serve any data from a file until the file is fully downloaded. Just curious, can you use --vfs-read-chunk-size in combination with --vfs-cache-mode full so that the whole file doesn't have to be downloaded before any data is served to peers. Even with my fast internet, I'm concerned that timeouts might happen. And if there is freeleech day or something that triggers increased demand and I have to pull 5-6 large files, then it wouldn't take a few minutes, more like 30 mins and peers' clients would disconnect.

The more I think about this, the less I think it's a good idea.

Animosity022 · July 18, 2021, 2:33am

That’s not how it works. It serves what it has downloaded and works fine for your use case.

ryyt · July 18, 2021, 2:47am

I am looking to use VFS caching to solve 2 problems:

Caching (obviously), so the data only has to be downloaded once.
Reducing API queries and this is where chunking comes in. Torrents will often be built with 500K - 16M chunks which results in many random access requests, but with VFS you can override that and set it to 100MB or more, which with a fast connection will be downloaded very quickly.

So how do I accomplish this? Is "--vfs-cache-mode full" all I need? (plus setting location, size etc.) Does it use chunking by default or does it wait until the whole file is finished before making it available? Can I use/specify chunk size with it? I ask because this thread:

Says the chunk flag does not apply to cached remotes which confuses me.

Animosity022 · July 18, 2021, 4:50am

Props for searching but that's a post back from 2018 as that's in regards to a new feature that was introduced for range requests (chunked reading) as opposed to grabbing entire files each time.

Full mode is explained here:

and that's what you want to use. Start with defaults and go from there as I don't know what remote storage you are using so it's hard to suggest other things.

ryyt · July 18, 2021, 5:54am

Hi, thank you very much for that. I am using Google Drive.

So from the document it seems that --vfs-cache-mode full uses chunks by default. Another thread I'd read here said that full mode needs to cache entire files before serving the data to a torrent client for seeding, is that not the case anymore?

And what is the default chunk size? Can I still use --vfs-read-chunk-size with full mode and specify chunk size? I just want to avoid querying the cloud server API too many times as is inherent in torrenting. Most torrents have 4MB pieces. I have fast internet and can get away with larger chunks, 100MB or more.

Sorry for asking so many questions, the doc is just not super clear on this.

Animosity022 · July 18, 2021, 12:41pm

I don't know what thread you are referring as the docs I linked and I've said both say that isn't the case.

Just use the defaults and start there.

My example service file is here and has some notes in each line as to why I use it:

https://github.com/animosity22/homescripts/blob/master/systemd/rclone.service

ryyt · July 19, 2021, 12:15am

It's this thread. I guess it is old though. Thank you for confirming!

I really want to specify chunk size though, because my use case is not typical. I want to limit API calls and yet I want to be able to serve random chunks fast enough to torrent peers. This is why I wanted to know if I could use these flags in vfs cache full mode:

--vfs-read-chunk-size 100M --vfs-read-chunk-size-limit 0

I think the size limit is key because I don't want the fetched chunk size to double each time, as for a torrent that would quickly balloon to the entire size of the file and some files within torrents can be huge.

I also thought about setting --drive-chunk-size, but I don't think it's applicable here because the torrent client handles the uploading.

These are the flags I want to run. Could you tell me what you think:

--use-mmap \
--dir-cache-time 96h \
--poll-interval=1h \
--tpslimit 10 \
--timeout 12h \
--fast-list \
--vfs-cache-mode full \
--vfs-read-chunk-size 100M \
--vfs-read-chunk-size-limit 0 \
--vfs-cache-max-size 1000G \
--vfs-cache-max-age 96h \
--cache-dir <name> \
--buffer-size 1G \

Animosity022 · July 19, 2021, 12:23am

That's for range requests from getting the file and isn't the chunk size stored on disk.

Best to start with the defaults as you have 1 billion API calls per day.

Does nothing on a mount.

This goes against your not wanting to grab more data as your buffer size is 1G so it's going to try to grab more. Best to just use the default.

Do you have a low memory machine? Using nmap also doesn't work for torrents so that's probably not great for your use case.

There's no reason to make this an hour as Google Drive is a polling remote so you unless you want to wait an hour for any changes outside the mount, there isn't much reason to set this.

No reason to set this as it's already in the defaults.

What's the reason for this?

ryyt · July 19, 2021, 1:06am

Thank you very much for the replies!

I just want to make sure that rclone doesn't fetch/cache more than 100MB at a time from the cloud. Torrents are random access, so I don't want it to start getting enormous chunks and slow down the random access. I don't know how many simultaneous download streams it does by default, but it would quickly get overwhelmed with huge chunks. Wouldn't this limit the chunks to 100MB without subsequent increases?

Sounds good, I thought this would help serve peers better. As long as it's limited to 100MB chunk per peer request, it shouldn't matter how much it buffers I thought.

I read that my torrent client requires mmap, it was also set by default by my seedbox's rclone config, so I thought I'd leave it in. Would it hurt to leave it in?

I know, but for my use case, changes to the remote drive will be made very rarely, so I figured there's no reason to poll it more than once an hour. I guess it doesn't really matter.

So does this mean that I will never get locked out by Google since rclone's API polling rate is limited to 10 per second by default?

I figured, since it's for torrenting and I will implement a max number of active torrents, it's possible a peer might connect, disconnect and not be able to connect again for a while, so I just want to make sure rclone doesn't think it's an I/O timeout.

Can I also ask you what the difference is between --vfs-cache-max-age and --dir-cache-time ?
And if it makes sense to use --drive-chunk-size for my use case.

Animosity022 · July 19, 2021, 1:43am

Sorry as I had it backwards ands you are 100% correct as you need it for torrents.

The goal is always to push the API as much as you can as you have plenty of quota. If you go too fast, it just slows you down. No harm. No foul.

People get confused as the upload/download quotas look like API messages as they are different.

That's for the time things can remain in the cache directory with vfs cache mode full.

That's how long the directory and files stay in memory for a listing. With a polling remote, you can have this super high.

It's only used for uploading.

ryyt · July 19, 2021, 3:27am

Thanks! Just so we're not going back and forth too much, how does this look for flags:

--use-mmap \
--dir-cache-time 96h \
--timeout 12h \
--vfs-cache-mode full \
--vfs-read-chunk-size 100M \
--vfs-read-chunk-size-limit 0 \
--vfs-cache-max-size 1000G \
--vfs-cache-max-age 96h \
--cache-dir <name> \

Again, does my justification for using --vfs-read-chunk-size and --vfs-read-chunk-size-limit make sense? If not, do you mind explaining why I don't need these flags?

Animosity022 · July 19, 2021, 4:10am

It doesn't have anything to do with the cache mode full as it's just the HTTP range request that is sent via the API for 'chunked' reading, which not sure I like the term, but that's the flag.

If the file opens, reads a bit, closes, it really doesn't matter unless it's too small which would require more API hits.

Run it for a bit and look at the results and see if it's doing what you want. If not, make some changes. It's not really a science as it's more an art so you might need to make some adjustments as it bakes.

ryyt · July 19, 2021, 4:51am

Oh I see, I don't know what HTTP range requests are. You're saying for seeding torrents, the size it grabs will reset after each hit? So the chunked reading is only for sequential reads then? I guess it doesn't matter then if I set a limit or not, it can't hurt.

I think the way I'm imagining the reading and caching happening is fundamentally wrong. Anyway, I'll try it with the last set of flags I posted and then if there are issues I will remove the read-chunk-size flags. It's just that there are so many variables in this, it'll be hard to optimize with trial and error so I just wanted to know the best possible settings for this use case. I think I'm pretty close though.

Thanks for your help!

Animosity022 · July 19, 2021, 11:59am

By setting limits for things that aren't needed it can make things slow and more API calls as well.

Normally, a torrent will hash check at the end and that's one long read. By turning things off / limiting things, you make that take longer / create more API calls.

My rule of thumb is to only change something if I have a good reason why after some testing.

ryyt · July 19, 2021, 10:00pm

Hm okay then, so then just this?

--use-mmap \
--dir-cache-time 96h \
--timeout 12h \
--vfs-cache-mode full \
--vfs-cache-max-size 1000G \
--vfs-cache-max-age 96h \
--cache-dir <name> \

Peers will only hash check their download once it's in their local storage though.

Animosity022 · July 19, 2021, 11:57pm

Looks like a great starting point.

ryyt · July 20, 2021, 12:05am

Thank you, I will try it now.

You think --vfs-read-ahead would be beneficial here?
And out of curiosity, what happens if the cache fills up and it's all still being used? Does it default to another vfs cache mode, if so which one?

Animosity022 · July 20, 2021, 12:08am

If you run out of disk space? Things generally go bad if a disk fills up.

vfs-read-ahead was put there for streaming so I don't think it has any use for you.