Help me on how setup this

Hello all:

I have a website that have a plenty of daily downloads numbers. 1000's of file with a very large number of TB's bandwidth per day.
My data is over 15 TB.
I host these data on different servers and load balancing my traffic between these servers.
These server cost is killing me and my profit.
So i though about moving to GD and host the file on encrypted multi GD accounts.
Here is my general points which i love to get help from you on how to accomplish it:

  1. GD will act as a storage server and use a lowend vpss to proxy the files to users.
  2. I will use a multi GD account to rotate download between each account to minimize the risk of file daily download limit.
  3. I will cache files of local harddrive to lowest the download rate between GD and servers.
  4. I will use an SSD boxes in front of GD to pass the files to visitors.
  5. GD account will be mounted local.
  6. Each box will linked to one GD account.
  7. I prefer to sync my files from current storage servers to the Multi GD accounts at once.

This is my general though and i would like to get your feedback about the best config to use.

I am new to rclone so excuse me for this plenty of question.

Thank you all

What you are laying out here should be possible to do. I don't intend to lay out a whole setup for you, but I can give some general feedback and idea for what to look at, then you can come back with more specific questions when you run into a problem.

First of all, keep in mind that there are some inherent performance considerations, like small files will perform very poorly. If that is a thing for your use-case then look into what those limitations are and how to best mitigate them before you commit to the setup. Also, you probably should have some realistic expectations when it comes to performance in terms of serving out many concurrent downloads at the same time. This is not an issue on a local filesystem, but it much more of a limitation via cloud storage as you will have limits of how frequently requests can be made. Cloud storage is great, but just because it can be mounted to appear as a local drive it acts very differently under the hood - and this is something you will need to study and test to see if it is actually a plausible replacement for your existing solution. If it's just a relatively few very large files then it won't be so much of a problem, but if it's a lot of medium and smallish files being served to many concurrent issues then I foresee you will run into some trouble. There could be ways of load-balancing around those limits with multiple accounts however...

You will probably also want to use the Rclone cache backend. Especially if you have "hot" files that tend to get accessed much more than others in a given period. The more cache the better, as performance from the cache will be wildly better than running requests to the cloud frequently.

Could you elaborate on what you mean by a "multi GD account" ? I don't know if this is a thing I just don't know about, or if you mean that in the sense that you will script something that will load-balance across several accounts. If it's an actual type of Google account it sounds interesting and I'd like to know more.

@ thestigma , thanks for reply.
My downloaded file is parted to 500MB files with some few files with small size.
The over all files size is about 15TB .
My general idea is to host those file in over 20 GD drive accounts (This is what i mean by Multi GD account).
I will sync my current files to these accounts, then every account will be mounted to one server that has about 250GB SSD storage.
When ever a visitor request a file they will rotate to one server based on visitor origin and overall traffic.
Those file will be encrypted on google drive and will be cached on each server as long as possible.
My current daily bandwidth is between 50-100TBs.
I need to know the correct config for auto sync for my current files including future ones to these multi google drive account at once.
Also the correct config to mount the google drive account on each server to get maximum performance with low penalty chance.

Thank you for your feedback

That sounds doable then - if you can get the load balancing aspect to work right it should in principle work.

For the Gdrive, the only extra config parameters I think you will need is
upload_cutoff = 256M
chunk_size = 256M
(my current size for illustration)
This is a pitiful 8MB by default and increasing this very greatly speeds up transfer of large files (with some diminishing returns). I find 64MB adequate, 128MB ideal and 256MB best. Beyond this there isn't a lot to be gained and I'm actually a bit uncertain from looking at my network graphs if that might be some sort of cap as I don't seems to see larger segments transferring above this number (but this not well tested, just a casual observation).
This only affect upload performance mind you, and also keep in mind that each transfer can then eat that amount of memory, so be sure you don't overload the memory and set it reasonable.

Similarly you will want to use large chunks in your cache, as this will do the chunking on the downloads side. If you wanted media streaming and quick opening of streams then you'd actually want to avoid too large chunks, but for pure throughput and if you don't need streaming then the larger the chunks the better for much the same reason as in the uploads. That said, might want to avoid excessive size as serving out files via cache can't happen until the cache has downloaded at least one chunk, so visitors might feel downloads were "sluggish" to open if they were of excessive size. Try to set a size where you can expect the first parts to get into cache within a handful of seconds to get things started. 64MB might be a good place to start but it really depends on the bandwidth you have.

You will probably want to adjust number of transfers as well for rclone in general to handle several concurrent users. This increases the risk of bumping into limits on API requests (1000 requests in 100 seconds) but it should be much less of an issue in serving mostly large files for the most part.

I think that's most of the optimization stuff I can think of for the moment. Feel free to post your config when you have something up and running and me and others here can comment on what you might consider changing at that point. Just make sure to redact your sensitive information (crypt keys and API keys - basically all lines that has those randomly generated codes in them :slight_smile: )

1 Like

can't thank you enough.
Your info is helpful.

Very happy to help :slight_smile: Let me know when/if you need more guidance.

I am by no means an rclone expert myself, but I can probably at least help you out with explaining the more basic configuration stuff and save you a lot of time on that. The Rclone documentation is technically pretty robust, but can also be a little confusing to understand at first. It's also a lot to take in and not easy to understand which bits are the important ones and relevant for your usecase.

1 Like