Backward compatibility rules

I few months ago tardigrade was renamed to storj (see the introducing-storj-dcs-decentralized-cloud-storage-for-developers on storj blog)

To avoid confusion I would like to create a PR to rename tardigrade to storj, but I am wondering what are the backward compatibility practices of rclone:

  1. Is it fine to rename full package and cli arguments? (It would break existing clients)
  2. Or is it better to create a new storj provider, share the code, and eventually deprecate old tardigrade connector?
  3. Or we should create existing (tardigrade) cli parameters but update the help texts/docs only?

Please let me know what do you suggest...

I think the first thing to do would be to create an issue so we can discuss on the issue tracker.

In general we don't like to break stuff for existing users, so whatever we decide we'd need a period of dual running.

Hi @elek,

I agree with Nick and see a 4th option that you may want to consider for your PR.

The 4th option is based on the information in this thread:
https://forum.rclone.org/t/tardigrade-storj-vs-s3-remote-for-the-same/26574
indicating that StorJ may fully support the rclone S3 backend hereby eliminating the need for a specific StorJ backend.

Proposed steps for option 4:

Benefits:

  • StorJ users get a better rclone backend due to the higher number of users (and developers) on the S3 backend.
  • rclone complexity is slightly reduced by having fewer specialised backends. This eases maintenance which typically leads to increased overall robustness/stability.

Drawbacks:

  • None that I can see (with my very limited knowledge of Tardigrade/StorJ)

I think the storj backend is better than the s3 backend for some purposes (vague memory!).

However I can loop in the storj developers on an issue so we can ask them what they think.

Thanks, good points!

I missed a (small) performance test after the functional backend test to check if there are performance differences - both StorJ and rclone may have changed since the Tardigrade backend was made.

Do we have any (new or old) performance tests/data showing the benefits of using the specific backend instead of the S3 backend?

I can ask around, but based on the architecture, S3 is expected to be slower.

Using s3 protocol requires a running storj/gateway-(mt|st) server which receives the REST calls and transform them to storj specific rpc calls. The rclone backend uses the same rpc calls (using the same original client library), so it should be significant faster.

BTW, thanks for the answers. Will open an issue and investigate how can it be done with the most backward compatible way...

I guess it depends on the limiting component in the setup. If your speed is limited by your local system resources or network bandwidth, then it will be faster to use the S3 protocol than the StorJ protocol (assuming the S3 gateway has sufficient resources and bandwidth).

This is due to the StorJ storage architecture and illustrated by these examples:

If you upload a 100 MB file, then you only need to encrypt and upload app. 101 MB using the S3 protocol, whereas you would need to encrypt and upload app. 340MB using the StorJ protocol.

If you download a 100 MB file, then you only need to download and decrypt app. 101 MB using the S3 protocol, whereas you would need to download and decrypt app. 130MB using the StorJ protocol.

Also the S3 protocol only uses one TCP connection per --transfer whereas the StorJ protocol uses minimum 110 for each upload and 35 for each download. Thus requiring significantly more system resources. Reference: https://rclone.org/tardigrade/#known-issues

Is my understanding correct or am I missing something?

Greetings, @dominick here with Storj. When using the storj (Native) back end the encryption and erasure coding occurs client side which is more intense on local compute and results in a 2.68x upload multiplier due to erasure coding.

Use our native integration pattern to take advantage of client-side encryption as well as to achieve the best possible download performance. Uploads will be erasure-coded locally, thus a 1gb upload will result in 2.68gb of data being uploaded to storage nodes across the network.

Use this pattern for

  • The strongest security
  • The best download speed

Using the S3 backend for uploading is faster as the encryption and erasure coding occurs on our edge services. The disadvantage is you have to share the encryption key with us as we encrypt for you.

Use our S3 compatible Hosted Gateway integration pattern to increase upload performance and reduce the load on your systems and network. Uploads will be encrypted and erasure-coded server-side, thus a 1GB upload will result in only in 1GB of data being uploaded to storage nodes across the network.

Use this pattern for

  • Reduced upload time
  • Reduction in network load

Ideal users could choose between the "Native" and "Hosted S3 Options". I can't post links here but we have a recent write-up on performance. Search for "hotrodding decentralized storage" which should bring up my post on the Storj Community.

Thank you all!

Hi @dominik,

Perhaps, but at the expense of significantly increased usage of local computer and network resources - or speed.

The other possibility is to user the rclone crypt backend on top of the S3 backend to StorJ. It may well have enough security for the average rcloner.

I read your Storj post and didn’t find any measurements to support this claim and suspect you assume a local computer having resources and network connectivity like your S3 edge computers - e.g being directly connected to the internet backbone.

I would expect the S3 protocol to be the fastest due to the multipliers of the StorJ protocol, and this is partly supported by this statement from your StorJ post: “Ultimately your compute, ram, or internet connection is likely to be the gating factor”

Do you have a performance comparison of rclone using the S3 protocol vs the StorJ protocol on an average computer connected to average ISP network connection?

1 Like

You are absolute correct. The only reason to upload via the storj backend is so you can be the custodian of you keys. It might not be most peoples first choice but its a nice option to support for the more security conscious.

Over the weekend we did some testing and did find that the rclone backend outperformed ours but it required the file be split using a utility and transferred with --transfer figures around 128. We saw around 3000Mb/s via our backend with parallelism of 192 and 6000Mb/s when using rclone and --transfers 128. Key is the load on our network, when downloading with our backend it skirts our edge services (GatewayMT) and directly connects to the nodes holding the segments.

Really Id love to see our product implemented like it is today, natively (currently referred to as tardigrade in rclone) and under the s3 universal choices via the s3/rclone backend. Even better would be to set defaults like our 64MB block size when selecting us form compatible s3 services.

Thank you for the time you have spent reviewing this!

-Dominick

Thanks for all the ideas and feedback.

I created a draft pull request (backend/tardigrade: backend renamed to storj (tardigrade still works) by elek · Pull Request #5613 · rclone/rclone · GitHub) which shows how the backward-compatibility issue can be solved with using the same code for "tardigrade" and "storj" backend.

If I understand well, the only one remaining question is: keeping the support tardigrade/storj native backend or focus on s3 integration (4th option vs others)

For me, it seems to be better to keep both option:

  • They have different resource usage guarantees (with enough available resource - like a backup from a server -- it can be better/faster to use storj/tardigrade)
  • Different security guarantees (sharing key or not)
  • And as of today tardigrade is included as a backend, I think it should be maintained for backward-compatibility anyway... Renaming it just makes the usage less confusing

But I think it would be a good idea to improve the documentation of the tardifrade/storj backend with explaining the different guarantees and advantages/disadvantages discussed above, to help users to choose. (TLDR; on edge use s3, on server OR if security is important use tardigrade/s3)

What do you think?

I think we are taking activities and discussions in the wrong order.

Let me briefly recap status as seen from my perspective:

The S3 protocol seems to be the best option for most uses of Storj, except for the very security interested prepared to pay the price in equipment and/or speed.

I therefore propose the following sequence of your Storj activities:

  1. Make a GitHub issue to discuss the overall approach and plan
  2. Confirm full Storj S3 compatibility by running the automated S3 backend test against an S3 TestDrive having Storj as endpoint.
  3. Make a pull request to Add Storj to the list of S3 providers on this page:
    https://github.com/rclone/rclone/blob/master/docs/content/s3.md
  4. Make a performance comparison of the S3 and tardigrade backends to collect information on the expected resource usage and speed on representative use cases using identical equipment.
  5. Make a pull request to update the tardigrade (and S3) backend documentation with this information.
  6. Make a pull request to rename the tardigrade backend to Storj - or to depreciate it in favor of Storj’s own client (https://www.storj.io/integrations/uplink-cli). Approach to be based on the results and agreements obtained in the previous steps.

@ncw Please correct me if you see something missing/mistaken

I think that is an excellent idea. The notes in this post by @dominickmarino would go very well in the docs

Thank you for the suggestions @Ole.

I created an issue with the recommended action points: Rename tardigrade->storj and clarify s3/native backend use cases · Issue #5616 · rclone/rclone · GitHub

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.