DNS round robin to Dell ECS (S3)

Hi there, is there a setting where rclone will use all of the entries in a DNS round robin when connecting to a Dell ECS (an S3 object store)? So say my DNS record has 4 IPs in it, will rclone create 4 connections to the ECS and do multi-threaded copy?

Thanks!

I think you might be confusing two things.

DNS is just a lookup for a name to IP. That happens via the networking layer of the OS and rclone just uses the OS to do that.

If you want multiple connections, that happens via the multi threaded downloads.

Downloads support that but upload is still a single connection.

If I’m copying 100 files to a 4-node ECS it’s a lot quicker to do 4 x 25 than it is to do 1 x 100. Multi-part upload may also benefit from this (I haven’t been able to test yet).

That setting is --transfers as you can make that even larger with S3 and get better performance.

Cool, so are you confirming that if I use --transfers=4 on my example above, that rclone will do 4 DNS lookups and (hopefully) get the 4 different IPs in the round-robin?

Thanks!

Sorry as I was not clear.

If you have transfers 4(which is the default), it does a DNS lookup and the OS picks the IP to connect to. It makes 4 connections to that IP and does 4 transfers at the same time.

Can I make a feature request (and if so, how) for an option (or the default) to do a DNS lookup per thread? Performance to multiple ECS nodes is significantly better than multiple threads to a single node.

Thanks.

There's nothing rclone would do in terms of that as that is how the operating system handles name lookups.

What problem are you trying to solve though by routing to different IPs rather than just having more connections going out? I'd be fairly confident that 4 connections to the same IP, you are not exhausting Dell's bandwidth :slight_smile:

I’ve proven it with a product we use called Atempo Ada. When it comes to millions of small files, the CPU in each ECS node handling the transactions is much more of a bottleneck than the bandwidth.
As far as “that is how the OS does it” — that’s presumably only because that’s how you’ve written it. If you run the same nslookup from the command line 4 times in a row you will get 4 different IPs. So rclone would need to support this behaviour: give me all the IPs for the destination DNS record, and randomly choose n of them where n is the number of transfers requested.

Does it make sense?

Generally, applications do a lookup to get an IP and the OS returns that.

If the OS returns a different IP back and it round robins, that would already work.

rclone doesn't do it's own DNS resolution inside the application as it leverages existing GO functionality to lookup names:

https://golang.org/pkg/net/#hdr-Name_Resolution

It's definitely possible to write custom DNS resolution inside an application as you are suggesting but not something I would think most applications would do (just my opinion).

If you opened up new connections and the system gave back a new IP for that name, it would connect to it. A bit factors on that as some systems do DNS caching so it would always return the same name.

@ncw can chime with any information or if I've misstated something as I've been wrong before and I'm sure I'll be wrong again :slight_smile:

Thanks Animosity for the replies and for guiding me in the right direction. I understand "generally" how applications do DNS resolution, however what I am asking for here is a very specific solution for an S3 object store that operates more quickly and efficiently when all nodes are being used in parallel, and was hoping that rclone could be expanded to support it.

My current use of Atempo Ada works great, but it is extremely complex software with a million settings, and is not very intuitive or efficient for quick and dirty jobs. rclone fills that gap nicely, where I just want a quick one-off copy from somewhere else to the ECS.

You mention what "most applications would do", clearly you are right, but rclone is not "most applications", and certainly isn't in this very specific situation.

Rather than wait for @ncw to respond, does anyone have a suggestion of how to raise a formal feature request?

Thanks.

You'd open an issue on github.