I've been using rclone to copy over files from various users' clouds' (Dropbox, Box, Drive...) to my own AWS S3 instance. So far I've tried doing it in two different ways:
Using a "master" serverless function (AWS Lambda) that uses a queue to spawn "worker" lambda functions; each worker function then uses the copyto command to asynchronously copy over (~50) files.
The problem here is that after spawning 20 worker functions (~1000 files) the following worker functions hang and just timeout after a while. Is it Dropbox throttling the transfer or is it something else I'm not thinking about? I don't get any errors just a timeout after the lambda time limit...
I've also tried using the normal rclone copy command (because I assumed rclone took throttling under consideration) but it's kinda slow (~50min to transfer 9GB of data from a UK Dropbox account to our US East 1 S3 instance). Is there any way to speed it up besides the --no-traverse --fast-list flags?
Any advice would be immensely appreciated!
If you just want more concurrent transfers then all you need to do is set
(or however many you want).
But while your S3 system can probably handle a lot of concurrent transfers, those other systems probably have API and file-access limits. I can give you exact details for all of them (you'd have to look it up) but Google Drive for example can handle 1000 API calls pr 100 seconds and can let you access about 2 files a second. That is - you can start that many transfers every 2 seconds. I don't think there is a maximum on concurrent transfers though. On larger files this is usually more than enough - but on small files it may not be.
50min for 9GB seems very slow even for dropbox, but again it might make sense if it's a lot of small files and dropbox has tight rate limits (which I don't rightly know the exact details of - but I kind of assume so).
may also have a great impact on larger files uploaded by rclone to S3.
I recommend 64M if you can afford the memory, but keep in mind that EACH active transfer can potentially use this much RAM, so don't overload or rclone will crash. Higher would be even better theoretically, but each doubling brings reduced gains. 5M to 64M would be very noticeable gain on larger files for an affordable memory price.
This won't have any effect on a server-side transfer if that is what you are doing though.
--fast-list and --no-traverse only affect listings, so that unlikely to be related to your problem.
Try running your command with -P to get a progress indicator where you can monitor the speed and progress of your transfers to get an idea of what is happening behind the scenes.
Are you using the API to do this? Running a master
rclone rcd and using the api to run
operations/copyto with the
_async flag would work better. That may be what you are doing though.
If you are running 20 rclone's simultaneously then I expect you are being throttled by dropbox for maximum number of simultanous connections.
@thestigma's advice is good here
There are about 5000 files total in the Dropbox account so on average ~1.8mb. Is that considered small? Thanks so much for your advice!
Thanks for your advice too! I wish I could mark two answers as "solutions"...
1.8MB (which I assume you meant) is fairly small yes.
But obviously that really depends on your bandwidth + the API limits in question, because what we are really talking about here is the time it takes for each file to transfer. If the transfer time for each is trivial (which it probably is for 1.8MB) then you will be much more limited by how quickly the API will let you start new transfers than you are in terms of bandwidth.
I'd call any file that would theoretically transfer in less than a second on your bandwidth "small" in this context. For my own 150Mbit connection for example, 5MB or below is definitely small, and getting my full bandwidth used on that (on Google Drive at least with it's 2files/sec) is not really possible. Transfering 1000 tiny text documents can be a hassle - as it takes a lot of time even if by bandwidth is hardly getting used.
On service-providers with less restrictive APIs you can compensate with using a lot of transfers. On others you have to just made do (or else maybe archive your small files together into a big file for example).
I hope that was understandable
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.