Memory + Request problem with Rclone Google Drive


Let me start by saying I love rclone, it's an amazing tool. I've been using using it to transfer large amounts of files from my users' Dropbox, Drive, Box etc. to my own S3 Bucket. The way I've been doing it is by having the user pass me an access token, calling "rclone ls" on their files (through a remote I create for them) and then using Promise.all to call "rclone copyto user-drive-remote:path/file1 my-s3-remote:file1-uniqueId" on each file in one go (I use Node).

This is where the problems start. It's been working great with very small quantities of files but when I get a slightly larger folder involved (300mb+) involved it starts giving me a couple of errors:

  • Drive says that I submitted 735 requests and told me that I had surpassed my daily quota. There are no more than 50 files so this is very weird.

  • The memory usage gets maxed out. I'm using AWS Lamdba to do this and the maximum memory I can allocate is 3GB. Should I have my lamdba function call other lamdba functions for each file?

  • I get an MaxListenersExceededWarning. So far I've just kind of hacked it to look like this
    require('events').EventEmitter.defaultMaxListeners = 100;

I've attached a gist below if you want to take a look at the code ( minus the sensitive info ):

Any help/tips/advice on this matter would be incredibly appreciated. Would love to hear your feedback. Thanks!!!

If you are invoking rclone many times then each time there is an overhead - it has to log in and find the dirctory you are working on. This may be what is causing the quota problems and memory problems and too many sockets, especially if you are running a great number of them in parallel.

The best solution to this would be to start one rclone with rclone rcd then use the rclone api - this will be a lot more efficient and should hopefully solve your problems! You should find the API easy to use from node - it is posting and retrieving JSON blobs.

The other alternative would be to limit the number of rclones you run at once. I'm not sure how you do that in node but I'm sure there is a way!

@ncw thanks for the reply! I create a new topic for a follow up question:

I can definitely limit the number of rclones that run at once to reduce the memory problems. How does the rclone API use less requests than the rclone ls command being run once and then rclone copyto being run on each file though? Is there a way to reduce the requests without switching to the API? (besides --fast-list on the ls command)

It seems 97% of my requests to drive are drive.files.list methods (1700 calls for ~50 files), 1.5% are drive.files.get and the last 1.5% are drive.about.get.

There is a certain overhead when you run rclone it needs to make sure it has a valid token and find the directory you've passed in. If you run rclone with -vv --dump headers you'll get an idea.

You need rclone to be persistent in some way so you could use rclone mount or one of the rclone serve servers.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.