Fine tuning list/tree command for RAM/speed

Hi all, I'm fairly new to rclone however we have access to it at our company and are looking to use it for our purposes - so I do apologise if this is fairly simple. Eventually, we are looking to use it to move from GDrive to an S3 storage, however for the time being we are attempting to pull a list of all the files (and currently just folders) present on GDrive. The issue I'm currently facing is testing a balance between speed and memory usage, as the tree command included below ended up failing due to using up all of the 8gb of RAM on the server when run over a larger dataset (this can be increased but I'd like to understand the ideal usage first), I've been looking at how the command can be altered. So far, I've come to the following conclusions:

  • --fast-list does not really matter here as from what I can gather, this is mostly RAM intensive but saving on API calls, which is not an issue.
  • --use-mmap apparently is a good way of managing memory, as go seems to have poor memory management and this can help alleviate moving it back to the available pool.
  • --checkers int seems to be about running things concurrently, I can't tell if this is usually just for moving or copying files and the like or would work here. On the assumption it is useful here, I imagine I can just bump this up until it gets towards the upper limit of memory and that will be where it can be left. I'm assuming more parallel processing means a faster command.

With this in mind, if anybody could clarify the above points or perhaps suggest the best way to balance memory vs speed then that would be great. Currently I can just leave a command going, and any larger directories can be split up once listing out the actual files within the folders, so longer commands aren't an issue, but I don't want to be taking longer than required due to just doing the wrong thing, or have it fail part way due to running out of RAM.

Run the command 'rclone version' and share the full output of the command.

Currently running, but can come back when done with the version.

Which cloud storage system are you using? (eg Google Drive)

GDrive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

` ` ` rclone tree -d -v --use-mmap --fast-list --log-file log.txt --checkers=6 GDrive:"/Filepath/" >output.txt

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

Can include once current command has finished

welcome to the forum,

rclone lsd

If you only need folders, I would avoid the tree command for the big run. It has to build the pretty hierarchy, so it can be heavier than a plain listing. For a recursive folder list I would try: rclone lsf -R --dirs-only GDrive:/Filepath/ > output.txt. I would also leave --fast-list off while RAM is the problem, since it trades fewer API calls for more memory. --checkers can help concurrency, but I would start low and raise it only after watching memory on a smaller subtree.

Thanks for the advice - is there a way to easily get this into a somewhat user friendly format? Currently with tree I can use the hierarchy to create delimiters that split them up into columns based on folder depth in sheets which is very handy for sharing, along with the visuals of the tree to help with readability. Is tree likely to require much more time/memory than a recursive list?

For a shareable sheet I would not use tree as the first pass. Use something like:

`rclone lsf -R --dirs-only GDrive:/Filepath/ > folders.txt`

Then import/split on `/` in Sheets. That gives you the folder depth as columns without rclone needing to build the visual tree. `tree` is convenient for humans, but on a very large remote it can cost more memory because it has to keep enough state to print the hierarchy. `--fast-list` is also the first thing I would leave off for this job; it saves API calls but usually increases RAM. Start with low `--checkers` too, since Google API latency is often the limit here rather than local CPU.