Copying Files Inside Directories That Have Very Large Number of Files and SubDirectories

This is a bit of a monologue for anyone else who is doing their google-foo and trying to get their rclone to work in this particular use case. Not really a question/issue. But if anyone thinks its worth a PR to the documentation, please let me know! Also any corrections on any of our misunderstandings would be deeply appreciated. And thanks ncw for the amazing software!

We have a single folder with over 20k to 50k files and directories on Google Drive. We are downloading from a remote to a local drive and not uploading. We use it to deploy files to servers on a low risk, non mission critical basis with lots of time to fail and retry. Lastly the disk is not encrypted and the files within range from a few KB to a few GB. Our initial expectation of rclone was more like wget or curl where we would have a bash script with hundreds of lines repeating “rclone copy src dst”. However, we faced the following problems and had a final resolution after trawling through the issues and docs.

copy copy is a subset of “sync without delete” and will always attempt to mark “excluded from sync and deletion” all files within the given root folder. So if we want to copy 100 files, it will perform the listing and exclusion 50k * 100 times. We sang hallelujah when we saw --no-traverse but it seems like this only does not traverse the dst but still traverses the src. Is No-Traverse on the src something that cannot be done and is anti-thetical to rclone?

copyto copyto also has a very long “handshake” at the beginning. Although even with the vv flag, it doesn’t show anything. So we are not sure if it is also listing the whole directory. We only see “file.tar: Couldn’t find file - need to transfer” almost immediately. Then after about 30 seconds later, it does manage to find the file and managed to download it!

copy with --filter-from or --include-from won’t work because we would have to construct some pretty complicated regex and wildcards to pick the correct file patterns thus hurting future maintainability and developer productivity. Our python based picker is easy to read and understand and performs simple brute force loops and just spews out all the file names into a single text file. This also allows us to log and audit what we really downloaded in any deployment scenario. Regex patterns cannot do anything like this.

copy with --files-from is our final choice. It still does the same list and exclude files thing, but it only does it once at the beginning. However, without this issue post, we would have scratched our heads a little longer: https://github.com/ncw/rclone/issues/1620 - so an update to the documentation would be great!

copyto with --files-from results in same behavior as copy with --files-from

copy with --fast-list Fast-List doesn’t work on Google Drive. So that’s that. However, if S3 had 50k to 100k files in a single directory, would it be a lot faster?

API Token Refresh Slight digression from main topic. Upon doing rclone config and following the steps, it doesnt save my google drive client ID and secret key. So every time the auth token got expired, it didn’t renew automatically and my entire download failed. I had to open ~/.config/rclone/configfile and type it in manually and then it worked. Is this a feature or a bug?

Thank you and PRs to clarify the docs are always appreciated.

If you want to see exactly what rclone is doing then use the -vv --dump-bodies flag (with --dry-run so you don’t copy any data).

copyto should be what you want for copying individual files. We only see “file.tar: Couldn’t find file - need to transfer” almost immediately. that means it couldn’t find it locally.

That is probably what I would have suggested (eventually!)

I don’t think that is supported! It should probably give an error.

–fast-list is all about recursing through the directory - with your one flat directory it probably won’t make any difference.

That is a bug. I’ve had other reports of it. I’ve never managed to reproduce it though. Are you running multiple rclones by any chance? If you can make a reproduction for me then please make a new issue on github!