This is a bit of a monologue for anyone else who is doing their google-foo and trying to get their rclone to work in this particular use case. Not really a question/issue. But if anyone thinks its worth a PR to the documentation, please let me know! Also any corrections on any of our misunderstandings would be deeply appreciated. And thanks ncw for the amazing software!
We have a single folder with over 20k to 50k files and directories on Google Drive. We are downloading from a remote to a local drive and not uploading. We use it to deploy files to servers on a low risk, non mission critical basis with lots of time to fail and retry. Lastly the disk is not encrypted and the files within range from a few KB to a few GB. Our initial expectation of rclone was more like wget or curl where we would have a bash script with hundreds of lines repeating “rclone copy src dst”. However, we faced the following problems and had a final resolution after trawling through the issues and docs.
copy copy is a subset of “sync without delete” and will always attempt to mark “excluded from sync and deletion” all files within the given root folder. So if we want to copy 100 files, it will perform the listing and exclusion 50k * 100 times. We sang hallelujah when we saw --no-traverse
but it seems like this only does not traverse the dst but still traverses the src. Is No-Traverse on the src something that cannot be done and is anti-thetical to rclone?
copyto copyto also has a very long “handshake” at the beginning. Although even with the vv flag, it doesn’t show anything. So we are not sure if it is also listing the whole directory. We only see “file.tar: Couldn’t find file - need to transfer” almost immediately. Then after about 30 seconds later, it does manage to find the file and managed to download it!
copy with --filter-from or --include-from won’t work because we would have to construct some pretty complicated regex and wildcards to pick the correct file patterns thus hurting future maintainability and developer productivity. Our python based picker is easy to read and understand and performs simple brute force loops and just spews out all the file names into a single text file. This also allows us to log and audit what we really downloaded in any deployment scenario. Regex patterns cannot do anything like this.
copy with --files-from is our final choice. It still does the same list and exclude files thing, but it only does it once at the beginning. However, without this issue post, we would have scratched our heads a little longer: https://github.com/ncw/rclone/issues/1620 - so an update to the documentation would be great!
copyto with --files-from results in same behavior as copy with --files-from
copy with --fast-list Fast-List doesn’t work on Google Drive. So that’s that. However, if S3 had 50k to 100k files in a single directory, would it be a lot faster?
API Token Refresh Slight digression from main topic. Upon doing rclone config and following the steps, it doesnt save my google drive client ID and secret key. So every time the auth token got expired, it didn’t renew automatically and my entire download failed. I had to open ~/.config/rclone/configfile and type it in manually and then it worked. Is this a feature or a bug?