When I attempt to list all the files in one of my Google Drives (haven't tested others) a decent amount of the files end up missing, ~5%. The files that are not returned seems consistent. This happens with and without fast-list option.
My drive is a bunch of files stored by their MD5s. A file with MD5 'ffff124484cec49abc87e355d8de14c7' would be found at drive_name:MD5/ff/ff/ffff124484cec49abc87e355d8de14c7
There should be 962 files returned from the MD5/ff folder, but it only lists 955. The missing files are visible in Google Drive web interface, Google Drive Filestream, and using the Google Python Library. I added '--dump responses' to the command and only see '200 OK' responses. If I add the second level of folders to the request (:MD5/ff/ff vs :MD5/ff) missing files appear.
What is your rclone version (output from rclone version)
rclone v1.52.3
- os/arch: linux/amd64
- go version: go1.13.15
Which OS you are using and how many bits (eg Windows 7, 64 bit)
Linux (RHEL 8) 64 bit
Which cloud storage system are you using? (eg Google Drive)
Google Drive
The command you were trying to run (eg rclone copy /tmp remote:tmp)
And if that doesn't help can you try the rclone ls with --disable ListR
I suspect this is probably to do with the fast directory listings that rclone uses - we've had problems with this before. Rclone has quite a few work-arounds for google bugs here
Nick, tested with --disable ListR and it works. Thanks!
Bit more information from testing. I got the request and auth from rclone and submitted it using cURL and confirmed the files are not returned by the google api. rclone had combined many parents folders into the request so I removed all but the folder with one of the missing files and one other folder. When I do that it still does not show. When I removed the other parent folder (so I am only requesting the parent folder I know the missing file is in) it does show again. This looks to be an issue on Google's side not returning all the expected results.
I will kick this over to them and see if they have any ideas why the results are incomplete.
One more update. Setting the value team drive to be the root drive ID fixed it as well. Not sure why, but the results were incomplete unless that was set.
Nick, yeah realized that later. I found what I think is a related bug that had already been reported to Google and chimed in with my experience / information.
What ListR is, is a backend primitive which lists all the files in the remote very quickly. Some backends can do that easily (eg S3) and some can't. The Google drive backend implements this in a complicated way too.
When you use --fast-list for a sync, it uses the ListR primitive, but it has to build a tree of all the objects in memory first which can use a lot of memory.
When you use rclone ls it is also doing a recursive list of all the files and rclone will use ListR if available is it doesn't need to buffer the files in memory as we don't guarantee any particular ordering for rclone ls. So we get the speed up of --fast-list but without the extra memory use.