Files missing from Google Drive 'ls' output

What is the problem you are having with rclone?

When I attempt to list all the files in one of my Google Drives (haven't tested others) a decent amount of the files end up missing, ~5%. The files that are not returned seems consistent. This happens with and without fast-list option.

My drive is a bunch of files stored by their MD5s. A file with MD5 'ffff124484cec49abc87e355d8de14c7' would be found at drive_name:MD5/ff/ff/ffff124484cec49abc87e355d8de14c7

There should be 962 files returned from the MD5/ff folder, but it only lists 955. The missing files are visible in Google Drive web interface, Google Drive Filestream, and using the Google Python Library. I added '--dump responses' to the command and only see '200 OK' responses. If I add the second level of folders to the request (:MD5/ff/ff vs :MD5/ff) missing files appear.

What is your rclone version (output from rclone version)

rclone v1.52.3
- os/arch: linux/amd64
- go version: go1.13.15

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Linux (RHEL 8) 64 bit

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone ls gd_backup_zdata_home:MD5/ff

The rclone config contents with secrets removed.

[gd_backup_zdata_home]
type = drive
client_id = 1...2.apps.googleusercontent.com
client_secret = n...V
scope = drive
root_folder_id = 0...A
service_account_file = /opt/.../zfsbackup.json

A log from the command with the -vv flag

2020/09/01 16:30:25 DEBUG : rclone: Version "v1.52.3" starting with parameters ["/opt/.../envs/zfs_backup/bin/rclone" "ls" "--log-file" "log.txt" "-vv" "gd_backup_zdata_home:MD5/ff"]
2020/09/01 16:30:25 DEBUG : Using config file from "/opt/.../etc/zfs_backup/rclone.conf"
2020/09/01 16:30:32 DEBUG : 21 go routines active

Additional example output

rclone ls gd_backup_zdata_home:MD5/ff/ff | grep 'ffff'
     1608 ffff124484cec49abc87e355d8de14c7
      321 ffffde984a43be4653c91a803a0e9be5
rclone ls gd_backup_zdata_home:MD5/ff | grep '\/ffff'
      321 ff/ffffde984a43be4653c91a803a0e9be5

Thanks and let me know if you need additional details of for me to testing anything.

Can you try with v1.53 which was released today.

And if that doesn't help can you try the rclone ls with --disable ListR

I suspect this is probably to do with the fast directory listings that rclone uses - we've had problems with this before. Rclone has quite a few work-arounds for google bugs here :frowning:

Nick, tested with --disable ListR and it works. Thanks!

Bit more information from testing. I got the request and auth from rclone and submitted it using cURL and confirmed the files are not returned by the google api. rclone had combined many parents folders into the request so I removed all but the folder with one of the missing files and one other folder. When I do that it still does not show. When I removed the other parent folder (so I am only requesting the parent folder I know the missing file is in) it does show again. This looks to be an issue on Google's side not returning all the expected results.

I will kick this over to them and see if they have any ideas why the results are incomplete.

One more update. Setting the value team drive to be the root drive ID fixed it as well. Not sure why, but the results were incomplete unless that was set.

Ah, you need to set team_drive otherwise rclone doesn't set some things in the API which you need for as accessing shared drived.

If you run with -vv --dump headers you can see the difference.

I'm not 100% sure why you saw what you saw but I know not using the team drive flags will cause problems.

Nick, yeah realized that later. I found what I think is a related bug that had already been reported to Google and chimed in with my experience / information.

Thanks for all the help debugging this.

Does rclone automatically --fast-list now that you need to disable ListR?

Back in this release:

What ListR is, is a backend primitive which lists all the files in the remote very quickly. Some backends can do that easily (eg S3) and some can't. The Google drive backend implements this in a complicated way too.

When you use --fast-list for a sync, it uses the ListR primitive, but it has to build a tree of all the objects in memory first which can use a lot of memory.

When you use rclone ls it is also doing a recursive list of all the files and rclone will use ListR if available is it doesn't need to buffer the files in memory as we don't guarantee any particular ordering for rclone ls. So we get the speed up of --fast-list but without the extra memory use.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.