Google Drive: More evidence of improper/out-of-date/delayed caching, possible leading to erroneous duplicated files?

Hello everyone,

First of all, this is most probably an issue with GDrive, and not a bug in rclone per se (unless @ncw says differently, of course), see below for more details.

First, let me quote from another topic:

I just remembered the above right now after encountering (during some totally unrelated, routine archiving) another evidence of improper/out-of-date/delayed caching on GoogleDrive’s part, to with:

  1. First, I checked with --dry-run a local directory where I just updated a dozen small text files against its remote copy (located on an encrypted GDrive remote), to be sure the correct files would be copied:
rclone --dry-run -v --checkers=8 copy . egd:HD_ARCHIVE_0002
2017/09/04 10:52:59 INFO  : Encrypted drive 'egd:HD_ARCHIVE_0002': Modify window is 1s
2017/09/04 10:53:00 NOTICE: 2010121002/2010121002.md5: Not copying as --dry-run
2017/09/04 10:53:00 NOTICE: 2010092501/2010092501.md5: Not copying as --dry-run
2017/09/04 10:53:00 NOTICE: 2010111501/2010111501.md5: Not copying as --dry-run
2017/09/04 10:53:00 NOTICE: 2010092601/2010092601.md5: Not copying as --dry-run
2017/09/04 10:53:00 NOTICE: 2010112902/2010112902.md5: Not copying as --dry-run
2017/09/04 10:53:00 NOTICE: 2010112901/2010112901.md5: Not copying as --dry-run
2017/09/04 10:53:00 NOTICE: 2010121001/2010121001.md5: Not copying as --dry-run
2017/09/04 10:53:00 NOTICE: 2010121201/2010121201.md5: Not copying as --dry-run
2017/09/04 10:53:01 NOTICE: 2010123001/2010123001.md5: Not copying as --dry-run
2017/09/04 10:53:01 NOTICE: 2011011301/2011011301.md5: Not copying as --dry-run
2017/09/04 10:53:01 NOTICE: 2011050101/2011050101.ls: Not copying as --dry-run
2017/09/04 10:53:01 NOTICE: 2011050101/2011050101.md5: Not copying as --dry-run
2017/09/04 10:53:26 INFO  : Encrypted drive 'egd:HD_ARCHIVE_0002': Waiting for checks to finish
2017/09/04 10:53:26 INFO  : Encrypted drive 'egd:HD_ARCHIVE_0002': Waiting for transfers to finish
2017/09/04 10:53:26 INFO  :  
Transferred:      0 Bytes (0 Bytes/s)
Errors:                 0
Checks:              2519
Transferred:           12
Elapsed time:       34.1s
  1. Being the above correct, I then re-ran the command without --dry-run and sure enough, rclone copied the exact same 12 files reported above, as expected:
rclone -v --checkers=8 copy . egd:HD_ARCHIVE_0002
2017/09/04 10:53:39 INFO  : Encrypted drive 'egd:HD_ARCHIVE_0002': Modify window is 1s
2017/09/04 10:53:42 INFO  : 2010121002/2010121002.md5: Copied (replaced existing)
2017/09/04 10:53:42 INFO  : 2010121201/2010121201.md5: Copied (replaced existing)
2017/09/04 10:53:44 INFO  : 2010112902/2010112902.md5: Copied (replaced existing)
2017/09/04 10:53:45 INFO  : 2010121001/2010121001.md5: Copied (replaced existing)
2017/09/04 10:53:46 INFO  : 2010112901/2010112901.md5: Copied (replaced existing)
2017/09/04 10:53:47 INFO  : 2010111501/2010111501.md5: Copied (replaced existing)
2017/09/04 10:53:47 INFO  : 2010092601/2010092601.md5: Copied (replaced existing)
2017/09/04 10:53:48 INFO  : 2010123001/2010123001.md5: Copied (replaced existing)
2017/09/04 10:53:49 INFO  : 2010092501/2010092501.md5: Copied (replaced existing)
2017/09/04 10:53:50 INFO  : 2011011301/2011011301.md5: Copied (replaced existing)
2017/09/04 10:53:51 INFO  : 2011050101/2011050101.md5: Copied (new)
2017/09/04 10:53:52 INFO  : 2011050101/2011050101.ls: Copied (new)
2017/09/04 10:53:59 INFO  : Encrypted drive 'egd:HD_ARCHIVE_0002': Waiting for checks to finish
2017/09/04 10:53:59 INFO  : Encrypted drive 'egd:HD_ARCHIVE_0002': Waiting for transfers to finish
2017/09/04 10:53:59 INFO  :  
Transferred:   192.452 kBytes (8.770 kBytes/s)
Errors:                 0
Checks:              2519
Transferred:           12
Elapsed time:       21.9s
  1. Being the sick paranoid guy :wink: that I am, I always run “rclone copy” twice, to be sure that no more files remain to be copied, and to my great surprise, this time it copied again 6 of the files already reported as copied above:
rclone -v --checkers=8 copy . egd:HD_ARCHIVE_0002
2017/09/04 10:54:04 INFO  : Encrypted drive 'egd:HD_ARCHIVE_0002': Modify window is 1s
2017/09/04 10:54:06 INFO  : 2010121002/2010121002.md5: Copied (replaced existing)
2017/09/04 10:54:06 INFO  : 2010112902/2010112902.md5: Copied (replaced existing)
2017/09/04 10:54:07 INFO  : 2010121001/2010121001.md5: Copied (replaced existing)
2017/09/04 10:54:07 INFO  : 2010092501/2010092501.md5: Copied (replaced existing)
2017/09/04 10:54:07 INFO  : 2011050101/2011050101.ls: Copied (replaced existing)
2017/09/04 10:54:08 INFO  : 2010112901/2010112901.md5: Copied (replaced existing)
2017/09/04 10:54:28 INFO  : Encrypted drive 'egd:HD_ARCHIVE_0002': Waiting for checks to finish
2017/09/04 10:54:28 INFO  : Encrypted drive 'egd:HD_ARCHIVE_0002': Waiting for transfers to finish
2017/09/04 10:54:28 INFO  :  
Transferred:   15.894 kBytes (638 Bytes/s)
Errors:                 0
Checks:              2521
Transferred:            6
Elapsed time:       25.5s
  1. I then re-ran the copy again just to be sure, and sure enough, it reported no files copied, as I’ve been expecting since 3) above:
rclone -v --checkers=8 copy . egd:HD_ARCHIVE_0002
2017/09/04 10:54:52 INFO  : Encrypted drive 'egd:HD_ARCHIVE_0002': Modify window is 1s
2017/09/04 10:55:20 INFO  : Encrypted drive 'egd:HD_ARCHIVE_0002': Waiting for checks to finish
2017/09/04 10:55:20 INFO  : Encrypted drive 'egd:HD_ARCHIVE_0002': Waiting for transfers to finish
2017/09/04 10:55:20 INFO  :  
Transferred:      0 Bytes (0 Bytes/s)
Errors:                 0
Checks:              2521
Transferred:            0
Elapsed time:       34.8s

So it seems the Google Drive is indeed working with more than one cache (perhaps per “shard” or whatever inside Google’s own “Cloud”), and that those caches can get out-of-sync among each other, and generate unneeded copying as above.

Perhaps it could also be generating the misterious duplicates I’ve reported on that other topic linked above? perhaps the 6 extra copies listed on 3) above would end up as duplicates? To try and verify this, I then ran:

rclone lsd --max-depth 999 egd:HD_ARCHIVE_0002 | cut -b 44- | sort | uniq -c | sort -n | grep -v ' *1 '`

But it produced no output whatsoever, so I understand rclone can detect no duplicates (at least under egd:HD_ARCHIVE_0002)

@ncw, would said duplicates end up being generated on the same “top-level” directory above (eg, HD_ARCHIVE_0002), or in order to be sure to detect them if duplicates were indeed generated, should I better check my entire remote (ie, egd:)?

PS: in case that’s relevant, all the above was done with rclone v1.37, on top of Springdale EL6 (RHEL6 clone) running on my x86_64 notebook.

Cheers,
– Durval.

Yes I’ve seen that happen. It seems more common on team drives for some reason. I uploaded 1000 files, then did rclone size on the upload directory. I see less than 1000 files, which gradually over the space of a minute or two creeps up to 1000.

If rclone uploads something and it doesn’t appear in the listing of the remote, when you run the sync again rclone will think it is missing and upload it again. This creates the duplicate files on drive as far as I can see. I think there is a similar process for duplicate directories.

This will only detect duplicate directories won’t it? I’m pretty sure that you’ll just have duplicate files in the above.

I think the duplicates should end up in the same top-level directory. You should be able to see them with rclone ls --max-depth 1 egd:HD_ARCHIVE_0002: I would have thought.