Rclone mount repeatedly uploading the same unchanged files

What is the problem you are having with rclone?

rclone mount on macOS (with --vfs-cache-mode full) is repeatedly uploading the same files to Google Drive, despite the fact that the files have not changed. When I compare the old and new versions of the files, the only difference I can find is the modtime; the checksums and all other metadata match exactly.

This happens only with a few specific files (and not many others in the same directory). There is nothing special about the problematic files, so far as I can tell. I have checked for potential duplicates and possible illegal characters in the filenames, but there are none that I can find. It happens a few times per day, which I can see by reviewing the Activity History in Google Drive (via the https://drive.google.com/ UI). It is a bit unsettling to see random files being "edited" randomly when I haven't touched them, even though no data seems to be changing.

I've tried debugging a bit and have found a sequence of steps to reliably reproduce it (see below.) Note that this is just one way to produce it, and not the ONLY way (for example, I have also seen it happen while the mount is running and has not been recently mounted or unmounted.)

One theory I had (which could be totally wrong) is that some other process on my computer (perhaps system processes related to Finder or Quick Look) is accessing the files with read/write access but proceeds to only read (and not write) them, and rclone interprets this as a write, and accordingly re-uploads them with an updated modtime. But if that's the case, I can't explain why it would only affect a few specific files and not others. (Presumably the same process would be reading those files too?)

Also worth noting that I do not see this behavior in a normal (non-mounted) local folder with the same files in it. In other words: it seems to be rclone (or FUSE?), not Finder or something else, that is responsible for updating the modtime. It is not happening when we take rclone/FUSE out of the equation.

Repro steps (and corresponding lsl output):

  1. To make the simplest possible test, I created a new folder in Google Drive containing only 3 files: the 2 "problematic" files (PICT_20211210_130955.JPG and PXL_20220511_194921851.MP.jpg) and 1 normal file (PXL_20220717_044546991.MP.jpg). I will set this folder as the root_folder_id for my drive remote.

  2. Start the mount. At this point, if I run rclone lsl on my cache-dir, I see as follows (only the .DS_Store that Finder puts there by default. So far so good.)

     6148 2023-02-21 05:36:12.476326737 .DS_Store
  1. Open the /path/to/local/mount directory in Finder. Now, rclone lsl reveals that the 2 problematic files (but not the 1 normal file) have been downloaded.
     6148 2023-02-21 05:36:12.476326737 .DS_Store
      276 2023-02-21 06:01:59.936642041 vfsMeta/gdrive_testfolder/PICT_20211210_130955.JPG
      278 2023-02-21 06:02:01.980363484 vfsMeta/gdrive_testfolder/PXL_20220511_194921851.MP.jpg
   529165 2023-02-21 05:36:35.938000000 vfs/gdrive_testfolder/PICT_20211210_130955.JPG
  6455244 2023-02-21 05:36:34.259000000 vfs/gdrive_testfolder/PXL_20220511_194921851.MP.jpg
  1. Wait for the cache to expire, so that rclone automatically removes the cached files. (I've set --vfs-cache-max-age to 0h1m0s to make testing easier.) Now, rclone lsl confirms the cache is cleaned:
     6148 2023-02-21 05:36:12.476326737 .DS_Store
  1. Unmount the mount. Immediately upon doing so (and before the terminal process fully terminates), rclone re-downloads the 2 problematic files (but not the 1 normal file) and updates their modtimes, as if they had been edited. Note the changed modtimes and the absence of the third file:
     6148 2023-02-21 05:36:12.476326737 .DS_Store
      275 2023-02-21 06:04:03.952351995 vfsMeta/gdrive_testfolder/PICT_20211210_130955.JPG
      278 2023-02-21 06:04:03.633825357 vfsMeta/gdrive_testfolder/PXL_20220511_194921851.MP.jpg
   529165 2023-02-21 06:04:03.634066000 vfs/gdrive_testfolder/PICT_20211210_130955.JPG
  6455244 2023-02-21 06:04:01.935404000 vfs/gdrive_testfolder/PXL_20220511_194921851.MP.jpg
  1. Start the mount again. Now, rclone will replace the remote files with the cache files, because they are newer.
     6148 2023-02-21 05:36:12.476326737 .DS_Store
      276 2023-02-21 06:05:48.024043094 vfsMeta/gdrive_testfolder/PICT_20211210_130955.JPG
      279 2023-02-21 06:05:49.105709685 vfsMeta/gdrive_testfolder/PXL_20220511_194921851.MP.jpg
   529165 2023-02-21 06:04:03.634066000 vfs/gdrive_testfolder/PICT_20211210_130955.JPG
  6455244 2023-02-21 06:04:01.935404000 vfs/gdrive_testfolder/PXL_20220511_194921851.MP.jpg
  1. For completeness's sake, here's what I get when I run rclone lsl on the Google Drive remote:
   529165 2023-02-21 06:04:03.634000000 PICT_20211210_130955.JPG
  6455244 2023-02-21 06:04:01.935000000 PXL_20220511_194921851.MP.jpg
  5566678 2022-07-17 00:55:06.000000000 PXL_20220717_044546991.MP.jpg

(The absence of .DS_Store is expected due to --noappledouble and I believe the rounded modtime is also expected per https://rclone.org/drive/#modified-time )

Run the command 'rclone version' and share the full output of the command.

rclone v1.61.1
- os/version: darwin 13.2.1 (64 bit)
- os/kernel: 22.3.0 (arm64)
- os/type: darwin
- os/arch: arm64
- go/version: go1.19.4
- go/linking: dynamic
- go/tags: cmount

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone mount gdrive_testfolder: /Users/redacted/rclone/gdrive_testfolder --vfs-cache-mode full --stats 1m --rc -vv --cache-dir /Users/redacted/rclone/cache/gdrive_testfolder --vfs-cache-max-age 0h1m0s

The rclone config contents with secrets removed.

[gdrive_testfolder]
type = drive
client_id = redacted
client_secret = redacted
scope = drive
token = redacted
team_drive = 
root_folder_id = redacted
export_formats = url

A log from the command with the -vv flag

Log corresponding to Steps 1-5 above: rclone log 2023-02-21 Steps 1-5 - Pastebin.com
Log corresponding to Step 6 above: rclone log 2023-02-21 Step 6 - Pastebin.com

The modtimes are not accurate in the cachedir - the actual mod times are held in the vfsMeta part of the cache as JSON files.

Rclone doesn't randomly download files, it will be the OS or some application asking for those files.

At the end of your first log you can see

2023/02/21 06:04:03 DEBUG : /PICT_20211210_130955.JPG: Write: ofst=32768, fh=0x1
2023/02/21 06:04:03 DEBUG : PICT_20211210_130955.JPG(0x14000146200): _writeAt: size=16384, off=32768
2023/02/21 06:04:03 DEBUG : PICT_20211210_130955.JPG(0x14000146200): >_writeAt: n=16384, err=<nil>
2023/02/21 06:04:03 DEBUG : /PICT_20211210_130955.JPG: >Write: n=16384

Which appears that something is modifying PICT_20211210_130955.JPG - writing 16k at offset 32k.

I don't know why that is, but that is why a) rclone downloads the file (so it can modify that chunk) and b) why it wants to upload it again.

I wonder if this is something to do with resource forks and the OS thinking it is modifying a resource fork but actually modifying the file. I don't really understand how resource forks work in the modern macOS world or how they get translated by FUSE so this may be an area to investigate!

Thank you so much, this is very helpful!

I spent awhile yesterday trying to figure out which process is asking for those files. I'm pretty sure the read in #3 (upon opening the parent directory in Finder) is caused by the OS's "Quick Look" attempting to generate thumbnail icons for each file. The write in #5 is harder to identify, but appears to also be related to Quick Look. I've been trying to spot what is different about the "problematic" files vs. "normal" files, and one thing I'm noticing is that the two images in my example contain visible text. Apple recently introduced a "Live Text" feature that performs OCR on images which can then be selected in Quick Look or searched for in Spotlight. So one possible theory is that the OS is attempting to cache the recognized text somewhere, like a resource fork, and--as you suspected--FUSE is writing to the file instead.

I'm not 100% sure this is the case (there are other unaffected images with text, which would seem to disprove this theory, and Apple makes it infuriatingly hard to disable these features to test), but supposing this is what's happening, is there a different mount configuration that might play nicer with this? A particular FUSE mount flag, perhaps? It seems like it must be possible, as other apps like BoxCryptor and Google Drive for Desktop (pre-FileProvider) have somehow managed to get around these issues. I wonder if they are simply blocking the preview generation process somehow, which would explain why opening a folder doesn't require downloading all its contents to disk, and previews don't work until the first time a file is opened.

Or, short of that, is there a way to tell rclone "ignore the write if nothing actually changed"? (I experimented with --no-modtime but it didn't do what I expected.)

I will also explore Bisync to see if that might be a better fit for my use case. Thanks again for your help, and for this amazing tool!

P.S. For anyone who might see this post and be inclined to pursue the "Live Text" theory further, here is some technical info I found helpful:

I also found this of interest, from the osxfuse wiki:

By default, macFUSE provides a flexible and adaptive mechanism to handle extended attributes (including things such as Finder Info, Resource Forks, and ACLs). It will initially forward extended attributes calls up to the user-space file system. If the latter does not implement extended attribute functions, macFUSE will remember this and will not forward subsequent calls. It will store extended attributes as Apple Double (._) files. If the user-space file system does implement extended attribute functions, it can choose to handle all or only some extended attributes. If there are certain extended attributes that the user-space file system wants macFUSE (the kernel) to handle through ._ files, it should return ENOTSUP for such attributes. The auto_xattr option tells macFUSE to not bother with sending any extended attributes calls up to user-space, regardless of whether the user-space file system implements the relevant functions or not. With auto_xattr, the kernel will always_ use ._ files.

That is interesting about extended attributes.

Rclone returns ENOTSUP for all reads and writes to extended attributes.

It might be worth having a go with -o noapplexattr to force macFUSE to disallow access to all resource stuff. nobrowse looks interesting too. You should be able to try any of the options on that page and see if they help. I suspect there is one that we are missing.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.