Files with multiple parents in Google Drive

There is a feature in Google Drive which allows to place file or catalog in multiple catalogs without actually copying it (Shift+Z web interface feature). It just pushes one more Id into parents collection of file/catalog internally.

The problem is that rclone doesn’t recognize (and i suppose there’s no way to figure it out) file’s or catalog’s original location among other parent catalogs so it just downloads it multiple times during sync procedure.

It leads to almost double drive space consumption on the destination (local file system). The problem is that i’m trying to move my data between two Google Drives so the latter one becomes twice larger than the origin.

Have anyone ever came across the same problem? Are there any solutions?

Thanks in advance.

I didn’t realise you could do that on the web interface.

I’ve seen the feature in the API but I didn’t realise that people were using it!

There is no original location - it is all about the Object ID with google drive.

This is effectively the same as hard links on a unix filing system, with the inode taking the place of the Object ID.

rclone doesn’t deal with hard links (in general it is quite hard) so I can’t think of an easy way of getting rclone to help with this other than with some major re-architecting.

It would be possible to hardlink all the identical copies back together again at the end with a bit of scripting - maybe there is a script out there which can help with that?

1 Like

If your main concern is eventual disk usage, and could tolerate temporarily increased disk usage, then there can be many post-processing methods:

  • Use file systems with dedup feature, like ZFS or btrfs. Note that ZFS dedup can be tricky to set up right, so be extra careful if going that way.
  • Use CoW file systems, and a post-processing script that uses “cp --reflink” to dedup it offline.
  • On file systems supporting hard link, use a post-processing script to dedup it offline.

I’m not sure whether hard links are entirely safe for rclone though. If the remote file changes, when sync’ing it back, it could be problematic if rclone updates the file in-place. Reflink is a safer choice, and should work regardless of how rclone does it.