Would it be possible to somehow utilize reflinks, if available, when copying/moving data to a mount which utilizes vfs cache, so that copies to the cache complete much faster and without requiring double the space, when the cache directory is in the same fs as the source?
It's definitely possible, but the general use case for rclone is Cloud Storage so it's a very niche use case and doubtful it would ever get implemented.
Yes it is cloud storage but a lot of backends with mount actually require VFS cache writes to have all fs operations fully supported and for slower connections also, write caching prevents blocking behavior for the duration of the upload. I do not have metrics but I assume that the majority of rclone mount users also utilize VFS cache, at least on writes mode.
The thing is that when you move eg 500GB to a remote mount with VFS cache, the actual operation takes ages and requires 500GB extra disk space for the duration when neither of those should happen when the underlying fs is eg btrfs.
I seriously doubt this can be, even remotely, classified as a niche use case.
I bet you'd be surprised what most people do
I'm not sure what you mean takes ages. Just copying a new file? Generally if folks / me are doing big disk operations, I don't use a mount.
If it was popular/needed, odds are, it would already be done. Someone would sponsor it or someone would pick it up and do it. No reason to debate it as I'm sure I can be wrong as I have been before and will be again. Feel free to submit a PR as I'm sure @ncw would welcome it and/or help out if some of it was done.
I'm not sure what you mean takes ages. Just copying a new file? Generally if folks / me are doing big disk operations, I don't use a mount.
What I mean is that if you copy 30-40 directory trees of 500GB total, as an example, it takes a long time and, depending on how many files inside, it can take a very long time, even on non-rotational storage. Since we're using VFS cache and we're not actually copying the files to the remote, this operation should be close to instant considering the files are on the same filesystem but it doesn't happen because it is considered a remote filesystem. But since it is actually moving to the same fs, reflinks should fix this problem.
As far as not using a mount goes, it is much easier to just use eg midnite commander in the shell and cherry pick what you want to move on the spot etc instead of manually doing rclone move per directory for eg 50 directories that you first have to identify, note down and feed them to eg a script. Yeah, if all you need is to move a single directory tree you can always just do rclone move instead. But if you're moving things to the cloud, selectively, in order to clear local storage space, using a mount is a huge timesaver and it would be even more so if you wouldn't have to wait for copy/move operations with target at the same fs as the source.
EDIT: I always assumed that this was not done because of some technical limitation that I could not think of.
If it was popular/needed, odds are, it would already be done.
Pretty sure that has more to do with people not knowing about things that low level in order to imagine, let alone request, such a feature, than people not "needing it".
In any case, this would speed up VFS write cache performance to almost instant for any fs that supports reflinks (and hardlinks could be used for those that don't). Not to mention dropping the double space requirement for the duration of the operation. So this would actually benefit everyone using VFS write cache, no matter their use case. It's just that they do not know about it.
That's exactly why I used mergerfs as it's a fuse based file system that does support hard linking.
mount: symbolic and hard links needed · Issue #4980 · rclone/rclone
That's from 2021.
Hard linking only works when the cache and source disk are on the same disk. Otherwise, you end up copying.
I specifically used mergfs to do this and uploaded overnight via rclone copy and kept everything merged together so it was all seamless to the end user.
I think it's a pretty good idea to utilize links for the VFS cache write queue.
It should be clarified that this doesn't really depend on reflinks AFAICT (only supported by CoW file systems), but could just use hard links.
Even symlinks, which would be more portable and could be used across mount points, could be used if ensuring that metadata/mtime checks happen on the link target.
The higher level fully portable solution would be to implement and use VFS cache limit checks for writes, i.e. VFS cache exceeds maximum size when caching writes.
@ncw I think some architectural guidance would facilitate reaching a consensus/implementation here.