Rclone does not sync directories when ZFS snapshot is used as a source

What is the problem you are having with rclone?

I am attempting to sync the contents of a ZFS snapshot with Backblaze B2 using rclone. However, rclone seems to only "see" regular files at the top-level of the snapshot, ignoring directories and not recursing into the directories for the sync. When I invoke rclone, specifying the live system instead of the snapshot, but otherwise an identical invocation, I do get the desired behavior of picking up directories and their children for the sync.

Run the command 'rclone version' and share the full output of the command.

rclone v1.62.2-DEV

  • os/version: debian 11.7 (64 bit)
  • os/kernel: 6.1.0-0.deb11.7-amd64 (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.20.4
  • go/linking: dynamic
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Backblaze B2 (Although, using the local file system as a destination seems to net similar results.)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync ./ backblaze_b2_recovery_crypt:

The rclone config contents with secrets removed.

[backblaze_b2_recovery]
type = b2
account = wouldntyouliketoknow
key = wouldntyouliketoknow

[backblaze_b2_recovery_crypt]
type = crypt
remote = backblaze_b2_recovery:mybucket/somewhere
password = wouldntyouliketoknow
password2 = wouldntyouliketoknow

[bb2_test]
type = b2
account = wouldntyouliketoknow
key = wouldntyouliketoknow

A log from the command with the -vv flag

2023/06/16 17:06:07 DEBUG : rclone: Version "v1.62.2-DEV" starting with parameters ["rclone" "sync" "-vv" "--dry-run" "./" "backblaze_b2_recovery_crypt:"]
2023/06/16 17:06:07 DEBUG : Creating backend with remote "./"
2023/06/16 17:06:07 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2023/06/16 17:06:07 DEBUG : fs cache: renaming cache item "./" to be canonical "/srv/.zfs/snapshot/20230615235901-nightly/remote"
2023/06/16 17:06:07 DEBUG : Creating backend with remote "backblaze_b2_recovery_crypt:"
2023/06/16 17:06:07 DEBUG : Creating backend with remote "backblaze_b2_recovery:mybucket/somewhere"
2023/06/16 17:06:08 DEBUG : Couldn't decode error response: EOF
2023/06/16 17:06:09 DEBUG : README.txt: Size and modification time the same (differ by -433.449µs, within tolerance 1ms)
2023/06/16 17:06:09 DEBUG : README.txt: Unchanged skipping
2023/06/16 17:06:09 DEBUG : Encrypted drive 'backblaze_b2_recovery_crypt:': Waiting for checks to finish
2023/06/16 17:06:09 DEBUG : Encrypted drive 'backblaze_b2_recovery_crypt:': Waiting for transfers to finish
2023/06/16 17:06:09 DEBUG : Waiting for deletions to finish
2023/06/16 17:06:09 INFO  : There was nothing to transfer
2023/06/16 17:06:09 NOTICE: 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Checks:                 1 / 1, 100%
Elapsed time:         1.7s

2023/06/16 17:06:09 DEBUG : 8 go routines active

For some additional context when reading the logging information above, here is a listing of the snapshot I am trying to sync:

root@freyja:/srv/.zfs/snapshot/20230615235901-nightly/remote# ls -l
total 8
drwxr-xr-x 2 root root   2 May  4 22:30 aptly
drwxr-xr-x 2 root root   2 May  4 22:31 backups
drwxr-xr-x 2 root root   2 May  4 18:17 gitea
drwxr-xr-x 4 root root   4 May 13 10:16 localhost
drwxr-xr-x 2 root root   2 May  4 22:30 prometheous
-rw-r--r-- 1 root root 742 May  4 18:42 README.txt
drwxr-xr-x 2 root root   2 May  4 18:16 syncthing

I tried to specify a directory on the local file system as the destination just to see if I could elicit different behavior, but it seemed to behave the same as when specifying Backblaze B2 as the destination.

Then I ran rclone through strace and I saw a pattern of function calls that stood out to me, an excerpt of which can be found below. It would seem rclone is getting snagged on managing its epoll object. I'm not sure if this is a red herring, but epoll_ctl() seemed to succeed when I ran rclone from the live file system rather than the snapshot.

futex(0xc000700148, FUTEX_WAKE_PRIVATE, 1) = 1 
newfstatat(AT_FDCWD, "/srv/.zfs/snapshot/20230611235901-nightly/remote/localhost/usrlocal", {st_mode=S_IFDIR|0755, st_size=2, ...}, 0) = 0 
openat(AT_FDCWD, "/srv/.zfs/snapshot/20230611235901-nightly/remote/localhost/usrlocal", O_RDONLY|O_CLOEXEC) = 10
fcntl(10, F_GETFL)                      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(10, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 10, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=2964979272, u64=140341021376072}}) = -1 EPERM (Operation not permitted)
fcntl(10, F_GETFL)                      = 0x8800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE)
fcntl(10, F_SETFL, O_RDONLY|O_LARGEFILE) = 0
getdents64(10, 0xc000564000 /* 2 entries */, 8192) = 48
getdents64(10, 0xc000564000 /* 0 entries */, 8192) = 0
close(10)

instead of .zfs snapshots folder mount snapshot before. I had tones of issues with other programs trying to use .zfs - mounting always worked

Especially on Linux - on BSD it works - never bothered to get deeper into it. Mount works.

when you use v1.62.2-DEV it means you use some v1.62.2 beta (DEV is a hint) - doubt it makes any difference here - but why not to stick to release. or latest 1.63 beta

a bit of off-topic but as you use ZFS on Linux:

maybe TrueNAS Scale will change a tide - but on Linux with ZFS your are more and more in unchartered waters. Which is weird as they have no alternative.

Hmm. No dice mounting the snapshot and running the latest 1.63.0 beta built off commit 4f8dab8bcec40f34f02074549747437ac17b0bef

As far as why I'm running the 1.62.2-DEV version, I believe I installed via: go install github.com/rclone/rclone@v1.62.2
Theoretically, the git tag should be on the same commit the release binary was cut from so. I suppose there could still be go toolchain differences to worry about.

I'll try to find a couple free hours this weekend to get the debugger setup and see if I can get anymore clarity on what is going on.

That is interesting to here about Ubuntu. Hopefully it is licensing and not technical reasons driving this decision. After ~10 years I'd hate to uproot my zpool, btrfs wasn't checking all the boxes for me, and I already have a long TODO list of side projects.

So far Debian Bookworm, Trixie, and Sid all still offer the zfs-dkms package in contrib.

I doubt it is rclone per se problem - it is rather some bug with ZFS integration on Debian...

But it can be good clue to pinpoint the root cause and help ZFS guys to fix it.

If you end up filling some ticket in Debian/ZFS let me know - I am actually interested myself in ZFS state on Linux.

Ok, this is the dumbest thing ever but I'm going to note it here for posterity in case someone else runs into something similar.

I did not want to sync my entire zpool up to Backblaze, mainly because the upload speed at my house is poo. So I created /srv/remote/ as the root of a hierarchy for bind mounts. That way I could just use /srv/remote/ as the source for my offsite backups and get just the most important stuff.

Originally when I first set all this up, I just used the live file system to get my backup script in to production quicker. Most of the data was static but I did have a few daemons updating files underneath the backup script which would occasionally cause size/hash errors for rclone. I put this issue on the TODO list and sat on it for a while.

Months later, I didn't think too much about it and switched to using my nightly ZFS snapshot as the source rather than the live file system. As using a snapshot would mean no more size/hash mismatch errors after a transfer. Well...the bind mounts aren't mounted to a bunch of empty directories in the snapshot, they are only mounted on the live file system. So of course rclone isn't going to copy data that isn't there.

The kicker is that the one file rclone did continue looking at for me, the README.txt in my OP logs, that is where I documented how I set all this up based on bind mounts. I just didn't read my own README.txt before updating my backup script.

The lesson here is:

  1. Don't try to tackle a software change early in the morning, just sleep on it.
  2. Read your own README.txt before changing anything.

sigh

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.