Quasi Feature Request

We are using rclone to backup storage servers on our University Campus to S3 and it's working very well.

I've been looking at the --metadata-mapper feature and thinking about a specific use case. Here's what I'd like to be able to do

  • run rclone so that its only effect is to invoke the --metadata-mapper at it scans the local (or perhaps remote) storage system.

Why? rclone has a terrific, progressive file system scanner with a rich feature set for filltering include/exclude files and directories. I'd like to reuse that with identical filters that I use for backups and create a local database of the files/directories.

OK, so why do I want the local database? rclone only re-uploads changed metadata when a new copy of a file is uploaded. Sometimes owner/group permissions are changed but the file itself isn't. Also Some backends won't let you update "just the metadata". Hence it's not a request to "please upload only changed metadata". However, a local DB can help me track that info over time without writing Terabytes to PB of already-backed-up data.

I think basically I'm looking for a flag that tell rclone to scan and just just call the -metadata-mapper for each file/directory it sees.

It's a quasi feature request becuase I know that my use is not really the target of cross platform data movement that rclone does exceedingly well.

Thanks!
Phil

Glad to hear rclone is working well for you :slight_smile:

I think the most effective way of building a local database would be to use rclone lsjson - this takes all the same include/exclude flags and will dump metadata for you with --metadata. The format it dumps is pretty much the same as the input to the metadata mapper.

eg

$ rclone lsjson -R -M dir | jq .
[
  {
    "Path": "dir",
    "Name": "dir",
    "Size": -1,
    "MimeType": "inode/directory",
    "ModTime": "2024-02-05T18:26:22.142222171Z",
    "IsDir": true
  },
  {
    "Path": "z",
    "Name": "z",
    "Size": 0,
    "MimeType": "application/octet-stream",
    "ModTime": "2024-02-05T18:26:30.494167333Z",
    "IsDir": false,
    "Metadata": {
      "atime": "2024-02-10T10:32:41.598431129Z",
      "btime": "2024-02-05T18:26:04.790336126Z",
      "gid": "1000",
      "mode": "100664",
      "mtime": "2024-02-05T18:26:30.494167333Z",
      "uid": "1000"
    }
  }
]

Maybe rclone should do this? I had that in mind as an extension when making the metadata system. It would need some updating the metadata primitives for the backends but and it would definitely use more transactions as it would have to read the metadata from both sides to check it which isn't free.

PS Maybe you could persuade the University to take out a support contract which can help you get answers quicker and keeps the rclone project sustainable?

1 Like

It's "interesting" that lsjson doesn't record permissions or uid/gid for directories. That's not the end of the world for us - we can generate a list of directories with the same filters and then post-process the listing.

Blockquote
Maybe rclone should do this? I had that in mind as an extension when making the metadata system. It would need some updating the metadata primitives for the backends but and it would definitely use more transactions as it would have to read the metadata from both sides to check it which isn't free.

That's an interesting thought --

Our current project backs up to S3 (glacier via lifecycle). We do daily top-ups (only upload files timestamped in the last 24 hours). And weekly full syncs (catches anything top-ups don't and deletes files on the backup). I'd probably do a "metadata" sync 1/month (since uid/gid/perms rarely change).

I'm going to send an email to sales@rclone.org - either support or sponsorship.

Rclone 1.66 will do this! Directories have been second class citizens up to now but 1.66 will rectify this syncing modtime by default but metadata too when enabled.

:slightly_smiling_face: