Problem with three-way syncing (sftp/s3/files)

Dear Rclone users and developers,

I have a sync problem which might have a simple obvious solution which escapes me completely, however.

  • I hade some data in an sftp + crypt backend.

  • I synced it with my local filesystem:

    rclone sync encrypted-sftp:folder ./folder
    
  • I now switched to an Amazon S3 AWS backend + crypt. I synced the two like this:

    rclone sync encrypted-sftp:folder encrypted-s3:folder

    They are reported as in sync

  • Now I am trying to sync the encypted-s3 with the local filesystem:

    rclone sync --dry-run encrypted-s3:folder ./folder
    
  • I expected it to be already in sync. But rclone tries to copy again all the files from s3 to the file system.

  • If I increase the verbosity, I see:

    Modification times differ by 9229h50m27.441916489s: 2018-09-23 23:12:03 +0200 CEST, 2019-10-13 13:02:30.441916489 +0200 CEST`
    
  • It looks like it is comparing the original creation time of the file stored in the encrypted s3 with the time I synced the file on the filesystem, and wants to copy it again. But this does not happen with the encrypted-sftp backend. Does anyone know why, please? I do not understand, even after reading the rclone's s3 backend documentation.

  • The data is almost 2Tb. I would like to avoid copying it again if possible

Thank you in advance for your help.

Rclone version:

  • rclone v1.48.0
  • os/arch: linux/amd64
  • go version: go1.12.5

Valerio

EDIT: Style

It sounds what may be happening is the encrypted file's modtime is being compared rather than the real file's modtime (inside the encrypted file). A second complication is that there are 3 different filesystems in play here. It may not be obvious to most users but each of these systems actually have different rules about how their timestamps work, so something might have gone sideways in that regard.

The exact how or why of what might trigger this is a little beyond me though, we might need @ncw to chime in here and suggest how we best troubleshoot this.

But what I can say is that have many ways to work around this at least.
We could compare on hashes instead of the usual modtime+size. This is much more accurate anyway (at the cost of a little CPU), but the only problem is I'm not sure if you can use this directly on an encrypted S3 --> local.

I would ask you to just try first of all.
add --checksum to your rclone command
and see if you get any "--checksum ignored because filesystems do not share a common hash" message.
Rclone may be able to work aorund that with a manual calculation automatically (if so - no error) - but this is the part I can't say for sure off the top of my head.

There are also more ways to go about this, but give me some feedback on what you think may be appropriate and test --checksum first.

Dear @thestigma, thanks for the quick answer.

So, I tried with the checksum, and it does not try to copy the files anymore, so that seems to work.

It seems that for some files the date was changed, but for others it did not. It is not clear why. I am now running a full --dry-run for the full dataset to have an idea of how much would be copied

Valerio

If --checksum works then I'd just consider that your fix.
If you don't get that error I talked about at the very start then it means it is working.

But of course if you want to dig further down into this to see if there is a bug somewhere to correct, you can do that too if you want. In that case I suspect we are going to need to use some -vv logging track the modtime and comparison.

So just to be clear - is rclone doing the comparison wrong but the file has a correct modtime?
Or does the file actually have a wrong modtime because it got changed/corrupted somewhere?
If it is the latter, can you maybe see where this happens? It's pretty likely it is happening in one of the spesific transfer steps between remotes.

What should have happened is that rclone preserved the modification times for all the syncs.

However, I'm guessing one of those backends didn't preserve the modification times properly.

It could be the s3 server (are you using AWS or something else?) or the the sftp backend - what server are you using here?

Can you perform a test to see on which backend the modification times went missing?

You can use rclone lsl backend:path/to/file to see the modtime for a given file.

So I just wanted to let all the kind people who tried to help me know that in the end I took a brute force approach to reconcile all the time stamps

  • I synced encrypted-s3 > local fully. It did not copy all the files, only about 30% of them.
  • Now rclone says that encrypted-s3 > local is fully in sync
  • It also says that encrypted sftp > local is fully in sync (this is strange, I did not expect local to be in sync with both of them at the same time)
  • I kept a separate SHA checksum of the files to check their integrity, and the local files seem not corrupted

So thanks to everyone who helped!

Valerio

I'm glad you got it working. I'm not clear on what happened though, but I'll take it!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.