Unexpected dangerous behaviour with: move --immutable and --ignore-existing flags

I've been trying to figure out a command to move files from a local filesystem to a destination remote server, but without ever overwriting or deleting files that have the same name (but unique content). i.e..

  • If files have the same name, and the content is exactly the same, the source copy can just be deleted
  • If files have the same name, but the content is different, the source file is left on the source system and nothing changes on the destination either

After a while, I figured out that I can achieve this with:

  • rclone move --immutable /tmp/sourcefolder/ destinationserver:destinationbucket

...but it only works when the source is a folder, not a specific file.

While figuring this out, a couple of earlier attempts surprised me along the way:

  • rclone move --immutable sourcefile destinationserver:destinationbucket
    • i.e. the same command as above, but on a specific filename... this overwrites the destination file, even when the content is unique/different
  • rclone move --ignore-existing sourcefile destinationserver:destinationbucket
    • this deletes original source files (which have unique content) entirely. It does the same thing if you specify a source folder.

In my experiments, when the files had different content, the modtimes were also different. But even adding --checksum to either of them didn't make any difference.

Both of the above behaviours seem very surprising and dangerous to me, given that they both sound like they should avoid being destructive. Are either of these issues bugs?

I'm using: rclone v1.51.0 - which I just downloaded now using curl https://rclone.org/install.sh | sudo bash

Can you share a debug log?

Can you share a debug log?

For testing, I'm just creating files using Linux command:

date > sourcefile

On destination host:

cat sourcefile
Sun May 24 17:22:01 AEST 2020

--immutable demo on source host:

cat sourcefile
Mon May 25 11:46:16 AEST 2020
rclone move --immutable -vv sourcefile duobase:test-bucket-1

2020/05/25 11:49:53 DEBUG : rclone: Version "v1.51.0" starting with parameters ["rclone" "move" "--immutable" "-vv" "sourcefile" "duobase:test-bucket-1"]
2020/05/25 11:49:53 DEBUG : Using config file from "/root/.rclone.conf"
2020/05/25 11:49:53 DEBUG : sourcefile: Modification times differ by -18h24m14.919741001s: 2020-05-25 11:46:16.204163086 +1000 AEST, 2020-05-24 17:22:01.284422085 +1000 AEST
2020/05/25 11:49:53 DEBUG : sourcefile: MD5 = 1e5c7d0711fd82a5b4b98b153023b8f4 (Local file system at /tmp/rclone-push)
2020/05/25 11:49:53 DEBUG : sourcefile: MD5 = 720463eaab00d1fdd011edb3001ee954 (S3 bucket test-bucket-1)
2020/05/25 11:49:53 DEBUG : sourcefile: MD5 differ
2020/05/25 11:49:53 DEBUG : sourcefile: MD5 = 1e5c7d0711fd82a5b4b98b153023b8f4 OK
2020/05/25 11:49:53 INFO  : sourcefile: Copied (replaced existing)
2020/05/25 11:49:53 INFO  : sourcefile: Deleted
2020/05/25 11:49:53 INFO  :
Transferred:            30 / 30 Bytes, 100%, 426 Bytes/s, ETA 0s
Checks:                 2 / 2, 100%
Deleted:                1
Transferred:            1 / 1, 100%
Elapsed time:         0.0s
2020/05/25 11:49:53 DEBUG : 5 go routines active
2020/05/25 11:49:53 DEBUG : rclone: Version "v1.51.0" finishing with parameters ["rclone" "move" "--immutable" "-vv" "sourcefile" "duobase:test-bucket-1"]

...the result of above is that "sourcefile" overwrites the destination file of the same name, even though content + modtime are different (see differing MD5 in debug log)

--ignore-existing file demo on source host:

date > sourcefile
cat sourcefile
Mon May 25 11:52:33 AEST 2020
rclone move --ignore-existing -vv sourcefile duobase:test-bucket-1

2020/05/25 11:53:27 DEBUG : rclone: Version "v1.51.0" starting with parameters ["rclone" "move" "--ignore-existing" "-vv" "sourcefile" "duobase:test-bucket-1"]
2020/05/25 11:53:27 DEBUG : Using config file from "/root/.rclone.conf"
2020/05/25 11:53:27 DEBUG : sourcefile: Destination exists, skipping
2020/05/25 11:53:27 INFO  : sourcefile: Deleted
2020/05/25 11:53:27 INFO  :
Transferred:             0 / 0 Bytes, -, 0 Bytes/s, ETA -
Checks:                 2 / 2, 100%
Deleted:                1
Elapsed time:         0.0s
2020/05/25 11:53:27 DEBUG : 5 go routines active
2020/05/25 11:53:27 DEBUG : rclone: Version "v1.51.0" finishing with parameters ["rclone" "move" "--ignore-existing" "-vv" "sourcefile" "duobase:test-bucket-1"]

...result is that my local unique "sourcefile" is deleted. Nothing changes on destination server, it keeps its old file.

--ignore-existing folder demo on source host:

date > sourcefile
cat sourcefile
Mon May 25 12:05:00 AEST 2020
rclone move --ignore-existing -vv /tmp/rclone-push/ duobase:test-bucket-1
2020/05/25 12:06:04 DEBUG : rclone: Version "v1.51.0" starting with parameters ["rclone" "move" "--ignore-existing" "-vv" "/tmp/rclone-push/" "duobase:test-bucket-1"]
2020/05/25 12:06:04 DEBUG : Using config file from "/root/.rclone.conf"
2020/05/25 12:06:05 DEBUG : sourcefile: Destination exists, skipping
2020/05/25 12:06:05 INFO  : S3 bucket test-bucket-1: Waiting for checks to finish
2020/05/25 12:06:05 INFO  : sourcefile: Deleted
2020/05/25 12:06:05 INFO  : S3 bucket test-bucket-1: Waiting for transfers to finish
2020/05/25 12:06:05 INFO  :
Transferred:             0 / 0 Bytes, -, 0 Bytes/s, ETA -
Checks:                 2 / 2, 100%
Deleted:                1
Elapsed time:         0.0s

2020/05/25 12:06:05 DEBUG : 5 go routines active
2020/05/25 12:06:05 DEBUG : rclone: Version "v1.51.0" finishing with parameters ["rclone" "move" "--ignore-existing" "-vv" "/tmp/rclone-push/" "duobase:test-bucket-1"]

...result is same as with a single file... my local unique "sourcefile" is deleted. Nothing changes on destination server, it keeps its old file.

Are you using encrypted remotes as well or are they just regular remotes?

Also, I think from reading through the docs and request, it is supposed to only work on directory copies from what I can tell.

I think @ncw might be able to add some input but I don't think the docs are clear as least from immutable.

That looks correct to me based on what was specified. What would you expect to happen instead?

I think this is a bug. --immutable should never allow destination files to be overwritten.

Can you please make a new issue on github about this?

It is happening because the move a single file shortcuts the full sync routine.

Hmm, that probably is working properly. What the --ignore-existing flag does is when rclone is comparing the source and destination to see if they need transferring it says that if the destination already exists then we don't need to transfer the source. After the copy phase move will delete any files which are deemed to be in the destination already. By adding --ignore-existing you've said any files which exist even if different in the destination are OK so the source can be deleted.

This behaviour is exactly as required for sync and copy but it seems pretty non-intuitive for move, I would agree.

So if we made --ignore-existing not delete files if they exist that would satisfy your requirement.

Another alternative is to error out if --ignore-existing is set for move as being counter-intuitive...

^^ To address this use-case I wrote difflist and difflist2 scripts. But it would be a great addition if we could control source-side delete behavior selectively with rclone itself, when running rclone move.

  • difflist uses rclone check (hash check) to create a list of files in source: that are missing or not identical in destination.

  • diffmove will do a rclone move of the differential list from difflist. The files in the source that are already in the destination are left untouched. [ So does not address OPs specific case, yet.]

  • difflist2 uses rclone tree (no hash check, names only) to create a list of files or folders that are in source but not in destination. I wrote difflist2 for a special use case (compare folder trees) but it can be easily modified to work with diffmove to allow moving only file names from source that do not exist in destination.

This is a short, hacky solution for an admittedly edge use-case :slight_smile: . But it works for now.

If rclone could omit source-side deletes with a flag (?) that would pretty much eliminate the need for these short scripts.

1 Like

So instead of overriding --ignore-existing make a move specific flag which does something like "only delete source files if after transferring destination was identical"?

Or change the behaviour of --ignore-existing so that it doesn't delete the source files if we've had a move.

I'm coming round to the idea that we should probably do that for --ignore-existing otherwise there is potential for data loss.

Do you want to please make a new issue on github about changing the behaviour of --ignore-existing?

There are a few possible use-cases. What you describe would address most of them.
Not sure if this would address the OPs use case?

Nice diagram :smiley:

I think it isn't quite right though. I think the change to the logic would be if we didn't transfer a file then we would no longer delete it. A file counts as transferred if there was already an identical file in the destination or it was copied. This is the logic used without --ignore-existing

I edited this on my phone so apologies!

Hi ncw. It's always hard to sort through multiple examples ... scratching our heads to see if we (meaning me) articulate things correctly.

In your edit above it looks like move with --ignore-existing would delete files in the source if they already exist in the destination (which is the current behavior). This is the same behavior that occurs without using the --ignore-existing flag (Base case).

I may have misunderstood, but I thought one objective was to have a flag ( --ignore-existing or a new one) which would leave source files in place if an exact hash-matched file existed already in the destination (in the second set, Add --ignore-existing flag? Essentially ignoring the delete phase of a move operation.

Then in the fourth set where A' has the same name as A but a different hash, move --ignore-existing would move A over as it exists but is not identical.

===
It may just be that I was thinking --ignore-existing would leave any same-name-hash-identical file alone. Where you were thinking that --ignore-existing would leave any same-name but hash-different file alone.

I think both cases are valid. Would it be possible to allow either/both behavior with appropriate flag(s)?

This is what I had in mind (which would imply a change in the behavior of --ignore-existing)

I'm not sure, but I think this is what you are describing (with or without the new flag I inserted at the bottom of the chart).

:slight_smile:

That was my initial thought but...

...I don't think the rules of --ignore-existing with move are well defined at the moment so we can define them however they make the most sense!

Sure! However lets concentrate on getting the behaviour of --ignore-existing + move that we want first.

Looking at the second half of your diagram, I think that it is up for discussion what --ignore-existing does when there is an existing file. It would be consistent to say that if a file exists on the destination then we don't delete the source. Easy to explain. That is the equivalend of the last row in your diagram

I'd be happy with that behaviour to be the default behaviour of --ignore-existing and move.

Maybe you could explain why you would find --ignore-same-hash useful on its own? What problem are you trying to solve? It doesn't cut down on the transfer as rclone will check the hashes and not transfer it.

It's a good question :slight_smile: Basically it is the complement of the use-case that OP describes.

  • If files have the same name, and the content is exactly the same, then nothing changes.
  • If files have the same name, but the content is different, the source file is copied to destination.

A simple example:
Say you have 3 remotes (rem1 , rem2 , rem3).
We want to keep the content for all three mirrored.
Copying from rem1 to rem2 is "cheap".
Copying from rem1 or rem2 to rem3 is "expensive".
Moving from rem2 to rem3 is "cheap".

In the above examples we want A to overwrite A'.
But ideally we want B to not be deleted from rem2 because it already exists on rem3. This will save one extra copy/transfer when rem1 is next synced to rem2.

I can see the use-case for both instances. Just happens that this is one I use :slight_smile:

No, I'm not using any kind of encryption. It's just a Minio server with default settings, and a simple/default S3-compatible rclone remote config.

Yes, I agree. I really did not expect data loss from an innocuous sounding flag like --ignore-existing

I would expect --ignore-existing to ignore it :slight_smile: ...in other words just skip it and take no action on that file.

I was very surprised that the verb "ignore" could be interpreted as "delete my one-and-only copy of unique files". Especially when not only is the content different, but the filesize and modtime are all different too.

I can't imagine that wanting your unique files deleted like this is a very common use case. But even if some people do want to do that for some reason, I think there should be a specific additional flag required to do that... which has a sufficiently scary/dangerous sounding name, likely including the word "delete". Although I'm not sure that enough people would actually want this to warrant much effort being put into it. I think it's more important to just make the default functionality more sane to avoid unexpected data loss. If I really wanted unique files deleted for some reason, I'd likely be scripting that myself anyway... and very very carefully.

This is, alas, what rclone sync/copy/move do in their natural state. The 99.99% use-case where newer/changed/updated files in the source overwrite unique older files in the destination.

I think the discussion here revolves around the definition of "existing" .

  1. Existing = any file with the same name.
  2. Existing = a file with the same name and same hash.

Personally I can see a use-case for both interpretations.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.