S3 server side move never deletes the original files

Zetto · June 26, 2018, 2:24am

I'm restructuring some of my data and need to do a number of "rclone move" on my S3 crypt remote to match the local changes. This works and successfully moves the data however it leaves behind a delete marker and a version of the file (versioning is disabled on the S3 bucket). My expectation from the documentation is that S3 does not support move so it will do a copy and then a delete but it never deletes the files (I found this out the hard way in the past when my bucket roughly doubled in size).

Here is the command I'm running.

rclone move --transfers 100 --verbose encryptedremote:original/path/to/files encryptedremote:new/path/to/files

I noticed that the timestamp is updated on the files that are copies. Is it possible to maintain the original timestamp?

ncw · June 26, 2018, 3:20pm

What is a delete marker and how are you seeing it?

I attempted to replicate the problem like this

$ echo hello | rclone rcat s3:rclone-test-ssm-1/hello.txt
$ rclone move s3:rclone-test-ssm-1 s3:rclone-test-ssm-2
$ rclone ls s3:rclone-test-ssm-2
        6 hello.txt
$ rclone ls s3:rclone-test-ssm-1
$

That is correct. If you use -vv you can see rclone doing the copy then the delete.

2018/06/26 16:18:27 DEBUG : rclone: Version "v1.42-022-g187e1e3a-config" starting with parameters ["rclone" "-vv" "move" "s3:rclone-test-ssm-1" "s3:rclone-test-ssm-2"]
2018/06/26 16:18:27 DEBUG : Using config file from "/home/ncw/.rclone.conf"
2018/06/26 16:18:27 INFO  : S3 bucket rclone-test-ssm-2: Waiting for checks to finish
2018/06/26 16:18:27 INFO  : S3 bucket rclone-test-ssm-2: Waiting for transfers to finish
2018/06/26 16:18:27 INFO  : hello.txt: Copied (server side copy)
2018/06/26 16:18:27 INFO  : hello.txt: Deleted
2018/06/26 16:18:27 INFO  : 
Transferred:      0 Bytes (0 Bytes/s)
Errors:                 0
Checks:                 1
Transferred:            1
Elapsed time:       600ms

2018/06/26 16:18:27 DEBUG : 8 go routines active
2018/06/26 16:18:27 DEBUG : rclone: Version "v1.42-022-g187e1e3a-config" finishing with parameters ["rclone" "-vv" "move" "s3:rclone-test-ssm-1" "s3:rclone-test-ssm-2"]

I don't understand what is going on here! Is it some setting on your bucket?

rclone's timestamp should be preserved. Did you mean the s3 last modified? It isn't possible to set that and I think it is entirely up to s3 whether it is preserved or not (but I may be wrong!).

$ rclone lsl s3:rclone-test-ssm-1
        6 2018-06-26 16:19:32.628021177 hello.txt
$ rclone move s3:rclone-test-ssm-1 s3:rclone-test-ssm-2
$ rclone lsl s3:rclone-test-ssm-2
        6 2018-06-26 16:19:32.628021177 hello.txt

Zetto · June 27, 2018, 4:43am

From my research delete markers only apply to buckets with versioning enabled or suspended. When I created the bucket I enabled versioning prior to adding the files using rclone. I've since disabled versioning but that means it's now "suspended". To resolve this I may need to create a new bucket and move all of my files. Could we possibly add a flag to rclone when doing a move or delete on a versioning enabled/suspended bucket to delete the file by version ID so the delete marker is not created?

Here is S3 documentation on versioning and delete markers.

Using Versioning

Versioning-enabled buckets enable you to recover objects from accidental deletion or overwrite. For example:

*If you delete an object, instead of removing it permanently, Amazon S3 inserts a delete marker, which becomes the current object version. You can always restore the previous version. For more information, see Deleting Object Versions.

Managing Objects in a Versioning-Suspended Bucket

When you suspend versioning, existing objects in your bucket do not change. What changes is how Amazon S3 handles objects in future requests. The topics in this section explain various object operations in a versioning-suspended bucket.

Adding Objects to Versioning-Suspended Buckets

Once you suspend versioning on a bucket, Amazon S3 automatically adds a null version ID to every subsequent object stored thereafter (using PUT, POST, or COPY) in that bucket.

If a null version is already in the bucket and you add another object with the same key, the added object overwrites the original null version.

If there are versioned objects in the bucket, the version you PUT becomes the current version of the object. The following figure shows how adding an object to a bucket that contains versioned objects does not overwrite the object already in the bucket. In this case, version 111111 was already in the bucket. Amazon S3 attaches a version ID of null to the object being added and stores it in the bucket. Version 111111 is not overwritten.

Deleting Objects from Versioning-Suspended Buckets

If versioning is suspended, a DELETE request:

*Can only remove an object whose version ID is null
Doesn't remove anything if there isn't a null version of the object in the bucket.

*Inserts a delete marker into the bucket.

Working with Delete Markers

A delete marker is a placeholder (marker) for a versioned object that was named in a simple DELETE request. Because the object was in a versioning-enabled bucket, the object was not deleted. The delete marker, however, makes Amazon S3 behave as if it had been deleted.

Delete markers accrue a nominal charge for storage in Amazon S3. The storage size of a delete marker is equal to the size of the key name of the delete marker. A key name is a sequence of Unicode characters. The UTF-8 encoding adds from 1 to 4 bytes of storage to your bucket for each character in the name.

Removing Delete Markers

To delete a delete marker, you must specify its version ID in a DELETE Object versionId request. If you use a DELETE request to delete a delete marker (without specifying the version ID of the delete marker), Amazon S3 does not delete the delete marker, but instead, inserts another delete marker.

ncw · June 27, 2018, 8:28am

That is very helpful thank you.

I think the thing to do next would be to make an make a new issue on github. If you could cut and paste the above in that would be very helpful (if you edit your post you can copy the raw markdown which will work in github too).

I suppose the question would be whether to do a full integration like the b2 versioning or not?

I guess there is probably a way of rclone detecting whether versioning is enabled (or suspended) in a bucket so it could all be automatic.