Error about corrupted transfer

Ahmet_Kartal · September 15, 2019, 4:00pm

I am using the last beta version of the rclone.(just downloaded)
First it gave me the file already closed. After I searched something I handled with that error. I chanced the code I used to transferr data.
This is my rclone code :rclone sync -vv s3_source:folder s3_dest:folder_for_dst --s3-upload-cutoff 0 --ignore-checksum
When I delete the --s3-upload-cutoff 0 command it gives me HTTP connection broken error.
This is the output of what I got when I type the rclone code :
2019/09/15 18:44:59 ERROR : Folder/File: corrupted on transfer: sizes differ 179203 vs 2996172
2019/09/15 18:44:59 INFO : Folder/File: Removing failed copy
2019/09/15 18:45:12 ERROR : Folder/File: corrupted on transfer: sizes differ 216668 vs 3187384
2019/09/15 18:45:12 INFO : Folder/File: Removing failed copy
2019/09/15 18:45:27 ERROR : Folder/File: corrupted on transfer: sizes differ 223652 vs 3276200
2019/09/15 18:45:27 INFO : Folder/File: Removing failed copy
2019/09/15 18:45:27 ERROR : Folder/File: corrupted on transfer: sizes differ 201749 vs 3069735
2019/09/15 18:45:27 INFO : Folder/File: Removing failed copy

Also, I got 4 files transferred from source to destination s3. But, when I check the size of the files with rclone ls s3_source:folder andrclone ls s3_dst:folder_for_dst two of the file are same size but the other two files size are different(destination files size bigger than source file size)

I don't know what is going on.

ncw · September 15, 2019, 5:47pm

I suspect the files might have Content-Encoding: gzip set. This means that the size in s3 is different from the size that rclone downloads which really confuses rclone!

Are s3_source and s3_dest different storage platforms or regions or something? If you could do a server side copy that would be quickest but you'd need to access them both through the same s3: remote.

Ahmet_Kartal · September 16, 2019, 6:31am

My files has ".gz" extension. So, I think first of your comment is correct about my problem. Even if they have different size can we tell that the content of the file is different? Does rclone add something or can something happend to my file during transferr? Can you offer me something to transferr ".gz" files?

For the second of your comment, can you clarify your words (doing a server side copy) a bit more?

ncw · September 16, 2019, 10:29am

I think adding the --ignore-size flag should work. You may need --ignore-checksum also, I'm not sure.

This issue is described here: Google Cloud Storage: Can't download files with Content-Encoding: gzip · Issue #2658 · rclone/rclone · GitHub (this is about gcs but s3 has the same issue).

Can you show me the config (excluding secrets) for s3_source and s3_dest and I'll explain more with examples

Ahmet_Kartal · September 23, 2019, 8:49am

Sorry for very late answering. I was busy with transferring other files. Now there is only this file need to be transferred.
This is my config file :

[s3_source]
type = s3
env_auth = false
access_key_id = XX
secret_access_key = XX
endpoint = ....digitaloceanspaces.com
acl = private

[s3_dst]
type = s3
env_auth = false
access_key_id = XX
secret_access_key = XX
endpoint = ....digitaloceanspaces.com
acl = public-read

I have seen--no-gzip-encoding code but I haven't tried it yet. Do you think that it can solve my problem?
If I add --ignore-size command, would rclone miss some files that I needed to transfer?

ncw · September 23, 2019, 2:33pm

I think it probably won't help, but give it a try! I think --ignore-size --ignore-checksum will get that last file transferred.

I presume the endpoints are in different regions? If so then you can't do a server side copy.

thestigma · September 23, 2019, 9:04pm

Relevant side-question: Would the same problem apply to regular old .zip files? (don't know exactly what tool was used to encode it).

Because I did note a problem a while back where a .zip file consistently got caught in an infinite upload-reupload-reupload loop when the VFS cache tried to move it. I didn't have time to dig into the logs at the time and kind of forgot about it. I very rarely use .zip format so I think it may have been the first time I tried to upload one.

Ahmet_Kartal · September 24, 2019, 7:03am

With this code:rclone copy -vv source_s3:folder dest_s3:folder --ignore-checksum --ignore-size --s3-upload-cutoff 0 --no-gzip-encoding rclone transferred my file correctly.
Without --no-gzip-encoding it worked but the total size was completely different. My source file has 91 MBytes but when it transferred without no-gzip command my destination s3 told me that it has 1 GB which is totaly lie
Thank you for your attention sir.

ncw · September 24, 2019, 11:10am

I'm struggling to work out what rclone should do here!

You have files which have been uploaded compress with Content-Encoding: gzip

Should rclone

download these files as compressed
download these files as uncompressed

Note that the size and checksum of the file rclone reads in the cloud refer to the compressed file, so in some ways option 1. would be most useful so rclone can check the size and checksum. However that is probably suprising to the user (and is not what gsutil does - it will decompress the file).

You want 1. to achieve your transfer. You also want the Content-Encoding set on the upload which rclone won't be doing at the moment either.

At the moment rclone is doing 2. but failing to check size and checksum.

Ahmet_Kartal · September 26, 2019, 6:36am

As I have mentioned before I had 2 files which are transferred. I am not sure whether or not it caused it but I had to change my destination folder. If i do not change it, it again tells wrong size.(Even though the 2 files size exactly the same)
I think downloading the files as compressed is the best option. Rclone should not care about a file whether compressed or not. If a user wants to download his file he need to get what he had at the source file.
For me as well. I had compressed file and I expect rclone to download without uncompressing it. If I want to download as uncompressed first I need to decompress my file in my source folder not rclone.

ncw · September 26, 2019, 9:07am

Thanks for that.. I will think further.

David_Western · November 25, 2019, 9:47pm

I have run into the same issue running rclone between S3 and Wasabi (an S3 API clone). The source bucket contains some Django/Python uploaded boto3 based files were they are compressed automatically and the content-encoding set to gzip.

rclone is downloading, uncompressing based on the content-encoding, then failing on the size mismatch compared to the original compressed version. In my opinion a "clone" should retrieve the exact bytes of the source file and place it on the destination. Maintaining the content-encoding and other meta tags would be necessary for that to work properly.

As it stands now, there is no work-around I've found. I can prevent errors with the --no-gzip-encoding flag but then the destination file no longer has the content-encoding set to gzip which causes browser clients to fail reading the new copy since they don't know the file is compressed without a content-encoding (these are .css / .js files).

ncw · November 26, 2019, 8:19am

Yes I haven't figured this out yet.

You should probably be able to upload uncompressed files with --ignore-size and --ignore-checksum.

Do you think if rclone had a -Z flag (as some have suggested) which

set the --no-gzip-encoding flag
set the content-encoding to gzip (provided the input was gzip encoded)
or gzipped the input if not

Is that about the right behaviour?

Trouble is I'm not sure that is the correct behaviour when copying from s3 -> local disk.

system · February 24, 2020, 8:19am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.