Chunker on Wasabi: removing aborted upload, orphaned chunks

folly · January 27, 2022, 2:37pm

What is the problem you are having with rclone?

I'm using rclone with Wasabi for backups and storage. I have one bucket which I only use for rclone. Now I have the problem that the size of my bucket shown by Wasabi is way larger than the size of all my files if I run rclone size wbac: or aws s3 ls --summarize --human-readable --recursive s3://<bucket> --endpoint-url=https://s3.eu-central-1.wasabisys.com .

Now how I think how I got this problem. Every time I start my pc I run a script which uses copy and sync to backup some folders of mine. One file I have is 36GB large. Since my internet connection is to slow and I didn't let my pc run long enought, this file couldn't be uploaded at one time. This resulted in a total of about 300 GB of encrypted chunk files, since reclone did always start the procress of uploading again. I did not realize this problem untill a few days ago, when I was wondering why my bucket is so large. I then used Wasabi explorer to delete these files manually. I think this was the problem. Now if I use rclone or aws cli to show the size of my bucket it is 300GB smaler but the size shown by wasabi is the same as before.

I wrote the wasabi support and got the following answer:

This is caused by the way the backup strategy/process of the application you are using interacts with our system.
Whenever the same object body is uploaded using certain backup applications, links and composing objects are created in the database.
Complete info of how the composing objects work can be found here: Wasabi API Reference Center

Some backup applications like Veeam, Commvault, MSP360, Altaro, etc. use a different backup strategy, and they do not use the exact same object body which circumvents creation of composed objects on Wasabi.
We would recommend using those applications to reupload data from the bucket(s) you are experiencing this issue with to new buckets.
We could have your old bucket(s) waived for any deleted charges once you successfully re-upload data to new buckets, so you do not get billed for that going forward.

You may decide to continue using your current application for your backups, but please keep in mind that due to its backup strategy, composing objects will be part of your bucket, and hence they would not appear exactly in the utilization stats that you see in that or other s3 applications, those become internal DB links and function of that as mentioned in our API guide (above).
Let us know how you would like to proceed.

Do you have any idea how to maybe fix this problem without following the solution proposed by the support and uploading everything in a new bucket? Is it possible to delete or manipulate the wrongfull links in the database using the wasabi api or something else?

I'm currently still in contact with the support and will let you know if I have a new solution.

As for the original problem of uploading my 36GB file, I managed to do this by implementing the same crypt and chunker configuration locally and than used this to encrypt and chunk my file. Aferwards I manually uploaded the 400MB files step by step to the appropriate folder.

Run the command 'rclone version' and share the full output of the command.

rclone v1.56.1

Which cloud storage system are you using? (eg Google Drive)

Wasabi

The rclone config contents with secrets removed.

[wasabi]
type = s3
provider = Wasabi
access_key_id = ******************
secret_access_key = ******************
region = eu-central-1
endpoint = s3.eu-central-1.wasabisys.com

[wbaccrypt]
type = crypt
remote = wasabi:\<bucket>\crypt
password = ******************
password2 = ******************

[wbacchunker]
type = chunker
remote = wbaccrypt:chunker
chunk_size = 400Mi

[wbac]
type = alias
remote = wbacchunker:.

asdffdsa · January 27, 2022, 2:50pm

hello and welcome to the forum,

i suggest that you rename the title of this topic, as it has little/nothing to do with wasabi and size problem.

the issue is about an aborted upload to a chunker remote and how to remove orphaned chunks.

folly · January 27, 2022, 2:56pm

Done, thank you for the suggestion.

asdffdsa · January 27, 2022, 2:57pm

that is not possible, as rclone uses the official aws s3 go library.

another example, is that wasabi has an api to move a file, which would not trigger wasabi retention period cost.
rclone does not implement that.

imho, and i do not think this is a wasabi issue
--- orphaned chunks can occur with any provider, including local.
--- rclone does not support the wasabi composing api.

as per rclone docs
"If rclone gets killed during a long operation on a big composite file, hidden temporary chunks may stay in the directory. They will not be shown by the list command but will eat up your account quota"

ncw · January 27, 2022, 4:28pm

I think what Wasabi support is saying is that these files, even though you've deleted them, appear as part of your quota because of the Wasabi 90 deletion rule so you'll see that extra space until those files become 90 days old. Waiting for this to expire will cost you approx 3*6*0.3 = $5 (if I did my maths right!).

I think you've done the deletion now, so you should asked to "have your old bucket(s) waived for any deleted charges" now.

asdffdsa · January 27, 2022, 4:54pm

as a side note, the default retention period at wasabi is 90 days.

if you store veeam backups files, wasabi, if requested, will reduce that retention period of 30 days,
for all files, not just veeam backup files.

folly · February 8, 2022, 11:30am

Thank you for the answers and tips. Here is my solution which works for me and maybe helps ohters.

After your second answer @asdffdsa I read the chunker docs again carefully and found

If rclone gets killed during a long operation on a big composite file, hidden temporary chunks may stay in the directory. They will not be shown by the list command but will eat up your account quota. Please note that the deletefile command deletes only active chunks of a file. As a workaround, you can use remote of the wrapped file system to see them. An easy way to get rid of hidden garbage is to copy littered directory somewhere using the chunker remote and purge the original directory. The copy command will copy only active chunks while the purge will remove everything including garbage.

So I thought I could use the suggested solution. I did copy my folder to a new one and purged the old one

rclone copy wbac:<folder> wbac:<new_folder>

rclone purge wbac:<folder>

Then I waited a day to let wasabi update the bucket size, but it was the same as before . So the suggested solution did not work for me!

The next thing I did was the solution. I used aws cli to copy all data from my <bucket> to a new <bucket-new>

aws s3 cp s3://<bucket>/crypt s3://<bucket-new>/crypt --recursive --endpoint-url=https://s3.eu-central-1.wasabisys.com

I changed my rclone config so that everything was pointing to the new bucket. Now <bucket-new> has the correct size. I than asked the wasabi support to have my deleted storage charges of the old bucket waived and they kindly did that.
Then I had one last problem. I don't know why, but for some files (less than 10%) the modification date has been changed. I suspect by the aws s3 cp command. Therefore I used rclone sync path/to/local/folder wbac:<folder> --refresh-times to have the correct modification time. This stopps rclone from uploading all files with a falsely changed modification date to upload again. And now everything is as I want it to be .

ncw · February 8, 2022, 2:36pm

@ivandeex maybe we should make a backend command for chunker which fsck's the file system, deleting any orphaned chunks?

ivandeex · February 8, 2022, 3:06pm

@ncw
I believe we don't need to add a special command. We can just improve rclone cleanup implementation in the chunker backend. Or what do you mean?

Note. Chunks can be orphaned or belong to ongoing uploads. We will need to triage them based on the modtime, I guess.

ncw · February 8, 2022, 8:32pm

Duh yeah, rclone cleanup is the right command!

Triage based on mod time is what the s3 backend does, though it uses upload start time which is even better.

folly · February 8, 2022, 10:14pm

I don't know if it is any help, I forgot to mention that I also did try rclone cleanup, but it also didn't solve the problem and just deleted two other unrelated orphaned files.

ivandeex · February 9, 2022, 11:08am

it's not clever enough atm
in a future release it will... when someone (me?) improves it
the best workaround for now is to server-side copy the directory containing garbage, so orphaned chunks are skipped, then purge the original.

ivandeex · February 9, 2022, 11:11am

which becomes another candidate for inclusion into metadata breaking backwards compatibility
i expected that and it's the key reason i keep the backend marked as beta for so long

ncw · February 9, 2022, 11:21am

Yes that would be a good idea.

Falling back to modtime would be OK for backwards compatibility.

system · February 12, 2022, 11:21am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.