Chunking a tarball

Question about the rclone chunker—

Basically I want to roll the entire contents of my gdrive in to a tarball and then use the fact that gdrive allows the current download or upload operation occurring when a quota is reached to complete to bypass the 750GB limit. Once uploaded to gdrive, if I use rclone mount, is the rclone chunker able to address, chunk, and operate on a tarball without downloading the entire tarball?

This is what I want to do:

Upload an 8TB tarball.

(This will be allowed, despite the 750GB limit, because gdrive will not terminate an active upload even when the quota surpasses 750GB)

Mount my gdrive with rclone mount

Perform any future operations as extractions, additions, or deletions on the tarball itself. This will let the rclone chunker handle upload and download operations such that I can stay within bandwidth limits with reasonable reliability. I can adjust adjust the chunk size in an effort to make this more efficient, rather than use lots of service accounts.

Bonus: If I ever need to pull the tarball off of gdrive, I can do so without worrying about the 750GB cap due to that download being a single operation (just as the upload).

Hopefully that was clear… let me know if I can restate that in a better way.

Thanks!

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

Paste command here

Rclone mount [foo]

The chunker remote doesn't do that as it's documented here:

Chunker (rclone.org)

It's for chunking the uploads which then can be downloaded.

There isn't a remote that does block level items that you are asking about to my knowledge so not sure any of that would work.

Is it possible to do this with just the vfs layer or is there an architectural limitation somewhere?

My understanding is that when skipping around in, say, video files, rclone is not required to pull down the whole file, but is able to jump to the proper block and stream the file starting at a given point, yes?

Sorry as I wasn't specific enough.

To add/delete, you have to block level change stuff which won't work as cloud backends that I'm aware of, do not support that. To replace part of the file without rewriting the whole file, that's usually done at the block level.

I was perusing this article for some data:
Block-Level File Copying and the Cloud in 2022 (cloudwards.net)

If you have to seek around a file, yes, that's correct as that's 1 out of 3 scenarios you've described. I believe a tar file though reads from start to finish to find something though.

Chunker remote isn't this use though to be clear.

Best thing I did was moving away from Google. No upload or download limits.

Okay thanks for clarifying/docs. Chunker is definitely not the solution.

However, and it looks like modifying the file is not possible. However, what about mounting as read-only and extracting individual files as necessary? This may require a technical understanding of tar that I don’t have because (I think) if the entire file must be read to create an index of its contents, then that will require a full download?

The limitations of gdrive are cumbersome but the price per GB really can not be beat. Glacier isn’t really an option for me…

I wonder if a deduped backup (e.g. Borg) would make this possible layered on top of the rclone vfs layer?

I feel pretty good that tar just reads until it find what you are asking.

That’s easy to test with a big file to validate as most old backup / tape in the Unix work all used tar back in the day.

Chunker can't do this, but if you are willing to upload in a zip file then the experimental (not merged yet) zip backend can:

You'd upload a zip of all your existing files and rclone can access the files within the zip without downloading the whole thing.

v1.58.0-beta.5990.02faa6f05.zip-backend on branch zip-backend more info: Streaming archive/unarchive capabilities · Issue #2815 · rclone/rclone · GitHub

However it can't write to the zip file one written (very few cloud storage providers allow that) so probably isn't exactly what you want.

This is what I want to do:
Upload an 8TB tarball.
(This will be allowed, despite the 750GB limit, because gdrive will not terminate an active upload even when the quota surpasses 750GB)

Google drive limit for a single file is 5TB https://support.google.com/drive/answer/37603?hl=en

Not exactly what I want but close enough! rclone can seriously do everything. lol

Will this work on “split” zips? Since I am going to hit the 5TB limit that another user posted above

eg ‘zip -r -s 4T ~/archive.zip ~/foo/‘

Not seeing that in the docs, but could be missing it.

If you've got 16TB of free disk space it is easy....

If you don't then it is going to be quite hard

Yes, rclone can stream only the needed parts of a zip file even if it is parts from multiple files. You can also use any other file archiver that creates files with indexes this includes 7z, rar, zip but not tar because it does not have indexing and for that reason the entire tar will need to be downloaded to build the index. You can even have a container vhd, vhdx, iso or any file, but once uploaded it can not be changed so you cannot easilly delete files from the archive for example.

Will rclone handle split zips?

E.g. the output of:

zip -r -s 4T archive.zip ~/stuff

Once you create a split zip files and move them to a remote with move/copy/sync you can mount the remote, point a archiver program to the files and extract the files you want, without needing to download the whole zip(s). There is still a big limitation of not being able to modify the zip files without a full reupload

You can experiment yourself if you want

So, Can you extract files from zip archives without a full download Yes
Can you remove files from a zip archive without a full upload No
Can you mount a remote and then create the zip on it No (zip file is not written sequentially and remotes need to be written sequentially)

Ahh, I should’ve been more clear; I was wondering if the zip backend would concatenate a split archive.

This still seems like the answer, I’m just trying to learn what to expect since the upload is substantial I want to be pretty sure what I’m getting in to here… haha. Thanks!

Not in the current version, no.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.