Native multipart upload for google cloud storage

First of all thanks for a great tool, I'm especially impressed by the vfs caching features :smile:

When the parallel multipart upload was implemented the focus seemed to be on s3 and related service. Native support for google cloud storage seemed to be discussed, but due to the extra complications using xml-api this was not prioritized.
It was suggested that the google cloud storage could maybe internally use the s3 compatibility layer or the user could switch to use this compatibility layer.

Is this still the status, or are there any plan to implement native gcs solution for multipart uplads? To me it sounds easier than trying to get the whole gcs implementation do things via the s3 layer, especially when it comes to handling authorization (I am aware of the efforts of providing an gcs provider for s3, but stall relies on generation of HMAC keys). It will require that there will be a few manual post/get/put http requests that deals with xml payloads, rather than explicit calls to gcs go api? I can make a more formal Feature request, but just wanted to check here first.

best regards
-Morten

GCS doesn't support chunked writing you are correct.

Here are the docs for multpart upload

There isn't too much to the API so I'd probably make API structs for the XML gubbins using a tool like https://xml-to-go.github.io/ on the examples and then use lib/rest and the CallXML method to submit them.

So sounds useful and not too difficult. Is that something you'd like to help with @morten ? Or maybe your company would like to sponsor it?

Thanks, at least I was pretty much on top of the status. I have been reading up on the multi-part xml-api since yesterday, but are more fluent in c++ than go. For my own education I'm putting together a c++ program that will cut a file in in chunks and upload the things in parallel. When I happy with that, I will go and look at go/rclone. I haven't even looked at building it, so it will take a round or two before I can promise to be up to speed on anything. Anyway from what I can see the encoding/xml library, should be able to create those xml messages from structs with xml.Marshal(), so I think that should be pretty straight forward

Rclone has machinery to do the XML encoding already in lib/rest, you'll find it a lot easier to do in Go than c++!

Go is like C without the hard parts, as a c++ dev you won't find it hard.

If you take a look at the WebDAV backend you'll see how to do it.

I got confused by the mentioning of the xmltogo... but yes I can see that can be used to see how the structures needs to be laid out.... at first I thought it was yet another xml library :smiley: Thanks for pointing me to the rest library, I think I will still need to figure out how to propagate the authorization "key" to Authorization element in the new xml requests...

I'm assuming that in theory whatever code changes needed (implementation of OpenChunkWriter and associated functions) will just go into backend/googlecloudstorage/?

I will still finish my c++ code first, so I can get an idea what speed I can possibly achieve

Yes I think that may be the hard part!

You'll need to find the auth key somehow and then set it in the lib/rest.

Yes that is correct. We might choose to put them in their own file if they are quite bulky but most backends don't.

Just a couple of observations:
The google cloud python library has a transfermanager, that can be called directly without going through the rest api your self. https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.transfer_manager#google_cloud_storage_transfer_manager_upload_chunks_concurrently This one uses the XML-API under the hood. Because threading in python is a bit problematic, they actually run parallel processes but that is a technicality

It also looks like some work have started on writing a similar transfermanager for google/storage go package, but currently only downloads are supported (using the normal api for reading objects). In the initial commit of this, the release notes mentions it is in preview but the intention for the transfermanager are both download and uploads https://github.com/googleapis/google-cloud-go/blob/main/storage/CHANGES.md#1420-2024-06-10

Rclone does parallel chunked downloads already. Implementing the OpenChunkWriter method is all that's needed for uploads and each of its methods will be a thin wrapper about 1 API call so I don't think there is any point using a library for this.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.