Which remotes support the rcat function?

What is the problem you are having with rclone?

I tried to use rclone rcat to a OneDrive remote on a cloud instance. The reason was because I was running out of space on the VM's storage, so I figured I could tar some files and upload them to cloud storage all in one go. I used a command like this:

tar -czf - /path/to/lots/of/files | rclone rcat my_onedrive:backups/from_vm.tar.gz

This failed - I don't have the exact error message in front of me, but it was clear that the reason was because the VM didn't have enough space to contain both the data and a temporary file. It looks like rclone tried to make a temporary file in /tmp, which filled up the root filesystem completely, and rclone exited with failure with no data having been uploaded. So it looks like with OneDrive, the rcat command doesn't actually work and rclone falls back to creating a local file before uploading. Unfortunately this meant I had to purchase a temporary additional block storage volume to place the file on.

I don't remember the specific sizes, but for the sake of example, assume the VM has 100GB of storage. There is only 10GB free. There are 30GB worth of files that are safe to delete from the VM but need to be archived, so rcat seemed to be the natural choice. The folder in question contained tens of thousands of small files, all of which were highly compressible, so simply copying/syncing the files as-is with rclone would have both been inefficient since it would have required thousands of upload requests, thus hitting OneDrive API request limits, but also the files would be uncompressed on the remote, taking up extra space. Thus, the goal was to simultaneously tar, gzip and stream the files to a single file on OneDrive.

My actual question is: which remotes actually support the rcat command, and can actually stream the data up to the cloud provider without any intermediate temporary files?

The documentation for rcat simply states that the behavior will depend on the remote, and the OneDrive remote's page doesn't contain any specifics about rcat or any mention of it not being supported.

Run the command 'rclone version' and share the full output of the command.

rclone v1.58.1-DEV

  • os/version: alpine 3.16.2 (64 bit)
  • os/kernel: 5.15.69-0-lts (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.18.6
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

In this specific case, OneDrive. But I'm looking for more generic advice as to how to figure out which remotes support which features, in particular rcat.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

tar -czf - /path/to/lots/of/files | rclone rcat my_onedrive:backups/from_vm.tar.gz

The rclone config contents with secrets removed.

[my_onedrive]
type = onedrive
token = redacted
drive_id = redacted
drive_type = personal

A log from the command with the -vv flag

I would have to recreate the specific situation - i.e. trying to upload data with not enough free space on /tmp - if it's REALLY required that I get the exact error message let me know and I'll startup another VM instance and generate some test data, but as I stated I dealt with the issue by paying for additional storage in the VM, manually tar'ing the data, then uploading with rclone in a traditional fashion (rclone copy).

Hi, welcome to the forum!

The quick answer to this is that you can look at the StreamUpload column in the following table: Overview of cloud storage systems.

Regarding your use case in general, I'll leave giving further advice to some of our more experienced users..

Hi fmillion,

I can understand your surprise and recreated in small scale:

$ echo "Hello World" | ./rclone rcat OneDrive:testfolder/testfile --streaming-upload-cutoff 1b -vv
2022/10/26 09:22:34 DEBUG : rclone: Version "v1.56.1" starting with parameters ["./rclone" "rcat" "OneDrive:testfolder/testfile" "--streaming-upload-cutoff" "1b" "-vv"]
2022/10/26 09:22:34 DEBUG : Creating backend with remote "OneDrive:testfolder/"
2022/10/26 09:22:34 DEBUG : Using config file from "/home/..../rclone/rclone.conf"
2022/10/26 09:22:34 DEBUG : fs cache: renaming cache item "OneDrive:testfolder/" to be canonical "OneDrive:testfolder"
2022/10/26 09:22:34 DEBUG : One drive root 'testfolder': Target remote doesn't support streaming uploads, creating temporary local FS to spool file
2022/10/26 09:22:34 DEBUG : Creating backend with remote "/tmp/rclone-spool578502663"
2022/10/26 09:22:34 DEBUG : testfile: Size and modification time the same (differ by 0s, within tolerance 1s)
2022/10/26 09:22:34 DEBUG : testfile: Starting multipart upload
2022/10/26 09:22:35 DEBUG : testfile: Uploading segment 0/12 size 12
2022/10/26 09:22:35 DEBUG : testfile: sha1 = 648a6a6ffffdaa0badb23b8baf90b6168dd16b3a OK
2022/10/26 09:22:35 INFO  : testfile: Copied (new)
2022/10/26 09:22:35 DEBUG : 8 go routines active

Note the DEBUG message: "Target remote doesn't support streaming uploads, creating temporary local FS to spool file".

You would probably have liked it to be a NOTICE to make it more visible, but there may be good reasons not to do that in piped situations. I don't know.

Now that you know what it does you can perhaps break your tar down to smaller pieces to allow for the use of temporary storage.

That is a very useful table, thank you!

In the future it looks like I could maybe use Azure blob storage as a temporary drop point. Blob storage billing (IIRC) is very granular, like gigabyte-second billing, so as long as the blob doesn't stay in storage for very long (it wouldn't need to), it won't cost much - for that 30GB file, and assuming it has to stay in blob storage for say 15 minutes for the transfer to OneDrive, it'd probably be less than a cent. Bandwidth on my cloud VM isn't a big deal so the extra round trip wouldn't be a concern.

What I'll need to do some further testing on is whether rclone can move between Azure blobs and OneDrive without downloading and storing the file locally first, because that would end up defeating the whole purpose of this exercise.

I noticed that apparently you can provide the file size to rcat (if you know what it will be) and then rcat works more just like a regular upload (I think?) and should work for any remote. Obviously I can't know the total size of the file after compression without first compressing it, so it wouldn't work to do that for the initial upload, but since we will already know the file size when it is pushed to Azure blobs, we should be able to move it over to OneDrive without caching the entire file locally?

OneDrive storage is definitely cheaper by the GB for long-term archival (this isn't a business use case, just some personal hosting and such), so just moving to blobs overall might not be the most economical option long-term, but hey.

Thanks!

Yeah, thanks for pointing out the debug message. I'll definitely remember to do a -vv next time I encounter weird behavior before running off to find some other solution :slight_smile:

You don't need local storage for

rclone copy Azure: OneDrive:

but it will download and upload.

Now 30GB isn't much so perhaps you have another machine that you can setup as (streaming) SFTP server something like this:

rclone serve sftp yourSFTPserver:

and then do

tar -czf - /path/to/lots/of/files | rclone rcat yourSFTPserver:from_vm.tar.gz
rclone copy yourSFTPserver: OneDrive:

If you don't mind doing the tar gz twice (and you are sure the data you are tarring won't change), then you could do the tar into ,wc -c first to figure out the size, then again into rcat with the size flag. This will work without a temporary file on all backends.

Not a bad idea. That assumes that there is absolutely no way the data will be different between the two tar operations - it shouldn't be if nothing is touching the data, but there's the possibility of things like background indexer processes touching files and changing modified dates and such that could affect the actual size of the final tarfile. Doing the tarfile uncompressed mitigates that even further. I'll do some experimenting with that idea though - the key point I'm taking from this is that if you specify the size to rclone, it can stream to any remote, which is still useful!

rclone mount is an interesting idea too. But that does explain why rclone recommends vfs-cache-mode writes for remotes like OneDrive.

The upload/download bandwidth isn't an issue on my cloud servers, but one thing I can't seem to get a clear answer on (I guess I'm not a high-enough-ranked business customer for Azure to actually answer my billing questions) is how Azure blob storage is billed. The pages all talk about GB per month, but what if you're storing a large amount of data but only for a very short time? Basically I'm trying to figure out the granularity of the billing system - does it look at gigabyte-seconds, gigabyte-minutes, etc? If it's gigabyte-months with rounding up, then uploading 50GB, deleting it immediately, and doing that 100 times would get you charged for 5TB of blob storage (!). Or perhaps it's "the maximum amount you had stored at any one time during the month" which would still mean 50GB for one month even if it only actually existed on their storage for a very small percentage of that time. I can't get a straight answer from anyone, and it's also something where I can't just "try it myself" because it has potential to be costly, and it would take a minimum of a month to even know the results. A little frustrating :slight_smile:

Agree, but you may have missed the first part of my point:

I proposed to setup an SFTP server on another machine with 30GB free disk space and then stream your tar directly to the local disk of that machine. After that you can make a normal rclone copy to OneDrive. So no use of mount or vfs-cache, but you are right you could also make the SFTP server send the data to OneDrive using --vfs-cache-mode writes.

I think this quote answers you question very precisely:

Data storage and metadata are billed per GB on a monthly basis. For data and metadata stored for less than a month, you can estimate the impact on your monthly bill by calculating the cost of each GB per day. ... The number of days in any given month varies. Therefore, to obtain the best approximation of your costs in a given month, make sure to divide the monthly cost by the number of days that occur in that month.

Source: https://learn.microsoft.com/en-us/azure/storage/common/storage-plan-manage-costs

I read "GB per day" as "Max GB each day" in this context. Remember to add operations (API transactions), Data Transfer etc on top of the pure Storage Price.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.