When is the temporary directory used exactly?

What is the problem you are having with rclone?

I am trying to understand when the temporary directory is used in rclone. More specifically I am trying to figure out if rclone uses streams exclusively when I am using the rc sync/copy API or not.

In addition, in my case, I am also using the crypt adapter with multiple combinations (e.g. attached to source only, attached to target only, attached to both). What happens then?

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.2

  • os/version: darwin 12.6 (64 bit)
  • os/kernel: 21.6.0 (arm64)
  • os/type: darwin
  • os/arch: arm64
  • go/version: go1.19.1
  • go/linking: dynamic
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

OneDrive, Dropbox, GoogleDrive, S3, HTTP and the Crypt backend.

Thank you for your time.

Not sure as that's a pretty generic question.

If you copy and sync, generally nothing is stored locally on a file system.

Are you doing something from remote to remote? local to remote?

For starters, thanks for the reply.

I understand that it's a bit generic but it's crucial for me to understand if the temporary directory is used during copies because there will not be a lot of local disk space available and I want to avoid it.

Are you doing something from remote to remote? local to remote?

I am only doing copies and moves from remote to remote (e.g OneDrive to Google Drive, or HTTP to Google Drive, Google Drive to Dropbox, etc), and usually, a Crypt layer is attached on either side or both sides. I do not use the local filesystem at all.

If you copy and sync, generally nothing is stored locally on a file system.

Any case that comes into mind that might be an exception to this?

For example, from what I know, OneDrive needs to know the size you are uploading beforehand. If you have a Google Drive as the source and a OneDrive with a Crypt layer as a target, how does rclone solves the fact that the size is unknown since the original data (from Google Drive) will be encrypted first (which means a new size) before ending up in OneDrive? Do the data get encrypted and cached somewhere first and then uploaded to OneDrive or is it streamed from Google Drive to OneDrive and encryption and content length detection happens somehow on the fly?

Feel free to go technical with me and point me to the source code if you have to!

Rclone supports way to many remotes to for me to be 100% sure. I'm not aware of anything stored locally for the remotes you've got mentioned.

I'm more a visual/do it type of person so why not just run a copy from onedrive to google drive and see?

File size is metadata stored on the remote so it knows the size of the file.

efelix@gemini:~$ rclone lsl GD:hosts
      510 2022-10-03 11:06:48.412000000 hosts
felix@gemini:~$ rclone copy /etc/hosts gcrypt:
felix@gemini:~$ rclone lsl gcrypt:hosts
      510 2022-10-03 11:06:48.412000000 hosts

File size will generally look the same with a some bytes for encryption:

Already did before posting and indeed nothing was getting written in the temporary directory but I wanted a more experienced opinion!

Indeed. But do you happen to know where in the source (for OneDrive specifically) this calculation happens? I am very curious to see how the ContentLength of the upload is calculated when there is also encryption in the middle. I have been looking around but I wasn't able to find it.

That's beyond me as I generally don't look at code.

Thank you for all your answers.

Hi arekanderu,

In these cases everything in handled in RAM, there are no temporary files - and RAM usage is primarily used for transfer buffers and directory listings.

This applies to all remote storage services.

There are few exceptions where rclone use local disk and they are described in the docs, examples are:
https://rclone.org/commands/rclone_mount/#vfs-file-caching
https://rclone.org/hasher/#cache-storage

rclone can calculate the sizes beforehand when needed in these situations - data transfer is streamed.

There are however situations where the downloaded size differs from the reported size, and that can probably be an issue if copying to a remote that needs the correct size upfront. Example: Copying iPhone Live Photos (.heic) from one OneDrive to another OneDrive.

Thank you Ole for getting back to me.

I am very curious to see how exactly is the calculation implemented when the size is unknown beforehand. Do you happen to know where in the code I can have a look?

How is the size unknown if it exists on the remote already?

The source remote will have the unencrypted size. After the file goes through encryption it will have a new size. As you have already mentioned you know the encryption algorithm used and therefore you might be able to calculate the new size on the fly since you know the padding bytes and everything. I am not sure how the whole thing happens if it's being streamed directly from source to target and wanted to check the code out of curiosity since I am missing something here.

Yes a bit larger, but predictable.

Right, but the source remote is 'known' and not 'unknown'. The crypt process happens on the fly and the extra data is what I shared in the link above.

code is at -> rclone/crypt.go at master · rclone/rclone · GitHub

Then I suggest you follow the contributing guide to build a local debugable copy and then place a few good breakpoints to see what happens when you copy a single file into a crypt.

The crypt file format is described in the docs.

The calculation in the code is here

The crypt encoding was designed so that if you know the size of the input you know the size of the output and vice versa. This is very important for maintaining streaming without using local disk for all backends.

The only time rclone copy will ever need to spool data to disk is

  • if the source size isn't known (eg from Google Photos, or a Google Doc) and the destination backend doesn't accept streaming uploads
  • the compress backend will use local disk if the backend it is wrapping can't stream uploads
  • the jottacloud backend will use locak disk if the source can't provide MD5 hashes

To see whether a backend can stream uploads check out the optional features table and look for StreamUpload. In rclone speak StreamUpload means that the backend needs to know the size of the upload before the upload starts.

Of OneDrive, Dropbox, GoogleDrive, S3, HTTP I think only OneDrive can't do StreamUpload.

So for your use, if you copy Google Docs you'll need local disk, otherwise not.

Thank you all for your replies. Things are much clearer now.

Also, based on all the information above, the following two lines will also be helpful to anyone looking for it:

Cheers!

That sentence confused me a bit, so to clarify (I hope) if it confuses others as well:

  • StreamUpload=No means that the backend needs to know the size of the upload before the upload starts

  • StreamUpload=Yes means the backend allow files to be uploaded without knowing the file size in advance, i.e. the backend supports what we call streaming uploads.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.