I am trying to understand when the temporary directory is used in rclone. More specifically I am trying to figure out if rclone uses streams exclusively when I am using the rc sync/copy API or not.
In addition, in my case, I am also using the crypt adapter with multiple combinations (e.g. attached to source only, attached to target only, attached to both). What happens then?
Run the command 'rclone version' and share the full output of the command.
rclone v1.59.2
os/version: darwin 12.6 (64 bit)
os/kernel: 21.6.0 (arm64)
os/type: darwin
os/arch: arm64
go/version: go1.19.1
go/linking: dynamic
go/tags: none
Which cloud storage system are you using? (eg Google Drive)
OneDrive, Dropbox, GoogleDrive, S3, HTTP and the Crypt backend.
I understand that it's a bit generic but it's crucial for me to understand if the temporary directory is used during copies because there will not be a lot of local disk space available and I want to avoid it.
Are you doing something from remote to remote? local to remote?
I am only doing copies and moves from remote to remote (e.g OneDrive to Google Drive, or HTTP to Google Drive, Google Drive to Dropbox, etc), and usually, a Crypt layer is attached on either side or both sides. I do not use the local filesystem at all.
If you copy and sync, generally nothing is stored locally on a file system.
Any case that comes into mind that might be an exception to this?
For example, from what I know, OneDrive needs to know the size you are uploading beforehand. If you have a Google Drive as the source and a OneDrive with a Crypt layer as a target, how does rclone solves the fact that the size is unknown since the original data (from Google Drive) will be encrypted first (which means a new size) before ending up in OneDrive? Do the data get encrypted and cached somewhere first and then uploaded to OneDrive or is it streamed from Google Drive to OneDrive and encryption and content length detection happens somehow on the fly?
Feel free to go technical with me and point me to the source code if you have to!
Already did before posting and indeed nothing was getting written in the temporary directory but I wanted a more experienced opinion!
Indeed. But do you happen to know where in the source (for OneDrive specifically) this calculation happens? I am very curious to see how the ContentLength of the upload is calculated when there is also encryption in the middle. I have been looking around but I wasn't able to find it.
In these cases everything in handled in RAM, there are no temporary files - and RAM usage is primarily used for transfer buffers and directory listings.
rclone can calculate the sizes beforehand when needed in these situations - data transfer is streamed.
There are however situations where the downloaded size differs from the reported size, and that can probably be an issue if copying to a remote that needs the correct size upfront. Example: Copying iPhone Live Photos (.heic) from one OneDrive to another OneDrive.
I am very curious to see how exactly is the calculation implemented when the size is unknown beforehand. Do you happen to know where in the code I can have a look?
The source remote will have the unencrypted size. After the file goes through encryption it will have a new size. As you have already mentioned you know the encryption algorithm used and therefore you might be able to calculate the new size on the fly since you know the padding bytes and everything. I am not sure how the whole thing happens if it's being streamed directly from source to target and wanted to check the code out of curiosity since I am missing something here.
Then I suggest you follow the contributing guide to build a local debugable copy and then place a few good breakpoints to see what happens when you copy a single file into a crypt.
The crypt encoding was designed so that if you know the size of the input you know the size of the output and vice versa. This is very important for maintaining streaming without using local disk for all backends.
The only time rclone copy will ever need to spool data to disk is
if the source size isn't known (eg from Google Photos, or a Google Doc) and the destination backend doesn't accept streaming uploads
the compress backend will use local disk if the backend it is wrapping can't stream uploads
the jottacloud backend will use locak disk if the source can't provide MD5 hashes
To see whether a backend can stream uploads check out the optional features table and look for StreamUpload. In rclone speak StreamUpload means that the backend needs to know the size of the upload before the upload starts.
Of OneDrive, Dropbox, GoogleDrive, S3, HTTP I think only OneDrive can't do StreamUpload.
So for your use, if you copy Google Docs you'll need local disk, otherwise not.
That sentence confused me a bit, so to clarify (I hope) if it confuses others as well:
StreamUpload=No means that the backend needs to know the size of the upload before the upload starts
StreamUpload=Yes means the backend allow files to be uploaded without knowing the file size in advance, i.e. the backend supports what we call streaming uploads.