When is the temporary directory used exactly?

arekanderu · November 9, 2022, 10:19am

What is the problem you are having with rclone?

I am trying to understand when the temporary directory is used in rclone. More specifically I am trying to figure out if rclone uses streams exclusively when I am using the rc sync/copy API or not.

In addition, in my case, I am also using the crypt adapter with multiple combinations (e.g. attached to source only, attached to target only, attached to both). What happens then?

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.2

os/version: darwin 12.6 (64 bit)
os/kernel: 21.6.0 (arm64)
os/type: darwin
os/arch: arm64
go/version: go1.19.1
go/linking: dynamic
go/tags: none

Which cloud storage system are you using? (eg Google Drive)

OneDrive, Dropbox, GoogleDrive, S3, HTTP and the Crypt backend.

Thank you for your time.

Animosity022 · November 9, 2022, 11:45am

Not sure as that's a pretty generic question.

If you copy and sync, generally nothing is stored locally on a file system.

Are you doing something from remote to remote? local to remote?

arekanderu · November 9, 2022, 12:24pm

For starters, thanks for the reply.

I understand that it's a bit generic but it's crucial for me to understand if the temporary directory is used during copies because there will not be a lot of local disk space available and I want to avoid it.

Are you doing something from remote to remote? local to remote?

I am only doing copies and moves from remote to remote (e.g OneDrive to Google Drive, or HTTP to Google Drive, Google Drive to Dropbox, etc), and usually, a Crypt layer is attached on either side or both sides. I do not use the local filesystem at all.

If you copy and sync, generally nothing is stored locally on a file system.

Any case that comes into mind that might be an exception to this?

For example, from what I know, OneDrive needs to know the size you are uploading beforehand. If you have a Google Drive as the source and a OneDrive with a Crypt layer as a target, how does rclone solves the fact that the size is unknown since the original data (from Google Drive) will be encrypted first (which means a new size) before ending up in OneDrive? Do the data get encrypted and cached somewhere first and then uploaded to OneDrive or is it streamed from Google Drive to OneDrive and encryption and content length detection happens somehow on the fly?

Feel free to go technical with me and point me to the source code if you have to!

Animosity022 · November 9, 2022, 12:29pm

Rclone supports way to many remotes to for me to be 100% sure. I'm not aware of anything stored locally for the remotes you've got mentioned.

I'm more a visual/do it type of person so why not just run a copy from onedrive to google drive and see?

File size is metadata stored on the remote so it knows the size of the file.

efelix@gemini:~$ rclone lsl GD:hosts
      510 2022-10-03 11:06:48.412000000 hosts
felix@gemini:~$ rclone copy /etc/hosts gcrypt:
felix@gemini:~$ rclone lsl gcrypt:hosts
      510 2022-10-03 11:06:48.412000000 hosts

File size will generally look the same with a some bytes for encryption:

arekanderu · November 9, 2022, 12:46pm

Already did before posting and indeed nothing was getting written in the temporary directory but I wanted a more experienced opinion!

Indeed. But do you happen to know where in the source (for OneDrive specifically) this calculation happens? I am very curious to see how the ContentLength of the upload is calculated when there is also encryption in the middle. I have been looking around but I wasn't able to find it.

Animosity022 · November 9, 2022, 12:48pm

That's beyond me as I generally don't look at code.

arekanderu · November 9, 2022, 12:48pm

Thank you for all your answers.

Ole · November 9, 2022, 12:51pm

Hi arekanderu,

In these cases everything in handled in RAM, there are no temporary files - and RAM usage is primarily used for transfer buffers and directory listings.

This applies to all remote storage services.

There are few exceptions where rclone use local disk and they are described in the docs, examples are:
https://rclone.org/commands/rclone_mount/#vfs-file-caching
https://rclone.org/hasher/#cache-storage

rclone can calculate the sizes beforehand when needed in these situations - data transfer is streamed.

There are however situations where the downloaded size differs from the reported size, and that can probably be an issue if copying to a remote that needs the correct size upfront. Example: Copying iPhone Live Photos (.heic) from one OneDrive to another OneDrive.

arekanderu · November 9, 2022, 1:01pm

Thank you Ole for getting back to me.

I am very curious to see how exactly is the calculation implemented when the size is unknown beforehand. Do you happen to know where in the code I can have a look?

Animosity022 · November 9, 2022, 1:03pm

How is the size unknown if it exists on the remote already?

arekanderu · November 9, 2022, 1:08pm

The source remote will have the unencrypted size. After the file goes through encryption it will have a new size. As you have already mentioned you know the encryption algorithm used and therefore you might be able to calculate the new size on the fly since you know the padding bytes and everything. I am not sure how the whole thing happens if it's being streamed directly from source to target and wanted to check the code out of curiosity since I am missing something here.

Ole · November 9, 2022, 1:09pm

Yes a bit larger, but predictable.

Animosity022 · November 9, 2022, 1:11pm

Right, but the source remote is 'known' and not 'unknown'. The crypt process happens on the fly and the extra data is what I shared in the link above.

code is at -> rclone/backend/crypt/crypt.go at master · rclone/rclone · GitHub

Ole · November 9, 2022, 1:15pm

Then I suggest you follow the contributing guide to build a local debugable copy and then place a few good breakpoints to see what happens when you copy a single file into a crypt.

ncw · November 9, 2022, 1:16pm

The crypt file format is described in the docs.

The calculation in the code is here

github.com

rclone/rclone/blob/3a3bc5a1ae822b5430292b7ecc513045b71f70f8/backend/crypt/cipher.go#L1099-L1125


      
          // EncryptedSize calculates the size of the data when encrypted
          func (c *Cipher) EncryptedSize(size int64) int64 {
          	blocks, residue := size/blockDataSize, size%blockDataSize
          	encryptedSize := int64(fileHeaderSize) + blocks*(blockHeaderSize+blockDataSize)
          	if residue != 0 {
          		encryptedSize += blockHeaderSize + residue
          	}
          	return encryptedSize
          }
          
          // DecryptedSize calculates the size of the data when decrypted
          func (c *Cipher) DecryptedSize(size int64) (int64, error) {
          	size -= int64(fileHeaderSize)
          	if size < 0 {
          		return 0, ErrorEncryptedFileTooShort
          	}
          	blocks, residue := size/blockSize, size%blockSize
          	decryptedSize := blocks * blockDataSize
          	if residue != 0 {
          		residue -= blockHeaderSize

This file has been truncated. show original

The crypt encoding was designed so that if you know the size of the input you know the size of the output and vice versa. This is very important for maintaining streaming without using local disk for all backends.

The only time rclone copy will ever need to spool data to disk is

if the source size isn't known (eg from Google Photos, or a Google Doc) and the destination backend doesn't accept streaming uploads
the compress backend will use local disk if the backend it is wrapping can't stream uploads
the jottacloud backend will use locak disk if the source can't provide MD5 hashes

To see whether a backend can stream uploads check out the optional features table and look for StreamUpload. In rclone speak StreamUpload means that the backend needs to know the size of the upload before the upload starts.

Of OneDrive, Dropbox, GoogleDrive, S3, HTTP I think only OneDrive can't do StreamUpload.

So for your use, if you copy Google Docs you'll need local disk, otherwise not.

arekanderu · November 9, 2022, 1:37pm

Thank you all for your replies. Things are much clearer now.

Also, based on all the information above, the following two lines will also be helpful to anyone looking for it:

github.com

rclone/rclone/blob/ce3b65e6dc0866cac041b82aabbf790ebd9778b6/backend/crypt/crypt.go#L1039


      
          
          
// Size returns the size of the file
          func (o *ObjectInfo) Size() int64 {
          	size := o.ObjectInfo.Size()
          	if size < 0 {
          		return size
          	}
          	if o.f.opt.NoDataEncryption {
          		return size
          	}
          	return o.f.cipher.EncryptedSize(size)
          }
          
          
// Hash returns the selected checksum of the file
          // If no checksum is available it returns ""
          func (o *ObjectInfo) Hash(ctx context.Context, hash hash.Type) (string, error) {
          	var srcObj fs.Object
          	var ok bool
          	// Get the underlying object if there is one
          	if srcObj, ok = o.ObjectInfo.(fs.Object); ok {
          		// Prefer direct interface assertion

github.com

rclone/rclone/blob/ce3b65e6dc0866cac041b82aabbf790ebd9778b6/backend/crypt/crypt.go#L925


      
          		return o.Object.Open(ctx, options...)
          	}
          
          
	var openOptions []fs.OpenOption
          	var offset, limit int64 = 0, -1
          	for _, option := range options {
          		switch x := option.(type) {
          		case *fs.SeekOption:
          			offset = x.Offset
          		case *fs.RangeOption:
          			offset, limit = x.Decode(o.Size())
          		default:
          			// pass on Options to underlying open if appropriate
          			openOptions = append(openOptions, option)
          		}
          	}
          	rc, err = o.f.cipher.DecryptDataSeek(ctx, func(ctx context.Context, underlyingOffset, underlyingLimit int64) (io.ReadCloser, error) {
          		if underlyingOffset == 0 && underlyingLimit < 0 {
          			// Open with no seek
          			return o.Object.Open(ctx, openOptions...)
          		}

Cheers!

albertony · November 9, 2022, 1:45pm

That sentence confused me a bit, so to clarify (I hope) if it confuses others as well:

StreamUpload=No means that the backend needs to know the size of the upload before the upload starts
StreamUpload=Yes means the backend allow files to be uploaded without knowing the file size in advance, i.e. the backend supports what we call streaming uploads.

system · December 9, 2022, 1:46pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.