Chunked + encrypted space overhead is quite high

What is the problem you are having with rclone?

I am using a chunked + encrypted configuration which works well, except that there is fairly significant difference between the space used on my cloud provider and the space used locally.

For a folder I recently uploaded with 11gb (~15 files), this takes up 21gb of storage remotely, which seems pretty significant. Another larger archive of 600GB of data, ended up using 1120GB of storage after encryption + chunking.

Is this expected?

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.1
- os/version: arch (64 bit)
- os/kernel: 5.19.7-arch1-1 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.19
- go/linking: dynamic
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Azure block storage (azureblob)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync some_dir aa_chunk:some_dir

The rclone config contents with secrets removed.

[azure_archive]
type = azureblob
account = [secret]
key = [secret]

[aa_crypt]
type = crypt
remote = azure_archive:enc
filename_encryption = standard
directory_name_encryption = true
password = [secret]

[aa_chunk]
type = chunker
remote = aa_crypt:
chunk_size = 256M
hash_type = sha1

Any data / log file / examples to share?

Unfortunately I didn't capture a log file. I don't think it's particularly related to the content, any large file would do, though if I do need to upload something again I'll try to capture a debug log file

Hi Ani,

I would expect the overhead to be negligible (less than 0,1%) with large files like your first example.

I therefore suggest you start by comparing the size of the folder with ~15 files at different rclone levels, that is something like:

rclone size some_dir
rclone size aa_chunk:some_dir
rclone size aa_crypt:some_dir
rclone cryptdecode --reverse aa_crypt: some_dir
# Let's call the output from the above command: xxxxxxxxxx
rclone size azure_archive:enc/xxxxxxxxxx

Next you can compare the file(chunks) to see which file is taking too much space - or if there are some extra (unencrypted) files in the crypt folder:

rclone lsl some_dir
rclone lsl aa_chunk:some_dir
rclone lsl aa_crypt:some_dir
rclone cryptdecode --reverse aa_crypt: some_dir
# Let's call the output from the above command: xxxxxxxxxx
rclone lsl azure_archive:enc/xxxxxxxxxx

It is probably going to be somewhat tedious to match the crypted chunks to files, but I suppose it is possible based on the timestamps and sizes.

We would probably need the entire output from the above commands to help in any further troubleshooting.

Hey Ole, thanks for the great debugging commands. After looking into this a bit more using these commands (in particular the cryptdecode --reverse was super helpful), I think this was some kind of reporting issue with Azure. While both rclone's upload stats from --progress and Azure's initial stats in the portal reported the inflated numbers, using their Storage Explorer tool as well as the rclone size output now report what I expect to see. Thanks again for the help!

Thanks, you're welcome, happy we found the reason and your data was OK :sweat_smile:

Well spotted, there is indeed a bug in the stats reported by rclone when using chunker.

These are the simplest steps I could find to reproduce:

> rclone test makefile 10M ./testfolder/10Mfile
2022/09/20 15:55:33 NOTICE: Creating 1 files of size 10Mi.
2022/09/20 15:55:33 NOTICE: Written 10MiB in 12ms at 834.021MiB/s.
> rclone copy ./testfolder/10Mfile :chunker,remote='./testfolder/chunks',chunk_size=1M: --progress --ignore-times 
Transferred:           20 MiB / 20 MiB, 100%, 0 B/s, ETA -
Checks:                10 / 10, 100%
Renamed:               10
Transferred:            1 / 1, 100%
Elapsed time:         0.1s

I added --ignore-times to the last command to make it repeatable during debugging and test.

Seems like the same or a similar issue has been seen before:
Double data transferred when using chunk + crypt with --crypt-no-data-encryption=true
Errors uploading files when using -crypt-no-data-encryption=true · Issue #5498 · rclone/rclone · GitHub
and suspect a regression from the introduction of this feature:
--stats-one-line-date does not increment transfers when doing a server-side move · Issue #5430 · rclone/rclone · GitHub

I suggest you create a GitHub bug report with a link to this thread, then you can easily add your own comments and track status.