Rclone dedupe on crypt remotes?

What is the problem you are having with rclone?

I want to use rclone dedupe on a crypt remote, but get Failed to dedupe: Encrypted drive '<crypt-remote>:' has no hashes

Doing the same on the underlying remote works, but gives encrypted filenames.

Is there any way to run the dedupe command, but get the results in plaintext? Alternatively, output it to a file, and run that through rclone cryptdecode?

Run the command 'rclone version' and share the full output of the command.

rclone v1.63.0
- os/version: ubuntu 22.04 (64 bit)
- os/kernel: 5.15.0-73-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.20.5
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Dropbox

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone dedupe --dedupe-mode list --by-hash <remote>

(Not providing config, since it's not relevant)

In this case if you want to use hashes I think only option would be to run it in non interactive mode e.g. --dedupe-mode first. Capture all details in log file --log-level DEBUG --log-file rclone.log. Then you could use bash to extract what was deduped and pass it to through rclone cryptdecode

You can always add --dry-run and test before running for real.

Thanks.

Does not "--dedupe-mode list" give a list output? I don't want to automatically delete anything – I just want a list of dupes.

Ideally rclone would be smart enough to automatically decrypt the output if you run dedupe on a crypt. I would imagine all the functions already are in the codebase, but I'm no dev, so don't have the skills.

Indeed --dedupe-mode list can give you list even faster. But still you need manually (or by script) extract encrypted names and pass through rclone cryptdecode.

There is no functionality like you described available at the moment.

There is other issue with your approach. The same file names always result in the same encrypted file name, however it is not the case with file content as encryption uses cryptographic salt to permute the encryption key. Every file is encrypted with different key effectively.

So you can only detect cases where file was copied server side - as then encrypted content is the same.

1 Like

I would mount encrypted remote with --vfs-cache-mode full --vfs-read-chunk-size 1M and use something like fclones.

then you could run:

fclones group . --max-prefix-size 1MB --max-suffix-size 1MB --skip-content-hash

It would only read and hash 2MB (1MB from the beginning and end of a file) from every suspected duplicate (where length is the same) and thanks to --vfs-read-chunk-size 1M mount option only 2MB are read from remote per file.

1 Like

Thank you! That seems to work fine. I did try with rmlint earlier, but it does not seem to have the same features with prefix/suffix sizes. Smart thinking. :slight_smile:

So I'm parsing through multiple directories, so for reference my final command is:

find . -maxdepth 1 -type d \( ! -name . \) -exec bash -c "cd '{}' && fclones group . --max-prefix-size 1MB --max-suffix-size 1MB --skip-content-hash" \; | tee <name_of_logfile>.log

1 Like

is it to pre-warm cache?

No, not really, just to run the command inside of each subdirectory instead of recursively through it all.

Easier to have an overview, and it was something I did with rmlint – since it produces a script a log and a script to remove duplicates where it's run.

1 Like

I see now:) should look at your code more carefully. I used to use rmlint too but since I discovered fclones I forgot about it.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.