On S3, Why does `rclone dedupe` complain about duplicates about files in the same folder

What is the problem you are having with rclone?

S3 is "CaseInsensitive": false, but rclone is acting like "CaseInsensitive": true
rclone ls works but not rclone dedupe

Run the command 'rclone version' and share the full output of the command.

rclone v1.65.0
- os/version: ubuntu 22.04 (64 bit)
- os/kernel: 5.15.90.4-microsoft-standard-WSL2 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.21.4
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

AWS S3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone dedupe aws02:zork.source --dry-run -vv

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[aws02]
type = s3
provider = AWS
access_key_id = XXX
secret_access_key = XXX
region = us-east-1
storage_class = STANDARD

A log from the command that you were trying to run with the -vv flag

rclone ls aws02:zork.source
        2 AA.TXT
        1 aa.txt

rclone dedupe aws02:zork.source --dry-run -vv
2024/01/04 14:50:46 DEBUG : rclone: Version "v1.65.0" starting with parameters ["rclone" "dedupe" "aws02:zork.source" "--dry-run" "-vv"]
2024/01/04 14:50:46 DEBUG : Creating backend with remote "aws02:zork.source"
2024/01/04 14:50:46 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2024/01/04 14:50:46 NOTICE: S3 bucket zork.source: Can't have duplicate names here. Perhaps you wanted --by-hash ? Continuing anyway.
2024/01/04 14:50:46 INFO  : S3 bucket zork.source: Looking for duplicate names using interactive mode.

What were you hoping would happen - that rclone showed that AA.TXT and aa.txt are duplicates?

That means S3 is case sensitive so aa.txt and AA.TXT are considered to be different file names.

If you want to get a bit experimental you can try forcing CaseInsensitive true for s3 with `--disable

rclone dedupe --disable '!CaseInsensitive' -vv s3:rclone

Unfortunately that didn't seem to work!

hi, hope all is well in the new year

no. the two flies are not duplicates.

you and i agree, but rclone dedupe complains about Can't have duplicate names here
yet rclone ls does not complain.

And BTW in the same situation for Google Drive rclone does not complain about anything:

$ rclone ls drive:test -vv
2024/01/04 19:19:46 DEBUG : rclone: Version "v1.65.0" starting with parameters ["rclone" "ls" "drive:test" "-vv"]
2024/01/04 19:19:46 DEBUG : Creating backend with remote "drive:test"
2024/01/04 19:19:46 DEBUG : Using config file from "/Users/kptsky/.config/rclone/rclone.conf"
2024/01/04 19:19:46 DEBUG : Google drive root 'test': 'root_folder_id = XXX' - save this in the config to speed up startup
  4135822 aa.txt
  4135822 AA.txt
2024/01/04 19:19:47 DEBUG : 7 go routines active

$ rclone dedupe drive:test -vv --dry-run
2024/01/04 19:20:25 DEBUG : rclone: Version "v1.65.0" starting with parameters ["rclone" "dedupe" "drive:test" "-vv" "--dry-run"]
2024/01/04 19:20:25 DEBUG : Creating backend with remote "drive:test"
2024/01/04 19:20:25 DEBUG : Using config file from "/Users/kptsky/.config/rclone/rclone.conf"
2024/01/04 19:20:25 DEBUG : Google drive root 'test': 'root_folder_id = XXX' - save this in the config to speed up startup
2024/01/04 19:20:26 INFO  : Google drive root 'test': Looking for duplicate names using interactive mode.
2024/01/04 19:20:26 DEBUG : 7 go routines active

So something is not right.

That is because the backend can't have duplicate names so running rclone dedupe doesn't make sense - rclone is just warning you about that. This is controlled by the DuplicateFiles feature flag.

$ rclone backend features s3: | grep Duplicate
		"DuplicateFiles": false,

You don't have any duplicate files so ls isn't complaining about anything.

That is because google drive does have the DuplicateFiles feature flag set

$ rclone backend features drive: | grep Duplicate
		"DuplicateFiles": true,

Both the outputs are as I would expect - what were you expecting?

Google drive file names are case sensitive so aa.txt and AA.txt are different files.

1 Like

Thx for the explanation. I think I get it now.

1 Like

yeah, all that makes sense now, thanks ncw.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.