Duplicate files and folder names in s3 fail to copy to local drive

What is the problem you are having with rclone?

I am tasked with copying data from AWS S3 to a local hard drive.
I have files and folders named the same in an S3 bucket and one of the two fail to copy. I have tried using the --suffix parameter with no success.

Run the command 'rclone version' and share the full output of the command.

rclone v1.65.0

  • os/version: Microsoft Windows 11 Pro 23H2 (64 bit)
  • os/kernel: 10.0.22631.2861 (x86_64)
  • os/type: windows
  • os/arch: amd64
  • go/version: go1.21.4
  • go/linking: static
  • go/tags: cmount

Which cloud storage system are you using? (eg Google Drive)

AWS S3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

.\rclone.exe copy s3-prod.external:bucket D:\AET-S3-Files\bucket --suffix *.bak --progress --transfers 32 --log-file log3.txt --log-level
 NOTICE

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[s3-prod.external]
type = s3
provider = AWS
access_key_id = XXX
secret_access_key = XXX
region = us-east-1
storage_class = STANDARD

A log from the command that you were trying to run with the -vv flag

2024/01/04 12:31:30 DEBUG : devops: multi-thread copy: chunk 6/6 (335544320-366635631) size 29.651Mi finished
2024/01/04 12:31:30 DEBUG : devops: Finished multi-thread copy with 6 parts of size 64Mi
2024/01/04 12:31:30 DEBUG : devops: Src hash empty - aborting Dst hash check
2024/01/04 12:31:30 DEBUG : devops.jukefob1.partial: Can't move: rename \\?\D:\AET-S3-Files\devops-iracentral-staging\devops.jukefob1.partial \\?\D:\AET-S3-Files\devops-iracentral-staging\devops: Cannot create a file when that file already exists.: trying copy
2024/01/04 12:31:30 ERROR : devops.jukefob1.partial: partial file rename failed: can't move object - incompatible remotes
2024/01/04 12:31:30 INFO  : devops.jukefob1.partial: Removing failed copy

welcome to the forum,

maybe try --inplace

how did you create duplicate filenames in the same folder, as rclone cannot do that?
if you run the rclone copy with -vv --dry-run, rclone should complain about duplicate filenames.

if you run rclone dedupe on the bucket, does rclone list duplicate filenames in the same folder?

What is possible is that your local filesystem (where you try to save S3 data) is case insensitive where S3 files/dirs names are not.

For example you can have two different objects in S3 bucket called aa.txt and AA.txt but your local filesystem does not allow such two files to exist in the same folder.

As you mentioned that issue only applies to one or two objects probably the easiest solution is to rename them manually. Or use case sensitive filesystem to store your S3 data.

I think for Windows NTFS it is something you can configure per directory basis. Do some Google.

correct, i wrote about that somewhere in the forum.

hmm, if that is correct, then rclone should overwrite the dest file, same as local to local, on windows


strange, why does rclone complain about dupes in this case

rclone ls aws02:zork.source -vv
2024/01/04 13:56:05 DEBUG : rclone: Version "v1.65.0" starting with parameters ["rclone" "ls" "aws02:zork.source" "-vv"]
2024/01/04 13:56:05 DEBUG : Creating backend with remote "aws02:zork.source"
2024/01/04 13:56:05 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
        2 AA.TXT
        1 aa.txt

rclone dedupe aws02:zork.source --dry-run
2024/01/04 13:55:35 NOTICE: S3 bucket zork.source: Can't have duplicate names here. Perhaps you wanted --by-hash ? Continuing anyway.
1 Like

It is not logical rclone message at all indeed.

Here when using Google drive:

$ rclone ls drive:test -vv
2024/01/04 19:19:46 DEBUG : rclone: Version "v1.65.0" starting with parameters ["rclone" "ls" "drive:test" "-vv"]
2024/01/04 19:19:46 DEBUG : Creating backend with remote "drive:test"
2024/01/04 19:19:46 DEBUG : Using config file from "/Users/kptsky/.config/rclone/rclone.conf"
2024/01/04 19:19:46 DEBUG : Google drive root 'test': 'root_folder_id = XXX' - save this in the config to speed up startup
  4135822 aa.txt
  4135822 AA.txt
2024/01/04 19:19:47 DEBUG : 7 go routines active

$ rclone dedupe drive:test -vv --dry-run
2024/01/04 19:20:25 DEBUG : rclone: Version "v1.65.0" starting with parameters ["rclone" "dedupe" "drive:test" "-vv" "--dry-run"]
2024/01/04 19:20:25 DEBUG : Creating backend with remote "drive:test"
2024/01/04 19:20:25 DEBUG : Using config file from "/Users/kptsky/.config/rclone/rclone.conf"
2024/01/04 19:20:25 DEBUG : Google drive root 'test': 'root_folder_id = XXX' - save this in the config to speed up startup
2024/01/04 19:20:26 INFO  : Google drive root 'test': Looking for duplicate names using interactive mode.
2024/01/04 19:20:26 DEBUG : 7 go routines active

I think it might be a bug as for my simple logic in both situations results should be the same. Like rclone is not aware that S3 buckets are case sensitive.

yeah, that is what i thought.
maybe, i should start a new topic and not continue to hijack this topic...

and anyways, the OP issue is not the same as that issue.
and why incompatible remotes ?

Maybe also related to - check that bug introduced in go 1.21.4 is fixed in go 1.21.5 · Issue #7468 · rclone/rclone · GitHub ??

@IfYouNoahGuy would you mind to try rclone beta? v1.66? v1.65 has known problems in Windows - fixed already in beta.

or avoid using betas, just test with last stable that did not have the windows+local issue.
https://github.com/rclone/rclone/releases/download/v1.64.2/rclone-v1.64.2-windows-amd64.zip

Hey guys, thanks for the quick responses and suggestions.
some clarifications:

  • rclone did not create the duplicate file names in the same folder. the duplication is coming from the S3 bucket. example image:
  • I wish this were the case, this is only a single bucket that has this issue. i have 30+ more that potentially have this issue as this is archival data.

I was hoping rclone had a function that would append a number/unique identifier/anything for files/folders it found to be duplicated.

Alternatively, is there a way for it to create a list-type log file of the error files it runs into while performing a long copy job?

I don't think the beta will fix this specific issue since it's about how ntfs handles same named objects, but I can try it.

Instead of an image could you please run:

rclone ls s3-prod.external:bucket --max-depth 1

No such functionality atm.

Your situation seems to be a bit special - these are no simple duplicates. S3 does not allow duplicates anyway.

Are you familiar with aws cli? Could you run aws s3 ls s3://bucket as well?

as far as i know, on dos/windows, cannot have a dir and a file with the same name.

the same seems to apply to linux

touch 01
mkdir 01
mkdir: cannot create directory ‘01’: File exists

in S3 there are no duplicates... and there are no directories. Only objects.

What we see here is I think an S3 illusion:) but it breaks when transferring to "normal world":slight_smile:

I can have three objects (files) named:

test
test/file1.txt
file.txt

Their names are perfectly unique. Problem arises when we try to move such objects to different storage like local filesystem where S3 naming scheme is attempted to be translated into directories names. I have to admit I have never seen this issue but it seems logical given how S3 works.

I also think that it is rare because in most case tools (like rclone) prevent it from happening. But going directly to S3 API might be nothing stopping from naming objects like that... Whatever reason this is not transferable to local filesystem directly.

Somebody has to write program/script to parse all names and change them to comply with local filesystem limitations. Should not be extremely difficult.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.