I am tasked with copying data from AWS S3 to a local hard drive.
I have files and folders named the same in an S3 bucket and one of the two fail to copy. I have tried using the --suffix parameter with no success.
Run the command 'rclone version' and share the full output of the command.
rclone v1.65.0
os/version: Microsoft Windows 11 Pro 23H2 (64 bit)
os/kernel: 10.0.22631.2861 (x86_64)
os/type: windows
os/arch: amd64
go/version: go1.21.4
go/linking: static
go/tags: cmount
Which cloud storage system are you using? (eg Google Drive)
AWS S3
The command you were trying to run (eg rclone copy /tmp remote:tmp)
how did you create duplicate filenames in the same folder, as rclone cannot do that?
if you run the rclone copy with -vv --dry-run, rclone should complain about duplicate filenames.
if you run rclone dedupe on the bucket, does rclone list duplicate filenames in the same folder?
What is possible is that your local filesystem (where you try to save S3 data) is case insensitive where S3 files/dirs names are not.
For example you can have two different objects in S3 bucket called aa.txt and AA.txt but your local filesystem does not allow such two files to exist in the same folder.
As you mentioned that issue only applies to one or two objects probably the easiest solution is to rename them manually. Or use case sensitive filesystem to store your S3 data.
I think for Windows NTFS it is something you can configure per directory basis. Do some Google.
$ rclone ls drive:test -vv
2024/01/04 19:19:46 DEBUG : rclone: Version "v1.65.0" starting with parameters ["rclone" "ls" "drive:test" "-vv"]
2024/01/04 19:19:46 DEBUG : Creating backend with remote "drive:test"
2024/01/04 19:19:46 DEBUG : Using config file from "/Users/kptsky/.config/rclone/rclone.conf"
2024/01/04 19:19:46 DEBUG : Google drive root 'test': 'root_folder_id = XXX' - save this in the config to speed up startup
4135822 aa.txt
4135822 AA.txt
2024/01/04 19:19:47 DEBUG : 7 go routines active
$ rclone dedupe drive:test -vv --dry-run
2024/01/04 19:20:25 DEBUG : rclone: Version "v1.65.0" starting with parameters ["rclone" "dedupe" "drive:test" "-vv" "--dry-run"]
2024/01/04 19:20:25 DEBUG : Creating backend with remote "drive:test"
2024/01/04 19:20:25 DEBUG : Using config file from "/Users/kptsky/.config/rclone/rclone.conf"
2024/01/04 19:20:25 DEBUG : Google drive root 'test': 'root_folder_id = XXX' - save this in the config to speed up startup
2024/01/04 19:20:26 INFO : Google drive root 'test': Looking for duplicate names using interactive mode.
2024/01/04 19:20:26 DEBUG : 7 go routines active
I think it might be a bug as for my simple logic in both situations results should be the same. Like rclone is not aware that S3 buckets are case sensitive.
I wish this were the case, this is only a single bucket that has this issue. i have 30+ more that potentially have this issue as this is archival data.
I was hoping rclone had a function that would append a number/unique identifier/anything for files/folders it found to be duplicated.
Alternatively, is there a way for it to create a list-type log file of the error files it runs into while performing a long copy job?
I don't think the beta will fix this specific issue since it's about how ntfs handles same named objects, but I can try it.
in S3 there are no duplicates... and there are no directories. Only objects.
What we see here is I think an S3 illusion:) but it breaks when transferring to "normal world"
I can have three objects (files) named:
test
test/file1.txt
file.txt
Their names are perfectly unique. Problem arises when we try to move such objects to different storage like local filesystem where S3 naming scheme is attempted to be translated into directories names. I have to admit I have never seen this issue but it seems logical given how S3 works.
I also think that it is rare because in most case tools (like rclone) prevent it from happening. But going directly to S3 API might be nothing stopping from naming objects like that... Whatever reason this is not transferable to local filesystem directly.
Somebody has to write program/script to parse all names and change them to comply with local filesystem limitations. Should not be extremely difficult.