Problem when cloning one cloud storage to another cloud storage

Hi. I'm having a problem with rclone when I'm trying to clone files from one cloud storage (s3) to another cloud storage (gcp).
I'm running into duplicate object issue when my source bucket contains files with the same name as prefix for other file.

index
index/test
index/test2

When I run rclone command I have the followin output:
./rclone copy src:forgems-123 dest:forgems-123
2019/05/20 15:08:34 NOTICE: S3 bucket forgems-123: Switched region to "eu-central-1" from "eu-north-1"
2019/05/20 15:08:35 NOTICE: index: Duplicate object found in source - ignoring

When the command completes the destination bucket is missing the "index" file.
I think that the source of the problem is that rclone assumes that all backends are hierarchical file systems (which is not true for S3, GCP, OpenStack swift, etc).
What do you think about adding some flag (--files-only) that only synchronizes files without traversing whole directory structure ?

What is happening is that rclone sees the "index" file in the source and also notices it has has a "directory" called "index" in the source so doesn't transfer it.

Rclone never creates directory index files as they aren't necessary and waste resources. However they are the only way of having an empty "directory" on s3.

Do your index files have anything in, or are they just 0 length directory markers? If rclone just re-created them would that be sufficient?

Rclone knows the difference internally between the "bucket based" backends and the other "file system" sort of backends.

Since rclone copies between the different sort of backends it has to use the lowest common denominator. So if you create an object called "toomanyslashes////" rclone can't copy that to your local disk under that name exactly.

I was thinking of a flag or a backend option to make rclone create the 0 length index files. That is the most common request.

I'd not thought of just doing a bucket based remote -> bucket based remote sync. Rclone could detect that you are doing that and do something a bit different...

The index file has a content in it. It's not a directory marker. That's how the application stores it's data.

I think that's not enough. Some bucket based backends don't allow to have this kind of hierarchy (for example minio).

Sorry, that's not exactly what I wan't :slight_smile: I don't want a directory marker. I just want to copy the files to have the same structure as in source (if the destination backend allows it).
I think I'm capable of implementing such a feature and I just wanted to ask for opinion how to approach it.

I've started to implement the fix for this issue. Could you take a look at it @ncw ? https://github.com/forgems/rclone/commit/bc03f45d92b938c5246c60f005b46655db6c769d

I see what you are getting at and at a first glance the code looks OK.

Probably best if you stick it in a pull request along with some tests to exercise the new code.

Hi @ncw. I've opened the PR at https://github.com/ncw/rclone/pull/3220 . Could you take a look at it ?

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.