mpcarl
(Pierre Carlson)
February 20, 2024, 7:13pm
1
What is the problem you are having with rclone?
I would like to force rclone to do an in-place copy of an existing object in s3.
Run the command 'rclone version' and share the full output of the command.
❯ rclone version
rclone v1.65.2
os/version: darwin 14.3.1 (64 bit)
os/kernel: 23.3.0 (arm64)
os/type: darwin
os/arch: arm64 (ARMv8 compatible)
go/version: go1.21.6
go/linking: dynamic
go/tags: cmount
Which cloud storage system are you using? (eg Google Drive)
S3 (IBM COS)
The command you were trying to run (eg rclone copy /tmp remote:tmp)
I am trying to do a simple 1 object copy like this:
rclone copy "INSTANCE:bucket-a/9014" "INSTANCE:bucket-a/"
I have tried different combinations of these parameters:
--no-check-dest --no-traverse --ignore-existing --ignore-times -vvv
which always results in
don't need to copy/move 9014, it is already at target location
Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.
[INSTANCE]
type = s3
env_auth = false
provider = IBMCOS
access_key_id = XXX
secret_access_key = XXX
endpoint = s3.us-south.cloud-object-storage.appdomain.cloud
location_constraint =
server_side_encryption =
storage_class =
A log from the command that you were trying to run with the -vv flag
rclone copy "INSTANCE:bucket-a/9014" "INSTANCE:bucket-a/" --no-check-dest --no-traverse --ignore-existing --ignore-times -vv
2024/02/20 13:10:35 DEBUG : rclone: Version "v1.65.2" starting with parameters ["rclone" "copy" "INSTANCE:bucket-a/9014" "INSTANCE:bucket-a/" "--no-check-dest" "--no-traverse" "--ignore-existing" "--ignore-times" "-vv"]
2024/02/20 13:10:35 DEBUG : Creating backend with remote "INSTANCE:bucket-a/9014"
2024/02/20 13:10:35 DEBUG : Using config file from "/Users/mpcarl/.config/rclone/rclone.conf"
2024/02/20 13:10:35 DEBUG : Resolving service "s3" region "us-east-1"
2024/02/20 13:10:35 DEBUG : fs cache: adding new entry for parent of "INSTANCE:bucket-a/9014", "INSTANCE:bucket-a"
2024/02/20 13:10:35 DEBUG : Creating backend with remote "INSTANCE:bucket-a/"
2024/02/20 13:10:35 DEBUG : Resolving service "s3" region "us-east-1"
2024/02/20 13:10:35 DEBUG : fs cache: renaming cache item "INSTANCE:bucket-a/" to be canonical "INSTANCE:bucket-a"
2024/02/20 13:10:35 DEBUG : S3 bucket bucket-a: don't need to copy/move 9014, it is already at target location
2024/02/20 13:10:35 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 0.1s
2024/02/20 13:10:35 DEBUG : 7 go routines active
asdffdsa
(jojothehumanmonkey)
February 20, 2024, 7:53pm
2
hi, so i tried to replicate your issue but could not. maybe my test is not correct.
rclone tree INSTANCE:bucket-a
/
└── 9014
└── file.txt
1 directories, 1 files
rclone copy INSTANCE:bucket-a/9014 INSTANCE:bucket-a -vv
2024/02/20 14:52:05 DEBUG : rclone: Version "v1.65.2" starting with parameters ["c:\\data\\rclone\\rclone.exe" "copy" "INSTANCE:bucket-a/9014" "INSTANCE:bucket-a" "-vv"]
2024/02/20 14:52:05 DEBUG : Creating backend with remote "INSTANCE:bucket-a/9014"
2024/02/20 14:52:05 DEBUG : Using config file from "c:\\data\\rclone\\rclone.conf"
2024/02/20 14:52:05 DEBUG : Resolving service "s3" region "us-east-2"
2024/02/20 14:52:06 DEBUG : Creating backend with remote "INSTANCE:bucket-a"
2024/02/20 14:52:06 DEBUG : Resolving service "s3" region "us-east-2"
2024/02/20 14:52:06 DEBUG : file.txt: Need to transfer - File not found at Destination
2024/02/20 14:52:06 DEBUG : S3 bucket bucket-a: Waiting for checks to finish
2024/02/20 14:52:06 DEBUG : S3 bucket bucket-a: Waiting for transfers to finish
2024/02/20 14:52:06 DEBUG : file.txt: md5 = 9afcb2d16863f2df14342a4143c7e45d OK
2024/02/20 14:52:06 INFO : file.txt: Copied (server-side copy)
2024/02/20 14:52:06 INFO :
Transferred: 4 B / 4 B, 100%, 0 B/s, ETA -
Transferred: 1 / 1, 100%
Server Side Copies: 1 @ 4 B
Elapsed time: 0.3s
rclone tree INSTANCE:bucket-a
/
├── 9014
│ └── file.txt
└── file.txt
1 directories, 2 files
mpcarl
(Pierre Carlson)
February 20, 2024, 10:05pm
3
asdffdsa:
─ file.txt
In my case there is no file.txt. The object I am trying to copy is named 9014. This also shows it creates a new copy of the object and does not overwrite the existing object.
This is the equivalent awscli command which seems to work as expected
aws s3api [...] copy-object --bucket bucket-a --key 9014 --copy-source bucket-a/9014 --metadata-directive REPLACE
asdffdsa
(jojothehumanmonkey)
February 20, 2024, 10:17pm
4
ok, now, i can reproduce the issue.
can you explain the reason to run that command, perhaps there are workarounds?
here is a possible workaround.
use two remotes, each with the same exact config as INSTANCE
let's call them INSTANCE01 and INSTANCE02
rclone copy INSTANCE01:bucket-a/9014 INSTANCE02:bucket-a -vv --ignore-times --server-side-across-configs
2024/02/20 17:32:38 DEBUG : rclone: Version "v1.65.2" starting with parameters ["c:\\data\\rclone\\rclone.exe" "copy" "INSTANCE01:bucket-a/9014" "INSTANCE02:bucket-a" "-vv" "--ignore-times" "--server-side-across-configs"]
2024/02/20 17:32:38 DEBUG : Creating backend with remote "INSTANCE01:bucket-a/9014"
2024/02/20 17:32:38 DEBUG : Using config file from "c:\\data\\rclone\\rclone.conf"
2024/02/20 17:32:38 DEBUG : fs cache: adding new entry for parent of "INSTANCE01:bucket-a/9014", "INSTANCE01:bucket-a"
2024/02/20 17:32:38 DEBUG : Creating backend with remote "INSTANCE02:bucket-a"
2024/02/20 17:32:38 DEBUG : 9014: Transferring unconditionally as --ignore-times is in use
2024/02/20 17:32:39 DEBUG : 9014: md5 = 9afcb2d16863f2df14342a4143c7e45d OK
2024/02/20 17:32:39 INFO : 9014: Copied (server-side copy)
mpcarl
(Pierre Carlson)
February 21, 2024, 2:38pm
5
The reason for the in-place copy is that you can set bucket level policies that are only applied when an object is uploaded. If I want to apply a new bucket level policy to a large number of existing objects, you need to upload them again. In order to avoid all of the network transfer time, doing an in-place copy has the same affect on the objects as uploading again.
I will try this action with two remotes and report back.
mpcarl
(Pierre Carlson)
February 21, 2024, 9:28pm
6
I was able to try this today. And while it did replace the existing object, it did not do a server side copy. It did a download and an upload of the object. OK for a few small objects, but if I have millions of objects or a set of very large objects, this would not work.
So it is still not working as an equivalent of the awscli command.
asdffdsa
(jojothehumanmonkey)
February 21, 2024, 9:48pm
7
i posted a full debug log, that did server-side copy
without a full debug log, hard to know, what exactly is going on.
and for a deeper look, post a debug log using --dump=headers
Might be possible that rclone does not do what you want (though @asdffdsa test shows that server-side in-place copy is possible) but then you can maybe use aws cli and rclone together.
I do something like this sometimes:
rclone lsf S3remote:bucket_name -R --files-only | xargs -P 16 -n 1 aws_script.sh
where aws_script.sh:
aws ... [your specific command] ... --key $1 ...
This will run 16 parallel aws cli jobs fed by rclone output. Depending on your system you can scale it up or down.
Or purely server side use something like AWS batch jobs - you would have to investigate if is is possible wiht IBM COS though - they might have similar option.
If not fast enough then worst case you can write your own small program using some native S3 access library. Whatever rust, go or what you prefer. rclone is fantastic but sometimes is not enough to solve specific problem.
mpcarl
(Pierre Carlson)
February 23, 2024, 9:30pm
9
Thanks for the assistance on this topic.
also aws cli is real slow (written in Python takes "ages" to initialise any invoking it) - when you have a bit more files to process I find much better to use minio mc cli - 10x+ faster
system
(system)
Closed
March 24, 2024, 9:34pm
11
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.