Cluster rclone with k8s

Hello everyone,
I want to ask a question about parallel syncing using k8s.

I have a big bucket on Ceph which I want to transfer to another cluster of Ceph.
Using only one computer, I'm really limited on how many objects I can transfer per minute.

I was wondering - is there any way to create rclone cluster? For example - defining 6 pods on k8s, each pod on a different host that will together migrate my bucket?
Probably it requires some correlation between them... Is that even possible with rclone?

hello and welcome to the forum,

this is a one time migration?

--- folder01
--- folder02
--- folder03

if you have a structure like that, then you can use the following. no coordination is required.
rclone copy source:folder01 dest:folder01
rclone copy source:folder02 dest:folder02

Did you investigate increasing --transfers and --checkers - that can usually get rclone up to line speed.

@ncw made a good point.

  • can you post the current command?
  • with the current command, are you maxing out the network connection?

@asdffdsa Thank you :slight_smile:
Umm it is a one time migration(After that I will start to work with the second Ceph cluster).
Unfortunately, my bucket is not divided to folders at all, just a lot of files in one flat namespace, and running a multiple instances of rclone for the same bucket will just make them copy the same files...

@ncw I have, but I have a really limited bandwidth on my computer, so it takes forever(I'm using all the bandwidth). That's also an answer to your questions too @asdffdsa.

However, I have a k8s cluster which I can use, with much more bandwidth than my computer. I can deploy one pod to do that, but I wanted to know is that any way to divied that to multiple pods?

  • perhaps each pod can use some sort of rclone filter - using something about the filename and/or extension
    rclone supports globs and has limited support for regex.

  • each pod copies files within some limit - to combine two flags
    --min-age and --max-age to copy files within a date range
    --min-size and --max-size to copy file within a size range

  • take the output rclone ls remote:
    split that file into 6 files - file01, file02....file06
    feed file01 to pod01 and so on....

I also thought about filtering, but I can't categorize the object names(Preety much a random string) and the data is there for a really long time...

About feeding the file names - that sounds great. But how can I just feed the file names until I run over the whole bucket?

there is a simple script
in this case, the source folder has 6 files in it

# get a list of all files in the source bucket and save it as file named files.txt
rclone lsf source: > files.txt

# split the list into six files
split -l 1 files.txt output

#run on pod01
rclone copy source: dest: --files-from=outputaa -v
#run on pod02
rclone copy source: dest: --files-from=outputab -v

@asdffdsa OK that really helps... It meand I can deliver the file names to the pods in some way...

Is there any way to check if rclone finished cloning other than look at the CLI?

i use:

  • exit codes as defined in the documentation.
  • parse the rclone log file using regex patterns.
    i have a python script that orchestrates rclone, 7zip, fastcopy, veeam and VSS.
    the rclone log can get very large. the script, using regex patterns, filters out certain text and create a shortened summary log.

and depending on your specific use-case, there might be some flags that can speed up the copy operation such as

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.