Cluster rclone with k8s

Hello everyone,
I want to ask a question about parallel syncing using k8s.

I have a big bucket on Ceph which I want to transfer to another cluster of Ceph.
Using only one computer, I'm really limited on how many objects I can transfer per minute.

I was wondering - is there any way to create rclone cluster? For example - defining 6 pods on k8s, each pod on a different host that will together migrate my bucket?
Probably it requires some correlation between them... Is that even possible with rclone?

hello and welcome to the forum,

this is a one time migration?

source
--- folder01
--- folder02
--- folder03

if you have a structure like that, then you can use the following. no coordination is required.
rclone copy source:folder01 dest:folder01
rclone copy source:folder02 dest:folder02

Did you investigate increasing --transfers and --checkers - that can usually get rclone up to line speed.

@ncw made a good point.

  • can you post the current command?
  • with the current command, are you maxing out the network connection?

Hey,
@asdffdsa Thank you :slight_smile:
Umm it is a one time migration(After that I will start to work with the second Ceph cluster).
Unfortunately, my bucket is not divided to folders at all, just a lot of files in one flat namespace, and running a multiple instances of rclone for the same bucket will just make them copy the same files...

@ncw I have, but I have a really limited bandwidth on my computer, so it takes forever(I'm using all the bandwidth). That's also an answer to your questions too @asdffdsa.

However, I have a k8s cluster which I can use, with much more bandwidth than my computer. I can deploy one pod to do that, but I wanted to know is that any way to divied that to multiple pods?

  • perhaps each pod can use some sort of rclone filter - using something about the filename and/or extension
    rclone supports globs and has limited support for regex.
    --include=*.ext
    --include=2021*

  • each pod copies files within some limit - to combine two flags
    --min-age and --max-age to copy files within a date range
    --min-size and --max-size to copy file within a size range

  • take the output rclone ls remote:
    split that file into 6 files - file01, file02....file06
    feed file01 to pod01 and so on....

I also thought about filtering, but I can't categorize the object names(Preety much a random string) and the data is there for a really long time...

About feeding the file names - that sounds great. But how can I just feed the file names until I run over the whole bucket?

there is a simple script
in this case, the source folder has 6 files in it

# get a list of all files in the source bucket and save it as file named files.txt
rclone lsf source: > files.txt

# split the list into six files
split -l 1 files.txt output

#run on pod01
rclone copy source: dest: --files-from=outputaa -v
#run on pod02
rclone copy source: dest: --files-from=outputab -v

@asdffdsa OK that really helps... It meand I can deliver the file names to the pods in some way...

Is there any way to check if rclone finished cloning other than look at the CLI?

i use:

  • exit codes as defined in the documentation.
  • parse the rclone log file using regex patterns.
    i have a python script that orchestrates rclone, 7zip, fastcopy, veeam and VSS.
    the rclone log can get very large. the script, using regex patterns, filters out certain text and create a shortened summary log.

and depending on your specific use-case, there might be some flags that can speed up the copy operation such as
https://rclone.org/docs/#no-traverse
https://rclone.org/docs/#no-check-dest
--fast-list

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.