Hello everyone,
I want to ask a question about parallel syncing using k8s.
I have a big bucket on Ceph which I want to transfer to another cluster of Ceph.
Using only one computer, I'm really limited on how many objects I can transfer per minute.
I was wondering - is there any way to create rclone cluster? For example - defining 6 pods on k8s, each pod on a different host that will together migrate my bucket?
Probably it requires some correlation between them... Is that even possible with rclone?
if you have a structure like that, then you can use the following. no coordination is required. rclone copy source:folder01 dest:folder01 rclone copy source:folder02 dest:folder02
Hey, @asdffdsa Thank you
Umm it is a one time migration(After that I will start to work with the second Ceph cluster).
Unfortunately, my bucket is not divided to folders at all, just a lot of files in one flat namespace, and running a multiple instances of rclone for the same bucket will just make them copy the same files...
@ncw I have, but I have a really limited bandwidth on my computer, so it takes forever(I'm using all the bandwidth). That's also an answer to your questions too @asdffdsa.
However, I have a k8s cluster which I can use, with much more bandwidth than my computer. I can deploy one pod to do that, but I wanted to know is that any way to divied that to multiple pods?
perhaps each pod can use some sort of rclone filter - using something about the filename and/or extension
rclone supports globs and has limited support for regex. --include=*.ext --include=2021*
each pod copies files within some limit - to combine two flags --min-age and --max-age to copy files within a date range --min-size and --max-size to copy file within a size range
take the output rclone ls remote:
split that file into 6 files - file01, file02....file06
feed file01 to pod01 and so on....
there is a simple script
in this case, the source folder has 6 files in it
# get a list of all files in the source bucket and save it as file named files.txt
rclone lsf source: > files.txt
# split the list into six files
split -l 1 files.txt output
#run on pod01
rclone copy source: dest: --files-from=outputaa -v
#run on pod02
rclone copy source: dest: --files-from=outputab -v
parse the rclone log file using regex patterns.
i have a python script that orchestrates rclone, 7zip, fastcopy, veeam and VSS.
the rclone log can get very large. the script, using regex patterns, filters out certain text and create a shortened summary log.