Hi, I am the research lead of the Wayback Machine at the Internet Archive. We at the Internet Archive love rclone
for its built-in support of the Internet Archive as a storage systems and we use it in out tooling internally.
Recently I was trying to copy a large list of files from the Internet Archive's Petabox to AWS S3 in which the source and path directory structures and file names were slightly different. For that, I created a TAB-separated 2-column source and destination path mapping file like this:
$ cat src_dst_path_map.txt
ia:/item1/file1 aws:/bucket1/part1/item1_file1
ia:/item1/file2 aws:/bucket1/part1/item1_file2
ia:/item2/file3 aws:/bucket1/part1/item2_file3
ia:/item2/file4 aws:/bucket1/part2/item2_file4
Then I wrote a small script to consume this mapping file and run rclone copyto
for each line:
$ cat rclone_map_copier.sh
#!/usr/bin/env bash
while read -r src dst
do
rclone copyto --progress $src $dst
done < <(cat "$@")
Finally, I ran it as following:
$ ./rclone_map_copier.sh src_dst_path_map.txt
This approach works, but it means a new rclone
process is created for each line in the mapping file, which adds a small overhead of boot up and teardown of such processes.
I did see that rclone
supports --files-from
and --files-from-raw
options, but they only support reading a list of source files, not their corresponding destinations.
It would be nice to either allow an optional second column to the input file of --files-from
/--files-from-raw
option or introduce a new set of flags like --src-dst-map
/--src-dst-map-raw
to accept such a two-column input file.
PS: If a feature or workaround already exists to achieve the described objective more efficiently, I will be glad to learn about that.