I always propose Rclone to users when they want to archive stuff but I am frequently seeing that users in a HPC centric research community have difficulties in the process, e.g.
- finding data they should archive because there are some huge files in some folder 7 levels deep that they forgot about
- remembering where the data was archived to and getting it back quickly when needed
- shepherding archiving processes of hundreds of TiB that sometimes interrupt and we need to remember to resume
- only <3% of the observed data copy processes are currently check-summed because it is too much effort or users do not know what a checksum is
- the decision of deleting local data is hard and comparing source and target takes extra time
- users find working with AWS Glacier cumbersome
This tool is mostly a wrapper for Rclone and keeps track of some meta data in csv files and interacts with Glacier and S3 compatible storage. It is designed to be easily replaceable if something better comes along in the future. Excuse the coding style, most of it has been generated by ChatGPT4.