I would like to copy the directory only if it has the .gpg.md5 file, otherwise leave it alone.
An ideal parameter to have would be --include-if-present.
That way I could specify something like --include-if-present="*.tar.gz.gpg.md5", or if not possible with the regex, have the client uploading to the bucket mark each directory as "completed" by placing a marker file such as ".completed" (inverse of ".ignore") and then specifying --include-if-present=".completed"
The scenario is: an s3 bucket continuously being populated with directories and files.
A client retrieves these files continuously, and should avoid moving (copying and then deleting) files from a directory that has not been fully uploaded.
Am I maybe missing something with the filters that makes this possible already?
What is your rclone version (output from rclone version)
v1.53.4
Which OS you are using and how many bits (eg Windows 7, 64 bit)
RHEL 7 x64
Which cloud storage system are you using? (eg Google Drive)
s3
The command you were trying to run (eg rclone copy /tmp remote:tmp)
I don't think you can do this with filters directly.
So I take it the .tar.gz.gpg.md5 is the last one written in the process?
So what you could do is do a two pass copy, so first pass, find all the directories you could copy then massage this into an include file for the copy, something like
Depending on exactly how many files that is, that might make a really long filter, so you might be better off making a list of directories to exclude.
You could also use the --min-age flag if you set that to 5m say, then assuming the process to generate a directory doesn't take longer than 5 minutes that might be a solution.
So I take it the .tar.gz.gpg.md5 is the last one written in the process?
It probably will be in my case, but it might as well be the .completed file marker that would be uploaded after the entire batch of data is done uploading.
So what you could do is do a two pass copy, so first pass, find all the directories you could copy then massage this into an include file for the copy
Great idea! I'll give this a try. The amount of files going through this pipeline will be in the hundreds of thousands and a few PB in size, but if I pull the data away quickly enough the filter size will be manageable.
[...] you might be better off making a list of directories to exclude.
This won't work since the bucket is being continuously uploaded to
You could also use the --min-age flag
I discussed this with the client uploading the data and this might not be the best idea, since if the upload process fails on a friday and isn't fixed until monday, this would not work.
upload files for each batch in a certain order so the receiver knows when the batch is done
Now I'm on the receiving end and need to solve:
only download data batches that have been uploaded completely
A combination of these two features would make large scale continuous file transfers much easier with rclone, without the need for scripting around it and maintaining file / directory lists.
Thank you for your work on rclone. I use it a lot!