What is the problem you are having with rclone?
Trying to only copy directories that have a certain file in them.
For example, if a directory should contain 3 files:
I would like to copy the directory only if it has the .gpg.md5 file, otherwise leave it alone.
An ideal parameter to have would be
That way I could specify something like
--include-if-present="*.tar.gz.gpg.md5", or if not possible with the regex, have the client uploading to the bucket mark each directory as "completed" by placing a marker file such as
".completed" (inverse of ".ignore") and then specifying
The scenario is: an s3 bucket continuously being populated with directories and files.
A client retrieves these files continuously, and should avoid moving (copying and then deleting) files from a directory that has not been fully uploaded.
Am I maybe missing something with the filters that makes this possible already?
What is your rclone version (output from
Which OS you are using and how many bits (eg Windows 7, 64 bit)
RHEL 7 x64
Which cloud storage system are you using? (eg Google Drive)
The command you were trying to run (eg
rclone copy /tmp remote:tmp)
The rclone config contents with secrets removed.
A log from the command with the
I don't think you can do this with filters directly.
So I take it the
.tar.gz.gpg.md5 is the last one written in the process?
So what you could do is do a two pass copy, so first pass, find all the directories you could copy then massage this into an include file for the copy, something like
rclone lsf -R --absolute --files-only /path/to/source --include "*.tar.gz.gpg.md5" | sed 's/\/[^\/]*$/\/*/g' > include-dirs
Then do the copy with that
rclone copy /path/to/source remote:dest --include-from include-dirs
Depending on exactly how many files that is, that might make a really long filter, so you might be better off making a list of directories to exclude.
You could also use the --min-age flag if you set that to 5m say, then assuming the process to generate a directory doesn't take longer than 5 minutes that might be a solution.
Thanks for the quick reply!
So I take it the .tar.gz.gpg.md5 is the last one written in the process?
It probably will be in my case, but it might as well be the
.completed file marker that would be uploaded after the entire batch of data is done uploading.
So what you could do is do a two pass copy, so first pass, find all the directories you could copy then massage this into an include file for the copy
Great idea! I'll give this a try. The amount of files going through this pipeline will be in the hundreds of thousands and a few PB in size, but if I pull the data away quickly enough the filter size will be manageable.
[...] you might be better off making a list of directories to exclude.
This won't work since the bucket is being continuously uploaded to
You could also use the --min-age flag
I discussed this with the client uploading the data and this might not be the best idea, since if the upload process fails on a friday and isn't fixed until monday, this would not work.
I opened this issue on github a while ago: https://github.com/rclone/rclone/issues/3975 for the file transfer order.
In the github issue I was trying to solve:
- upload files for each batch in a certain order so the receiver knows when the batch is done
Now I'm on the receiving end and need to solve:
- only download data batches that have been uploaded completely
A combination of these two features would make large scale continuous file transfers much easier with rclone, without the need for scripting around it and maintaining file / directory lists.
Thank you for your work on rclone. I use it a lot!
Have a go with the filter approach and see if that works.
That probably isn't too difficult - the most difficult thing will be working out the command line interface probably!
--include-if-present flag is probably quite hard though as it will subvert the directory scanning.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.