Rclone copy using regex expression using include multiple expression for file name

yes, i do the exact same thing; using python for regex.
i create the rules using https://pythex.org/

hopefully, rclone regex support will include --include and not exclude --exclude

please see my topic about this here

1 Like

I tried with this option as well but nothing triggered different in rclone logs file .

"2021/10/08 10:46:51 DEBUG : rclone: Version "v1.57.0-beta.5681.3df0b2fe1.fix-sftp-timeout" starting with parameters ["rclone" "copy" "--max-depth=1" "--include=anon_PRODUCT.????????.A901" "--sftp-host=remote_server" ":sftp:/demo_out/" "--sftp-user=minio" "--sftp-pass=" "/opt/SP/minio/datasync/demo1" "--log-file=/var/log/rclone/rclone_pll_no_file.log" "-vvv" "-P" "--error-on-no-transfer"]
2021/10/08 10:46:51 DEBUG : Creating backend with remote ":sftp:/demo_out/"
2021/10/08 10:46:51 DEBUG : :sftp: detected overridden config - adding "{11qEL}" suffix to name
2021/10/08 10:46:51 DEBUG : Using config file from "/opt/SP/minio/.config/rclone/rclone.conf"
2021/10/08 10:46:51 DEBUG : sftp://minio@remote_server:22//demo_out/: New connection Local_server:35984->remote_server:22 to "SSH-2.0-OpenSSH_5.3"
2021/10/08 10:46:52 DEBUG : fs cache: renaming cache item ":sftp:/demo_out/" to be canonical ":sftp{11qEL}:/demo_out/"
2021/10/08 10:46:52 DEBUG : Creating backend with remote "/opt/SP/minio/datasync/demo1"
2021/10/08 10:46:52 DEBUG : sftp://minio@remote_server:22//demo_out/: checking "md5sum" command: "d41d8cd98f00b204e9800998ecf8427e -"
2021/10/08 10:46:52 DEBUG : sftp://minio@remote_server:22//demo_out/: checking "sha1sum" command: "da39a3ee5e6b4b0d3255bfef95601890afd80709 -"
2021/10/08 10:46:52 DEBUG : Saving config "md5sum_command" in section ":sftp" of the config file
2021/10/08 10:46:52 NOTICE: Can't save config "md5sum_command" for on the fly backend ":sftp"
2021/10/08 10:46:52 DEBUG : Saving config "sha1sum_command" in section ":sftp" of the config file
2021/10/08 10:46:52 NOTICE: Can't save config "sha1sum_command" for on the fly backend ":sftp"
2021/10/08 10:46:52 DEBUG : DWH.2MMO.SERVICE_AGREEMENT.20210706.A901: Excluded
2021/10/08 10:46:52 DEBUG : DWH.2MMO.SERVICE_AGREEMENT.20210706.A901: Excluded
2021/10/08 10:46:52 DEBUG : DWH.2MMO.SERVICE_AGREEMENT.20210706.A901_DND: Excluded
2021/10/08 10:46:52 DEBUG : Local file system at /opt/SP/minio/datasync/demo1: Waiting for checks to finish
2021/10/08 10:46:52 DEBUG : Local file system at /opt/SP/minio/datasync/demo1: Waiting for transfers to finish
2021/10/08 10:46:52 INFO : There was nothing to transfer
2021/10/08 10:46:52 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 0.5s

2021/10/08 10:46:52 DEBUG : 11 go routines active"

adding more clarity here , I am copying file now files with 2 pattern , one pattern file present in source directory , but one pattern name not available , so in the log we can see the file name exist with pattern copied successfully , but one not present , we have no clue whether that file not present at source size or rclone trying to copy that file but it was not present at source path , how would I know 1 file is missing with the given pattern using rclone --include.

2021/10/08 11:02:24 DEBUG : rclone: Version "v1.57.0-beta.5681.3df0b2fe1.fix-sftp-timeout" starting with parameters ["rclone" "copy" "--max-depth=1" "--include=anon_PRODUCT.????????.A901" "--include=DWH.2MMO.SERVICE_AGREEMENT.????????.A901" "--sftp-host=remote_server" ":sftp:/demo_out/" "--sftp-user=minio" "--sftp-pass=" "/opt/SP/minio/datasync/demo1" "--log-file=/var/log/rclone/rclone_pll_no_file.log" "-vvv" "-P"]
2021/10/08 11:02:24 DEBUG : Creating backend with remote ":sftp:/demo_out/"
2021/10/08 11:02:24 DEBUG : :sftp: detected overridden config - adding "{11qEL}" suffix to name
2021/10/08 11:02:24 DEBUG : Using config file from "/opt/SP/minio/.config/rclone/rclone.conf"
2021/10/08 11:02:24 DEBUG : sftp://minio@remote_server:22//demo_out/: New connection local_server:36670->remote_server:22 to "SSH-2.0-OpenSSH_5.3"
2021/10/08 11:02:25 DEBUG : fs cache: renaming cache item ":sftp:/demo_out/" to be canonical ":sftp{11qEL}:/demo_out/"
2021/10/08 11:02:25 DEBUG : Creating backend with remote "/opt/SP/minio/datasync/demo1"
2021/10/08 11:02:25 DEBUG : sftp://minio@remote_server:22//demo_out/: checking "md5sum" command: "d41d8cd98f00b204e9800998ecf8427e -"
2021/10/08 11:02:25 DEBUG : sftp://minio@remote_server:22//demo_out/: checking "sha1sum" command: "da39a3ee5e6b4b0d3255bfef95601890afd80709 -"
2021/10/08 11:02:25 DEBUG : Saving config "md5sum_command" in section ":sftp" of the config file
2021/10/08 11:02:25 NOTICE: Can't save config "md5sum_command" for on the fly backend ":sftp"
2021/10/08 11:02:25 DEBUG : Saving config "sha1sum_command" in section ":sftp" of the config file
2021/10/08 11:02:25 NOTICE: Can't save config "sha1sum_command" for on the fly backend ":sftp"
2021/10/08 11:02:25 DEBUG : DWH.2MMO.SERVICE_AGREEMENT.20210706.A901_DND: Excluded
2021/10/08 11:02:25 DEBUG : Local file system at /opt/SP/minio/datasync/demo1: Waiting for checks to finish
2021/10/08 11:02:25 DEBUG : Local file system at /opt/SP/minio/datasync/demo1: Waiting for transfers to finish
2021/10/08 11:02:25 DEBUG : sftp cmd = /demo_out/DWH.2MMO.SERVICE_AGREEMENT.20210706.A901
2021/10/08 11:02:25 DEBUG : sftp output = "f888fd81cb9426a1aac0bb69919a12a1 /demo_out/DWH.2MMO.SERVICE_AGREEMENT.20210706.A901\n"
2021/10/08 11:02:25 DEBUG : sftp hash = "f888fd81cb9426a1aac0bb69919a12a1"
2021/10/08 11:02:25 DEBUG : DWH.2MMO.SERVICE_AGREEMENT.20210706.A901: md5 = f888fd81cb9426a1aac0bb69919a12a1 OK
2021/10/08 11:02:25 INFO : DWH.2MMO.SERVICE_AGREEMENT.20210706.A901: Copied (new)
2021/10/08 11:02:25 INFO :
Transferred: 34.130 MiB / 34.130 MiB, 100%, 0 B/s, ETA -
Transferred: 1 / 1, 100%
Elapsed time: 1.1s

2021/10/08 11:02:25 DEBUG : 14 go routines active

if you want to know if a specific file exists then try something like this

rclone lsf ./source/ --include=DWH.2MMO.SERVICE_AGREEMENT.20210706.A901 > matches.txt

if [ -s matches.txt ]; then
	echo files matched
	rclone copy ./source ./dest --files-from=matches.txt  -vv
else
	echo files NOT matched
fi

and the output would be

files matched
2021/10/08 09:12:36 DEBUG : rclone: Version "v1.56.2" starting with parameters ["rclone" "copy" "./source" "./dest" "--files-from=matches.txt" "-vv"]
2021/10/08 09:12:36 DEBUG : DWH.2MMO.BILLING_PARAMETERS.20170425.A901: Excluded
2021/10/08 09:12:36 DEBUG : DWH.2MMO.SERVICE_AGREEMENT.20210706.A901: md5 = 822fc3c8096b254459da67446d33437a OK
2021/10/08 09:12:36 INFO  : DWH.2MMO.SERVICE_AGREEMENT.20210706.A901: Copied (new)

if that is not what you want, then post a detailed example

1 Like

Let me explain what I wanted if any one help jn this .

I have remote server A and local server B from server A i am pulling files on server B in Dir AB
the files once copied to Dir AB it will upload in S3 bucket using rclone, this all process integrated automated with cron schedule.

Now once data copied to S3 bucket application process those data and removed from bucket .

Then again next cron schedule run and same data which was copied in S3 bucket processed and deleted by application copied from DIR AB as the same data exists there and Dir AB have data retention for 7 days, any way whiles any data that copied to bucket having list and do not copy again same data from Dir AB to bucket even the data deleted from S 3 bucket,

Pls let me know if any query on my question I may be not good in explaining the requirements :frowning:

is there also a way if we can copy source path as well, i mean full directory structure ?
for example on source path we have given /tmp/source-file copying to Destination /datasync , rclone will copy the file source-file with directory structure . /datasync/tmp/source-file will it possible by any flag in rclone ?

sorry, not understanding your example and grammar.
a good detailed example with real paths would be helpful.

Let me try to explain bit more in details.

  1. I have files copied in folder "/opt/SP/minio/datasync/demo_skip"
  2. from folder "/opt/SP/minio/datasync/demo_skip" I am copying data to minio bucket using rclone.
    rclone copy /opt/SP/minio/datasync/demo_skip s3_bucket:/demo_skip/
  3. this folder " /opt/SP/minio/datasync/demo_skip" have files retention period 30-60 days .
  4. what happening from bucket s3_bucket:/demo_skip/ application consume the files and deleted from "s3_bucket:/demo_skip/" and it is empty .
  5. one cron joob running in background copy the files again from /opt/SP/minio/datasync/demo_skip to s3_bucket:/demo_skip/ now application find the same data again and processed it :frowning:
    is there any way we can we can stop rclone to copy the files once it is done copying to bucket again and again.?

i tried to do export list of copied files in one specific file like below.

rclone md5sum /opt/SP/minio/datasync/demo_skip >> /tmp/filescopied

#cat /tmp/filescopied
d41d8cd98f00b204e9800998ecf8427e test1
d41d8cd98f00b204e9800998ecf8427e test4
d41d8cd98f00b204e9800998ecf8427e test8
c625c9b8ac50605c1da62b6e2cf23268 test3
d41d8cd98f00b204e9800998ecf8427e test9
d41d8cd98f00b204e9800998ecf8427e opt/SP/edwdata/ab_data_mount/main/serial/VFDE/public/DWH_PUB/main/mozart/incoming/DWH.2MMO.SERVICE_AGREEMENT.20210704.A901
60b8b876699360cba8fe70ae0e1ab909 test2
f888fd81cb9426a1aac0bb69919a12a1 opt/SP/edwdata/ab_data_mount/main/serial/VFDE/public/DWH_PUB/main/mozart/incoming/DWH.2MMO.SERVICE_AGREEMENT.20210702.A901
f888fd81cb9426a1aac0bb69919a12a1 DWH.2MMO.SERVICE_AGREEMENT.20210702.A901

now I am using rclone copy with --exclude-from flag
rclone copy --exclude-from=/tmp/filescopied /opt/SP/minio/datasync/demo_skip s3_bucket:/demo_skip -vvv -P
but its not work , when I redireect only file list to filescopied file then it work but I wanted to do delta check if any changes in source files then only it will trigger copy same file to bucket.

can you help how I can set logic to see checksum and filename using --exlude-from=/tmp/filescopied and then rclone decide whether it need to copy or ignore the already transferred files

let me know if you understand or any query
thanks in advance for your help :slight_smile:

once the files are copied from local to minio, and within the retention period, are the source files used for any purpose?

Yes the other applications also utilising those source files for there own use , thats the reason we don't have any control on there source directory path and files

pls help if anything we can do using rclone itself or any script we can put between to check this :frowning:

after that application has processed the files, if you can move it to a new folder, you might use
https://rclone.org/docs/#compare-dest-dir

but our job running via rclone so it can again do the same thing will copy the files from remote server to local intermediate directory and then again it will copy the same data to bucket if application deleted the file from bucket after processed .:frowning:

you have a complex use-case.

as i understand it, the issue is

  1. the source will copy files to dest
  2. the application in the dest will process the dest files and then delete the dest files.
  3. goto step 1

the goal is to prevent the source from being copied for a second time to the dest.

is that correct?

yes correct in addition in case source files change in size or updates then it should copy the file to dst.

then i think that my suggestion should work.

  1. on source: rclone copy sftp:/incoming /datasync --compare-dest=/alreadyprocessed
  2. on dest: application processes files in /datasync. do not delete the processed files.
  3. on dest: mv /datasync/ /alreadyprocessed/
  4. goto 1.
1 Like

Let me test this and update you
Thanks for your help so far

i see that you marked this as solved, great!

curious as to how my suggestion worked out for you?

1 Like

I follow the same process you mentioned ,

  1. rclone copy /datasync/ minio:bucket/ --compare-dest=/alreadyprocessed

Then
2. rclone move minio:bucket /alreadyprocessed

So by comparing /alreadyprocessed to sftp/incomming it is not copying the files which is already in present in /alreadyprocessed and copying when the new files there or change in file
3. Then step 1 again
Only thing due to comparisons we need to keep double space one for local directory and one for intermediate directory

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.