Copying changed files from source into sub-folders of a destination parent folder on S3

caberson · February 25, 2019, 6:59pm

Use case:
We currently have a job that uses rclone copy to copy changed files to a S3 bucket. We now want to put only the updated files into a sub folder/bucket within the parent bucket. Something like:
s3://parent-bucket/2019-01-01/changed-files-on-2019-01-01-go-here-after-job
s3://parent-bucket/2019-01-02/changed-files-on 2019-01-02-go-here-after-job

Question:
Is there a way to accomplish this on S3 using rclone, where all the files, regardless of sub-folders (buckets), are considered under a main bucket? If not, does anyone have suggestion on doing this?

ncw · February 26, 2019, 9:43am

I’m not 100% clear what you want to achieve, but I think using –backup-dir might be helpful. This will put the files that would be overwritten in a sync into a backup directory.

caberson · March 1, 2019, 12:12am

@ncw,
Thanks for the reply. I don’t believe the backup-dir option suits our need as we want the newer (and not the older) files in the dated directory. Reason for doing this is that we process the updated files and we would like to cut down on the time needed to look up changed files.

After searching for a while, I don’t think this is possible so I probably need a different approach. Like parsing the dry-run logs from a “copy” operation and use that list instead.

ncw · March 1, 2019, 8:07am

You can use rclone lsf --files-only --min-age or --max-age to generate lists of objects. You can then feed these back into rclone with the --files-from flag - that might help.

caberson · March 1, 2019, 4:20pm

@ncw, when using the --files-from flag for rclone copy, will it download unchanged files if the file is in the file list? If not, then what you suggested will definitely be easier than parsing the dry run logs from a copy command.

calisro · March 1, 2019, 4:27pm

notebook2:~$ echo go1.12.linux-amd64.tar.gz > filefrom
notebook2:~$ rclone copy --files-from filefrom . robgd: -v 
2019/03/01 11:26:03 INFO  : Google drive root '': Waiting for checks to finish
2019/03/01 11:26:03 INFO  : Google drive root '': Waiting for transfers to finish
2019/03/01 11:26:12 INFO  : go1.12.linux-amd64.tar.gz: Copied (new)
2019/03/01 11:26:12 INFO  : 
Transferred:   	  121.334M / 121.334 MBytes, 100%, 12.762 MBytes/s, ETA 0s
Errors:                 0
Checks:                 0 / 0, -
Transferred:            1 / 1, 100%
Elapsed time:        9.5s

notebook2:~$ rclone copy --files-from filefrom . robgd: -v 
2019/03/01 11:26:17 INFO  : Google drive root '': Waiting for checks to finish
2019/03/01 11:26:17 INFO  : Google drive root '': Waiting for transfers to finish
2019/03/01 11:26:17 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 0
Checks:                 1 / 1, 100%
Transferred:            0 / 0, -
Elapsed time:       900ms

It won’t force the file to be reupload/downloaded. It is essentially just a filter for the sync/copy.