GCS: auto-create subfolders based on file creation time?

What is the problem you are having with rclone?

I am regularly running 'rclone copy' to copy files from local folders to a Google Cloud Storage bucket.
Each local folder contains 1 week worth of specific log files (rolling delete after 1 week).

In the GCS bucket, I have a hierarchical folder structure similar to:
bucket/log_type/owner/date/

I supply the /log_type/owner prefixes manually as shown in the rclone command below. The date sub-folder should be computed automatically from the file creation date, and added as a prefix to each file encountered in the source directory. Is there such an option in rclone?

What is your rclone version (output from rclone version)

rclone v1.51.0

  • os/arch: linux/amd64
  • go version: go1.13.7

Which OS you are using and how many bits (eg Windows 7, 64 bit)

ubuntu 18, 64 bits

Which cloud storage system are you using? (eg Google Drive)

GCP Storage buckets

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone: Version "v1.51.0" starting with parameters ["rclone" "copy" "/my_source_dir" "my_remote:my_bucket/log_type/owner/" "-vv"]

hi, not sure if i understand what you want.

if you copy an entire directory to a new folder in the remote, then you have duplicates in each new folder.

perhaps you can use sync and --backup-dir.
rclone sync /my_source_dir my_remote:my_bucket/pictures/current --backup-dir=my_remote:my_bucket/pictures/current_date/"

and you can use --dry-run when testing.

Thank you for your input. My initial post was unclear, I have rewritten it now :slight_smile:

Not yet!

I'd probably use some variant of $(date) in the script that does the transfer which doesn't quite get the creation date into the file name but it is close...

sync and --backup-dir combined with a current date+time could be helpful to newbies.

and then there would be the issue of how to format date+time.
perhaps just use the format from the date command from linux or the golang.

Thanks for your response!

My current hack is to sync every midnight with the --max-age 24h flag, to get the correct binning by date.
However, if a sync job fails, the failed data then needs to be resynced manually.

As the perfect solution does not yet exist, I could envision to have 7 sync jobs one after the other, each filtering the data in the source folder for one specific date, and prefixing the encountered files with that date. Since I have max. 1 week of data, that still seems doable.

If I was doing this I would rename the log files with dates in at the source - that will make syncing them easier if you miss a sync job etc.

Good point, I think that's what I'll do.

Update for precision: you can't rename filenames to contain forwardslashes --> I recreated the lowest level of desired folder structure (date) locally.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.