Capture the S3 URLs when migrating from GCS to AWS

What is the problem you are having with rclone?

Hi, I’ve been tasked with migrating files from Google Cloud Storage to AWS S3. As part of the migration, we need to capture the S3 URLs of the files (so we can update our database with the new S3 paths instead of the old GCS ones). Ideally, as each file is migrated, its S3 URL should be written to a log file that we can later process to update the database.

From my research, it seems there isn’t a built-in way in rclone to output the destination URL directly. ChatGPT suggested using --use-json-log with --log-file=mapping.json and then parsing the output with a Python script, but I’m wondering if there’s a more standard or efficient way to achieve this.

This feels like a common migration scenario, so I’d expect there to be some best practice around it. What’s the recommended way to use rclone to accomplish this?

Run the command 'rclone version' and share the full output of the command.

Which cloud storage system are you using? (eg Google Drive)

Google Cloud Storage and about to migrate to S3 AWS.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy gcp:my-gcp-bucket s3:my-aws-bucket/my-folder --use-json-log --log-file=mapping.json

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

Paste config here

A log from the command that you were trying to run with the -vv flag

Paste  log here

Why you want to capture something which is always there in plain site? Unless you use some custom URL mappings (but then rclone wont help you anyway) your S3 URL corresponds to file location “base URL/bucket/path/filename”.

Simply rclone ls should be all what you need.

But rclone ls only returns the file size and name (e.g. 554944 image.png), not the full S3 URL as you suggested “base URL/bucket/path/filename”.

Is there a way to have rclone output the full S3 URL directly using commands or flags, or do I need to construct the URLs based on the filename, the bucket, and the base URL after migration?

What exactly you are running?


$ rclone ls myS3storage:
 10486543 test-uk/data/0d/0defa7875a071b7
      721 test-uk/data/a4/a4716484524240184d1cf10b
      607 test-uk/data/d3/d3ee7bd2dda7cf0d94b228ec2d1f
 10486794 test-uk/data/fd/fdeaf48374136bdf9396d088cd21
      575 test-uk/index/06d44b47939df5b19e6fd465
      719 test-uk/index/f224911525e9948ef522325844e3cd6a0a5d
...
21084941 kopia-lock/p1bf12b0f207915b35c2e0cf9a4ad4c72-sbe9285447367381b135
 21018056 kopia-lock/p1c981fff55e1aca542da05cb17dd58c9-sb3ba5a5a8f78cf26136
 27117211 kopia-lock/p1e76ae62c74568e03597dea2dbfee3ba-scb575ed2197c3032136
 21964032 kopia-lock/p20bc31833b019f2aa41c0dd17895210f-s68d2a8f6516acf97135

where test-uk and kopia-lock are two different buckets I am using.

Anyway. rclone ls might be not the best option. rclone lsf is more flexible.

Bottom line is - all information you need is available IMO. No need to plan capturing anything from logs. As you will not find there anything extra anyway.

Oh nice, I ran rclone ls on my local machine (not on S3 yet), so it returns output like 554944 image.png. Just to confirm—are the strings like 0defa7875a071b7 at the end of each line in your output the file name or the folder name? If running rclone ls returns both bucket name and file name then it satisfies my goal.

yeah - files’ names.

To make sure that only files with their path are displayed:

$ rclone lsf myS3storage: --format p --files-only  --recursive

Check docs for more options - using lsf you can customise it to display only what you need which is probably easier for some later processing.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.