Can not copy all objects in azure blobs to s3 bucket at one time

What is the problem you are having with rclone?

I can read all objects in Azure blob objects;
[root@ip-10-111-110-41 ec2-user]# rclone lsd AZStorageAccount:
-1 2020-03-19 11:54:23 -1 bootdiagnostics-azrftp-9e96a6b8-b1bf-4cf7-8***************
-1 2020-03-17 16:30:16 -1 storage-engage
-1 2020-03-17 16:30:30 -1 storage-engage-archive
-1 2020-03-17 16:30:46 -1 storage-engage54
-1 2020-03-17 16:31:12 -1 storage-engage54-archive

I can read all objects in S3 Buckets;
[root@ip-10-111-110-41 ec2-user]# rclone lsd s3:
-1 2022-10-25 14:19:03 -1 blob-migrated-s3-bucket-for-backup
-1 2022-11-09 14:53:35 -1 bucket-for-azuremigration-eu
-1 2022-11-14 14:25:35 -1 s3-bucket-for-engage-arciheve

I can copy files both from Azure and Inctance to S3 by using their object names one by one;

(From instance to Bucket)
[root@ip-10-111-110-41 ec2-user]# rclone copy ~/.config/rclone/rclone.conf s3:s3-bucket-for-engage-arciheve -vv
2022/11/15 09:25:20 DEBUG : rclone: Version "v1.60.0" starting with parameters ["rclone" "copy" "/root/.config/rclone/rclone.conf" "s3:s3-bucket-for-engage-arciheve" "-vv"]
2022/11/15 09:25:20 DEBUG : Creating backend with remote "/root/.config/rclone/rclone.conf"
2022/11/15 09:25:20 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2022/11/15 09:25:20 DEBUG : fs cache: adding new entry for parent of "/root/.config/rclone/rclone.conf", "/root/.config/rclone"
2022/11/15 09:25:20 DEBUG : Creating backend with remote "s3:s3-bucket-for-engage-arciheve"
2022/11/15 09:25:20 DEBUG : rclone.conf: Need to transfer - File not found at Destination
2022/11/15 09:25:21 DEBUG : rclone.conf: md5 = 3e3a6e60c37f26dd2a10e4c6a4336ece OK
2022/11/15 09:25:21 INFO : rclone.conf: Copied (new)
2022/11/15 09:25:21 INFO :
Transferred: 315 B / 315 B, 100%, 0 B/s, ETA -
Transferred: 1 / 1, 100%
Elapsed time: 0.4s

2022/11/15 09:25:21 DEBUG : 6 go routines active----------------------

(From Azure Blob to S3 Bucket)
[root@ip-10-111-110-41 ec2-user]# rclone copy AZStorageAccount:storage-engage54/Cihan.JPG s3:s3-bucket-for-engage-arciheve -vv
2022/11/15 09:36:26 DEBUG : rclone: Version "v1.60.0" starting with parameters ["rclone" "copy" "AZStorageAccount:storage-engage54/Cihan.JPG" "s3:s3-bucket-for-engage-arciheve" "-vv"]
2022/11/15 09:36:26 DEBUG : Creating backend with remote "AZStorageAccount:storage-engage54/Cihan.JPG"
2022/11/15 09:36:26 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2022/11/15 09:36:26 DEBUG : fs cache: adding new entry for parent of "AZStorageAccount:storage-engage54/Cihan.JPG", "AZStorageAccount:storage-engage54"
2022/11/15 09:36:26 DEBUG : Creating backend with remote "s3:s3-bucket-for-engage-arciheve"
2022/11/15 09:36:27 DEBUG : Cihan.JPG: Need to transfer - File not found at Destination
2022/11/15 09:36:27 DEBUG : Cihan.JPG: md5 = 0a233b2df8cb4568dd75adbd1f5b3a3e OK
2022/11/15 09:36:27 INFO : Cihan.JPG: Copied (new)
2022/11/15 09:36:27 INFO :
Transferred: 42.748 KiB / 42.748 KiB, 100%, 0 B/s, ETA -
Transferred: 1 / 1, 100%
Elapsed time: 0.6s

However, I am not able to copy all object in one Blob container to S3 Bucket at one time with the command "rclone copy AZStorageAccount:storage-engage s3:s3-bucket-for-engage-arciheve"

Run the command 'rclone version' and share the full output of the command.

rclone v1.60.0

Which cloud storage system are you using? (eg Google Drive)

S3 Bucket / Azure Blob

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy  AZStorageAccount:storage-engage s3:s3-bucket-for-engage-arciheve -vv

The rclone config contents with secrets removed.

[AZStorageAccount]
type = azureblob
account = azureblockblobstorageapi
service_principal_file = ~/azure-principal.json
region = eu-west-1

[s3]
type = s3
provider = AWS
env_auth = true
region = eu-west-1
access_key_id = 
secret_access_key =
acl = private

A log from the command with the -vv flag

[root@ip-10-111-110-41 ec2-user]# rclone copy  AZStorageAccount:storage-engage s3:s3-bucket-for-engage-arciheve -vv
2022/11/15 10:41:43 DEBUG : rclone: Version "v1.60.0" starting with parameters ["rclone" "copy" "AZStorageAccount:storage-engage" "s3:s3-bucket-for-engage-arciheve" "-vv"]
2022/11/15 10:41:43 DEBUG : Creating backend with remote "AZStorageAccount:storage-engage"
2022/11/15 10:41:43 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2022/11/15 10:41:43 DEBUG : Creating backend with remote "s3:s3-bucket-for-engage-arciheve"
2022/11/15 10:42:43 INFO  : 
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:       1m0.1s

2022/11/15 10:43:43 INFO  : 
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:       2m0.1s

2022/11/15 10:44:43 INFO  : 
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:       3m0.1s

2022/11/15 10:45:43 INFO  : 
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:       4m0.1s

2022/11/15 10:46:43 INFO  : 
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:       5m0.1s

2022/11/15 10:47:43 INFO  : 
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:       6m0.1s

2022/11/15 10:48:43 INFO  : 
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:       7m0.1s

2022/11/15 10:49:43 INFO  : 
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:       8m0.1s

Killed
[root@ip-10-111-110-41 ec2-user]#

Did you kill the rclone or did the OS kill it? Rclone will need enough memory to hold all the objects for the largest directory. Each object uses rougly 1k of RAM, so if you have 1 million objects in a single directory then rclone will need 1 GB of RAM.

Add the --checksum flag for more efficiency. Both s3 and azure support md5 checksums - this will speed up the process so it doesn't have to do HEAD requests to read the modified time.

Hi, thank you for your fast response.
he rclone command killed it (or maybe os, but not me) . I installed the instance with minimum capacity. I will increase the resources then retry. I will update the entry soon.

1 Like

I just worked in Azure blob environment. There are 873 Milion object in 3 directories. The largest one has 500 milion objects. How much RAM will it bee needed to transfer 873 milion and total 8.3 TB data with rclone? Omg it sounds like it is not going to work. I tried file by file migration by object names. It looks like it will take 464 days to transfer based on my calculation. 100 objects transfered at 75 seconds. even if it double up the transfer rate with direct rclone copy command, it sounds more then 230 days? Do you have metric for transfer rate per 1000000 object or per Mb or TiB?
Is there any parameter to limit copying e.g 1.000.000 object to test the transfer time.
Thanks in advanced.

The largest directory will be the limiting factor... 500 Million objects will take approx 500G of memory :frowning:

After rclone lists the directory, with enough --transfers rclone will fill your bandwidth.

I have a sketched out plan for doing large directory syncs on disk - I haven't implemented this yet though.

Maybe your company would like sponsor me to implement this feature?

....

In the mean time, use rclone lsf -R > filez to get a listing of the directory into a file. This may take a while but it won't use very much memory.

Use head to pick off 1,000,000 objects

Use rclone copy with --files-from-raw and --no-traverse and --no-check-dest to transfer.

This isn't a sync. If you want to make it a sync you'd list the destination too and use comm to find common and differing files and just sync those. That is essentially how my proposed on disk sync will work, except it will be able to respect sync flags.

You could automate this and do chunks of 1M files at once.

Thanks Nick,

Let me know what you need to implement this feature, I may tell to my boss. The messenger will not fail :slight_smile:
rclone copy failed as expected even enlarged instanse. Now I am using rclone lsf -R > filez to get the list of the objects. I did not understand how to Use head to pick off 1,000,000 objects. If you share a sample, I really appritiace that.
Thanks

It would be a few days of consultancy work. If you want me to make a quote then drop me an email nick@craig-wood.com

You can use the unix tools head and tail to pick lines anywhere out of a file.

First 100 lines

head -n 100 <filez

Next 100 lines

head -n 200 <filez | tail -n 100

Next 100 lines

head -n 300 <filez | tail -n 100

etc.