I don't understand the documentation regarding Filter Options because I want to filter files I'm copying based on the Metadata in S3 (Glacier Deep Archive) and the documentation has Flags for filtering directory listings. I just want to know if I can filter out the files in S3 that are archived. Is that possible? or do I have to create a text file with filtered results before I can copy what I want from S3? I have never used rclone before, I always used AZCOPY since I'm migrating from AWS to Azure, but that doesn't allow filtering on S3 Metadata either and fails as soon as it hits an archived file with a 403 error because it's inaccessible. It seems most utilities are stupid in the same way, they can't bypass or filter the copy process.
Run the command 'rclone version' and share the full output of the command.
I'm sorry but I don't understand that answer. You can probably tell I'm not really a Linux user. I want to copy the files, but isn't lsf a switch to list the files (like ls)? I pasted my command in the topic and dry-run showed a list of files, so I ran the command, and it's sitting there doing something, but no output. I'm assuming it's doing something, but you know what happens when you assume...
i ran those command on windows, not linux
the same exact command works on windows, linux, macos, etc...
from rclone docs, "To test filters without risk of damage to data, apply them to rclone ls"
imho, once the filter is safely tested using rclone ls, then rclone copy --dry-run -vv, then rclone copy -vv
Although I did get it working, performance has been somewhat erratic. The copy process starts well, then degrades down to unacceptable levels. I have restarted the process several times with different setting and I am making some progress. It is dd thought. It seems to run in spurts where a set of files or a folder will copy, then it stalls, and then after a while it starts again. It isn't CPU/Memory bound, and at times I get throughput on the network interface approaching 1GBs, then nothing. Very weird. My latest attempt is using this command:-
Just to update you on this, I did successfully copy the data over the weekend for one environment. If I understand this correctly RCLONE reads the file list from S3 (using the filter) then starts the copy process. I believe the output paused for a great deal of time because (based on the metada filter I used to ignore Glacier deep Archive), creating the list of files to be copied took a considerable amount of time because 95% of the files in one specific folder were archived, and there were a LOT of files in that 2TB folder. Everything else copied with no issues, although I need to detune the performance a bit since the VM network interface was saturated at times. Other than that, success! I can't thank you guys enough for the help, because AZCOPY and Azure Data Factory were really not going to solve the problem. I've become a big fan of RCLONE basically overnight.
And for anyone else copying from S3 to Azure Blob using Private Link/Private Endpoints for Azure Blob Storage Accounts, put your VM on the same VNET/Subnet as the Storage Account Private Endpoint. I was getting up to 1GBs network throughput for the entire copy process between S3 private buckets and a Private Storage Account. The next one is 9.5 TB so I'll find out how long that takes with the same setup.