Can Rclone do this?

Hi all
Here is the problem im trying to solve so please tell me if you think Rclone can solve it

  • I need to copy many (milions) files from NAS to S3
  • I need to preserve the file/folder location from NAS to be same in S3 (yes the \ will be replace by /)
    Now the challenging part
  • The files are not immediately available\online on the NAS
  • I need to first attempt to copy or read them (it will fail)
  • The failed access trigger the system to bring the files from deep archive to the NAS (it can take seconds to minutes)
  • Then files are available (online) on the NAS I can resume the above workflow

so will Rclone wait for the files to be available or will treat it as failed copy ?

welcome to the forum,

yes, that is what rclone does, nothing special about that.

why not run a quick test, copy a single file?
and if rclone errors out the first time, the next time you run rclone, the file will be available for transfer.

should i use any of the following flags?or other?

--retries int
--retries-sleep Duration
--oos-copy-timeout Duration

imho, no, as a test, just copy a single file using default values and see what happens.
for example,
rclone copy file.ext name.of.your.remote: -vv

need a full debug log to be posted.
based on that debug log, we can see what else needs to be done.

Any reason why not to do this in two steps? the same way how it is performed for deep storage solutions in AWS, Azure or GCS

  1. Initialize your storage bringing files from what you call "deep archive" - you say nothing about details - so difficult to say how. Maybe simple shell script attempting to copy every file?

  2. rclone copy/sync/move

thank you, will post this here once i get access to the source

yes that can work, since there are millions of files i wanted to see if there is solution that basically retry the copy few times, then when it successfully copy(to cache) it then upload it to destination.
with the 2 batch approach i will have to constantly create new batch files, and scan each batch logs for errors

I mean move ALL files to hot storage before running rclone.

Doing rclone retries you will choke it to death if run with full speed as you said you do not know how long restorations takes - seconds or minutes. Or it will be mega slow if you wait minutes between retries - will never finish for millions of files.

That is the challenge "move ALL files to hot storage before running rclone"
once its in hot storage its basic rclone usage
i might need to code a solution that will copy files to hot storage(copy-wait-verify file exists), once file is copied (downloaded) then i can run rclone
its just dont give me much advantages to use Rclone, if i copy(download) the file to local hot storage/local disk then i can now just write it to S3

Can you explain what deep storage NAS solution you are using? Personally I have never heard about anything like this. But obviously I do not know all.

its called DiskExtender (data stored in Tapes, for windows its look like \server\folder\file.cad ) the file is a pointer/link
once you try to copy/open it the system will go fetch the file
i think its same as Dropbox/OneDrive etc... when file is offline

Ok - clear now.

I think you have to find a way to move deep to hot storage files sequentially as they are stored on tapes. So get all data from tape1 to hot - then rclone it to your preferred destination. Move to tape2.

Doing it randomly is not going to work for millions of files given tape latency. And most likely will kill tapes/drive.

So it requires some knowledge of how DiskExtender works in details. IMO you can not achieve all using rclone only.

It is interesting project - good luck.

yes you are correct, i do have list of the files and can sort it by order i want (i.e tape)
so i can build a batch file per tape
was just looking for a tool that i can run for this step 1 (deep to hot storage) and wait for step 1 to complete
then i can run step 2 rclone upload to S3

step 1:
for each file{

  • try to copy
  • wait x seconds
  • try again
    -try again
  • next

I think it requires really deep knowledge of DiskExtender. I think your pseudo code will work in theory.

But you probably need something like -

restore all tape1 to hot

I have a feeling that file by file is solution designed only for rare situations when something has to be restored from deep storage. Not for bulk transfer.

I do not know anything about this software so difficult to give some meaningful comments here.

and no it wont work....

try to copy
wait x seconds

if you have 1 million files and your wait is 10s then it is 115 days. with tape running all the time - it will be dead after one month

Talk to this tape solution company - they are only people who know what is the best way to do what you want. You might have to pay them to provide customised solution. No idea what your contract with them is.

rclone cloud storage transfer is the least problem here.

can Rclone do this? the try copy+wait
i will not run it on million of files but will split it to many batches
once batch complete download
will run batch to upload
so the tape reading can rest
they said i can try to copy/read 30 files in parallel without issue

The best answer for this is try small batch. you will see. And you can calculate how long it takes for all data.

i will be back to report results
waiting for NAS access