Possible let rclone save/resume some task status

I am using AutoRclone which uses Rclone to copy files from source folder to my Team Drive. There are 39W+ small files (most of them are .jpg and .txt) in source folder. For now I have more than 1K service accounts to help me do this automatically (once the quota limitation, i.e., 750G every 24 hours, for current service account is reached, switch to next service account automatically).

But the problem is that after switched to a new service account it takes a very long time for Rclone to read source/destination folder to skip some already copied files.

So I am wondering that if it is possible to save some status for current service account (for example the file list the service account has copied or file list the service account has not copied) in one task. Then for the next service account that will continually copy files in a new task can resume from the saved status thus skip many redundant reading and comparing OP. I mean is it possible to add some flags like --save_status and --resume?

rclone --config rclone.conf copy src001: dst001: --save_status status_file.txt
rclone --config rclone.conf copy src002: dst002: --resume status_file.txt

the rclone.conf is as

[src001]
type = drive
scope = drive
service_account_file = ./1.json
root_folder_id = my_source_folder_id

[dst001]
type = drive
scope = drive
service_account_file = ./1.json
team_drive = my_destination_folder_id

[src002]
type = drive
scope = drive
service_account_file = ./2.json
root_folder_id = my_source_folder_id

[dst002]
type = drive
scope = drive
service_account_file = ./2.json
team_drive = my_destination_folder_id

Think this will save a lot of time. Many thanks for this cool tool.

This is the kind of thing the cache backend was invented for. It can cache the metadata and not the file data.

Have you tried that?

1 Like

Thanks. Have not used that. Will try. The introduction of rclone copy is

Copy files from source to dest, skipping already copied

Does it compare at the same time checksum & size & mod-time & size to skip files that already copied? I have looked up the global flags,

--checksum                             Skip based on checksum (if available) & size, not mod-time & size
--ignore-checksum                      Skip post copy check of checksums.
--ignore-size                          Ignore size when skipping use mod-time or checksum.
--ignore-existing                      Skip all files that exist on destination
--ignore-times                         Don't skip files that match size and time - transfer all files
--size-only                            Skip based on size only, not mod-time or checksum

And any suggestion for my situation?

Can it cache only the metadata? Does this happen if you just set chunk size to 0?
does the cache by the way also cache the non-standard attributes that aren't normally included in a listing? I never thought about that, but I suppose that would be plausible, which could be useful to know about.

I'm not sure that fixes the main problem though, which is having to re-list several times if you run rcloen several times.
What would be really nice if is there was a function to dump the remaining transfer list to a file when the operation ends (for whatever reason).That would allow pretty seamless handovers - and also ultra-fast resumes. The user would have to be careful about not using very old listings of course - but in the short term it would be a great tool.

That's just another random idea though.

@xyou365 I think the closest you can do to this currently is do rclone lsf source: , save that to file, and then use --files-from during the transfer. Then you will not have to re-list the source again. The problem is the destination will still have to be re-listed each time, so it probably does not end saving very much time in a full sync. It may save a decent amount of time in a limited copy however...

Yes, it runs the same comparison again each time.
By default, if the size is identical and the modtime is "identical" (ie. within the margins of error) then it just doesn't copy it because it would be redundant. Checks should be very fast to request normally - and the comparison on the CPU is trivial (seconds for tens of thousands of files). On backends that can support it --fast-list will make it even faster (as much as 15x) to list. However, full syncs on very large collections of data can still take a couple of minutes.

It is also possible to for --checksum comparison, but this typically happens automatically if it is possible to do it easily (if both sides already have precaclulated checksums).

But it does need to re-list each time to compare because rclone does not remember doing that already last time, so the listings are technically redundant (after doing it the first time). As I mentioned above, if there was a way to save the transfer list then we would not have to list and could just start copying the files we already know we checked and compared.

@ncw Or does there exist some way to use rclone check/cryptcheck for this and dump a pre-compared list to a --files-from compatible format? maybe? ...

The way to import such a list already exists ( --files-from) , but there is currently no way to dump a transfer list in progress. I only know of it being possible to make a list by using rclone lsf . but this will not be a compared list, this will just be a list of all files.

If you wanted to make a compared list (ie. same as the transfer list would be) you could do that in scripting. Listing both locations, then comparing them yourself. It is very possible - and some people have shown scripts that do this for you already - but it would certainly be a nice feature to have built-in. I do agree with that :slight_smile:

1 Like

You can do this with a bit of scripting and the -v flag.

It is a nice idea though, so nice I think there is already an issue about it: https://github.com/rclone/rclone/issues/1572

What you want is an output like the comm function where you can say whether you want unique files in source, common files, or unique files in dest. I guess there is another column too that is common files which differ.

I'm sure there is an issue with that idea in too but I couldn't find it!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.