The number of task scan files exceeds the original number but does not end

What is the problem you are having with rclone?

I migrated the file system from windows to minio, I saw from the windows file explorer that my number of files is 16 million, but rclone is now running for several days, and I see the check shows that 40 million files have been scanned Not stopped, still running

What is your rclone version (output from rclone version)

1.51.0

Which OS you are using and how many bits (eg Windows 7, 64 bit)

windows server2012 -> centos7 minio

Which cloud storage system are you using? (eg Google Drive)

local->minio

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy E:/xxx minio1:data1 --bwlimit 90M --transfer 8 --checkers 60 --config xxx -P --log-file xxx -v --dump-headers

The rclone config contents with secrets removed.


A log from the command with the -vv flag

I'm not sure what the problem is?

You'd have to share the log. Is there a reason you dumped headers as that's really going to slow things down and make a huge log.

I'd guess you mean the checks? Those are retries probably but with no log, it's just an educated guess.

Because of my mistake, I used -v and it didn't generate a lot of logs,I have been continuously monitoring this task, the retry file is only a few thousand

You can't see the ones I'm talking about without debug as it happens behind the scenes and it's why checkers are more.

Does this have something to do with the large number of source files and the increasing number of files?

Without a log, it's all a guessing game and not a fun one to play.

If you want to share the log, happy to look at it.

this is log link,https://github.com/shouldnotappearcalm/kubernetes-study-yaml/blob/master/rclone.log

Among them, I see some files are newly added files today

This would definitely be problematic:

2020/05/13 19:58:35 NOTICE: Time may be set wrong - time from "155.1.193.162:9000" is 5m1.364s different from this computer

So things like that would have some retries:

grep AD29065D81129990B7BE3A1A740DCC9F_file.jpg rclone.log
2020/05/13 23:29:39 INFO  : 3815/201901/20100/275E04F4AAB81BE1AB9B9F9FE3540FCF/archives/AD29065D81129990B7BE3A1A740DCC9F_file.jpg.thumb.jpg: Copied (new)
2020/05/13 23:29:39 ERROR : 3815/201901/20100/275E04F4AAB81BE1AB9B9F9FE3540FCF/archives/AD29065D81129990B7BE3A1A740DCC9F_file.jpg: Failed to copy: s3 upload: 400 Bad Request: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>IncompleteBody</Code><Message>You did not provide the number of bytes specified by the Content-Length HTTP header.</Message><Key>3815/201901/20100/275E04F4AAB81BE1AB9B9F9FE3540FCF/archives/AD29065D81129990B7BE3A1A740DCC9F_file.jpg</Key><BucketName>lnzy</BucketName><Resource>/lnzy/3815/201901/20100/275E04F4AAB81BE1AB9B9F9FE3540FCF/archives/AD29065D81129990B7BE3A1A740DCC9F_file.jpg</Resource><RequestId>160E9FDA8703265F</RequestId><HostId>4bd6a9b8-d78d-4825-a8f2-0f95ee81c0d2</HostId></Error>
2020/05/17 02:21:43 INFO  : 3815/201901/20100/275E04F4AAB81BE1AB9B9F9FE3540FCF/archives/AD29065D81129990B7BE3A1A740DCC9F_file.jpg: Copied (new)

If you run the log with -vv, you can see them.

Should this be the reason for not stopping? But he has retried successfully

Because 40 million files have been checked now, I cannot stop the task and run it again with -vv, which is too time-consuming

instead of one massive rclone command, you could break it into a number of smaller ones

rclone E:/xxx/folder1
rclone E:/xxx/folder2
rclone E:/xxx/folder3

But in this way, the original directory structure will be lost when transferred to minio

i also have a set of folders with a large amount of files,

rclone copy e:\xxx\folder1 minio1:data1\folder1
rclone copy e:\xxx\folder2 minio1:data1\folder2

Yeah, if it's still going, it has files still left to copy over and without a -vv, you can't be certain of what the retries are, but if it's still going, that's most likely the cause/case.

If you have memory, you can always increase the max backlog to help speed it up as well.

If it was my huge copy going on, I'd compare the sizes on the source and dest and validate how much is 'done'. If it's a good portion done, I'd wait it out.

If it's going poorly, I'd stop it, fix the time and fix the logging to remove headers and let it run via INFO for now with a few tweaks rather than debug is things generally seem to be ok.

Does rclone support breakpoint resume?

I'm not sure what you mean by that.

If you stop the copy and restart, anything on the other side should not be recopied.

I say should as you have a time difference so that worries me a bit so you might want to test a few files first and validate as rclone does use size and time to compare for a copy/sync first.

I understand what you mean, thank you very much for your answer

At a guess you had some errors, so rclone did a complete retry. I guess rclone is going for the 3 retries so should stop at 48 million checks.

There will be some files which didn't get uploaded and there will be error messages about that.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.