Rclone sync: Stuck?

Agree…

I checked the logs of other jobs, including those that are working, and I also identified the problem with some small files:

 *   ...2018.02.18_20.13.50.7z: 100% /317.400k, 319/s, 0s
 *   ...2018.02.19_11.06.18.7z: 100% /148.916k, 907/s, 0s
 *   ...2018.02.22_10.23.27.7z: 100% /9.478k, 1.219k/s, 0s
 *   ...2018.02.22_19.11.00.7z: 101% /3.618k, 28.922k/s, 0s

I would rather fix the things, generating hash on wasabi. I don’t want to propagate the dropbox error … I read the #2213 issue.

But I don’t see how to separate only the troublesome files. There are thousands of files, several tens of Gb, various folders, etc.

I don’t know if I understood … how to do this local -> wasabi?

Is there any parameter to disable sending using chunks?

I’m running with the latest beta (rclone v1.40-132-g0daced29), let’s see what happens.

Stuck in the same video folder … :slightly_frowning_face:

I moved the folder out of the fileset and I’m running sync again. Let’s see …

Same results… everything goes fine until it reaches a big (?) file, then it get stuck, and after a while it “unlocks” (see “file4”):

Note: before that it checked dozens of files (unchanged) and copied 37 files without problems, most small

Transferred:   75.641 MBytes (1.426 MBits/s)
Errors:                 0
Checks:              2060
Transferred:           34
Elapsed time:      7m4.2s
Transferring:
 *   ...[file1    ]: 100% /8.241M, 0/s, 0s
 *   ...[file2    ]: 91% /27.192M, 47.605M/s, 0s
 *   ...[file3    ]: 100% /8.122M, 4/s, 0s
 *   ...[file4    ]: 100% /8.122M, 994.978k/s, 0s

2018/04/27 13:10:52 INFO  : [file3    ]: Copied (new)
2018/04/27 13:10:57 DEBUG : 
Transferred:   100.641 MBytes (1.833 MBits/s)
Errors:                 0
Checks:              2060
Transferred:           35
Elapsed time:     7m19.2s
Transferring:
 *   ...[file1    ]: 100% /8.241M, 0/s, 0s
 *   ...[file5 ***], 41.767M/s, 0s
 *   ...[file2    ]: 91% /27.192M, 17.507M/s, 1s
 *   ...[file4    ]: 100% /8.122M, 365.896k/s, 0s

2018/04/27 13:10:59 INFO  : [file1    ]: Copied (new)
2018/04/27 13:11:12 DEBUG : 
Transferred:   125.641 MBytes (2.213 MBits/s)
Errors:                 0
Checks:              2060
Transferred:           36
Elapsed time:     7m34.2s
Transferring:
 *   ...[file5 ***], 15.359M/s, 1s
 *   ...[file2    ]: 91% /27.192M, 6.438M/s, 2s
 *   ...[file6 ***]: 91% /27.347M, 24.502M/s, 0s
 *   ...[file4    ]: 100% /8.122M, 134.556k/s, 0s

2018/04/27 13:11:27 DEBUG : 
Transferred:   125.641 MBytes (2.142 MBits/s)
Errors:                 0
Checks:              2060
Transferred:           36
Elapsed time:     7m49.2s
Transferring:
 *   ...[file5 ***], 5.648M/s, 3s
 *   ...[file2    ]: 91% /27.192M, 2.368M/s, 7s
 *   ...[file6 ***]: 91% /27.347M, 9.010M/s, 2s
 *   ...[file4    ]: 100% /8.122M, 49.481k/s, 0s

2018/04/27 13:11:42 DEBUG : 
Transferred:   125.641 MBytes (2.076 MBits/s)
Errors:                 0
Checks:              2060
Transferred:           36
Elapsed time:      8m4.2s
Transferring:
 *   ...[file5 ***], 2.077M/s, 8s
 *   ...[file2    ]: 91% /27.192M, 891.527k/s, 20s
 *   ...[file6 ***]: 91% /27.347M, 3.314M/s, 5s
 *   ...[file4    ]: 100% /8.122M, 18.196k/s, 0s

2018/04/27 13:11:57 DEBUG : 
Transferred:   125.641 MBytes (2.013 MBits/s)
Errors:                 0
Checks:              2060
Transferred:           36
Elapsed time:     8m19.2s
Transferring:
 *   ...[file5 ***], 782.184k/s, 23s
 *   ...[file2    ]: 91% /27.192M, 327.853k/s, 54s
 *   ...[file6 ***]: 91% /27.347M, 1.219M/s, 15s
 *   ...[file4    ]: 100% /8.122M, 6.691k/s, 0s

2018/04/27 13:12:12 DEBUG : 
Transferred:   125.641 MBytes (1.955 MBits/s)
Errors:                 0
Checks:              2060
Transferred:           36
Elapsed time:     8m34.2s
Transferring:
 *   ...[file5 ***], 287.643k/s, 1m3s
 *   ...[file2    ]: 91% /27.192M, 120.565k/s, 2m28s
 *   ...[file6 ***]: 91% /27.347M, 458.856k/s, 41s
 *   ...[file4    ]: 100% /8.122M, 2.460k/s, 0s

2018/04/27 13:12:27 DEBUG : 
Transferred:   125.641 MBytes (1.899 MBits/s)
Errors:                 0
Checks:              2060
Transferred:           36
Elapsed time:     8m49.2s
Transferring:
 *   ...[file5 ***], 105.778k/s, 2m51s
 *   ...[file2    ]: 91% /27.192M, 44.337k/s, 6m45s
 *   ...[file6 ***]: 91% /27.347M, 168.741k/s, 1m53s
 *   ...[file4    ]: 100% /8.122M, 926/s, 0s

2018/04/27 13:12:42 DEBUG : 
Transferred:   125.641 MBytes (1.847 MBits/s)
Errors:                 0
Checks:              2060
Transferred:           36
Elapsed time:      9m4.2s
Transferring:
 *   ...[file5 ***], 38.898k/s, 7m47s
 *   ...[file2    ]: 91% /27.192M, 16.305k/s, 18m21s
 *   ...[file6 ***]: 91% /27.347M, 62.053k/s, 5m9s
 *   ...[file4    ]: 100% /8.122M, 340/s, 0s

2018/04/27 13:12:57 DEBUG : 
Transferred:   125.641 MBytes (1.797 MBits/s)
Errors:                 0
Checks:              2060
Transferred:           36
Elapsed time:     9m19.2s
Transferring:
 *   ...[file5 ***], 14.305k/s, 21m11s
 *   ...[file2    ]: 91% /27.192M, 5.995k/s, 49m54s
 *   ...[file6 ***]: 91% /27.347M, 22.819k/s, 14m2s
 *   ...[file4    ]: 100% /8.122M, 125/s, 0s

2018/04/27 13:13:04 INFO  : [file4    ]: Copied (new)

The successfully copied files (lsl from wasabi) (note the sizes):

    12789 2016-06-12 13:24:22.000000000
  1071479 2015-09-04 15:45:52.735229600
  8641420 2015-09-04 15:45:30.311217300
  1070917 2015-09-11 10:15:02.400814900
  8641252 2015-09-10 14:46:54.660551300
   184624 2015-09-09 19:32:35.000000000
   184199 2015-09-04 15:44:08.000000000
    71015 2015-09-04 10:24:42.000000000
   651189 2014-07-16 12:10:09.000000000
   799065 2014-07-16 12:08:04.000000000
  1518382 2014-07-16 12:19:03.000000000
  1443284 2014-07-16 12:18:38.000000000
    21630 2014-07-16 12:31:30.000000000
  1027427 2014-03-12 15:32:02.000000000
  1027245 2014-03-14 11:54:45.000000000
  1029305 2014-03-28 21:14:31.000000000
   269034 2015-11-05 20:46:50.306462000
   452777 2015-11-05 20:46:50.320473200
   669696 2015-06-10 15:29:00.000000000
   409937 2016-05-19 19:50:00.000000000
  8516382 2014-11-13 15:14:34.000000000
   929041 2014-11-13 15:06:27.000000000
  8516288 2014-11-13 15:06:13.000000000
    75598 2015-09-04 14:12:47.910018200
    78163 2015-09-04 12:05:55.200638100
   136391 2015-09-11 14:58:56.710346200
    31552 2015-09-11 18:47:35.000000000
   244518 2015-09-15 22:57:33.000000000
    27393 2014-05-15 21:54:02.000000000
    29501 2014-05-15 21:54:40.000000000
    29464 2014-05-15 21:54:59.000000000
    27441 2014-05-15 21:55:25.000000000
    29443 2014-05-15 21:55:57.000000000
    27472 2014-05-15 21:51:13.000000000
  1172057 2014-11-03 20:58:35.000000000
  2430706 2014-11-03 20:59:52.000000000
  1602499 2014-11-03 21:01:18.000000000

But, it looks better with the --s3-disable-checksum option…

Some files look like a car stuck in the mud :upside_down_face:, but end up going. I’ll let it finish and see the result. And then sync again to see what happens.

I really don’t understand why it is getting stuck since it is being used the --s3-disable-checksum option, that is, it’s not calculating the hashes…:thinking:

It’s interesting that in moments when it stops, like this:

Transferred:   78.099 MBytes (1.062 MBits/s)
Errors:                 0
Checks:              3250
Transferred:           16
Elapsed time:     9m48.1s
Transferring:
 *   ...a.pdf: 100% /16.312M, 0/s, 0s
 *   ....pptx: 100% /19.293M, 0/s, 0s
 *   ....pptx: 100% /4.622M, 2.232k/s, 0s
 *   ....pptx: 100% /24.995M, 0/s, 0s

the disk is not spinning but the upload is at full usage (by the network monitor).

But what’s strange is that if the uploads are stopped (0/s) or practically stopped (2.232k/s) then what is it uploading?

Well, after several hours and a few more (few) files moved out of the set, the sync ended:

Transferred:   1.095 GBytes (1.020 MBits/s)
Errors:                 0
Checks:             11525
Transferred:          508
Elapsed time:  5h36m18.1s

It was very time consuming because every time it stopped, it would be locked for several minutes, and then send the files locked one by one.

Then I ran sync again and:

2018/04/27 22:20:00 DEBUG : S3 bucket [bkt_name] path [path]/: copied 2665 directories
2018/04/27 22:20:00 INFO  : Waiting for deletions to finish
2018/04/27 22:20:00 DEBUG : 
Transferred:      0 Bytes (0 Bits/s)
Errors:                 0
Checks:             11525
Transferred:            0
Elapsed time:     4m48.2s

Everything looks ok, but what does this copied mean?

copied 2665 directories

Then I ran sync again, and the same copied appeared:

2018/04/27 22:29:15 DEBUG : S3 bucket [bkt_name] path [path]/: copied 2665 directories
2018/04/27 22:29:15 INFO  : Waiting for deletions to finish
2018/04/27 22:29:15 DEBUG : 
Transferred:      0 Bytes (0 Bits/s)
Errors:                 0
Checks:             11525
Transferred:            0
Elapsed time:     5m14.6s

And finally I executed sync without the --s3-disable-checksum option, and everything ok:

2018/04/27 22:35:10 DEBUG : S3 bucket [bkt_name] path [path]/: copied 2665 directories
2018/04/27 22:35:10 INFO  : Waiting for deletions to finish
2018/04/27 22:35:10 DEBUG : 
Transferred:      0 Bytes (0 Bits/s)
Errors:                 0
Checks:             11525
Transferred:            0
Elapsed time:     5m23.9s

Now I’ll let dropbox client sync with my local files, and then I will run rclone sync from local to wasabi again. I hope dropbox doesn’t mess things up again …

Done. Everything is OK.

After all the question that remained is: why were the files stuck if the --s3-disable-checksum option was enabled?

Hmm, that is a bug! I’ve reverted that commit for the 1.41 release - we’ll fix it properly after.

It is inefficient rather than causing a problem so no need to worry.

Are you running this on a very low powered system? That might explain some of the oddities?

Ok! Could you point me the commit? I’m trying to “learn” go a little…

Nope. It’s an Intel i5 7th gen. with 8GB of RAM (one of my home notebooks). All resource monitors show very low usage.

And it seems that the lack of the --s3-disable-checksum option only makes difference in the first upload to a storage that came from dropbox, for the reasons you explained above. After running the first sync with the option enabled, I can remove it and everything is ok.

I’m doing a test with another folder and is happening exactly the same: if I don’t use the option, when it finds a large file, it grabs. It seems to be doing something, as if it were calculating the hash, but the disk and CPU show almost zero activity.

*   ...: 61% /701.884M, 108.812k/s, 5h34m52s
*   ...: 100% /6.954M, 20.985M/s, 0s
*   ...: 100% /28.745M, 0/s, 0s
*   ...: 60% /618.304M, 533.655k/s, 1h2m14s

This is the commit that got reverted: https://github.com/ncw/rclone/commit/0daced29db16ea0a76b4f9b0365716e7bbe1f252

I explained the problem with it in the PR

OK

Sounds like we might be getting somewhere…

So it seems things work OK with the -s3-disable-checksum flag, but without it the upload seems very slow.

One difference here is that rclone has to generate the MD5SUM of the source file before uploading it, so it has to read it twice and sum it. rclone md5sum should be doing the same thing, but that isn’t very slow is it?

Rclone then turns that md5sum into a header which it sends to Wasabi with the initial upload. I wonder if the header is confusing Wasabi somehow…

If it was calculating the hash then I’d expect the percentage to be 0%. Also in the verbose log you sent me, rclone appeared to have got passed the checksum calculation and had started to send the first two segments when it got stuck…

I’m going to run a little test to see if I can generate some random data which I can’t upload to wasabi…

Would it be possible for you to share one of the troublesome files with me?

I would say “better” but not “ok” :wink:. It still got stuck, but after some time the file is sent, as in the example of the “file4” some posts above.

It get stuck only at the first sync from local to wasabi. Once done I can execute sync with or without the flag and it executes quickly.

rclone md5sum computes the local hashes in 8 seconds:

72b29419deea15ab101d34e379290a74  Estilos de liderança.avi
1c7fe14c68bc3cc6c5b53380a5487c2e  Liderança e Equipes - Entrevista Steve Jobs - Trabalho MBA FGV Management.wmv.avi
ae32cb7152882b0123f018f9465c12d3  Marco Fabossi - Lição de Liderança e Trabalho em Equipe - Fomos Heróis.avi
1b53d609e9540d81515fd629ef5d2aa3  O Verdadeiro Lider.avi
9e5808de7b274de57b895e197359b29a  O melhor vídeo sobre liderança.avi
b8dfe49b83a6d2ca23dc3bd019827afd  Os estilos de liderança de Steve Jobs.avi
f7b1433d30464f9776367feb9d59e334  Vídeo sobre Liderança.avi

Elapsed time:        8.7s

I created a bucket on wasabi for you to work with the files (with full access), feel free to do any tests. I put some copies of the problematic files there. I sent you an email with the links and descriptions.

See if you can access them, because I’m still a little newbie with wasabi. Eg: Is it possible to share a bucket or folder on wasabi / s3? I didn’t find out how, just sharing files directly.

I could thank you. I managed to download the two files you linked in your email.

I had a go at uploading them with and without --s3-disable-checksum

So firstly without --s3-disable-checksum

Transferred:   1.289 GBytes (33.792 MBytes/s)
Errors:                 0
Checks:                 0
Transferred:            2
Elapsed time:         39s

Then with

Transferred:   1.289 GBytes (43.358 MBytes/s)
Errors:                 0
Checks:                 0
Transferred:            2
Elapsed time:       30.4s

Then a simulation of what happened with the dropbox -> wasabi transfer, then a sync from local -> wasabi

Transferred:   1.289 GBytes (34.291 MBytes/s)
Errors:                 0
Checks:                 2
Transferred:            2
Elapsed time:       38.4s

So the difference in time is ~8s which is entirely the md5sum time according to my tests…

Testing the MD5sum times made me realist that rclone is calculating them in parallel, If you have lots of big files on an non-SSD disk then that can be hard work for your disk with lots of seeking.

Can you try just using --transfers 1 --checkers 1 and see how the sync goes with that?

Running with --transfers 1 --checkers 1 seems more “consistent,” with better use of upload connection, but still with delays:

With --transfers 1 --checkers 1 and WITHOUT --s3-disable-checksum:

2018/04/29 09:57:12 DEBUG : 
Transferred:    25 MBytes (454.450 kBits/s)
Errors:                 0
Checks:              1894
Transferred:            0
Elapsed time:     7m30.6s
Transferring:
 *   ...ndre_Experimetnacao-eletroeletronica.mpg: 84% /29.701M, 125.262M/s, 0s
 
2018/04/29 09:57:27 DEBUG : 
Transferred:    25 MBytes (439.811 kBits/s)
Errors:                 0
Checks:              1894
Transferred:            0
Elapsed time:     7m45.6s
Transferring:
 *   ...ndre_Experimetnacao-eletroeletronica.mpg: 84% /29.701M, 46.064M/s, 0s

2018/04/29 09:57:42 DEBUG : 
Transferred:    25 MBytes (426.086 kBits/s)
Errors:                 0
Checks:              1894
Transferred:            0
Elapsed time:      8m0.6s
Transferring:
 *   ...ndre_Experimetnacao-eletroeletronica.mpg: 84% /29.701M, 16.940M/s, 2s

2018/04/29 09:57:57 DEBUG : 
Transferred:    25 MBytes (413.191 kBits/s)
Errors:                 0
Checks:              1894
Transferred:            0
Elapsed time:     8m15.6s
Transferring:
 *   ...ndre_Experimetnacao-eletroeletronica.mpg: 84% /29.701M, 6.229M/s, 6s

2018/04/29 09:58:12 DEBUG : 
Transferred:    25 MBytes (401.050 kBits/s)
Errors:                 0
Checks:              1894
Transferred:            0
Elapsed time:     8m30.6s
Transferring:
 *   ...ndre_Experimetnacao-eletroeletronica.mpg: 84% /29.701M, 2.291M/s, 16s

2018/04/29 09:58:27 DEBUG : 
Transferred:   29.701 MBytes (462.876 kBits/s)
Errors:                 0
Checks:              1894
Transferred:            0
Elapsed time:     8m45.6s
Transferring:
 *   ...ndre_Experimetnacao-eletroeletronica.mpg: 100% /29.701M, 2.269M/s, 0s

With --transfers 1 --checkers 1 --s3-disable-checksum:

Transferred:    25 MBytes (544.645 kBits/s)
Errors:                 0
Checks:              2134
Transferred:            0
Elapsed time:       6m16s
Transferring:
 *   ...ndre_Experimetnacao-eletroeletronica.mpg: 84% /29.701M, 163.632M/s, 0s

2018/04/29 10:05:37 DEBUG : 
Transferred:    25 MBytes (523.750 kBits/s)
Errors:                 0
Checks:              2134
Transferred:            0
Elapsed time:       6m31s
Transferring:
 *   ...ndre_Experimetnacao-eletroeletronica.mpg: 84% /29.701M, 60.175M/s, 0s

2018/04/29 10:05:52 DEBUG : 
Transferred:    25 MBytes (504.401 kBits/s)
Errors:                 0
Checks:              2134
Transferred:            0
Elapsed time:       6m46s
Transferring:
 *   ...ndre_Experimetnacao-eletroeletronica.mpg: 84% /29.701M, 22.129M/s, 1s

2018/04/29 10:06:07 DEBUG : 
Transferred:    25 MBytes (486.431 kBits/s)
Errors:                 0
Checks:              2134
Transferred:            0
Elapsed time:        7m1s
Transferring:
 *   ...ndre_Experimetnacao-eletroeletronica.mpg: 84% /29.701M, 8.138M/s, 4s

2018/04/29 10:06:22 DEBUG : 
Transferred:    25 MBytes (469.696 kBits/s)
Errors:                 0
Checks:              2134
Transferred:            0
Elapsed time:       7m16s
Transferring:
 *   ...ndre_Experimetnacao-eletroeletronica.mpg: 84% /29.701M, 2.993M/s, 12s

2018/04/29 10:06:37 DEBUG : 
Transferred:   29.701 MBytes (539.464 kBits/s)
Errors:                 0
Checks:              2134
Transferred:            0
Elapsed time:       7m31s
Transferring:
 *   ...ndre_Experimetnacao-eletroeletronica.mpg: 100% /29.701M, 2.726M/s, 0s

From what I understand from the latest tests in the latest posts (mainly yours and my hash calculation times) it seems that the calculation of the local hash (when enabled) does not make much difference, it is fast. As the results (delay) are similar with and without the calculation of the hash, it really seems to me something with the upload (file or file + hash). So the calculation of the hash doesn’t seem to be the cause …

I decided to enable --dump responses and found something interesting, sometimes this error appears:

2018/04/29 10:40:59 DEBUG : Error: dial tcp 38.27.106.14:443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.`

And it’s not a internet connection problem:

  1     3 ms     3 ms     3 ms  192.168.25.1
  2    25 ms    43 ms    24 ms  179......
  3    23 ms    24 ms    23 ms  201......
  4    32 ms    33 ms    32 ms  152......
  5    38 ms    37 ms    36 ms  152......
  6    33 ms    37 ms    31 ms  te-0-0-0-6-0-4-grtriotw3.net.telefonicaglobalsolutions.com [216.184.112.28]
  7   144 ms   136 ms   138 ms  94.142.98.179
  8   141 ms   142 ms   140 ms  84.16.14.189
  9   138 ms   137 ms   136 ms  be3017.ccr21.mia03.atlas.cogentco.com [154.54.11.157]
 10   138 ms   135 ms   135 ms  be3400.ccr21.mia01.atlas.cogentco.com [154.54.47.17]
 11   159 ms   161 ms   158 ms  be3482.ccr41.atl01.atlas.cogentco.com [154.54.24.145]
 12   172 ms   172 ms   170 ms  be2112.ccr41.dca01.atlas.cogentco.com [154.54.7.157]
 13   160 ms   164 ms   175 ms  be3083.ccr41.iad02.atlas.cogentco.com [154.54.30.54]
 14   173 ms   170 ms   171 ms  te0-0-0-0.agr13.iad02.atlas.cogentco.com [154.54.43.61]
 15   159 ms   162 ms   164 ms  te0-0-1-0.nr13.b039343-0.iad02.atlas.cogentco.com [154.24.15.2]
 16   160 ms   160 ms   160 ms  38.142.220.242
 17   158 ms   160 ms   160 ms  38.27.106.14

Maybe a response problem on 443 port?

Is this “connectex” similar to the windows ConnectEx function?

An additional simple doubt: what exactly does this “100%” mean? 100% transferred? 100% ready to transfer? Or something else?

* ...[file]: 100% /10.025M, 79/s, 0s

And when this percentage exceeds 100%? I guess it’s some rounding, since I think it only happens with very small files ( < 3 Mb).

* ...[file]: 107% /665, 0/s, 0s

OK, another thing crossed off the list :slight_smile:

Hmm

So what it looks like is that the wasabi servers didn’t accept your connection so rclone had to retry.

Do you see that every time?

BTW you can see that log with just -vv you don’t need --dump responses as well.

That can be caused by

  • firewalling (either local or remote)
  • rate limiting at your ISP
  • problems at wasabi

It might be worth experimenting with the --timeout and --contimeout flags

I haven’t checked the source, but I expect that is the go runtime function which calls the ConnectEx function.

It means 100% read from the source. There may be buffering which means it isn’t 100% delivered to the destination yet.

That is probably a retry so rclone has had a problem sending a file and retried it.

No, just a few log entries, not many.

But the default time is 1m for --contimeout. If it does’n connect in a minute, there is some other problem. Working accepting more than 1m wait would be unproductive. Anyway these connection errors seem to resolve in 5 ~ 10s.

Well, I think we can end with these conclusions:

  1. When dealing with files that came from dropbox, attention to the hashes problem;
  2. The calculation of the local hash does not impact the synchronization time;
  3. The use of --transfers 1 can help if there are intermittent connection problems.

Do you think there is anything for me to fix in rclone?