Rclone sync: Stuck?

I am also seeing sync getting stuck using rclone v1.34-75-gcbfec0dβ copying up to acd. I get 20MBs/sec for a while then they start getting stuck or going very slow after 10 minutes

rclone -v sync --transfers 10 --checkers 20 amazon:/Plex/TVShows amazon_enc:/TVShows --stats 15s --timeout 1m

Transferred: 6.530 GBytes (8.555 MBytes/s)
Errors: 0
Checks: 446
Transferred: 30
Elapsed time: 13m1.7s
Transferring:

  •             Code Black/s01e01 - Pilot.mkv: 49% done, 276.967 kBytes/s, ETA: 43m55s
    
  •     Code Black/s01e02 - We Plug Holes.mkv: 55% done, 264.314 kBytes/s, ETA: 36m18s
    
  • Fear the Walking Dead/s01e03 - The Dog.mkv: 62% done, 474.868 kBytes/s, ETA: 15m20s
  • Code Black/s01e05 - Doctors with Borders.mkv: 16% done, 540.773 kBytes/s, ETA: 26m31s
  • Fear the Walking Dead/s02e08 - Grotesque.mkv: 21% done, 293.697 kBytes/s, ETA: 1h6m17s
  • … the Walking Dead/s02e09 - Los Muertos.mkv: 21% done, 262.024 kBytes/s, ETA: 1h2m33s
  • …/s01e06 - Take Her Out of the Ballgame.mkv: 31% done, 432.589 kBytes/s, ETA: 14m22s
  • …Bourdain- Parts Unknown/s05e02 - Miami.mkv: 17% done, 283.350 kBytes/s, ETA: 45m11s
  • …Daddy/s01e07 - May the Best Friend Win.mkv: 0% done, 0 Bytes/s, ETA: 0s
  • …ing Dead/s01e02 - So Close, Yet So Far.mkv: 66% done, 273.016 kBytes/s, ETA: 22m21s

vs

Transferred: 2.885 GBytes (24.269 MBytes/s)
Errors: 0
Checks: 411
Transferred: 19
Elapsed time: 2m1.7s
Transferring:

  •             Code Black/s01e01 - Pilot.mkv: 16% done, 3.449 MBytes/s, ETA: 5m40s
    
  •         Alone/s02e12 - Into the Abyss.mkv: 52% done, 3.444 MBytes/s, ETA: 2m29s
    
  •     Code Black/s01e02 - We Plug Holes.mkv: 23% done, 1.858 MBytes/s, ETA: 8m47s
    
  • Fear the Walking Dead/s01e03 - The Dog.mkv: 21% done, 1.189 MBytes/s, ETA: 12m35s
  • Baby Daddy/s01e08 - The Daddy Whisperer.mkv: 56% done, 2.496 MBytes/s, ETA: 1m44s
  • … the Walking Dead/s02e09 - Los Muertos.mkv: 0% done, 829.192 kBytes/s, ETA: 25m9s
  • …/s01e06 - Take Her Out of the Ballgame.mkv: 85% done, 3.047 MBytes/s, ETA: 25s
  • …Daddy/s01e07 - May the Best Friend Win.mkv: 0% done, 3.564 kBytes/s, ETA: 48h11m6s
  • …ing Dead/s01e02 - So Close, Yet So Far.mkv: 24% done, 2.455 MBytes/s, ETA: 5m32s
  • …rts Unknown/s01e05 - Morocco (Tangier).mkv: 99% done, 2.777 MBytes/s, ETA: 0s

Hmmm, that isn’t right. I just noticed you said maildir directories, so that is likely a huge number of very small files.

rclone has an idle IO timeout which should have kicked in (see --timeout parameter). Though your files seem to have been 100% uploaded…

Just a thought - did you compile rclone yourself with go 1.6 or earlier?

Thanks :slight_smile:

Your transfers all appear to be proceeding, just a bit slower. That might be network throttling by your ISP or Amazon - I don’t think it is something that rclone can do anything about though.

This is on a dedicated server with no throtletting so I doubt OVH is throttling me… Maybe Amazon is though… Here is the latest output. You will notice many jobs at 0b/s which is what happens when I leave it running for long periods. Maybe add an option to kill the job after x amount of time with no data???

2016/12/31 13:48:27
Transferred: 206.704 GBytes (5.456 MBytes/s)
Errors: 0
Checks: 0
Transferred: 659
Elapsed time: 10h46m32.9s
Transferring:

  •              Blindspot/s01e01 - Pilot.mp4: 52% done, 994.863 kBytes/s, ETA: 2m4s
    
  •         Alone/s01e06 - Rain of Terror.mkv: 36% done, 0 Bytes/s, ETA: 0s
    
  •       Code Black/s01e06 - In Extremis.mkv: 53% done, 1.075 MBytes/s, ETA: 7m32s
    
  •    Bull (2016)/s01e09 - Light My Fire.mkv:  0% done, 0 Bytes/s, ETA: 0s
    
  • Preacher/s01e05 - South Will Rise Again.mkv: 0% done, 0 Bytes/s, ETA: 0s
  • … the Walking Dead/s02e09 - Los Muertos.mkv: 65% done, 1.071 MBytes/s, ETA: 6m40s
  • …dspot/s02e03 - Hero Fears Imminent Rot.mkv: 80% done, 1.078 MBytes/s, ETA: 7m5s
  • …radise- After Paradise/s01e03 - Week 3.mp4: 1% done, 0 Bytes/s, ETA: 0s
  • …radise- After Paradise/s02e01 - Week 1.mkv: 0% done, 0 Bytes/s, ETA: 0s
  • American Horror Story/s03e12 - Go to Hell.mkv: 30% done, 0 Bytes/s, ETA: 0s

Yes. All totaled, there are 1565 directories and 528091 files. Most of them are in the 10s of kB range in size, but a few are much larger.

I downloaded it. I believe from the downloads page a day or so before I posted the initial issue. (http://downloads.rclone.org/rclone-v1.34-linux-386.zip)

Thanks.

rclone will close the connections after the time set in --timeout and try again if there is no traffic.

It might be that this is happening and the file is being retries - try with the -v flag - this will produce some logs which should shed some light.

I am also seeing this on the latest rclone from github. files are not tiny, but in the 20-40MB range.

edit: i have been watching it. two of the transfers stalled and just sit 0 Bytes/s. then a third stalled.

Transferred: 15.223 GBytes (6.031 MBytes/s)
Errors: 0
Checks: 522
Transferred: 347
Elapsed time: 43m4.7s
Transferring:

  • … #####: 30% done, 0 Bytes/s, ETA: 0s
  • … #####: 11% done, 0 Bytes/s, ETA: 0s
  • … #####: 7% done, 0 Bytes/s, ETA: 1776724h23m22.311929344s
  • … #####: 1% done, 1.746 MBytes/s, ETA: 26m17s

with $ rclone sync -v amazon:/ crypt:amazon1/ enabled I do see some errors, but not related to the specific stuck file:

2017/01/05 19:01:59 pacer: Rate limited, sleeping for 208.641749ms (1 consecutive low level retries)
2017/01/05 19:01:59 pacer: low level retry 1/1 (error HTTP code 429: “429 Too Many Requests”: response body: “{“logref”:“5509####0e6”,“message”:“Rate exceeded”,“code”:”"}")

and

Error detected after finished upload - waiting to see if object was uploaded correctly: HTTP code 500: “500 Internal Server Error”: response body: “{“message”:“Internal failure”}” (“500 Internal Server Error”)

then after a while, it stalls and they all just show as 0 Bytes/s, ETA: 0s with no errors.

Reviving the topic…

Syncing with Wasabi, thousands of files without problems:

  • Unchanged skipping
  • Updated modification time in destination
  • Copied (new) (few)

but it always stops in a folder with some videos (small ones, 10 to 100 Mb):

log file

I have already uploaded the files directly via web interface and rclone copy, but the sync command keeps trying to do something that I can’t identify.

Update: another job, another files, same problem, and with small files in separated folders.

Edit: seems similar to this: github.com/ncw/rclone/issues/1482

That is very strange…

Can you paste an rclone ls of the troublesome local directory 14.Treinamentos/14.1.Curso_Aberto_2014-11/. There is some strange encoding in those file names which is either cryptad making a mess of unicode or a possible clue.

I have a wasabi account so I can see if I can replicate the problem.

Ok, here they are:

rclone ls local

rclone ls wasabi

Yes, the files have accentuation in name, but a lot of others have too, and sync without problems.

I noticed that in sync log the characters appear oddly (mapped), but in ls they are correct.

sync log:

Transferring:
 *   ...iderança/Outros/Estilos de liderança.avi: 100% /22.872M, 2.168M/s, 0s

wasabi ls:

23982646 Material Didatico/Videos/1.Liderança/Outros/Estilos de liderança.avi

those rclone ls are identical (when I sorted them) so it looks like all the files are uploaded.

So why is rclone trying to upload them again?

I tried to replicate the problem with some files named the same as yours but I didn’t succeed.

I suspect a bug in rclone or wasabi is causing this!

Can you do the transfer with -vv --dump responses please? And send me the complete log. It will be quite big! If you want, zip it and email it to me nick@craig-wood.com

Thanks

Nick, I just sent you the log file.

I noticed that the difference between how the accent was displayed was - obviously - related to the encoding used:

Notepad++ - UTF-8:

Transferring:
 *   ...derança/Outros/Vídeo sobre Liderança.avi: 63% /39.642M, 1.806k/s, 18h26m50s

Baretail - ANSI:

 Transferring:
 *   ...derança/Outros/Vídeo sobre Liderança.avi: 63% /39.642M, 1.806k/s, 18h26m50s

And the active code page of windows is 850.

Thank you! Have received it and will analyse shortly :slight_smile:

Looking at the log I see that those files are found to have the wrong modification time locally vs remote.

Taking a specifc example 14.Treinamentos/14.1.Curso_Aberto_2014-11/Material Didatico/Videos/1.Liderança/Outros/Os estilos de liderança de Steve Jobs.avi

  • src = local = 2018-01-25 06:18:56.580206 -0200 -02
  • dst = wasabi = 2014-10-20 09:57:24 -0200 -02

So what appears to have happened is that the local modtime has changed since the upload which is very strange.

What time does your OS think that file has? and if you rclone lsl it?

Looking at the transfers, rclone makes 2 upload chunks and tries to send them but it never receives a response to either PUT request indicating it didn’t finish.

This makes me wonder if there is something wrong on your local disk? Can you try copying that troublesome folder to another disk and see whether your OS can read all the files? Or try rclone md5sum on that directory and see if it finishes?

This seems to be related to the other date and hash issues I had recently with dropbox…

This day (January 25, 2018) was the “M” day, the day I tried to move a lot of folders using dropbox client and web interface, and I had several problems. Then I started to use only Rclone for this type of operation, but it was too late.

This sample (video) file that you show was placed in this folder in 2014 and has never been edited.

Windows (10) shows it as created in October 2014 and “modified” in January 2018:
file

All files in this folder (and others in the same group) have the same modification date (the date of the disastrous move in dropbox):

lsl local:

 79023404 2018-01-25 06:18:56.580206000 Material Didatico/Videos/1.Liderança/Outros/Os estilos de liderança de Steve Jobs.avi

lsl wasabi:

79023404 2014-10-20 09:57:24.000000000 Material Didatico/Videos/1.Liderança/Outros/Os estilos de liderança de Steve Jobs.avi

I noticed that several files have the truncated seconds (as you commented, a “feature” of dropbox) and others have ns. I’m guessing that these with the ns defined are those that Rclone was able to run the sync from the local machine:

   110340 2014-11-04 16:31:34.440382000 
   146733 2014-11-04 16:22:55.841501300 
   147924 2014-11-04 16:30:31.917920100 
   112145 2014-12-04 20:50:49.000000000 
   207440 2014-11-04 16:31:05.070549300 
   167505 2014-11-04 16:27:51.224881500 
 21691298 2014-11-11 11:25:06.124621200 
 27083854 2014-11-11 11:19:44.899320100 
 26652678 2014-11-11 11:16:35.000000000 
   136980 2014-11-04 14:57:28.000000000 
  1017344 2014-11-04 14:55:59.000000000 
   126361 2014-11-06 15:12:20.677059400 

md5sum local:

1b53d609e9540d81515fd629ef5d2aa3  Material Didatico/Videos/1.Liderança/Outros/O Verdadeiro Lider.avi	
9e5808de7b274de57b895e197359b29a  Material Didatico/Videos/1.Liderança/Outros/O melhor vídeo sobre liderança.avi	
b8dfe49b83a6d2ca23dc3bd019827afd  Material Didatico/Videos/1.Liderança/Outros/Os estilos de liderança de Steve Jobs.avi
f7b1433d30464f9776367feb9d59e334  Material Didatico/Videos/1.Liderança/Outros/Vídeo sobre Liderança.avi

md5sum wasabi (!!!):

1b53d609e9540d81515fd629ef5d2aa3  Material Didatico/Videos/1.Liderança/Outros/O Verdadeiro Lider.avi
                                  Material Didatico/Videos/1.Liderança/Outros/O melhor vídeo sobre liderança.avi
                                  Material Didatico/Videos/1.Liderança/Outros/Os estilos de liderança de Steve Jobs.avi
                                  Material Didatico/Videos/1.Liderança/Outros/Vídeo sobre Liderança.avi

Remembering that all these files were copied from dropbox to wasabi with Rclone in a cloud VM, and as dropbox is synchronized with my machine (via dropbox client), I don’t understand how the dates are different.

I ran some quick tests on disk and it didnt show any errors (it’s a new machine, just from January 2018). But I’ll do the sync test from an external hard drive to wasabi and post here.

From what I understand Rclone thinks the local file is “newer” (although the date has been changed by dropbox), and is trying to send it to wasabi, which for me ok, I don’t care about the dates changed for January 2018.

From what you described the problem seems to occur soon after the files are “broken” into chunks to be sent.

I think the problem is somehow related to the lack of md5sum…

See:

Some of the problematic files in the sync log:

Transferring:
 *   ...derança/Outros/Vídeo sobre Liderança.avi: 63% /39.642M, 1.806k/s, 18h26m50s
 *   ...e Sistema/Sistema aberto organização.avi: 80% /30.969M, 1.917k/s, 7h5m2s
 *   ...s estilos de liderança de Steve Jobs.avi: 33% /75.363M, 8.104k/s, 14h8m26s
 *   ...utros/O melhor vídeo sobre liderança.avi: 100% /19.287M, 5.095k/s, 0s

lsl wasabi (note the “0000000”, were not synced from local):

41567708 2014-10-13 15:56:40.000000000 Material Didatico/Videos/1.Liderança/Outros/Vídeo sobre Liderança.avi
32473812 2014-10-21 14:13:42.000000000 Material Didatico/Videos/3.Conceito de Sistema/Sistema aberto organização.avi 
79023404 2014-10-20 09:57:24.000000000 Material Didatico/Videos/1.Liderança/Outros/Os estilos de liderança de Steve Jobs.avi
20223562 2014-10-13 15:12:13.000000000 Material Didatico/Videos/1.Liderança/Outros/O melhor vídeo sobre liderança.avi

md5sum local:

f7b1433d30464f9776367feb9d59e334  Material Didatico/Videos/1.Liderança/Outros/Vídeo sobre Liderança.avi
8f406ed23a672ffad09824925a18d329  Material Didatico/Videos/3.Conceito de Sistema/Sistema aberto organização.avi
9e5808de7b274de57b895e197359b29a  Material Didatico/Videos/1.Liderança/Outros/O melhor vídeo sobre liderança.avi
b8dfe49b83a6d2ca23dc3bd019827afd  Material Didatico/Videos/1.Liderança/Outros/Os estilos de liderança de Steve Jobs.avi

md5sum wasabi:


                                  Material Didatico/Videos/1.Liderança/Outros/Vídeo sobre Liderança.avi
                                  Material Didatico/Videos/3.Conceito de Sistema/Sistema aberto organização.avi
								  Material Didatico/Videos/1.Liderança/Outros/Os estilos de liderança de Steve Jobs.avi
								  Material Didatico/Videos/1.Liderança/Outros/O melhor vídeo sobre liderança.avi

And there are a lot of other files without md5sum / hash…

On the other hand, see the sync log of the first file of the previous post, which has the wasabi hash:

sync log (note the log line numbers, the “100%” and the “M/s”):

Line 1894: 2018/04/26 15:32:32 DEBUG : 14.Treinamentos/14.1.Curso_Aberto_2014-11/Material Didatico/Videos/1.Liderança/Outros/O Verdadeiro Lider.avi: Modification times differ by -28453h17m47.2514224s: 2018-01-25 06:18:56.2514224 -0200 -02, 2014-10-27 17:01:09 -0200 -02
Line 2577:  *   ....Liderança/Outros/O Verdadeiro Lider.avi: 100% /8.913M, 54.571M/s, 0s
Line 2595:  *   ....Liderança/Outros/O Verdadeiro Lider.avi: 100% /8.913M, 20.068M/s, 0s
Line 2613:  *   ....Liderança/Outros/O Verdadeiro Lider.avi: 100% /8.913M, 7.380M/s, 0s
Line 2631:  *   ....Liderança/Outros/O Verdadeiro Lider.avi: 100% /8.913M, 2.714M/s, 0s
Line 2649:  *   ....Liderança/Outros/O Verdadeiro Lider.avi: 100% /8.913M, 1021.975k/s, 0s
Line 2651: 2018/04/26 15:39:25 INFO  : 14.Treinamentos/14.1.Curso_Aberto_2014-11/Material Didatico/Videos/1.Liderança/Outros/O Verdadeiro Lider.avi: Copied (replaced existing)

These files won’t have an md5 checksum because they were uploaded from dropbox which doesn’t support md5 checksums…

Large files are chunked for upload to S3 however chunked files don’t have an MD5SUM supplied by the S3 protocol. Instead rclone uploads a bit of metadata with the checksum in it. However rclone needs to know that at the start of the transfer so it needs to be supplied by the source - in this case dropbox.

So I think you’ll find any large file won’t have a checksum in your wasabi storage, where I think large is > 5MB.

So if those files got re-uploaded from your local disk then they will gain an md5sum at that point.

I think this explains your findings, but I don’t think it explains the problem :frowning:

If you could try to find something which reproduces the problem with a smaller dataset that would be great!

If you could try copying the troublesome subset of files to somewhere else then rclone sync them to a new bucket. You can then update the modification time and try syncing again. If you use the latest beta then you can use the --s3-disable-checksum which will upload them without an md5 checksum as if you had transferred them from dropbox.

I tried to replicate the problem locally like this but I didn’t succeed…

$ ls -l
total 77172
-rw-rw-r-- 1 ncw ncw 79023404 Apr 27 15:16 file # file the same size as one of the troublesome ones

$ rclone -vv --s3-disable-checksum copy . wasabi:test2 2>&1 | tee /tmp/wasabi1.log
2018/04/27 15:17:39 DEBUG : rclone: Version "v1.40-DEV" starting with parameters ["rclone" "-vv" "--s3-disable-checksum" "copy" "." "wasabi:test2"]
2018/04/27 15:17:39 DEBUG : Using config file from "/home/ncw/.rclone.conf"
2018/04/27 15:17:39 INFO  : S3 bucket test2: Modify window is 1ns
2018/04/27 15:17:40 INFO  : S3 bucket test2: Waiting for checks to finish
2018/04/27 15:17:40 INFO  : S3 bucket test2: Waiting for transfers to finish
2018/04/27 15:17:44 INFO  : file: Copied (new)
2018/04/27 15:17:44 INFO  : 
Transferred:   75.363 MBytes (15.554 MBytes/s)
Errors:                 0
Checks:                 0
Transferred:            1
Elapsed time:        4.8s

2018/04/27 15:17:44 DEBUG : 8 go routines active
2018/04/27 15:17:44 DEBUG : rclone: Version "v1.40-DEV" finishing with parameters ["rclone" "-vv" "--s3-disable-checksum" "copy" "." "wasabi:test2"]
$ ls -l
total 77172
-rw-rw-r-- 1 ncw ncw 79023404 Apr 27 15:16 file

$ touch file 

$ rclone -vv --s3-disable-checksum copy . wasabi:test2 2>&1 | tee /tmp/wasabi2.log
2018/04/27 15:17:57 DEBUG : rclone: Version "v1.40-DEV" starting with parameters ["rclone" "-vv" "--s3-disable-checksum" "copy" "." "wasabi:test2"]
2018/04/27 15:17:57 DEBUG : Using config file from "/home/ncw/.rclone.conf"
2018/04/27 15:17:57 INFO  : S3 bucket test2: Modify window is 1ns
2018/04/27 15:17:58 INFO  : S3 bucket test2: Waiting for checks to finish
2018/04/27 15:17:58 DEBUG : file: Modification times differ by -1m14.04817095s: 2018-04-27 15:17:51.442203976 +0100 BST, 2018-04-27 15:16:37.394033026 +0100 BST
2018/04/27 15:17:58 INFO  : S3 bucket test2: Waiting for transfers to finish
2018/04/27 15:18:03 INFO  : file: Copied (replaced existing)
2018/04/27 15:18:03 INFO  : 
Transferred:   75.363 MBytes (12.087 MBytes/s)
Errors:                 0
Checks:                 1
Transferred:            1
Elapsed time:        6.2s

2018/04/27 15:18:03 DEBUG : 7 go routines active
2018/04/27 15:18:03 DEBUG : rclone: Version "v1.40-DEV" finishing with parameters ["rclone" "-vv" "--s3-disable-checksum" "copy" "." "wasabi:test2"]

Agree…

I checked the logs of other jobs, including those that are working, and I also identified the problem with some small files:

 *   ...2018.02.18_20.13.50.7z: 100% /317.400k, 319/s, 0s
 *   ...2018.02.19_11.06.18.7z: 100% /148.916k, 907/s, 0s
 *   ...2018.02.22_10.23.27.7z: 100% /9.478k, 1.219k/s, 0s
 *   ...2018.02.22_19.11.00.7z: 101% /3.618k, 28.922k/s, 0s

I would rather fix the things, generating hash on wasabi. I don’t want to propagate the dropbox error … I read the #2213 issue.

But I don’t see how to separate only the troublesome files. There are thousands of files, several tens of Gb, various folders, etc.

I don’t know if I understood … how to do this local -> wasabi?

Is there any parameter to disable sending using chunks?

I’m running with the latest beta (rclone v1.40-132-g0daced29), let’s see what happens.

Stuck in the same video folder … :slightly_frowning_face:

I moved the folder out of the fileset and I’m running sync again. Let’s see …