Rclone sync hangs sometimes up

kadogo · January 27, 2021, 1:37am

What is the problem you are having with rclone?

Hello,

I use rclone to sync my HDD (that have multiple filesystems if it matters) to upload my files.

I use megacmd to expose Mega as webdav, so I can have the following process for Rclone (encryption -> webdav -> Mega).

But sometimes when I do a sync it's hanging. The transfer number is not moving and the checks number too. I tried to check the log of Rclone and megacmd, but it looks like it's just waiting.

# it just stay like that (I already tried to wait like 9h without any changes)
Transferred:             0 / 0 Bytes, -, 0 Bytes/s, ETA -
Checks:            249297 / 249297, 100%
Elapsed time:      4m49.7s

I use the next options with my rclone sync command as environments variable.

# ~/.bash_profile
export RCLONE_PROGRESS=1
export RCLONE_TIMEOUT=0
export RCLONE_TRANSFERS=1
export RCLONE_IMMUTABLE=1

I disable the timeout because sometimes I must upload very big files and if I let it enable rclone can timeout before the end of the upload that is happening in megacmd.

If you need more info or test let me know.

Thanks for your help.
Cheers

What is your rclone version (output from `rclone version`)

$ rclone version
rclone v1.53.4
- os/arch: linux/amd64
- go version: go1.15.6

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Debian GNU/Linux 9.9 (stretch) (Openmediavault)

Which cloud storage system are you using? (eg Google Drive)

Webdav with megacmd for at end upload on Mega

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

rclone sync '/media/HDD/' 'crypt-with-salt-webdav:/HDD/' -vvv --log-file 'HDD.log'

The rclone config contents with secrets removed.

[mega-webdav]
type = webdav
url = http://127.0.0.1:4443/xxx/directory
vendor = other

[crypt-with-salt-webdav]
type = crypt
remote = mega-webdav:/
password = x
password2 = x

A log from the command with the `-vv` flag

The log itself just contain that it skip the files or few files are immutable. I not think it's very useful, but I saw that it was most of the time asked to do a kill -QUIT PID for these issues, so I will provide that instead.

https://gist.github.com/kadogo/10f013267120adf0f6b8756db8e43f58

ncw · January 27, 2021, 1:33pm

kill -QUIT is excellent - thanks

I studied the log. It looks like it has locked up while traversing the directory structure of the mega webdav.

In particular it has locked up while doing IO on the directory listing.

It might be that setting timeout to non-zero would help here.

If you run this repeatedly - can you get it to lock up in the same way?

rclone size 'crypt-with-salt-webdav:/HDD/'

I don't know if you've tried the mega backend?

kadogo · January 27, 2021, 2:03pm

Hello,

Welcome for the -QUIT I had read it was asked so I thought it would be easier to add it directly ^^

I can comment the timeout if needed but I think that if it's not set to 0 I will just have a timeout error.

I have start the size command you want that I try it with the default timeout?
I see multiple lines like the next one but now there are no new lines in the log so I guess it's locked.

2021/01/27 14:35:22 DEBUG : pacer: Rate limited, increasing sleep to 20ms
2021/01/27 14:35:22 DEBUG : pacer: Reducing sleep to 15ms
2021/01/27 14:35:22 DEBUG : pacer: Reducing sleep to 11.25ms
2021/01/27 14:35:22 DEBUG : pacer: Reducing sleep to 10ms
2021/01/27 14:35:26 DEBUG : pacer: low level retry 1/10 (error Propfind "http://127.0.0.1:4443/xxx/directory/encrypted-path/": http: server closed idle connection)
2021/01/27 14:35:26 DEBUG : pacer: Rate limited, increasing sleep to 20ms
2021/01/27 14:35:26 DEBUG : pacer: Reducing sleep to 15ms
2021/01/27 14:35:26 DEBUG : pacer: Reducing sleep to 11.25ms
2021/01/27 14:35:26 DEBUG : pacer: Reducing sleep to 10ms

I started with the "normal" mega backend in rclone but I started to have connection issues and I saw in github it was better to use megacmd for avoiding the issue. I can give a try with the mega backend if it can help to find the issue.

ncw · January 27, 2021, 4:36pm

OK

Yes if it didn't finish then it has locked up.

This is suspicious and makes me think megacmd is having trouble with concurrent connections

Can you try the rclone size command with --checkers 1 - maybe it doesn't like the concurrency rclone uses.

If that works, then try your copy with --transfers 1 --checkers 1 --check-first which will minimise the concurrency rclone uses.

Concurrency is hard and C++ (the language megacmd is written in) doesn't make it easy!

I think the issue is probably in the megacmd or in the way rclone is talking to it, so using the mega backend probably won't help to find the problem.

kadogo · January 27, 2021, 4:49pm

I have disabled my timeout and run the command with --checkers 1.

It look like it's not moving but it didn't did much I think but I see the next part for a file and it looked like it just did 1 file too. I let it run for now, tell me if you want that I do a kill -QUIT

Bad size for decrypt: file is too short to be encrypted

Edit:

About the transfers I already do only 1 because megacmd upload only 1 by 1 so I do that to not go to fast with rclone. It make the transfer slower but it's not a big deal.

ncw · January 27, 2021, 5:57pm

Hmm, sounds like it has locked up again.

Try adding --tpslimit 1 which will slow everything right down but we'll see if megacmd can take it.

If that works try --tpslimit 10

kadogo · January 27, 2021, 10:13pm

It's similar I have 2 "fs cache" and 2 times "Bad size for decrypt: file is too short to be encrypted" for the same file.

I added the quit / start mega-cmd with mega-log as debug and I not see anything in the logs. I think that if I launch the real sync command I see more info.

kadogo · January 28, 2021, 10:27am

I didn't find the edit button back but I had run the command yesterday again and let it go.

$ rclone size --tpslimit 1 --checkers 1 'crypt-with-salt-webdav:/HDD/' -vvv
2021/01/28 01:42:26 DEBUG : rclone: Version "v1.53.4" starting with parameters ["rclone" "size" "--tpslimit" "1" "--checkers" "1" "crypt-with-salt-webdav:/HDD/" "-vvv"]
2021/01/28 01:42:26 DEBUG : Using config file from "/home/user/.config/rclone/rclone.conf"     
2021/01/28 01:42:26 INFO  : Starting HTTP transaction limiter: max 1 transactions/s with burst 1                                   
2021/01/28 01:42:26 DEBUG : Creating backend with remote "crypt-with-salt-webdav:/HDD/"                             
2021/01/28 01:42:27 DEBUG : Creating backend with remote "mega-webdav:/encrypted-path"
2021/01/28 01:42:27 DEBUG : fs cache: renaming cache item "mega-webdav:/encrypted-path" to be canonical "mega-webdav:encrypted-path"                                           
2021/01/28 01:42:27 DEBUG : fs cache: switching user supplied name "mega-webdav:/encrypted-path" for canonical name "mega-webdav:encrypted-path"                                    
2021-01-28 01:42:28 DEBUG : BOOTSECT.BAK: Bad size for decrypt: file is too short to be encrypted                                 
2021-01-28 01:42:28 DEBUG : BOOTSECT.BAK: Bad size for decrypt: file is too short to be encrypted                             
2021-01-28 02:00:16 DEBUG : pacer: low level retry 1/10 (error read tcp 127.0.0.1:52830->127.0.0.1:4443: i/o timeout)                 
2021-01-28 02:00:16 DEBUG : pacer: Rate limited, increasing sleep to 20ms                                                      
2021-01-28 02:05:17 DEBUG : pacer: low level retry 2/10 (error read tcp 127.0.0.1:52840->127.0.0.1:4443: i/o timeout)          
2021-01-28 02:05:17 DEBUG : pacer: Rate limited, increasing sleep to 40ms                                                           
2021-01-28 02:10:18 DEBUG : pacer: low level retry 3/10 (error read tcp 127.0.0.1:52852->127.0.0.1:4443: i/o timeout)           
2021-01-28 02:10:18 DEBUG : pacer: Rate limited, increasing sleep to 80ms                                                                                            
2021-01-28 02:15:18 DEBUG : pacer: low level retry 4/10 (error read tcp 127.0.0.1:52864->127.0.0.1:4443: i/o timeout)          
2021-01-28 02:15:18 DEBUG : pacer: Rate limited, increasing sleep to 160ms                              
2021-01-28 06:12:17 DEBUG : pacer: low level retry 1/10 (error read tcp 127.0.0.1:49372->127.0.0.1:4443: i/o timeout)
2021-01-28 06:12:17 DEBUG : pacer: Rate limited, increasing sleep to 20ms
2021-01-28 06:17:17 DEBUG : pacer: low level retry 2/10 (error read tcp 127.0.0.1:49384->127.0.0.1:4443: i/o timeout)
2021-01-28 06:17:17 DEBUG : pacer: Rate limited, increasing sleep to 40ms
2021-01-28 06:22:18 DEBUG : pacer: low level retry 3/10 (error read tcp 127.0.0.1:49396->127.0.0.1:4443: i/o timeout)
2021-01-28 06:22:18 DEBUG : pacer: Rate limited, increasing sleep to 80ms
2021-01-28 06:27:18 DEBUG : pacer: low level retry 4/10 (error read tcp 127.0.0.1:49408->127.0.0.1:4443: i/o timeout)
2021-01-28 06:27:18 DEBUG : pacer: Rate limited, increasing sleep to 160ms
2021-01-28 06:32:19 DEBUG : pacer: low level retry 5/10 (error read tcp 127.0.0.1:49420->127.0.0.1:4443: i/o timeout)
2021-01-28 06:32:19 DEBUG : pacer: Rate limited, increasing sleep to 320ms
2021-01-28 06:37:19 DEBUG : pacer: low level retry 6/10 (error read tcp 127.0.0.1:49432->127.0.0.1:4443: i/o timeout)
2021-01-28 06:37:19 DEBUG : pacer: Rate limited, increasing sleep to 640ms
2021-01-28 06:42:20 DEBUG : pacer: low level retry 7/10 (error read tcp 127.0.0.1:49444->127.0.0.1:4443: i/o timeout)
2021-01-28 06:42:20 DEBUG : pacer: Rate limited, increasing sleep to 1.28s
2021-01-28 06:47:20 DEBUG : pacer: low level retry 8/10 (error read tcp 127.0.0.1:49456->127.0.0.1:4443: i/o timeout)
2021-01-28 06:47:20 DEBUG : pacer: Rate limited, increasing sleep to 2s
2021-01-28 06:52:21 DEBUG : pacer: low level retry 9/10 (error read tcp 127.0.0.1:49468->127.0.0.1:4443: i/o timeout)
2021-01-28 06:57:21 DEBUG : pacer: low level retry 10/10 (error read tcp 127.0.0.1:49480->127.0.0.1:4443: i/o timeout)
2021-01-28 06:57:21 ERROR : path/in/clear: error listing: couldn't list files: read tcp 127.0.0.1:49480->127.0.0.1:4443: i/o timeout
2021-01-28 06:57:21 DEBUG : pacer: Reducing sleep to 1.5s
2021-01-28 06:57:23 DEBUG : pacer: Reducing sleep to 1.125s
2021-01-28 06:57:25 DEBUG : pacer: Reducing sleep to 843.75ms
2021-01-28 06:57:26 DEBUG : pacer: Reducing sleep to 632.8125ms
2021-01-28 06:57:27 DEBUG : pacer: Reducing sleep to 474.609375ms
2021-01-28 06:57:28 DEBUG : pacer: Reducing sleep to 355.957031ms
2021-01-28 06:57:29 DEBUG : pacer: Reducing sleep to 266.967773ms
2021-01-28 06:57:30 DEBUG : pacer: Reducing sleep to 200.225829ms
2021-01-28 06:57:31 DEBUG : pacer: Reducing sleep to 150.169371ms
2021-01-28 06:57:32 DEBUG : pacer: Reducing sleep to 112.627028ms
2021-01-28 06:57:33 DEBUG : pacer: Reducing sleep to 84.470271ms
2021-01-28 06:57:34 DEBUG : pacer: Reducing sleep to 63.352703ms
2021-01-28 06:57:35 DEBUG : pacer: Reducing sleep to 47.514527ms
2021-01-28 06:57:36 DEBUG : pacer: Reducing sleep to 35.635895ms
2021-01-28 06:57:37 DEBUG : pacer: Reducing sleep to 26.726921ms
2021-01-28 06:57:38 DEBUG : pacer: Reducing sleep to 20.04519ms
2021-01-28 06:57:39 DEBUG : pacer: Reducing sleep to 15.033892ms
2021-01-28 06:57:40 DEBUG : pacer: Reducing sleep to 11.275419ms
2021-01-28 06:57:41 DEBUG : pacer: Reducing sleep to 10ms
Transferred:             0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 3 (retrying may help)
Elapsed time:   6h24m16.6s
2021/01/28 08:06:43 DEBUG : 2 go routines active
2021/01/28 08:06:43 Failed to size with 3 errors: last error was: couldn't list files: read tcp 127.0.0.1:49480->127.0.0.1:4443: i/o timeout

ncw · January 28, 2021, 3:52pm

This looks like it is the megacmd not responding to requests, or possibly the upstream mega not responding to megacmd...

Did you try upgrading megacmd?

kadogo · January 28, 2021, 4:01pm

It looks like I have the latest version

apt update
# apt-cache policy megacmd
megacmd:
  Installé : 1.4.0-3.1
  Candidat : 1.4.0-3.1
 Table de version :
 *** 1.4.0-3.1 500
        500 https://mega.nz/linux/MEGAsync/Debian_9.0 ./ Packages
        100 /var/lib/dpkg/status

And no log in megacmd maybe I can try again and set the SDK to debug too

$ mega-log
MEGAcmd log level = INFO
SDK log level = ERROR

ncw · January 28, 2021, 4:12pm

There is only one commit to master since then. I think they must be doing development not on GitHub or in a branch I haven't found!

Might be worth a try

kadogo · January 28, 2021, 4:25pm

Just saw it was not on debug but info.

I change the 2 levels and ran the command again, I will post an update when it timeout or finish

Edit:

Not timeout or finish but I see something that can maybe help

2021/01/28 17:30:29 DEBUG : pacer: low level retry 3/10 (error read tcp 127.0.0.1:60634->127.0.0.1:4443: i/o timeout)
2021/01/28 17:30:29 DEBUG : pacer: Rate limited, increasing sleep to 80ms

[API:debug: 16:30:30] Not enough available space

I guess mega use maybe another "timezone" but it was the last line in my log ^^

Now that's bit weird because my HDD where is mega have 31G left.. It look like it will be a question for megacmd or maybe you already saw this message somewhere ^^

kadogo · January 29, 2021, 9:07am

I didn't check the command back yet but I checked my mega account and I still have free spaces so I don't really know the meaning of "Not enough available space"

ncw · January 29, 2021, 10:34am

kadogo:

Not timeout or finish but I see something that can maybe help

2021/01/28 17:30:29 DEBUG : pacer: low level retry 3/10 (error read tcp 127.0.0.1:60634->127.0.0.1:4443: i/o timeout)
2021/01/28 17:30:29 DEBUG : pacer: Rate limited, increasing sleep to 80ms

That is rclone getting errors, and retrying.

Hmm, I wonder if it is using the temporary directory? Could that have got full?

kadogo · January 29, 2021, 10:37am

That's possible I must say that it's on my nas that work with openmediafault on an usb stick so I use everything related to log/tmp/etc in my ram and I have something like 2Go for tmp.

I have sent a message to mega about the API error in case they can help.

Edit:

I tried to bind my tmp to another folder where I add space and when I saw the API error I took a look at df and nothing was really short. Maybe the root have only 200mb left but I not think it's related.

I'm still waiting on mega anwer, their first answer were the classical about not using third party so I explained again that it was from megacmd that give that error.

kadogo · January 29, 2021, 9:27pm

First answer from mega that maybe it was related to localstorage so in case I tested on my desktop that have like 100G free space and same problem so now waiting to see if they find something.

kadogo · January 30, 2021, 3:34am

For the follow up they adviced me to create an issue so it's there https://github.com/meganz/MEGAcmd/issues/494

ncw · January 30, 2021, 10:53am

It might be worth setting the environment variable TMPDIR which is the unix standard way of pointing to your temporary directory and see if that helps

kadogo · January 30, 2021, 12:57pm

I tried with export TMPDIR=path before launching mega-webdav and same issue I really think it's something in their API

kadogo · February 1, 2021, 11:13am

While waiting for megacmd to answer and I think they can be slow...

Is there no solution to maybe use the mega backend but having it work like a deamon to avoid the multiple connection issue?

Rclone sync hangs sometimes up

What is the problem you are having with rclone?

What is your rclone version (output from rclone version)

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Which cloud storage system are you using? (eg Google Drive)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

The rclone config contents with secrets removed.

A log from the command with the -vv flag

What is your rclone version (output from `rclone version`)

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

A log from the command with the `-vv` flag