Transport connection broken

Hmm, if using that flag then we could potentially limit the read from that file to the initial size which would fix the body transfer mis-match.

If we disable checksums, the http connection errors disappear and it acts like other remotes.

rclone copy auth.log b2:xx/ -vv --b2-disable-checksum

I want to try to get a summary of this issue. I found by using --b2-disable-checksum I'm still seeing transport connection broken errors. In fact, my last run (which unfortunately included only 1x -v) ran around 90 minutes --the majority of that time appear to be retries on six files (see below).

I cannot exclude /var/log --it's a requirement in this use case. Also, this solution must involve as few moving parts as possible, so I can't use rsnapshot here. The absence of repository and database, the very simple configuration is what made rclone the winner in a very long evaluation.

So I'd like to get some closure on this. Is this the expected behavior of rclone? Is there anything else I can try? And just to add a few more possibilities, I found this to be similiar https://github.com/rclone/rclone/issues/1733 And here is the results of my last run:

rclone -v --b2-disable-checksum --local-no-check-updated --bwlimit 10M --exclude-from /usr/local/etc/rclone.excludes --fast-list --transfers 32 sync /var crypt-sys-var:

https://pastebin.com/gJ1HQfFF

I'm curious. You're looking to backup files being written to with rclone? Or are you looking to let rclone continue by skipping them? I.e. /var/log/messages for example.

Ideally I'd like to see the files backed up.

Ideally.. Or is that a requirement.

I'm asking because that will drive your solution. Rclone won't make consistent backups. It's not a system backup tool. If a file is being written to it would either need to retry to get a consistent file or skip and move on in it's current state.

You said you can't exclude /var/log so I'm trying to understand your requirement.

Rclone is currently skipping those files despite the errors your seeing so it's doing what it's supposed to. The only thing I don't see happening correctly are the ugly errors and the flag local-no-check-updated might not be working correctly.

You could reduce the retries and let it fall faster. You could exclude those files manually. The flag could be fixed to upload those files dirty/inconsistent (only read up to the initial length) which might be the best option if that's possible.

These would help you key it fail faster.
--low-level-retries
--retries

I'll just explain exactly what is causing the error

2020/05/19 15:40:40 ERROR : named/data/named.run: Failed to copy: Post https://pod-000-1096-11.backblaze.com/b2api/v1/b2_upload_file/04d2559e20292587782d0913/c002_v0001096_t0017: net/http: HTTP/1.x transport connection broken: http: ContentLength=1242493 with Body length 1246054

What is happening here is that the local backend opens the source file. At the time of opening it has a size of 1242493 bytes.

The b2 backend then has to upload this object. It reads the size from the object and tells b2 that it is about to upload an object of that size in the Content-Length header.

Meanwhile the underlying file has grown - having had more data appended.

The b2 uploader then reads to the end of the file which is now 1246054 bytes long.

The Go HTTP library spots this and says - "hey you said you were going to give me 1242493 bytes but you actually gave me 1246054 bytes - I'm killing this connection as something is obviously wrong".

This only happens when using the --local-no-check-updated flag. Without that flag the upload will fail part way through with a different error (or it may fail as above because the checking isn't perfect).

So to fix this we could do one of

  1. In the local backend if --local-no-check-updated is in effect then only allow as many bytes as were initially specified to be read
  2. In the b2 backend, only read the initial size worth of bytes, not to the end of the file

My preference is probably option 1.

I tried this... It works but gives the next error

2020/05/20 08:57:33 ERROR : updating: Failed to copy: Checksum did not match data received (400 bad_request)

This happens because we read the cheksum before we start the transfer and it has changed after.

I tried various ways of working around this but b2 is just too fussy about needing the correct size of data with the correct checksum (that is a good thing really :slight_smile:

So I guess what we need is for --local-no-check-updated to freeze the size of files once they have been accessed at all. I had a go at this here

https://beta.rclone.org/branch/v1.51.0-369-g095dffd8-fix-local-updated-beta/ (uploaded in 15-30 mins)

Can you give it a go? It worked for me :slight_smile:

2 Likes

This is looking fantastic so far. I've done two runs now with the update you've just provided and I'm no longer having the trouble reported earlier in this thread. Also, thanks for the excellent explanation of the issue.

Just for the record, this is how I did my most recent test:

# rclone purge crypt-sys-var:log
# ./rclone-linux-386 -vv --local-no-check-updated --bwlimit 10M \
   --exclude-from /usr/local/etc/rclone.excludes --fast-list --transfers 32 \
   sync /var crypt-sys-var: > /tmp/rclone.debug 2>&1

A minor issue which got in the way is the 64 bit binary was unhappy:


# /bin/rclone -V
rclone v1.51.0
- os/arch: linux/amd64
- go version: go1.13.7

# ./rclone-linux-386 -V
rclone v1.51.0-DEV
- os/arch: linux/386
- go version: go1.13.4

# ./rclone-linux-amd64 -V
./rclone-linux-amd64: error while loading shared libraries: libfuse.so.2: 
 cannot open shared object file: No such file or directory

# ls -l /usr/lib/libfuse.so.2
lrwxrwxrwx. 1 root root 16 May 20 07:03 /usr/lib/libfuse.so.2 -> libfuse.so.2.9.2*

# export LD_LIBRARY_PATH=/usr/lib
# ./rclone-linux-amd64 -V
./rclone-linux-amd64: error while loading shared libraries: libfuse.so.2: wrong 
 ELF class: ELFCLASS32

Also, this is most likely the result of me using the 32-bit version (as noted in my previous post). After ~30minutes of processing the v1.51.0-DEV 386 version crashes. Here's a snippet of the error printed at that time: https://pastebin.com/DDYA3K5H

Is there something I can do to get the 64 bit version working or is this a build issue?

Do you see a linux 64 bit build from the URL above? I was going to see but I don't see a linux build there at all.

Ah. I had to go up the ladder and over the roof to here: https://beta.rclone.org/branch/v1.51.0-369-g095dffd8-fix-local-updated-beta/testbuilds/

So specifically, that build works for me in testing:

https://beta.rclone.org/branch/v1.51.0-369-g095dffd8-fix-local-updated-beta/testbuilds/rclone-linux-amd64.gz

and

felix@gemini:~$ ./rclone-linux-amd64 -V
rclone v1.51.0-DEV
- os/arch: linux/amd64
- go version: go1.13.4

What do you see from this?

So my ldd on it:

felix@gemini:~$ ldd rclone-linux-amd64
	linux-vdso.so.1 (0x00007ffd499fa000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f410a1bb000)
	libfuse.so.2 => /lib/x86_64-linux-gnu/libfuse.so.2 (0x00007f410a17c000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f410a159000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4109f67000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f410a1d0000)

as I'm on Ubuntu.

felix@gemini:~$ ls -al /lib/x86_64-linux-gnu/libfuse.so.2
lrwxrwxrwx 1 root root 16 Mar  7 06:09 /lib/x86_64-linux-gnu/libfuse.so.2 -> libfuse.so.2.9.9
felix@gemini:~$

Oh thanks! I totally forgot about ldd This sheds some light on the issue. I won't be able to dig deeper for a couple more hours.

ldd rclone-linux-amd64
	linux-vdso.so.1 =>  (0x00007ffd4833a000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f7c1e218000)
	libfuse.so.2 => not found
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7c1dffc000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f7c1dc2f000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f7c1e41c000)

OK for Centos7 this is the one: yum install fuse-libs.x86_64

Since getting the 64-bit build working my most recent run has been going for 90 minutes without issue.

You should be using this binary: https://beta.rclone.org/branch/v1.51.0-369-g095dffd8-fix-local-updated-beta/testbuilds/rclone-linux-amd64.gz

The ones in testbuilds are indeed test builds!

Let me know how your tests go.

Hello! I found the branch relevant to this issue but unsure if it made it into a recent release. Can someone enlighten me as to how to confirm? I'm still a git rookie.

https://github.com/rclone/rclone/tree/fix-local-updated

And thanks again for the attention on the issue I raised. I've been using the test binary and haven't had any issues (https://beta.rclone.org/branch/v1.51.0-369-g095dffd8-fix-local-updated-beta/testbuilds/rclone-linux-amd64.gz)

It didn't!

I've merged this to master now which means it will be in the latest beta in 15-30 mins and released in v1.53

Thanks for pointing this out!