Recommended options for "rclone mount" on encrypted GDrive for read-only, fully sequential access?

Hello folks,

I’ve successfully finished copying my first 8TB or so of local data to Google Drive (on my unlimited academic account) and now I’m in the process of verifying it.

This data is divided in volumes (which locally are entire hard-disks, and remotely are top-level directories on GDrive), and for each volume I have generated a local .md5 file (via the traditional “find … -type f -print0 | xargs md5sum” method), and I’m using it to check the remote GDrive volume by accessing it locally over an “rclone mount” and then running “md5sum -c” on it, like this:

mkdir egd
rclone --low-level-retries=100 mount --no-modtime --no-seek --read-only egd: egd &
(cd egd/VOLUME; md5sum -c /tmp/VOLUME.md5)

This mostly works, but for about 0.01% of the files, I get “Input/Output” errors, like these:

md5sum: ./REDACTED/REDACTED.bni: Input/output error
./REDACTED/REDACTED.bni: FAILED open or read

If I simultaneously check (on another interactive shell) that same file that gave the error above, using “rclone cat”, it always succeeds:

rclone cat egd:VOLUME/REDACTED/REDACTED.bni | md5sum -
     fc2751e43fb21aa12156322643c0491a  -

And, the checksum is correct:

grep REDACTED/REDACTED.bni /tmp/VOLUME.md5
     fc2751e43fb21aa12156322643c0491a  ./REDACTED/REDACTED.bni

So, it seems that whatever’s happening is some sort of “transient” problem with the rclone mount…

Are any of you seeing similar issues? What are the recommended options for “rclone mount” for such a use case, in order to minimize those errors (I’m not specially concerned about speed here).

Thanks,

Durval.

Did you use crypt when copying to google drive?

Hi Stokes,

Did you use crypt when copying to google drive?

Yep. As stated in the topic’s title: “Recommended options for “rclone mount” on encrypted GDrive for read-only, fully sequential access?”

Cheers,

Durval.

@ncw would need to confirm but I don’t believe you can md5 a file on crypt.

You’d have to copy it to a local folder and run it there.

Hi Stokes,

What you can’t do (yet), with encrypted remotes, is to use rclone’s built-in md5 support. But that’s not what I’m doing: I’m using the straight non-rclone program md5sum on top of an rclone mount, so it’s totally transparent for both md5sum (which, as far as it’s concerned, is reading “normal” files) and rclone mount (which, as far as it’s concerned, is just providing the files’ contents to a “normal” program)

See above, I definitely don’t think this is the case.

To try and focus my question better: what I need to know is what rclone mount options I should be using in order to get the most stability possible for that kind of access (as stated, read-only, sequential access to large directory trees on encrypted remotes running on top of Google Drive).

Cheers,

Durval.

@duval

Just did a few checks… I often get the input/output error when doing an md5sum -c for files hosted on ACD or G-Drive

If I copy these files to the local drive first and then do an md5sum -c, they check out.

So I’m not sure if it’s crypt, rclone, the cloud providers or what, but here’s how I tested it.

find . -name *.md5 -execdir md5sum -c {} \;

It then went through my folders and calculated the md5. I’d often see the i/o errors.

@duval

What would help is doing an rclone mount with the --verbose --log-file <file> flags and see the logs when the error occurs

You can also try --debug-fuse and --dump-headers and may get more data that way too.

I’ve noticed that the files the generate the error still give me an error if I retry it.

Transient problems with remote storage systems are very common! I’d say you are doing very well with 0.01%. error rate.

Open should be retried low-level-retries times though.

Do you see low level retries in the log if you run with -v?

Hello Nick ,

Just finished re-checking one of my volumes and there were 28 of these errors among 220424 files, so that’s exactly 0.0127% of unreadable files.

That’s what’s bugging me: with a total of 100 low-level retries, I would have thought that all errors should have been corrected by the retry procedure…

Alas, that’s exactly the point of my original question. If “–low-level-retries=100” isn’t enough to stop all “high-level” transient errors of this nature (ie, temporary errors affecting a program like “md5sum -c” running on top of an “rclone mount”), which option(s) should I use? It would be really nice to have a “fire-and-forget” procedure to periodically check my volumes, and these would need to show only serious, “permanent” errors. @ncw, can you offer some guidance in this regard?

Even without “-v”, “rclone mount” is dumping some (very few) errors to stderr (or is it stdout?):

2017/01/25 00:39:32 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED.DAT: ReadFileHandle.Read error: low level retry 1/100: unexpected EOF

2017/01/25 00:39:32 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED.DAT: ReadFileHandle.Read error: low level retry 2/100: unexpected EOF

These doesn’t seem to be coming from an open(), tho… and they are followed immediately by an “md5sum: Input/output error” and then a “md5sum: FAILED open or read” message. Of the aforementioned 28 errors, 6 follow exactly this pattern, and always with exactly 2 “low level retry” and “unexpected EOF” messages as above.

The other 22 errors do not have these the “low level retry” and “unexpected EOF” messages from “rclone mount”, they only have the “md5sum: Input/output error” and “md5sum: FAILED open or read” messages.

That said, I have avoided running “rclone mount” with the “-v” option during these verifications because of the sheer volume of output it would generate (this volume, for example, has around 2TB and over 220K files).

But if you think it would be useful, I think I can run it again until the first error happens; beyond “-v”, would the “–debug-fuse” and “–dump-headers” options that @Stokkes suggested also be useful, or would they just clutter things up in that particular case?

@ncw, please give me some guidance on these options so I can proceed with a debug run.

Thanks,

Durval.

Yes that is my thought too.

However I may have forgotten to retry something which I think is more likely that 100 retries not clearing the problem up.

Let’s see if we can see those low level retries in the log.

Those are rclone retrying after the connection has dropped and re-opening the stream part-way through.

OK that is interesting. That must mean that the retry code isn’t working properly :frowning:

I’ll test some more…

Input/output error is what you’ll get when stuff goes wrong in a non-specific way…

Yes that would be very useful, thank you - running with the -v flag and see what is happening on the files which give errors.

--debug-fuse and --dump-headers we may need depending on what we find but I suggest you log without them to start with as they make the logs unmanageable.

Hello @ncw,

You are most welcome; I’m starting it right away. It will probably be a couple of days (last time I checked that volume it took the best part of three days!), but I will get back here when I have a logfile with those errors.

Cheers,

Durval.

Hello @ncw,

I just checked it after about 12 hours, and it has three of the “md5sum: […] Input/output” errors already; I’ve checked the log file and all of those three errors show only two low-level attempts; moreover, the first of those two errors are logged with “unexpected EOF” messages from rclone, while the third error has “read: connection reset by peer” messages.

Here’s a selected copy/paste from the lines around these errors:

1st Error:

2017/01/26 03:03:10 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.Read OK
2017/01/26 03:03:10 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.Read size 131072 offset 4128768
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.Read error: low level retry 1/100: unexpected EOF
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.seek from 4128768 to 4128768 (io.Seeker)
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.Read io.Seeker failed: unexpected EOF
(... a lot of lines just like the previous two, with varying numbers on the "from (...) to (...)" part ...)
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.Read io.Seeker failed: unexpected EOF
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.Read size 4096 offset 4128768
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.Read error: low level retry 2/100: unexpected EOF
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.seek from 4128768 to 4128768 (io.Seeker)
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.Read io.Seeker failed: unexpected EOF
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.Flush
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.Flush OK
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.Release closing
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/READACTED.JPG: ReadFileHandle.Release OK
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED: Dir.Lookup
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED: Dir.Lookup OK
2017/01/26 03:03:29 REDACTED/REDACTED/REDACTED/REDACTED/REDACTED: Dir.Attr

2nd error:

The second error is very similar to the one above, so I will refrain from copying/pasting it here.

3rd error:

2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read size 131072 offset 32360693760
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read OK
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read size 131072 offset 32360824832
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read error: low level retry 1/100: read tcp 172.17.0.2:58099->172.217.16.161:443: read: connection reset by peer
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.seek from 32360824832 to 32360824832 (io.Seeker)
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read io.Seeker failed: read tcp 172.17.0.2:58099->172.217.16.161:443: read: connection reset by peer
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read size 131072 offset 32360955904
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.seek from 32360824832 to 32360955904 (io.Seeker)
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read io.Seeker failed: read tcp 172.17.0.2:58099->172.217.16.161:443: read: connection reset by peer
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read size 131072 offset 32361086976
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.seek from 32360824832 to 32361086976 (io.Seeker)
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read io.Seeker failed: read tcp 172.17.0.2:58099->172.217.16.161:443: read: connection reset by peer
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read size 131072 offset 32361218048
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.seek from 32360824832 to 32361218048 (io.Seeker)
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read io.Seeker failed: read tcp 172.17.0.2:58099->172.217.16.161:443: read: connection reset by peer
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read size 4096 offset 32360824832
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read error: low level retry 2/100: read tcp 172.17.0.2:58099->172.217.16.161:443: read: connection reset by peer
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.seek from 32360824832 to 32360824832 (io.Seeker)
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Read io.Seeker failed: read tcp 172.17.0.2:58099->172.217.16.161:443: read: connection reset by peer
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Flush
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Flush OK
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Release closing
2017/01/26 06:57:36 HD_ARCHIVE_0009/2016050601: Dir.Lookup
2017/01/26 06:57:36 HD_ARCHIVE_0009: Re-reading directory (43m30.129110743s old)
2017/01/26 06:57:36 REDACTED/REDACTED/REDACTED.aes256: ReadFileHandle.Release OK
2017/01/26 06:57:36 Google drive root '': Reading "REDACTED/"
2017/01/26 06:57:37 gd: Saved new token in config file

So the above seems to indicate why the “–low-level-retries=100” isn’t being enough to fix these issues: in both cases, it seems that rclone quits retrying the read after two attempts.

@ncw, I left the “md5sum -c” running, but the file is already over 1.3GB… if you don’t need anything more from this run, please let me know so I can interrupt it and avoid shoving more than the necessary amount of entropy on the Universe :wink:

Cheers,

Durval.

Hello everyone,

Coming back after a large interval (lots of stuff happening in RL leaving me with no free time), I returned to my attempt to copy and verify all my local stuff to my encrypted GDrive account.

As reported here, I’m doing the same check above (ie, complete “md5sum -c” on an “rclone mount” of my largest encrypted GDrive directory, with ~8M files and ~3.8TB) and so far (about 10% done) I have got no errors. Even better, the “rclone mount” process has reported a few “Unexpected EOF” errors, but these errors are not causing any errors in md5sum.

This is very encouraging, and a strong indication that @ncw has managed to fix the original issue which the retry code had on these circunstances; for the record, I’m running v1.36 now.

I will come back and report further when that “md5sum -c” finishes running (should be a couple of days more).

Cheers,
Durval.