Hello everyone,
This is a continuation of my “md5sum -c on a very large tree” tribulations which I initially reported here; I decided to open a new topic as it seems this is a new (and potentially much more serious) problem.
That last “md5sum -c” never finished, because I tried to update my remote copy with an “rclone sync” which then erroneously deleted/removed some files (see here and here if you are curious about the whole story).
After the sync issue was resolved and I got the updated copy of that tree on the remote, again I’ve started the " md5sum -c" on an rclone mount of it, with the following options:
rclone mount egd: ~/egd --low-level-retries=1000 --dir-cache-time 10m --max-read-ahead 256k
I’m having two different kinds of issues:
-
It is running very slowly: I did some projections and, at the current speed, it would take about 204 days(!) to finish. This seems really excessive, not only comparing with my previous experience (where it took about a week), but also bandwidth-wise (the machine where I’m running this has about 5-6MB/s (50-60Mbps) sustained bandwith to Google Drive, as reported by single
rclone copy
commands for example). -
I’m back to getting “read: connection reset by peer” errors being reported by
rclone mount
(v1.37) and back to getting errors in mymd5sum -c
verification, but not I/O errors, and rather silent (from the reading process point of view) data corruption, as follows:
Output from the above rclone mount
process:
2017/08/21 17:45:07 ERROR : REDACTED/REDACTED_20170816Z163008.md5: ReadFileHandle.Read error: low level retry 1/1000: read tcp 172.17.0.2:52728->216.58.210.1:443: read: connection reset by peer
2017/08/22 01:09:13 ERROR : REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED.tar.gz: ReadFileHandle.Read error: low level retry 1/1000: unexpected EOF
2017/08/22 02:44:43 ERROR : REDACTED/REDACTED_20170816Z163008.md5: ReadFileHandle.Read error: low level retry 1/1000: read tcp 172.17.0.2:52901->216.58.210.1:443: read: connection reset by peer
Interestingly enough, the first and last lines refer to the file containing the MD5 sums (which is being passed as an argument to md5sum -c
); the second line lists a completely unrelated file.
On the md5sum -c
side, I have the following:
./REDACTED1/REDACTED2/REDACTED3/REDACTED4/REDACTED6/REDACTED7/REDACTED8/REDACTED9: FAILED
./REDACTED1/REDACTED2/REDACTED3/REDACTED5/REDACTED10/REDACTED11/REDACTED12/REDACTED14/REDACTED15: FAILED
./REDACTED1/REDACTED2/REDACTED3/REDACTED5/REDACTED10/REDACTED11/REDACTED12/REDACTED14/REDACTED16: FAILED
./REDACTED1/REDACTED2/REDACTED3/REDACTED5/REDACTED10/REDACTED11/REDACTED12/REDACTED13/REDACTED17: FAILED
./REDACTED1/REDACTED2/REDACTED3/REDACTED5/REDACTED10/REDACTED11/REDACTED12/REDACTED13/REDACTED18: FAILED
./REDACTED1/REDACTED2/REDACTED3/REDACTED5/REDACTED10/REDACTED11/REDACTED12/REDACTED13/REDACTED19: FAILED
./REDACTED1/REDACTED2/REDACTED3/REDACTED5/REDACTED10/REDACTED11/REDACTED12/REDACTED13/REDACTED20: FAILED
Curiously enough, none of the above seven files reported as FAILED (ie, wrong MD5sums) is the second file reported with a “connection reset” by “rclone mount” above, so potentially they are very separate, different issues.
Anyway, it seems that now I’m (or rather, md5sum -c is) getting corrupted information out of the “rclone mount” reads… and with no indication of I/O errors from it (which I think would be a must, in case rclone mount finds an uncorrectable situation). In other words, if it was another process instead of md5sum, I would be getting totally silent data corruption…
I can run the rclone mount
with “-v -v”, but that would generate an impossible amount of output (potentially more than the available disk space I have on the machine running it, and would also certainly make the whole thing even slower), so I would like to delay that as much as possible.
Thanks in advance for any hints,
– Durval.
,