I'm glad to hear it, and I hope it continues to work well for you. Just be aware that the ox in the slaughterhouse line, when another ox two or three places in front of it gets felled by the hammer, could also be thinking just about the same thing, ie "nothing has happened to me so far, so no reason to worry!"
The crypt format is broken up into 64k chunks and I have a version of rclone which will write zeroes for the chunks with errors, but otherwise carry on, so we could investigate how much of the file is corrupted if you want.
If it is something like a video file then a 64k chunk lost will cause a minor visual artifact. If it is something less resilient then a 64k chunk missing might make the whole file useless.
So they are sized from ~9.5MB all the way to ~10GB, with about half of them over 1GB.
Yeah, it would be nice to have a look at these files and try to understand what happened. For all of them so far I have local copies so I can cmp -b the F'ed version against the good one and have an idea as to the extension/location of the damage.
If it is something like a video file then a 64k chunk lost will cause a minor visual artifact. If it is something less resilient then a 64k chunk missing might make the whole file useless.
Here's a | sort | uniq -c list of their extensions:
1 avi
2 gz
1 JPG
3 mkv
5 MOV
1 mp4
1 pdf
6 slob
1 vmdk
So, 10 of them are video files which should be mostly viewable (as long as the error areas don't include any critical headers or markers within the files), and the others are less resilient formats which would be completely unusable if I didn't have local copies for them.
I will be sure to keep this topic posted!
And BTW, thanks again for all your great help, and for making rclone available in the first place.
I just ^C'ed stressapptest after a little over 67h running, and the result was:
^CLog: User exiting early (1409823927 seconds remaining)
Stats: Found 0 hardware incidents
Stats: Completed: 5766200320.00M in 241479.67s 23878.62MB/s, with 0 hardware incidents, 0 errors
Stats: Memory Copy: 5766200320.00M at 23878.65MB/s
Stats: File Copy: 0.00M at 0.00MB/s
Stats: Net Copy: 0.00M at 0.00MB/s
Stats: Data Check: 0.00M at 0.00MB/s
Stats: Invert Data: 0.00M at 0.00MB/s
Stats: Disk: 0.00M at 0.00MB/s
Status: PASS - please verify no corrected errors
So, I think it's reasonably safe to assume the current machine isn't experiencing any sort of memory corruption.
This has a new flag --crypt-pass-bad-blocks which will output blocks which couldn't be authenticated as 64k of 0s.
Its possible if bytes have been added/removed to the file that it will output an endless stream of errors, but hopefully it will just be a small number of blocks.
My 1st run, on the smallest of those files, doesn't look very promising: got the "crypt: ignoring: failed to authenticate decrypted block - bad password?" 49 times in a row -- which IIUC means 49 * 64K= ~3.1MB of the file is lost... and that on a ~9.5GB file doesn't look like the "small number of blocks" hypothesis is going to hold
Will do more tests when able (treading carefully because I'm still waiting for that month-long rclone md5sum to finish running on this same remote) and will post the results here.
my thoughts exactly. this doesn't bode well for Google Drive's data integrity...
As I have local copies of all the original files (at least so far, fingers crossed), I can do a comparison after all that's over with; my plan is, for each of those files:
Rename the corrupt PATH/FILE.EXT to PATH/FILE.EXT_-_CORRUPTED_YYYYMMDD
use rclone (...) copyto to copy my known-good local file to PATH/FILE.EXT
use rclone cryptdecode --reverse to find the encrypted name for both files;
download both files from the base (unencrypted) remote, and then compare them bit-by-bit to see exactly what the corruption is.
@ncw, what do you think? Would that work? The above depends on the same file being uploaded again to the same encrypted remote and generating the same encrypted content on the base (unencrypted) remote. Is that the case?
This isn't the case normally. Rclone will generate a random nonce (as it is known in cryptography) for each encryption, so each time a file is encrypted is is different. In fact it weakens the crypto if you re-use the nonce.
It is however possible to do exactly what you want, it would require a modified rclone though which you could say - read the nonce from this file before encrypting this other file with it. That could be a backend command quite easily though.
Thanks for the explanation. Of course , the right thing to do is to use a random nounce.
That could be a backend command quite easily though
Or perhaps some option like --crypt-use-nounce-from CRYPT_REMOTE:PATH/FILE.
But I think it's a very specific case and I think I already abused my privilege as an rclone user too much to ask that of any developer, much less from you who has helped me so greatly in all of this.
But if the itch to dig deeper into this is bad enough, and I have the leisure for it, I might try implementing it and submitting a PR. Not sure this will be the case, tho -- have my plate quite full at the moment, and my knowledge of Golang is still extremely superficial
Thanks for the heads-up. A few thoughts have crossed my mind after reading this:
The Big G moves one step further in the enshittification of its services; this will end up making Drive completely unusable for many of us.
Case in point, I now have WELL OVER that (more than 13M files). So this new limit completely effs it up for my use case.
Drive was one of the few services where Google was actually competitive (besides Search). And now Google is about to become non-competitive in Drive too, at least for us 'heavy users' who are more likely to pay for the service instead of just enjoying the free plan. I don't think Google will sell many more Workspace plans, and will probably end up actually losing more than just a few of their current customers.
With ChatGPTized Bing moving onto Google Search (the only really competitive Google product so far), I fear that Google is not long for this world... therefore I would be wary of remaining with Google even if my use case fit within this new limit.
as per 2) and 3), what I think we will end up seeing a wave of migration of users from Drive to the alternatives, so the ones that don't have a solid business plan in order to withstand it, will either have to follow Google and enshittificate their services accordingly, or close up shop, or raise their prices up significantly.
as a consequence of all the above, the idea of spending the necessary money to buy the big HDDs necessary to start 'self-hosting' my own cloud backups at a friend's house looks more and more compelling....