Google only keeps MD5SUMs on the objects, so if we don’t want to download them then we are stuck with MD5SUMs.
Note that these collisions are not as useful as you might think. You can generate two documents A and B which have the same hash, but if I give you a doc C it is computationally infeasible for you to generate a doc D which hash the same hash. So that means that you can’t substitute random files in my google drive and keep the hashes the same.
Oh right I see, that I wasn’t aware of, I thought rclone must have been making the hashes itself via a stream. How difficult and time consuming would this be to implement, maybe also with the option of choosing the hashing algorithim?
On a different note, how easy/difficult is GO to learn, if an individual (me) has a loose background in programming, but for the web instead such as PHP. Do you have any recommendations for books or other content to learn it?
If you want to check integrity of the upload not using hashes you can use rclone check with the --download flag.
It would be possible to add a --download flag to rclone dedupe too - however I don’t think it would get much use.