It's speculation on my part - but while most simple containers are just audio and video tracks being muxed together, some containers also contain lots of other stuff - like MKVs. They have all sorts inner metadata, chaperinfo, even DVD-like menus and such in some cases. And how that info is structured in the file I just don't know. I would assume they would put those things all at the front of the file - but if that wasn't the case then rclone might be forced to jump around a lot.
While I have no problem playing the vast majority of videos via VLC from Gdrive, I do on occasion see a handful of files here and there that just refuse to play well. Taking both a long time to start and consistently stuttering. They are usually not even high bitrate. That is of course using the exact same VLC player. So this just leads me to believe that the formatting (or more likely container type to be spesific) plays some sort of role in this.