Hi,
I need some advice to speed things up.
I upload very big files to Google Drive and I would like to calculate md5sum of the local file while uploading and not prior to it. Right now I do it in two steps:
rclone md5sum test.txt > test.txt.md5
rclone copy test.txt REMOTE:
And then I can compare local and remote checksums
But I would like to save time by doing it concurrently and reading just once, because the files are big.
I mostly use Linux in case there is a better shell alternative to it, which I could not find.
It is my understanding that this already happens in rclone, so it shouldn't be necessary to do the extra md5 calculation and comparison in your script.
I had not realized that debugging shows you what I need, but I oversimplified my example: Most of the transfers I do I don't use "copy" but the newer "rcat", and it's there when I need it the most. Maybe the checksum it's not implemented there?
Thanks again
Manuel F.
cat /LOG/test.txt | rclone rcat -vvv SAR:kk/test.txt
2022/03/18 16:38:56 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "rcat" "-vvv" "-P" "SAR:kk/test.txt"]
2022/03/18 16:38:56 DEBUG : Creating backend with remote "SAR:kk/"
2022/03/18 16:38:56 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2022/03/18 16:38:57 DEBUG : fs cache: renaming cache item "SAR:kk/" to be canonical "SAR:kk"
2022-03-18 16:38:58 DEBUG : test.txt: Sending chunk 0 length 100000000
2022-03-18 16:39:01 DEBUG : test.txt: Size and modification time the same (differ by -506.926µs, within tolerance 1ms)
Transferred: 95.367 MiB / 95.367 MiB, 100%, 23.838 MiB/s, ETA 0s
Transferred: 1 / 1, 100%
Elapsed time: 4.6s
2022/03/18 16:39:01 DEBUG : 6 go routines active
You can't checksum on the fly to a pipe which is what you are doing as you had only shared rclone copy as those commands checksum since you a file input.
If you had a source file, you'd use copy/sync/move/etc.
If you are using a pipe, there isn't a source file generally as you'd be piping into something from multiple inputs/files making a md5sum useless.
The use case of one file wouldn't matter as you'd just use rclone copy and not pipe anything.
Usually cat or rcat would be used when piping multiple things together as you can't MD5 that / won't match because there are multiple inputs being made into a single file which the single file gets the MD5SUM on the remote and not the pieces making it up.
felix@gemini:~/test$ touch one
felix@gemini:~/test$ touch two
felix@gemini:~/test$ touch three
felix@gemini:~/test$ cat * | rclone rcat GD:test.tar -vvv
2022/03/18 16:09:22 DEBUG : Setting --config "/opt/rclone/rclone.conf" from environment variable RCLONE_CONFIG="/opt/rclone/rclone.conf"
2022/03/18 16:09:22 DEBUG : rclone: Version "v1.58.0" starting with parameters ["rclone" "rcat" "GD:test.tar" "-vvv"]
2022/03/18 16:09:22 DEBUG : Creating backend with remote "GD:"
2022/03/18 16:09:22 DEBUG : Using config file from "/opt/rclone/rclone.conf"
2022/03/18 16:09:22 DEBUG : GD: Loaded invalid token from config file - ignoring
2022/03/18 16:09:22 DEBUG : Saving config "token" in section "GD" of the config file
2022/03/18 16:09:22 DEBUG : GD: Saved new token in config file
2022/03/18 16:09:22 DEBUG : Google drive root '': 'root_folder_id = 0AGoj85v3xeadUk9PVA' - save this in the config to speed up startup
2022/03/18 16:09:22 DEBUG : Google drive root '': File to upload is small (0 bytes), uploading instead of streaming
2022/03/18 16:09:23 DEBUG : test.tar: md5 = d41d8cd98f00b204e9800998ecf8427e OK
2022/03/18 16:09:23 INFO : test.tar: Copied (new)
2022/03/18 16:09:23 DEBUG : 9 go routines active
If the OP is using cat with one file, I dunno as that doesn't make sense to me from a use case perpsective.
pipes are normally used to combine / stream many to one from my experience.
Hi again,
As I said some of my files are a few TB, and in the future maybe even bigger so I'm testing "rcat/cat" as an alternative to "copy" failures or losses.
Thanks jojo... for the hint but I think that for small files/streams rcat reverts automatically to copy. Anyway I will test it further, so any advice like yours will be welcome.
I dont blame you: It's mostly because of GDrive limits: First max 400K files, and second 5TB max filesize (close to but less important).
My contents are NTFS images (VHDX with ACLs). Also yes, I fear of failed connections and "resumability" but I'm happy as it is now by using rcat/cat. Only missing checksums to assure what I'm doing.
I'm still not understanding why you are using cat/rcat for a single file as you can't resume that fashion and you lose checksuming since you are piping as pipes are meant for merging files in general.
If it's working for you, good luck as I wouldn't backup my data that way but your data, your choice as they say.
I wouldn't expect the feature to get much traction unless you'd like to code, submit a PR as it doesn't make much sense of a use case.
I always add --size with rcat, if you do that you never get the checksum in the debug. There must be a reason. Now I have to compare speeds in both cases.