Rclone not transferring larger files (Jottacloud)

Nitrok · November 23, 2019, 8:46pm

I'm having a problem copying files to Jottacloud. After giving the command to start copying the files rclone just won't transfer those files - it just reads "transferring" but nothing happens.
I noticed this only occurs with larger files (~100 GB). Smaller files seem to transfer fine. You can see this from the log below that couple smaller files transferred fine. I also tried with --checksum and --ignore-checksum flags but those didn't seem to help.
I have tested copying on Windows and Linux and they both behave same way.

Additional info:

Command used
D:\Rclone>rclone copy Y:\Photos jotta:Photos -P

Log file
2019/11/23 22:10:47 DEBUG : rclone: Version "v1.50.2" starting with parameters ["rclone" "copy" "Y:\Photos" "jotta:Photos" "-P" "--log-file=D:\Rclone\log.txt" "-vv"]
2019/11/23 22:10:47 DEBUG : Using config file from "C:\Users\xx\.config\rclone\rclone.conf"
2019/11/23 22:10:48 INFO : jottacloud root 'Photos': Waiting for checks to finish
2019/11/23 22:10:48 INFO : jottacloud root 'Photos': Waiting for transfers to finish
2019/11/23 22:10:49 DEBUG : Käsitellyt-1_inc_b1_s2_v1.tib: MD5 = 7776ffccca51b991f05b0564aef37d61 OK
2019/11/23 22:10:49 INFO : Käsitellyt-1_inc_b1_s2_v1.tib: Copied (new)
2019/11/23 22:10:49 DEBUG : Käsitellyt-2_inc_b1_s2_v1.tib: MD5 = 79d410df68fc9a088d0ebbee9d391392 OK
2019/11/23 22:10:49 INFO : Käsitellyt-2_inc_b1_s2_v1.tib: Copied (new)
2019/11/23 22:10:49 DEBUG : Käsitellyt-3_inc_b1_s2_v1.tib: MD5 = 4819da6eadbf7e6b54465aa1f2d628ff OK
2019/11/23 22:10:49 INFO : Käsitellyt-3_inc_b1_s2_v1.tib: Copied (new)

Config
[jotta]
type = jottacloud
client_id = **
client_secret = **
token = {"access_token":"","token_type":"bearer","refresh_token":"","expiry":"2019-11-23T22:33:46.8735618+02:00"}
device = Jotta
mountpoint = Archive

thestigma · November 23, 2019, 9:58pm

According to Jotta there should be no max filesize problem like some backends have.
From the log it just looks to me like the file is just checked with hash and skipped because it already exists.

Are you sure you aren't trying to transfer a file that already exists and is identical? Skipping identical files would be default behaviour.

thestigma · November 24, 2019, 1:56am

One other thing to check is that you have enough space in %temp% on your system for a 100GB file.
Jotta requires local hash-calculation before upload, so large files have to get cached locally to do this. I would expect to see some kind of error in debug if you didn't have enough space - but it's something I'd double-check at least.

Nitrok · November 24, 2019, 8:05am

Yeah I'm sure that these files don't exist on Jottacloud. And there is enough free space on my drives so it should be able to cache those files.
How long do you think it takes to do this local hash-calculation? I've been waiting about 20 minutes with a ~100GB file but it won't start transferring.
Is there more detailed option for printing out the log file than -vv because it would be nice to see what rclone actually does at this stage.

Nitrok · November 24, 2019, 10:01am

Ok looks like it's just doing the hash-calculation. Transfer started after about an hour.
Is there any way to speed up this process or do I just have to wait it out? I have several files over 100GB so that'll take a lot of time if this is the case.

thestigma · November 24, 2019, 3:53pm

The hashing doesn't take that much effort - it just uses some CPU and is fairly trivial on any desktop-level CPU from the last 10 years. The main problem is that you inevitably have to read the entire file to generate the hash. This (the disk IO) is no doubt what takes a lot of time and I'm sure this is much slower than what your CPU could calculate. Since this gets cached too due to jotta requiring the hash to be delivered with the file before upload you are therefore doing:

A 100GB read from HDD (reading original for cache)
A 100GB write to HDD (writing the cache)
Another 100GB read from HDD (reading the cached file to calculate the hash)

And 300GB of file operations just takes a while - especially if it is a mechanical HDD, and doubly so if it is a standard-fare cheap mass-storage drive with unexceptional performance numbers. Usually it won't matter so much on smaller files because the HDD can just work on the next files while it uploads the previous ones - but when a single file is that large it can't start to work on uploading until it has finished processing. At least the good news is that if you have more than one such big file the hashing should already be complete on all the subsequent files. It's really the first one that you lose a lot of time on.

The whole problem can be circumvented if you have enough RAM to fit the file in memory, but obviously that isn't going to happen with 100GB.

I don't see a very easy fix for this but, here are a few ways the problem could be alleviated.

You could move your systems %temp% directory to an SSD drive with higher speed if that is possible.
You could set up the chunker remote to split the extreme size files into more managable chunks so that the file can start uploading much sooner. This absolutely will work if you feel this is a big problem for you, but it also feels like a bit of an overkill solution to me for what should be a relatively simple problem. It is probably the best solution in terms of practicality on this list however.
You could go on rclone's git-page and make a feature request, asking for this process to be optimized. I highly suspect this can be done in a smarter way if we invested a bit more effort into the code - either reducing the disk IO by 66% or at least 33% by calculating the hash on the fly on the first read. The downside to this is it's likely to take some time before the issue gets picked up as there is always plenty of other high-priority issues to work on.
Convince Jotta to implement server-side hashing. This is how most backends do this. Jotta is the odd one out and this quirk is really the cause of this whole workaround in the first place. This would be the BEST solution hands down as the current system is just very inefficient and also less data-secure. Getting a company to listen to your feedback and actually making a change like this is pretty likely to be frustrating and ultimately fruitless however...

TLDR: If this is a regular problem for you then I'd go with the chunker and split into 5-10GB chunks to cut that 20min waiting time into 1-2minutes. Let me know if you need help with that.

thestigma · November 24, 2019, 4:06pm

Oh yea and I almost forgot. To do debug logging, add these to your command:
--log-file=MyRcloneLog.txt
--log-level DEBUG

Debug is the best for technical debugging as it gives most info by far, but for more general daily use you'd want another level. You can see available levels here if you want to learn more:
https://rclone.org/docs/#log-level-level

Nitrok · November 25, 2019, 8:44am

Thank you for taking time to write all of this.
I think I'll just split the backup files into smaller ones for now so that the whole process goes a bit faster. Weird indeed that Jotta doesn't do server-side hashing - I think I have to reconsider some other cloud service..

thestigma · November 25, 2019, 3:34pm

Yea, it is odd. Note that I'm not an expert on Jotta - but the documentation seems fairly straight-forward when it explains that the hash needs to be provided from your side on upload.

system · February 23, 2020, 3:34pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.