Sync and modtime issue with hubic


#1

Hey there,
Due to the official hubic client being near to dead, I decided recently to switch recently to rclone that seems to be a great piece of software, but I encounter issues. I spend hours and days testing the whole thing.

Here is my configuration :

  • remote: hubic (about 150 Gbytes)
  • rclone version: rclone v1.43.1
  • OS: Ubuntu 18.04 LTS
  • command:
  • /usr/bin/rclone sync -vvv $SRC $DEST -L --min-age 15m --exclude-from $EXCLUDEFILE --checkers 128 --retries 1 --delete-during -n |& tee $LOGFILE

I first tried without the “-n” (dry-run) flag but I got some inconsistencies so I switched to dry run to perform extended tests.
If I run the same command several times, some large files (> 1Gbyte) are synced over and over.
In the log there is:

  • 2018/10/04 11:06:53 DEBUG : path_to_the_file/DSC_3197.MOV: Modification times differ by 1h25m31.7153182s: 2017-01-09 05:09:17.2846818 +1300 NZDT, 2017-01-08 17:34:49 +0000 UTC
  • […]
  • 2018/10/04 11:06:54 DEBUG : path_to_the_file/DSC_3197.MOV: Returning empty Md5sum for swift large object
  • 2018/10/04 11:06:54 NOTICE: path_to_the_file/DSC_3197.MOV: Not copying as --dry-run

I then checked the modification time on the remote and locally with the following commands:

  • rclone lsl hubic:path_to_the_file_directory
  • rclone lsl /home/path_to_the_file_directory
    Both outputs regarding the file are exactly the same:
  • 124974528 2017-01-09 05:09:17.284681800 DSC_3197.MOV
  • 124974528 2017-01-09 05:09:17.284681800 DSC_3197.MOV
    I do not understand why the modtime is detected as different in the first command above and not with the lsd command.
    The consequences of modtime being wrongly read for large files is that they are assumed as “dirty” since the md5sum is read to null. The file is synced over and over and at the end of the day it uses a lot of data.

What is more, I have lots of files (< 1 Gbytes) where the modtime is detected as not up to date. What is wrong. These files are not synced again but the modtime is updated.
I have about 34000 files and around 200 are detected as not up to date at each run, and they are not the same each run.
I suspect an issue on reading the modtime, as if randomly, not the modtime but the upload time was considered instead.
I don’t know if it is a bug of rclone or hubic.

Anyhow, I found a command that work well and is superfast. It uses –-update and --use-server-modtime. in dry run mode it takes only a couple of seconds to perform. I copy past it for anyone that would be interested :

/usr/bin/rclone sync -vvv $SRC $DEST -L --min-age 15m --exclude-from $EXCLUDEFILE --checkers 128 --delete-during --update --use-server-modtime |& tee $LOGFILE


#2

Hubic is bad, they made me loss all files of one old account.


#3

Are the troublesome files bigger than 5 GB? I suspect this is something to do with chunked files which are by default chunked at 5GB.

How were these file uploaded? With rclone or a different tool? Rclone uses a special bit of metadata to store the modified time.

server mod time is good for use with hubic as it saves an http transaction per file.

You can also use --checksum or --size-only. I sync lots of images regularly to a swift backend and I use --size-only for that.

You can also use --fast-list which will may speed things up at the cost of extra memory.


#4

Do you know good alternatives at decent price? I would need 1 TByte


#5

Thank you ncw for your answer. Here my replies to your questions:

Are the troublesome files bigger than 5 GB? I suspect this is something to do with chunked files which are by default chunked at 5GB.

No they aren’t. They are mainly between 1 and 2 GB.
Modtime reading errors also occur on small files. But as md5 sum is checked if modtime doesn’t fit, files are not synced but their modtime is updated.

How were these file uploaded? With rclone or a different tool? Rclone uses a special bit of metadata to store the modified time.

Files were mainly uploaded with the hubic client but some were uploaded with previous rclone attempts.

server mod time is good for use with hubic as it saves an http transaction per file.

You can also use --checksum or --size-only. I sync lots of images regularly to a swift backend and I use --size-only for that.

You can also use --fast-list which will may speed things up at the cost of extra memory.

Checksum works fine but takes a while: around 5 hours (dry run with 128 checkers) compared to 50 minutes for the default and 20 seconds for -u --use-server-modtime
–Size-only takes around 25 minutes but is disqualified since I have encrypted containers which size keeps constant even if files inside are changed.
I tried --fast-list but it was slower (I stopped it after 2 hours).


#6

As mentioned earlier hubic is bad. Try to download a large (at least as number of files, if you have just a few encrypted containers the situation might be different) folder you’ve synced previously (and compare with the original). Many files were missing in my case.


#7

Files where were uploaded with the hubic client won’t have their mod-time. rclone should just read the checksums - see they are identical and set the modtime.

However if they are chunked files, they won’t have a checksum, so rclone will upload them again in this case.

However once uploaded they will have the correct modtime and shouldn’t need to be uploaded again.

The above explains your symptoms quite well I think - the chunked file of < 5 GB with a missing hash.

-u --use-server-modtime isn’t perfect so I’d recommend running a full sync every now and again. It is the --min-age 15m which causes the biggest speed up I think.


#8

Jottacloud and pcloud


#9

Thanks Alfred. pCloud seems to be the perfect client.