Sync and modtime issue with hubic


#1

Hey there,
Due to the official hubic client being near to dead, I decided recently to switch recently to rclone that seems to be a great piece of software, but I encounter issues. I spend hours and days testing the whole thing.

Here is my configuration :

  • remote: hubic (about 150 Gbytes)
  • rclone version: rclone v1.43.1
  • OS: Ubuntu 18.04 LTS
  • command:
  • /usr/bin/rclone sync -vvv $SRC $DEST -L --min-age 15m --exclude-from $EXCLUDEFILE --checkers 128 --retries 1 --delete-during -n |& tee $LOGFILE

I first tried without the “-n” (dry-run) flag but I got some inconsistencies so I switched to dry run to perform extended tests.
If I run the same command several times, some large files (> 1Gbyte) are synced over and over.
In the log there is:

  • 2018/10/04 11:06:53 DEBUG : path_to_the_file/DSC_3197.MOV: Modification times differ by 1h25m31.7153182s: 2017-01-09 05:09:17.2846818 +1300 NZDT, 2017-01-08 17:34:49 +0000 UTC
  • […]
  • 2018/10/04 11:06:54 DEBUG : path_to_the_file/DSC_3197.MOV: Returning empty Md5sum for swift large object
  • 2018/10/04 11:06:54 NOTICE: path_to_the_file/DSC_3197.MOV: Not copying as --dry-run

I then checked the modification time on the remote and locally with the following commands:

  • rclone lsl hubic:path_to_the_file_directory
  • rclone lsl /home/path_to_the_file_directory
    Both outputs regarding the file are exactly the same:
  • 124974528 2017-01-09 05:09:17.284681800 DSC_3197.MOV
  • 124974528 2017-01-09 05:09:17.284681800 DSC_3197.MOV
    I do not understand why the modtime is detected as different in the first command above and not with the lsd command.
    The consequences of modtime being wrongly read for large files is that they are assumed as “dirty” since the md5sum is read to null. The file is synced over and over and at the end of the day it uses a lot of data.

What is more, I have lots of files (< 1 Gbytes) where the modtime is detected as not up to date. What is wrong. These files are not synced again but the modtime is updated.
I have about 34000 files and around 200 are detected as not up to date at each run, and they are not the same each run.
I suspect an issue on reading the modtime, as if randomly, not the modtime but the upload time was considered instead.
I don’t know if it is a bug of rclone or hubic.

Anyhow, I found a command that work well and is superfast. It uses –-update and --use-server-modtime. in dry run mode it takes only a couple of seconds to perform. I copy past it for anyone that would be interested :

/usr/bin/rclone sync -vvv $SRC $DEST -L --min-age 15m --exclude-from $EXCLUDEFILE --checkers 128 --delete-during --update --use-server-modtime |& tee $LOGFILE


#2

Hubic is bad, they made me loss all files of one old account.


#3

Are the troublesome files bigger than 5 GB? I suspect this is something to do with chunked files which are by default chunked at 5GB.

How were these file uploaded? With rclone or a different tool? Rclone uses a special bit of metadata to store the modified time.

server mod time is good for use with hubic as it saves an http transaction per file.

You can also use --checksum or --size-only. I sync lots of images regularly to a swift backend and I use --size-only for that.

You can also use --fast-list which will may speed things up at the cost of extra memory.


#4

Do you know good alternatives at decent price? I would need 1 TByte


#5

Thank you ncw for your answer. Here my replies to your questions:

Are the troublesome files bigger than 5 GB? I suspect this is something to do with chunked files which are by default chunked at 5GB.

No they aren’t. They are mainly between 1 and 2 GB.
Modtime reading errors also occur on small files. But as md5 sum is checked if modtime doesn’t fit, files are not synced but their modtime is updated.

How were these file uploaded? With rclone or a different tool? Rclone uses a special bit of metadata to store the modified time.

Files were mainly uploaded with the hubic client but some were uploaded with previous rclone attempts.

server mod time is good for use with hubic as it saves an http transaction per file.

You can also use --checksum or --size-only. I sync lots of images regularly to a swift backend and I use --size-only for that.

You can also use --fast-list which will may speed things up at the cost of extra memory.

Checksum works fine but takes a while: around 5 hours (dry run with 128 checkers) compared to 50 minutes for the default and 20 seconds for -u --use-server-modtime
–Size-only takes around 25 minutes but is disqualified since I have encrypted containers which size keeps constant even if files inside are changed.
I tried --fast-list but it was slower (I stopped it after 2 hours).


#6

As mentioned earlier hubic is bad. Try to download a large (at least as number of files, if you have just a few encrypted containers the situation might be different) folder you’ve synced previously (and compare with the original). Many files were missing in my case.


#7

Files where were uploaded with the hubic client won’t have their mod-time. rclone should just read the checksums - see they are identical and set the modtime.

However if they are chunked files, they won’t have a checksum, so rclone will upload them again in this case.

However once uploaded they will have the correct modtime and shouldn’t need to be uploaded again.

The above explains your symptoms quite well I think - the chunked file of < 5 GB with a missing hash.

-u --use-server-modtime isn’t perfect so I’d recommend running a full sync every now and again. It is the --min-age 15m which causes the biggest speed up I think.


#8

Jottacloud and pcloud


#9

Thanks Alfred. pCloud seems to be the perfect client.


#10

Hi,
Here is my experience after having run my rclone sync script daily with cron for over one month.
Before starting the script I distant copied my files to another directory so that all files on the remote shall now have their metadata fixed by rclone.
My script runs the sync with the --backup-dir flag in order to keep a copy of deleted or moved files. But I encounter many errors, mainly on large files (> 1Gb) but not only. What is really problematic is that some files are not synced at all. I have the feeling that if there is an error during the copy of the deleted/moved file, than the sync is cancelled. This is really annoying because some files are no updated.
Here is the main command I run in the script:

rclone sync $SRC $DEST --config $CONFIG -L --min-age $MINAGE --exclude-from $EXCLUDEFILE --backup-dir $BCKDAI -vvv --retries 1 --checkers 128 --transfers 128 --delete-during|& tee -a $LOGFILE

If I remove the --backup-dir flag there is no or few errors (far less than with the flag on).
What are the options used for the copy part of the --backup-dir option? Is it also size + modtime and md5sum in case of discrepancy?

Regards.


#11

Hmm, the swift backend doesn’t do low level retries (which I’ve made an issue about here https://github.com/ncw/rclone/issues/2740 ) given that and that you’ve got --retries 1 failing files will not be retried. You’ll need to increase --retries

Ah, I suspect you are falling into this issue where server side copies fail for large files on the swift backend

Do you have lots of files > 5GB? These are the ones this will fail for.

Would you like to help fixing up either of those issue?


#12

I will try again with 3 retries. But from what I have seen until now retries don’t help much in my case. If the sync fails on the first try it will also fail on the following tries. I run the script every day so that in case there is really an error it can be solved the following day. Retries consume a lot of data (my data per month is capped)

No, the largest file is 2 GB

Of course I would like to help to solve the issues. Tell me what to do.


#13

I see. Syncing once per day with --retries 1 is a pretty good solution.

I guess the first thing to do is can you dig up the ERROR logs for the sync so we can see how it is failing. If you aren’t hitting the server side copy bug then fixing the low level retries will probably fix it.


#14

Hi ncw,

I copy hereunder an extract of the log file.
As far as I understand, here is what happens:
for some unknown (from me) reason, server side copy on hubic triggers a lot of

HTTP Error: 504: 504 Gateway Time-out

I experienced this a few weeks ago when I moved all my files from one directory to another one on the remote. On the first run about 30% of files triggered an error. After trying again and again during a few days some went well but still around 10% of files weren’t copied. I completed the job by doing a sync from my local directory.

In the present case, since the server side copy requested by the --backup-dir flag fails, the file is not synced.

I see 2 ways to solve the problem:
-improve the server side copy robustness but I don’t know if it is possible.
-add a flag to the --backup-dir option to force the sync even if copied/deleted files can’t be backed up.
What do you think about this?

the log:

2018/11/21 00:12:12 DEBUG : --min-age 15m0s to 2018-11-20 23:57:12.484774603 +1300 NZDT m=-899.883354761
2018/11/21 00:12:12 DEBUG : rclone: Version “v1.43.1” starting with parameters [“rclone” “sync” “/home/fy/Hubic/” “hubic:default/Ubuntu/” “–config” “/home/fy/.config/rclone/rclone.conf” “-L” “–min-age” “15m” “–exclude-from” “/eglantines/balma/Informatique/scripts/RClone/exclusions.txt” “–backup-dir” “hubic:default/Backup/Daily/3” “-vvv” “–retries” “3” “–checkers” “128” “–transfers” “128” “–delete-during”]

2018/11/21 00:13:32 DEBUG : Documents/fy/LaGrotte/.data: Modification times differ by -67h30m19.8951493s: 2018-11-17 16:52:18.843694 +1300 NZDT, 2018-11-14 21:21:58.9485447 +1300 NZDT

2018/11/21 00:16:09 DEBUG : Documents/fy/LaGrotte/.data: MD5 = c8e7d375ac25741eb5a0edfaf01edca1 (Local file system at /home/fy/Hubic)
2018/11/21 00:16:09 DEBUG : Documents/fy/LaGrotte/.data: MD5 = 84d61ebb98660c639219744a140c8ac4 (Swift container default path Ubuntu/)
2018/11/21 00:16:09 DEBUG : Documents/fy/LaGrotte/.data: MD5 differ

2018/11/21 00:19:11 ERROR : Documents/fy/LaGrotte/.data: Failed to copy: HTTP Error: 504: 504 Gateway Time-out
2018/11/21 00:19:11 ERROR : Documents/fy/LaGrotte/.data: Not deleting source as copy failed: HTTP Error: 504: 504 Gateway Time-out

2018/11/21 00:21:02 ERROR : Attempt 1/3 failed with 12 errors and: failed to delete 4 files

2018/11/21 00:23:11 DEBUG : Documents/fy/LaGrotte/.data: Modification times differ by -67h30m19.8951493s: 2018-11-17 16:52:18.843694 +1300 NZDT, 2018-11-14 21:21:58.9485447 +1300 NZDT

2018/11/21 00:25:18 DEBUG : Documents/fy/LaGrotte/.data: MD5 = c8e7d375ac25741eb5a0edfaf01edca1 (Local file system at /home/fy/Hubic)
2018/11/21 00:25:18 DEBUG : Documents/fy/LaGrotte/.data: MD5 = 84d61ebb98660c639219744a140c8ac4 (Swift container default path Ubuntu/)
2018/11/21 00:25:18 DEBUG : Documents/fy/LaGrotte/.data: MD5 differ

2018/11/21 00:28:20 ERROR : Documents/fy/LaGrotte/.data: Failed to copy: HTTP Error: 504: 504 Gateway Time-out
2018/11/21 00:28:20 ERROR : Documents/fy/LaGrotte/.data: Not deleting source as copy failed: HTTP Error: 504: 504 Gateway Time-out

2018/11/21 00:29:39 ERROR : Attempt 2/3 failed with 4 errors and: HTTP Error: 504: 504 Gateway Time-out

2018/11/21 00:31:00 DEBUG : Documents/fy/LaGrotte/.data: Modification times differ by -67h30m19.8951493s: 2018-11-17 16:52:18.843694 +1300 NZDT, 2018-11-14 21:21:58.9485447 +1300 NZDT

2018/11/21 00:33:03 DEBUG : Documents/fy/LaGrotte/.data: MD5 = c8e7d375ac25741eb5a0edfaf01edca1 (Local file system at /home/fy/Hubic)
2018/11/21 00:33:03 DEBUG : Documents/fy/LaGrotte/.data: MD5 = 84d61ebb98660c639219744a140c8ac4 (Swift container default path Ubuntu/)
2018/11/21 00:33:03 DEBUG : Documents/fy/LaGrotte/.data: MD5 differ

2018/11/21 00:36:04 ERROR : Documents/fy/LaGrotte/.data: Failed to copy: HTTP Error: 504: 504 Gateway Time-out
2018/11/21 00:36:04 ERROR : Documents/fy/LaGrotte/.data: Not deleting source as copy failed: HTTP Error: 504: 504 Gateway Time-out

2018/11/21 00:37:38 ERROR : Attempt 3/3 failed with 12 errors and: failed to delete 4 files
2018/11/21 00:37:38 Failed to sync: failed to delete 4 files


#15

Oh, dear… That is the hubic end giving up on the server side copy and returning a timeout. There isn’t anything rclone can do about that I don’t think.

I don’t think we can do anything but retry.

This issue: https://github.com/ncw/rclone/issues/2740 would fix the problem I think.

It isn’t too tricky, it just requires every operation to be wrapped in the pacer.

That is possible of course, but I’m not that keen on the idea as people use backup-dir to save old copies so having a flag saying you don’t really care about saving old copies seems a bit odd!

I think the first solution is the best.

I had a go at it here:

https://beta.rclone.org/branch/v1.44-106-ge84790ef-fix-2740-swift-pacer-beta/ (uploaded in 15-30 mins)

Can you check it out?

It will retry the server side copies --low-level-retries times (10 by default) so hopefully that will get them to go through!


#16

Hi ncw,
Thanks for the new version. I have installed it and get tested for few days. It’s not easy for me to understand how the “pacer” works so I can’t really tell. But it seems it solves some sync errors but not all. For instance my 2 GB files still don’t get synced.
I wondering if the pacer isn’t trying more than 10 retries (like 2 x 10 or 3 x 10) but I’m not sure.
Would you like me to send you my complete logs to have a look? I would send them privately as they are big and I’m not keen to expose them publicly.


#17

Sure! Send them (or a link) by private message here, or to nick@craig-wood.com. Put a link to this forum page in for some context (I get a lot of email!). Thanks


#18

Thanks for the logs :smiley: It looks like hubic is really overloaded.

I see this in your config "--checkers" "128" "--transfers" "128" - this looks very high. I think reducing those down to say 16 and 8 or 8/4 will stop hubic throwing so many errors.

Using --fast-list may help. It will use more memory.

You may find using --checksum or --size-only makes your syncs a lot faster.

I couldn’t see which files were the 2GB ones - can you give me a hint?


#19

Yes, hubic is really unusable. Some files are out of reach and trigger errors when we try to download them or copy them. I will soon move to pcloud.

I set the checkers / transfer to a high value to increase the speed and I didn’t observe any improvement (i.e less errors) by decreasing the checkers and transfers.

I tried --fast-list several times: in doesn’t work for me. Rclone keeps running hours and hours but neither any check nor any transfer occurs, even after a few hours.

The size of files doesn’t appear in the logs. The 2 GB biles are “.data”, “.data.weekly”, “.data.monthly.tar.bz2”


#20

You could try reducing --swift-chunk-size which may help with big files. This is set so that filed don’t normally get chunked (as there are various problems with chunked files) but you could try setting it to 500M say to see if that helps.