Rclone hash check through ftp

What is the problem you are having with rclone?

Not a problem, just a question

What is your rclone version (output from rclone version)

irrelevant

Which OS you are using and how many bits (eg Windows 7, 64 bit)

linux

Which cloud storage system are you using? (eg Google Drive)

ftp server

The command you were trying to run (eg rclone copy /tmp remote:tmp)

None tried as yet

The rclone config contents with secrets removed.

irrelevant

A log from the command with the -vv flag

irrelevant


Hi,

This is just a question.
We are offloading content to a ftp server using lftp client.
The contents are usually quite large (about 150GB generally)

We have been facing some issues, with certain contents ending up corrupted. Often because of respective system/drive loads.

Here is my question :

Is it possible to use an rclone command through ftp to have a hash check, without downloading the content from the ftp server ?
I think not, but would be happy for any suggestion as to the appropriate rclone command if you think yes.

Thank you.

hi,

to do that transfer, you could run rclone serve sftp on the destination server, which can do checksums.

Hi,
Thank you for your answer.
Are you suggesting that if the content is sent to the ftp server with rclone, then it can perform a checksum of the content that is on the ftp server after being transfered ?

You are mentionning sftp, would that also work with ftp ? (not ssh port 22, but a ftp destination port 21)

Thank you.

it depends on what you can control.

do you have control over the router?
if yes, then it should be easy to run any server.

if no, then i would try the following

  • kill the ftp server
  • run the sftp server on the same port as ftp.

No, I am pushing content to a ftp server. Which is on another machine on the same network.
This ftp server is the only available destination (no sftp).
My question is only about this use case.

Can a hash check be done with rclone on the destination in this specific case ?
(I'm very doubtful but I thought I'd ask anyway :slight_smile: )

you can run a hash check on the source, save it to a file.
run rclone.
then on the dest, copy that checksum file to dest and do a compare.

Thank you, Yes, thats a very interesting idea.
But how would you do the compare through ftp ?
I only have ftp to access the destination.

(Thinking aloud, I know that I could retreive the files from the destination through ftp, and do a hash check comparison on the source machine. The only problem with this solution is that the data could also get corrupted in the return transfer, and so the hash wouldn't really prove anything execpt if it is valid...)

if you cannot make any changes to the dest server on the same network, not sure what to tell you.

what is the ftp server software?
some ftp servers do support checksums.
some ftp clients to support checksums.

Thank you. Yes I'm very aware this is a very tricky question. That's why I asked it here, thinking, maybe rclone had found a smart solution to this age old issue...

The ftp client is always lftp.
The ftp server varies. They are not my machines but a customers.

With lftp there is a wrapper command called "mirror" that allows to compare file sizes.
This tends to work for certain cases.
Unfortunately, I have found evidence of corrupted contents with same file sizes, hence this mirror command is insufficient for these cases...

Thank you for your insight anyhow. :slight_smile:

Rclone will compare sizes for you...

As far as I know there is no hash command for FTP so it isn't possible for rclone to compare hashes.

Actually I'm wrong about that, here is a draft HASH command: https://tools.ietf.org/html/draft-bryan-ftpext-hash-02

Which I got from: https://security.stackexchange.com/a/110177/10462

Does your FTP server support HASH or XCRC XMD5 etc?

If it does you'll see them in the response to the FEAT command when you run with -vv --dump bodies.

You can also do rclone check --download which will stream (not taking local disk space) the files to compare them. That will give you a definitive answer.

Wow. Nick Craig-Wood, this is a great find !!

We are facing a large amount of different FTP servers to which we send files. So some of them might accept these HASH commands. Our offload script should at least attempt it in case the server accepts them.

The rclone stream sounds interesting, in our case however the files are around 150GB, and these transfers usually take hours. I'm worried corruption could sneak its way even into the stream process.

I will nevertheless pursue investigations with your insights, and report here for the community if any significant breakthrough is acheived.

Thank you !

Honestly, I would try to switch to a protocol that supports hash check. This will make it way more robust. If this is any kind of important data. I know multiple providers that offer ftp backup (for example OVH) you can also turn on other methods like ssh to do just that.

If this is something casual and FTP is the only choice, I would seriously consider breaking up files so it's easier to check than a 150GB file. I would also recommend using some error correction mechanism that can help if corruption is low.

Hi Jose,

Thanks for your input.

Of course I agree with you. I can't really go into too much detail, but unfortunately using another protocol is not possible at this time. The receiving side is manufactured, and a change of protocol will be a long and tedious process to which they will have to identify their own ROI.

It is not something casual.

Breaking up files poses the difficulty of rebuilding them on the receiving side. Not sure FTP allows this seamlessly ?

What kind of error correction mechanism are you thinking of that would work with FTP ?

Yeah I can totally relate on how difficult getting budget for even the slightest change can be.

Since you are generating the zip files, you could break the zip files in multiple files and upload them instead of a huge file (spanning) in my experience, smaller size files have less chance of running into transfer problems. So you can make 5GB files and upload 30 of them instead of a 150GB file.

You could also generate error correcting files that could help fix the corruption if it's not terrible. Check https://www.thanassis.space/rsbep.html or https://github.com/Parchive/par2cmdline out. I've never used it, but I've read about it before. This would of course take additional space, but would be the easiest way to reconstruct the data in case a file becomes corrupt. Add it to Nick suggestion to use rclone check --download to confirm the file was uploaded correctly for peace of mind.

hi,

there is no solution to the problem as posted.

based on my suggestions, and the OP response concerning of the ftp server.
he does not seem to remote access, command prompt or ability to install software.

i have used winrar which can add recovery records and can split large files.
someone who need local access to the ftp server would have to unrar the files, look for log file for errors and so on for each and every transfer.

with splitting the files, that adds another layer to problems with potential errors splitting and rejoining the files. and then to still have to transfer the chunks over unreliable ftp.

so in the end, go with rclone check --download.
the unreliable nature of ftp might give up some false positives but you would have to upload those file(s) again.

Correct, OP doesn't have remote access, but he does hace access to local system. My suggestion is to break files before uploading and pair them with error correction files which are uploaded along the real data.

It's pretty similar to the winrar recovery records, but I didn't offer that as it's commercial/non free software that makes implementation even more complicated in corporate environments.

Just to be clear, I also recommend rclone check --download option, yet I added the error correction files idea alongside as another fail safe. It's common for some network maintenance to also be scheduled during the same time where backups are happening, so that command might fail. And anything to make backups more resilient is always a good practice. The error correction files would be used in local system when a restore is needed and results in news that remote data was somehow corrupted, not to confirm that transfer was successful.

not sure what you mean about access to local system?
how would the OP get that acces of each and every customer's dest server?

the OP would need to purchase just one copy of winrar for the source machine.
winrar does not charge to decompress compressed files, it has free tools.
winrar, like most compression programs, can create a self-extracting .exe that will decompress and combine the chunks and do checksum hashing.

I've never been able to get my employer to buy any licensed software, if this is a big company it won't be easy or one person decision. If he is having trouble turning on sftp, he will have nightmares getting to buy software. Yet, I might be wrong.

Again, my proposal doesn't include OP getting access to each and every dest server, is to simple upload the backup files along some error correction files. When a restore is needed, he fetches the file and if there is a problem, he can download the error correction file, which is smaller and hope both files didn't get corrupted.

Either way, if backups are getting corrupted that often over LAN, there is a serious problem with their network equipment that needs to be checked out.

Hi to both of you,
I have just seen your replies.
I am totally out of time bandwidth to provide any answers or details atm.
I will reply in due time asapossible.
Many thanks for your insights and thoughtful propositions !
Kind regards,

1 Like