It's working fine - except that it keeps re-uploading files. Which isn't really a surprise as the the files are being regenerated. So while the hashes would stay the same, the modification (and even creation) time changes.
I am now searching for a way to optimize the upload.
Since it's an sftp-only target there is no way to check the hash on the target (unless there is a rclone ssh subsystem that I don't know of). That's why I have the hash check disabled. I was hoping there is some mode that uploads an index to could be used to check against on the source?
What is your rclone version (output from rclone version)
rclone v1.52.2
- os/arch: darwin/amd64
- go version: go1.14.4
Which OS you are using and how many bits (eg Windows 7, 64 bit)
macOS 10.15.3 transferring to Ubuntu 19.10
Which cloud storage system are you using? (eg Google Drive)
sftp with key auth and nologin shell.
The command you were trying to run (eg rclone copy /tmp remote:tmp)
@Animosity022 Please also see the second part of the log with hashing on. There is no md5sum or sha1sum available on the server. I assume those are only available with full ssh access.
Yes, the user on the server needs to be able to execute the command in the path. Normally, you don't have to configure anything as it's most distributions, it isn't locked down.
If you locked it down, you'd need to make it accessible to execute for the user if you want to use checksums.
My fresh install Ubuntu VM with normal install and SSH setup with no configuration changes works without issue on checksums. If you turn off checksums, you have other options as by default it'll look as size and modification time.
You can use --size-only and only check size if you are unable to use checksums.
In the end, you need some way to compare file on the source and destination to say they are the same or not the same so checksum, size are some options for you to go with.
Of course it will work when there is full ssh access - which is the default. But in this scenario I want to avoid giving full ssh access.
Just checking the size is a too weak indicator.
I was hoping for there was a mode where rclone could upload and compare a manifest. But I guess I the only option is to fiddle with the ssh config to allow the execution of the hash commands then, or live with the re-uploads.
Making the assumption that the manifest will always be updated on updates, there could be e.g. a single .index file that lists all files and their hashes. This could all be done on the client. This file would have to be downloaded first to check what has changed.
That would be a naive implementation for it.
Are you doing a restricted shell? How are you restricting access?
See above. I provided the sshd snipped.
But thanks for the link to the article.
I think this is going to be hard to work around without hashes. As you've noted --size-only is a pretty weak check - it is better than nothing though.
By this I think you mean could rclone upload the hash to the sftp server then download it again to check it?
You can use the chunker backend to do this. You'd set the chunk_size very large (assuming you don't want chunked files) and then set the hash_type to md5all or sha1all meaning you want a hash of all files. This will store a bit of metadata per file with the hash in.
You could also do it manually by running rclone md5sum locally, uploading that to the server then downloading it and diffing it.