And I make sure any service has a dependency on rclone, stops as well. I'd imagine if gives you a bad exit code, because it could not cleanly stop since there was access going on.
killing after unmounting didn't work, since the mount is unmounted so fuser doesn't know what mount is being talked about (even if the mount still has writes happening to it)
And then, if I'm killing writing processes off, then I'm losing data, so not great
SIGINT gives "ERROR : zz: vfs cache: restart download failed: failed to start downloader: failed to open downloader: vfs reader: failed to open source file: invalid seek position" on the next run, if I'm actively writing
SIGTERM doesn't cause cache issues as far as I can tell, so it seems to be graceful to a degree? if its a graceful shutdown, then it shouldn't show an error and exit code, unless the shutdown had an error. I might be a bit obtuse with this, but, it does result in systemd/journalctl telling me everything is on fire, which is a bit ugh
(and if I did tell systemd to ignore that exit-code, would that then hide actual errors in the future?)
I'm not quite sure what you are expecting to happen.
For my fuse/rclone mounts, I have requirements for all my services to I specifically do not have to kill the mount as you'll get into situations like you are talking about.
You have to stop all the other services before stopping the rclone that so the IO is clean.
It's much like having a regular mount as the OS won't let you unmount it if there is an active process against it, but with a normal mount, you can't kill it.
You have to quiet it by stopping all IO and then stop it. As I shared, I use require in my systemd.
I'm wanting a graceful shutdown, not a halt-the-planet shutdown
do not allow new opens, wait until all currently open files are closed, then cleanup and exit.
Noting that the lines you provided didn't solve: fuser -Mm returned nothing, so there were no procs interacting with the mount, and the error and exit code still occurred.
fusermount -uz prevented new writes, but doesn't stop pre-existing writes, so "After the commands configured in this option are run, it is implied that the service is stopped" from the systemd docs, which is not the case, fusermount -uz will almost always terminate before rclone does, so I'd have to maintain a separate rclone-mount-stop script to actually get it to close gracefully which... I just wanna use sigterm, please
What would be expected:
sigterm/sigint once: stops new opens, waits for currently opened files to be closed (if under systemd, requests additional closing time from systemd during this period, see EXTEND_TIMEOUT_USEC, cleans up and closes
sigterm/sigint twice: grabs what is given from any open files, cleans up and closes
If you are using systemd, you'd have requirements to ensure things do not stop out of order and you quiesce the mount point before stopping it.
The only time I'm using a SIGTERM/SIGINT means something bad happened and you still have to clean up the IO at some point be a reboot or some other process clean up.
If there isn't any IO going on, the fusermount will unmount cleanly and nothing else is required.
SIGTERM really isn't the answer imo as you are trying to take a hammer to solve a problem is a clean fashion.
If SIGTERM is received, shut down the daemon and exit cleanly.
To do this otherwise, as your method, but graceful, would require, opening the pidfd (to avoid any pid replacement issues), calling fusermount -uz, then polling the pidfd.
I just want to use the gently-close-the-program-button to gently close the program
I'm trying to assist here so the sarcasm is not called for.
It's always amazing to me when I volunteer time and try to assist someone that are rude/sarcastic in their responses as my goal here is to help you by giving up my personal time to go through your question/problem.
That's the goal right? We have to understand the problem by having a conversation and teaching each other things and learning from each other. That's part of the process.
I'm not sure as that probably requires a consensus on what 'clean' means there as the mount is stopped properly and most documentation I've seen shows that exit code 143 for an OS killing a process which is what we've done. The mount can be remounted and it does seem to be clean, but not an exit code 0. That could be a documentation update and/or more clearly in the documentation.
If you'd like to submit a PR to update it to be more clear, please feel free. The PRs are another spot that discussion can happen as well.
@ncw - I did read a few other things and I'm not sure if that's a fuse item or rclone item causing the 143 as I found a few examples and that was adjusted via code changes.
It felt like you were telling me repeatedly that didn't know what I wanted. I just wanted to make a bug report that "it shows an error when you close the program according to the documentation", being told repeatedly to do something else, as if what I brought up was not valid. felt insulting and demeaning.
Perhaps I could have titled the thread better or something.
Generally for services under Linux, start with the LSB:
For all other init-script actions, the init script shall return an exit status of zero if the action was successful.
hence why systemd expects 0 on SIGTERM, as that is the equivalent of LSB's service stop
(and then after LSB, daemon for info specific to systemd)
SIGTERM does seem to do what is needed, however the fact that it produces a ERROR line worries me, however if that error is just a mis-leveled logging, IMO changing that to be INFO or DEBUG, and having the exit code = 0
If the ERROR is an error though, then that is a bit of a worry
SIGINT seems to not leave the cache in a safe order for my current binary, could be a problem for both and by chance its only happened when I used SIGINT, dunno. I'll swap to a new binary in the future and re-test
I have to ask questions to get to my own understanding of the issue. The goal for me I always assume good intent unless someone gives me a reason (i.e. pardon my sarcasm / pardon my annoyance) as my goal still was/is to help.
So by asking those questions, I think we have a good understanding of the specifics / clear instructions on reproducing it and if ncw gets a chance, he can chime in and see if it is a fix.
none of the questions seemed relevant other than "What would you expect to happen in response to sigterm and why?" which was phrased as "I'm not quite sure what you are expecting to happen." Which is not a question.
Actually, reading through, you asked zero questions.
a) there is no such thing as ExecStopPre for service files (it exists for socket files)
b) execstop runs, finishes, but rclone is still running as fusermount is lazy, hence we get
rclone@Dropbox.service: Main process exited, code=exited, status=143/n/a
showing that, even with you doing the fusermount -uz, rclone still gets SIGTERM'd
so, ALL of everything, you are quite probably having the exact same closing condition as I am, in your service files, which is why I can not use any of your suggestions: They don't do anything different.
You ignored me saying it wasn't a solution, treating me as.. I don't know what. Not worth hearing. I think the overall term for it is *splaining