osmo_fsm: osmo_fsm_inst_state_chg() using timeout_secs=0 instead of -1 as "no-timeout"
<pespin_> just found a big bug in osmo_fsm. Calling osmo_fsm_inst_state_chg() with timeout_secs=0 actually sets no timer... <pespin_> That should be handled with negative values, like -1 <fixeria> pespin_: timeout_secs=0? is it expected to expire immediately? <pespin_> fixeria, that's the idea... but doesn't happen <fixeria> pespin_: but what's the point of doing so? <pespin_> this has some use cases, like doing some stuff immediatelly but asynchronously <fixeria> as a quick hack, you could use osmo_fsm_inst_state_chg_ms() with a small ms value <pespin_> or I can imagine some FSM setting a timeout based on calculations which may end up in 0 <pespin_> yeah I know, but that still doens't fix the fact that some timeout may end up being 0ms too <fixeria> handling timeout_secs=0 as immediate expiration still looks weird to me <pespin_> why? is how any kind of timeout works <fixeria> AFAIR, tnt (or somebody else?) raised the topic of async event handling in osmo_fsm <fixeria> IMO, it would be a lot more readable if we had API for async event dispatching <pespin_> I don't think we can really fix the existing API though, since it's used in lots of places with timeou_secs=0 meaning no timeout... <pespin_> we'd probably need to add a new osmo_fsm_inst_state_chg2 or alike
no. osmo_fsm_inst_state_chg(timeout_secs == 0) is exactly intended to change the state and not set any timeout.
There is no use case for changing the state and immediately timing out.
Immediate actions are either in the onenter() cb or directly osmo_fsm_inst_term().
Anything here i am misunderstanding? otherwise please reject this issue.
There's a difference between "immediately", aka "synchronously" doing an action, and delaying an action 0 seconds, aka "asynchronously" doing an action.
Think of a function calculating a timer timeout value dynamically, and finding out it is late, so setting timeout=0. In this case, the timeout will be unexpectedly disabled.
ah you mean the use case "defer" -- take an action in the next (or so) select loop.
i see that as a new feature...
i can understand the notion that extrapolating a smaller and smaller delay down to zero would end up becoming "immediate" timeout,
but the state change from the start was intended to not have any timeout on zero.
i think that is fine, because IMHO a defer is orthogonal to state change; i think i would design it as an event dispatch instead?
"deferring" an event is one use case, but the other one which is not really deferring is setting up a dynamically-calculated timeout value when changing state.
It's totally common to, when calculating time diffs, set a timeout value to 0 when it's too late. In this case, the FSM will just silently break with an unexpected behavior.
Sorry for being slow to respond. I've been distracted by various emergencies in recent days.
As far as I recall: In general, I have to say that the timeout == 0 for "no timeout" was a conscious choice when originally wrigin osmo_fsm. As far as I recalled, the initial code had no timeout support at all, and then I thought it's useful to have an optional timeout to avoid having to call two functions for state change + start of timeout.
At that time, I would never have thought of a dynamic timeout, but only about static, compiled-in timeout values. Those can never go to 0 by accident.
I understand the failure mode pespin is describing, but I still think it is not a valid use case to "abuse" the timeout=0 for some deferred execution.
So IMHO, if we should fix something, then it is whatever code that generates dynamic (or user-modifiable) timeout values: It should never return "0".