Bug #5638


osmo_fsm: osmo_fsm_inst_state_chg() using timeout_secs=0 instead of -1 as "no-timeout"

Added by pespin 16 days ago. Updated 10 days ago.

Target version:
Start date:
Due date:
% Done:


Spec Reference:


<pespin_> just found a big bug in osmo_fsm. Calling osmo_fsm_inst_state_chg() with timeout_secs=0 actually sets no timer...
<pespin_> That should be handled with negative values, like -1
<fixeria> pespin_: timeout_secs=0?  is it expected to expire immediately?
<pespin_> fixeria, that's the idea... but doesn't happen
<fixeria> pespin_: but what's the point of doing so?
<pespin_> this has some use cases, like doing some stuff immediatelly but asynchronously
<fixeria> as a quick hack, you could use osmo_fsm_inst_state_chg_ms() with a small ms value
<pespin_> or I can imagine some FSM setting a timeout based on calculations which may end up in 0
<pespin_> yeah I know, but that still doens't fix the fact that some timeout may end up being 0ms too
<fixeria> handling timeout_secs=0 as immediate expiration still looks weird to me
<pespin_> why? is how any kind of timeout works
<fixeria> AFAIR, tnt (or somebody else?) raised the topic of async event handling in osmo_fsm
<fixeria> IMO, it would be a lot more readable if we had API for async event dispatching
<pespin_> I don't think we can really fix the existing API though, since it's used in lots of places with timeou_secs=0 meaning no timeout...
<pespin_> we'd probably need to add a new osmo_fsm_inst_state_chg2 or alike
Actions #1

Updated by neels 13 days ago

no. osmo_fsm_inst_state_chg(timeout_secs == 0) is exactly intended to change the state and not set any timeout.
There is no use case for changing the state and immediately timing out.
Immediate actions are either in the onenter() cb or directly osmo_fsm_inst_term().

Anything here i am misunderstanding? otherwise please reject this issue.

Actions #2

Updated by pespin 13 days ago

There's a difference between "immediately", aka "synchronously" doing an action, and delaying an action 0 seconds, aka "asynchronously" doing an action.

Think of a function calculating a timer timeout value dynamically, and finding out it is late, so setting timeout=0. In this case, the timeout will be unexpectedly disabled.

Actions #3

Updated by neels 13 days ago

ah you mean the use case "defer" -- take an action in the next (or so) select loop.
i see that as a new feature...

i can understand the notion that extrapolating a smaller and smaller delay down to zero would end up becoming "immediate" timeout,
but the state change from the start was intended to not have any timeout on zero.
i think that is fine, because IMHO a defer is orthogonal to state change; i think i would design it as an event dispatch instead?

osmo_fsm_inst_dispatch_deferred(fi, EVENT);
Actions #4

Updated by pespin 13 days ago

"deferring" an event is one use case, but the other one which is not really deferring is setting up a dynamically-calculated timeout value when changing state.
It's totally common to, when calculating time diffs, set a timeout value to 0 when it's too late. In this case, the FSM will just silently break with an unexpected behavior.

Actions #5

Updated by neels 13 days ago

i find the behaviour expected.
special case when you hit zero and call the desired action immediately?

Actions #6

Updated by pespin 13 days ago

neels we are talking about changing state, not dispatching an event here.
The state change is always instantaneous/synchonous, what we talk is about the timeout being disabled if 0 is set.

Actions #7

Updated by neels 11 days ago

I'm sure you will find a solution.

API wise my humble opinion is that state_chg(timeout_secs=0) meaning "never
time out" is definitely not a bug.

Actions #8

Updated by laforge 10 days ago

Sorry for being slow to respond. I've been distracted by various emergencies in recent days.

As far as I recall: In general, I have to say that the timeout == 0 for "no timeout" was a conscious choice when originally wrigin osmo_fsm. As far as I recalled, the initial code had no timeout support at all, and then I thought it's useful to have an optional timeout to avoid having to call two functions for state change + start of timeout.

At that time, I would never have thought of a dynamic timeout, but only about static, compiled-in timeout values. Those can never go to 0 by accident.

I understand the failure mode pespin is describing, but I still think it is not a valid use case to "abuse" the timeout=0 for some deferred execution.

So IMHO, if we should fix something, then it is whatever code that generates dynamic (or user-modifiable) timeout values: It should never return "0".

Actions #9

Updated by pespin 10 days ago

  • Status changed from New to Rejected

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)