stats code: remove buffer of values
The stats API is currently more complex than it needs to be.
Stats are currently structured as follows:
* osmo_stat_item_groups * / \ * group 1 group 2 * / \ * item 1 item 2 * / | \ * / | \ * / | \ * / | \ * values stats_next_id last_offs * / \ * 1 2 * |-id |-id * '-value '-value
The stats are periodically reported to a log or statsd compatible server.Reporting works like this (stats.c):
- the maximum of all values from stats_next_id until last_offs is calculated
- this maximum (one value!) is sent to all reporters
- stats_next_id is set to last_offs + 1
"values skipped" problem¶
This approach has the following problem: the reporting interval is user-configurable, but it needs to be short enough so the values are reported before the buffers (of any osmo_stat_item_group -> item -> values) overruns. Otherwise, the first values that were added to the osmo_stat_item are not considered for the max that gets reported to the reporters. The max value is potentially wrong.
Users get notified of this with the following error message:
DLSTATS ERROR num_trx:total: 4 stats values skipped (stat_item.c:285)
It would be better if users didn't have to deal with this problem, and just have each interval they set working as expected.
Why the buffer was added¶
When looking at the git log / talking to daniel, the intention of the values buffer was to:
a) support multiple reporters reading asynchronously from the values buffer
b) potentially provide a min/max/avg of the values in the buffer
a) is not useful: In practice, instead of multiple reporters reading asynchronously from osmo_stat_item, we have stats.c being the only user of stat_item, and reading synchronously for all attached reporters. As explained above, all reporters share the same stats_next_id. (This is a layer break since the stats API has a private stats_next_id inside osmo_stat_item, but before that it was even more wrong with a global current_stat_item_index that would be shared between all items and groups, see #5088.)
b) is not useful: we can't use it for an average, because we don't have timestamps associated with each individual value. It makes much more sense to set a low reporting interval and calculate an average in the receiving component, which then also has timestamps available.
* osmo_stat_item_groups * / \ * group 1 group 2 * / \ * item 1 item 2 * |- count * |- current * |- min * '- max
We can get rid of the values, stats_next_id and last_offs members of struct osmo_stat_item, if we replace them with:
Whenever updating the stat item, count gets increased, current is the new current value, and min / max get updated accordingly.
The count is needed so we know whether min/max should be set to the current value, or be calculated from the min/max of the previous and current value. (Instead of count, we could also use a boolean.)
Whenever reading from osmo_stat_item, the count must be set to 0.
This means, we need to officially get rid of reading asynchronously from osmo_stat_item (which, again, was not really used anyway, but only available in theory by bringing your own "next_id" parameter to various osmo_stat_item functions instead of having it point to osmo_stat_item->stats_next_id).
Note that currently there is no way to report the minimum or current of the collected values, this is for potential future usage. Right now we always report the maximum.
Due to fixing of stats bug #5215, we already have breakage in libosmocore related to stats when we make the next release (see TODO-RELEASE). So as discussed with Daniel, it seems like a good time to fix the stats API before the next release then. Assigning to Daniel, as discussed.
Updated by osmith about 1 year ago
- Status changed from New to Resolved
- % Done changed from 0 to 100
the stats FIFO has been replaced some months ago in
libosmocore e90c7176be0f627610b9c28f44551ad19f114672 I137992a5479fc39bbceb6c6c2af9c227bd33b39b