Project

General

Profile

Feature #5225

stats code: remove buffer of values

Added by osmith 19 days ago. Updated 5 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
08/31/2021
Due date:
% Done:

0%

Spec Reference:

Description

The stats API is currently more complex than it needs to be.

Current structure

Stats are currently structured as follows:

 *                                     osmo_stat_item_groups
 *                                       /           \
 *                                   group 1        group 2
 *                                   /      \
 *                                item 1     item 2
 *                              /    |  \
 *                            /      |    \
 *                          /        |      \
 *                        /          |        \
 *                   values   stats_next_id  last_offs
 *                  /      \
 *                 1        2
 *                 |-id     |-id
 *                 '-value  '-value

Reporting

The stats are periodically reported to a log or statsd compatible server.

Reporting works like this (stats.c):
  • the maximum of all values from stats_next_id until last_offs is calculated
  • this maximum (one value!) is sent to all reporters
  • stats_next_id is set to last_offs + 1

"values skipped" problem

This approach has the following problem: the reporting interval is user-configurable, but it needs to be short enough so the values are reported before the buffers (of any osmo_stat_item_group -> item -> values) overruns. Otherwise, the first values that were added to the osmo_stat_item are not considered for the max that gets reported to the reporters. The max value is potentially wrong.

Users get notified of this with the following error message:

DLSTATS ERROR num_trx:total: 4 stats values skipped (stat_item.c:285)  

It would be better if users didn't have to deal with this problem, and just have each interval they set working as expected.

Why the buffer was added

When looking at the git log / talking to daniel, the intention of the values buffer was to:
a) support multiple reporters reading asynchronously from the values buffer
b) potentially provide a min/max/avg of the values in the buffer

a) is not useful: In practice, instead of multiple reporters reading asynchronously from osmo_stat_item, we have stats.c being the only user of stat_item, and reading synchronously for all attached reporters. As explained above, all reporters share the same stats_next_id. (This is a layer break since the stats API has a private stats_next_id inside osmo_stat_item, but before that it was even more wrong with a global current_stat_item_index that would be shared between all items and groups, see #5088.)

b) is not useful: we can't use it for an average, because we don't have timestamps associated with each individual value. It makes much more sense to set a low reporting interval and calculate an average in the receiving component, which then also has timestamps available.

New structure

 *                                     osmo_stat_item_groups
 *                                       /           \
 *                                   group 1        group 2
 *                                   /      \
 *                                item 1     item 2
 *                                  |- count
 *                                  |- current
 *                                  |- min
 *                                  '- max

We can get rid of the values, stats_next_id and last_offs members of struct osmo_stat_item, if we replace them with:

  • count
  • current
  • min
  • max

Whenever updating the stat item, count gets increased, current is the new current value, and min / max get updated accordingly.

The count is needed so we know whether min/max should be set to the current value, or be calculated from the min/max of the previous and current value. (Instead of count, we could also use a boolean.)

Whenever reading from osmo_stat_item, the count must be set to 0.

This means, we need to officially get rid of reading asynchronously from osmo_stat_item (which, again, was not really used anyway, but only available in theory by bringing your own "next_id" parameter to various osmo_stat_item functions instead of having it point to osmo_stat_item->stats_next_id).

Note that currently there is no way to report the minimum or current of the collected values, this is for potential future usage. Right now we always report the maximum.

API/ABI breakage

Due to fixing of stats bug #5215, we already have breakage in libosmocore related to stats when we make the next release (see TODO-RELEASE). So as discussed with Daniel, it seems like a good time to fix the stats API before the next release then. Assigning to Daniel, as discussed.

History

#1 Updated by osmith 19 days ago

  • Description updated (diff)

#2 Updated by osmith 19 days ago

  • Description updated (diff)

#3 Updated by osmith 11 days ago

As pointed out while discussing with neels and laforge, it would be possible to store avg by storing the amount of values and an accumulator for the sum of values.

However since it's currently not supported to retrieve the average, it's not clear if it is worth adding that.

#4 Updated by osmith 5 days ago

  • Assignee changed from daniel to neels

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)