Project

General

Profile

Feature #5225

Updated by osmith over 2 years ago

The stats API is currently more complex than it needs to be. 

 h3. Current structure 

 Stats are currently structured as follows: 

 <pre> 
  *                                       osmo_stat_item_groups 
  *                                         /             \ 
  *                                     group 1          group 2 
  *                                     /        \ 
  *                                  item 1       item 2 
  *                                /      |    \ 
  *                              /        |      \ 
  *                            /          |        \ 
  *                          /            |          \ 
  *                     values     stats_next_id    last_offs 
  *                    /        \ 
  *                   1          2 
  *                   |-id       |-id 
  *                   '-value    '-value 
 </pre> 

 h3. Reporting 

 The stats are periodically reported to a log or statsd compatible server. 

 Reporting works like this (stats.c): 
 * the maximum of all values from stats_next_id until last_offs is calculated 
 * this maximum (one value!) is sent to all reporters 
 * stats_next_id is set to last_offs + 1 

 h3. "values skipped" problem 

 This approach has the following problem: the reporting interval is user-configurable, but it needs to be short enough so the values are reported before the buffers (of any osmo_stat_item_group -> item -> values) overruns. Otherwise, the first values that were added to the osmo_stat_item are not considered for the max that gets reported to the reporters. The max value is potentially wrong. 

 Users get notified of this with the following error message: 
 <pre> 
 DLSTATS ERROR num_trx:total: 4 stats values skipped (stat_item.c:285)   
 </pre> 

 It would be better if users didn't have to deal with this problem, and just have each interval they set working as expected. 

 h3. Why the buffer was added 

 When looking at the git log / talking to @daniel, the intention of the values buffer was to: 
 a) support multiple reporters reading asynchronously from the values buffer 
 b) potentially provide a min/max/avg of the values in the buffer 

 a) is not useful: In practice, instead of multiple reporters reading asynchronously from osmo_stat_item, we have stats.c being the only user of stat_item, and reading _synchronously_ for all attached reporters. As explained above, all reporters share the same stats_next_id. (This is a layer break since the stats API has a private stats_next_id inside osmo_stat_item, but before that it was even more wrong with a global current_stat_item_index that would be shared between all items and groups, see #5088.) 

 b) is not useful: we can't use it for an average, because we don't have timestamps associated with each individual value. It makes much more sense to set a low reporting interval and calculate an average in the receiving component, which then also has timestamps available. 

 h3. New structure 

 <pre> 
  *                                       osmo_stat_item_groups 
  *                                         /             \ 
  *                                     group 1          group 2 
  *                                     /        \ 
  *                                  item 1       item 2 
  *                                    |- count 
  *                                    |- current 
  *                                    |- min 
  *                                    '- max 
 </pre> 

 We can get rid of the values, stats_next_id and last_offs members of struct osmo_stat_item, if we replace them with: 

 * count 
 * current 
 * min 
 * max 

 Whenever updating the stat item, count gets increased, current is the new current value, and min / max get updated accordingly. 

 The count is needed so we know whether min/max should be set to the current value, or be calculated from the min/max of the previous and current value. (Instead of count, we could also use a boolean.) 

 Whenever reading from osmo_stat_item, the count must be set to 0. 

 This means, we need to officially get rid of reading asynchronously from osmo_stat_item (which, again, was not really used anyway, but only available in theory by bringing your own "next_id" parameter to various osmo_stat_item functions instead of having it point to osmo_stat_item->stats_next_id). 

 Note that currently there is no way to report the minimum or current of the collected values, this is for potential future usage. Right now we always report the maximum. 

 h3. API/ABI breakage 

 Due to fixing of stats bug #5215, we already have breakage in libosmocore related to stats when we make the next release (see TODO-RELEASE). So as discussed with Daniel, it seems like a good time to fix the stats API before the next release then. Assigning to Daniel, as discussed.

Back

Add picture from clipboard (Maximum size: 48.8 MB)