https://osmocom.org/https://osmocom.org/favicon.ico?16647414092022-08-30T19:54:51ZOpen Source Mobile Communicationslibosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=248122022-08-30T19:54:51Zlaforge
<ul></ul><p>this is the vty output of the libosmocore rate_ctr API. so it may very well be a bug in libosmocore affecting all our counters.</p>
<p>There's one global timer in libosmocore which iterates over all rate_ctr and computes the per-minute/hour/day aggregate stats.</p> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=248242022-08-31T10:18:09Zlaforge
<ul></ul><p>Looking at the output of some of my long-running osmo-e1d instances:<br /><pre>
tx:frames_muxed: 80257523645 (8011/s 512011/m 49440128/h 748802000/d) E1 Tx Frames multiplexed
rx:frames_demuxed: 80257529819 (8008/s 512008/m 49440136/h 748802000/d) E1 Rx Frames demultiplexed
</pre></p>
that's also odd:
<ul>
<li>512011/60 = 8533/s</li>
<li>49440128/3600 = 13733/s</li>
<li>748802000/86400 = 8667/s</li>
</ul>
<p>So clearly something odd is going on here.</p> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=248252022-08-31T10:27:05Zlaforge
<ul></ul>So the libosmocore/rate_ctr works like this:
<ul>
<li>schedule a libosmocore timer 1s in the future</li>
<li>timer call-back is rate_ctr_group_intv()
<ul>
<li>increment timer_ticks</li>
<li>update the per-second interval stats</li>
<li>if timer_ticks % 60 == 0, update the per-minute stats</li>
<li>etc. for the higher intervals</li>
</ul>
</li>
<li>re-arm the timer 1s in the future</li>
</ul>
So this has the following problems:
<ul>
<li>the timer expiration might be delayed by the scheduler</li>
<li>the processing of the timer takes some time</li>
<li>thus, in reality, the timer expiration is likely always "some amount more" than 1s, where "some amount" is the sum of the scheduler delay and the sum of the processing time of the timer callback function</li>
</ul>
<p>so ideally, we would use a periodic timer expiring regularly at 1s intervals rather than multiple consecutive relative 1s timers. That way, the error doesn't accumulate over time.</p>
<p>When rate_ctr was written, we didn't yet have support for periodic timerfd in libosmocore yet, so we had to use what was available (osmo_timer, which are not repeating).</p>
<p>So this explains why there is <strong>some</strong> amount of error. However, it does not explain the magnitude of error we're seeing (13733/s is <strong>way more</strong> than 8533/s).</p> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=248282022-08-31T14:47:48Zcquirin
<ul></ul><p>How about doing a gettimeofday() each time the counter get actually updated? Some math should then allow for the correct values to be deduced. That can be done with or without FP.</p> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=248342022-09-01T06:10:10Zlaforge
<ul></ul><p>On Wed, Aug 31, 2022 at 02:47:48PM +0000, cquirin wrote:</p>
<blockquote>
<p>How about doing a gettimeofday() each time the counter get actually updated? Some math should then allow for the correct values to be deduced. That can be done with or without FP.</p>
</blockquote>
<p>using a timerfd avoids the need for any of this. Even if the system is<br />stalled for several seconds, it will tell you how many timer intervals<br />(1s in our case) have expired meanwhile. We use this<br />in all the critical timing paths of other osmocom software, just not<br />for the rate_ctr.</p>
<p>However, I still don't think this is the root cause of what we're<br />seeing.</p> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=248402022-09-01T07:47:13Zlaforge
<ul><li><strong>Project</strong> changed from <i>osmo-e1d</i> to <i>libosmocore</i></li><li><strong>Subject</strong> changed from <i>strange per-minute/hour/day e1 tx/rx frame counts</i> to <i>rate_ctr: strange per-minute/hour/day e1 tx/rx frame counts</i></li></ul> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=248422022-09-01T14:59:16Zcquirin
<ul></ul><p>rate_ctl.c:307 contains the following code</p>
<pre><code>/* update the rate of the next bigger interval. This will
* be overwritten when that next larger interval expires */<br /> if (intv + 1 < ARRAY_SIZE(ctr->intv)) <br /> ctr->intv[intv+1].rate += ctr->intv[intv].rate;</code></pre>
<p>This code is reasonable when the very first interval has not yet elapsed, as it incrementally fills the higher values with some kind of estimates.</p>
<p>However, as soon as the rate has been computed the regular way, by differentiation, the code should no longer run.</p>
<p>I commented the code to check. I have rates of 0 for the first interval, of course, (meaning that the per day value stays 0 for a whole day) but I do have reasonable values as soon as the first interval has elapsed.</p> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=255252022-11-22T09:29:17Zlaforge
<ul><li><strong>Assignee</strong> changed from <i>laforge</i> to <i>msuraev</i></li></ul><p>I really didn't find the time so far, maybe something Max could look into?</p> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=255492022-11-23T11:10:59Zmsuraev
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul><p>Let's convert this to timerfd first and than troubleshoot and fix remaining inaccuracies.</p> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=255562022-11-23T12:15:11Zlaforge
<ul></ul><p>On Wed, Nov 23, 2022 at 11:10:59AM +0000, msuraev wrote:</p>
<blockquote>
<p>Let's convert this to timerfd first and than troubleshoot and fix remaining inaccuracies.</p>
</blockquote>
<p>those are actually two separate topics. The last update from cquirin handles a generic<br />logic bug int he code, which is IMHO higher priority than any "inaccoracies.</p> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=255862022-11-24T18:07:48Zmsuraev
<ul></ul><p>laforge wrote in <a href="#note-10">#note-10</a>:</p>
<blockquote>
<p>those are actually two separate topics. The last update from cquirin handles a generic<br />logic bug int he code, which is IMHO higher priority than any "inaccoracies.</p>
</blockquote>
<p>I completely agree regarding priorities. From the implementation point of view it's simply easier to fix the code which won't be affected by rewrite right away.</p> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=255872022-11-24T18:08:35Zmsuraev
<ul><li><strong>% Done</strong> changed from <i>0</i> to <i>50</i></li></ul><p>The preliminary fix is available in <a class="external" href="https://gerrit.osmocom.org/q/topic:ratectr">https://gerrit.osmocom.org/q/topic:ratectr</a><br />N. B: it hasn't been thoroughly tested yet.</p> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=256352022-11-27T18:54:01Zmsuraev
<ul><li><strong>% Done</strong> changed from <i>50</i> to <i>60</i></li></ul><p>What would be better semantics for long-running rate counters?</p>
<p>For instance, day counter should be:<br />- 0 until 24 hours has passed<br />- updated once with the value from hourly counter<br />- updated every time hourly counter fires</p>
<p>The current code is kinda confusing so I wonder how the rate estimation code should be changed. Should I adjust the estimation to run only for 0 (never updated) counters? Or should I try to implement estimation to run in-between counter updates?</p>
<p>Which variant is more convenient and less confusing?</p> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=258292022-12-17T15:30:26Zmsuraev
<ul><li><strong>% Done</strong> changed from <i>60</i> to <i>80</i></li></ul><p>After checking rate estimation in other monitoring soft it seems like suggestion by <a class="user active" href="https://osmocom.org/users/739605">cquirin</a> makes perfect sense: dropping the errorneous rate estimator and getting 0 for day counter until 24 hours passed.</p>
<p>The fix is under review in <a class="external" href="https://gerrit.osmocom.org/c/libosmocore/+/30304">https://gerrit.osmocom.org/c/libosmocore/+/30304</a></p>
<p>Additionally, one can check <a class="external" href="https://gerrit.osmocom.org/c/libosmocore/+/30301/">https://gerrit.osmocom.org/c/libosmocore/+/30301/</a> for example setup which uses statsd and netdata to obtain rate counter values independently from libosmocore's rate counter/vty code.</p> libosmocore - Bug #5671: rate_ctr: strange per-minute/hour/day e1 tx/rx frame countshttps://osmocom.org/issues/5671?journal_id=258422022-12-20T11:56:25Zmsuraev
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li><li><strong>% Done</strong> changed from <i>80</i> to <i>100</i></li></ul><p>The fix is merged, related doc update is available via <a class="external" href="https://gerrit.osmocom.org/c/osmo-gsm-manuals/+/30715">https://gerrit.osmocom.org/c/osmo-gsm-manuals/+/30715</a></p>