Project

General

Profile

Bug #4067

recent failures of HLR_Tests.ttcn for both master and latest

Added by laforge 4 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Urgent
Assignee:
Target version:
-
Start date:
06/20/2019
Due date:
% Done:

10%


Description

Hi Oliver,

I'd like to ask you to look into sudden instability of the HLR_Tests.ttcn
test suite during the last five builds. Of course it can be a pure coincidence,
but it looks like nothing else was changed in the HLR (or its tests) beyond
the changes you made related to "subscriber create on demand".

The last 5 builds (since 491) we're seeing failures related to "timeout waiting
for VTY prompt" or "unexpected VTY response", see
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-hlr-test/test_results_analyzer/

Interestingly, the "latest" tests also start to fail around the same time,
hinting that it's not the HLR that has a regression, but some changs in the tests?

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-hlr-test-latest/test_results_analyzer/

Thanks for looking into this.

History

#1 Updated by osmith 4 months ago

  • Status changed from New to In Progress

#2 Updated by osmith 4 months ago

  • % Done changed from 0 to 10

I looked through the check IMEI test / create subscriber on demand test patches again, and did not notice anything that could cause other tests to fail. The new tests get executed after the existing tests. No existing code was modified, with the exception of Check IMEI related IEs.

With that being said, it is strange that the TTCN3 HLR tests were passing at least 25 times in a row, and then shortly after the new tests were added, random tests start failing. So maybe it is related and I'm overlooking something.

Here's an overview of what failed:

## master
491: TC_gsup_check_imei_invalid_len => (expected failure, because related fix was not merged yet)
492: TC_mo_sss_reject               => g_Tguard timeout
493: -
494: TC_gsup_purge_cs               => VTY timeout for prompt
495: TC_gsup_ul                     => VTY timeout for prompt
496: -

## latest
241: TC_gsup_purge_ps               => VTY timeout for prompt
242: -
243: -
244: TC_gsup_purge_ps, TC_gsup_ul   => VTY timeout for prompt (both)
245: -

I was not able to reproduce any of the failures locally, whenever I run the tests, all of them pass (with and without docker). Then I took an in-depth look at two recent ones, TC_gsup_ul and TC_gsup_purge_cs, and they ran into a 2s VTY timeout after sending these commands:

TC_gsup_purge_cs (494):

subscriber imsi 262420176541756 create

TC_gsup_ul (495):

subscriber imsi 262428655547458 update msisdn 491613408534

There was nothing else in the TTCN3 logs, which hinted at why this was failing. I suspect, that OsmoHLR fails to notify the connected client in time ("Osmocom TTCN-3 GSUP Simulator") in hlr.c:osmo_hlr_subscriber_update_notify(). Unfortunately we don't have the logs of osmo-hlr in the failed test runs, so I'm not sure.

Here is a patch to add logging, in case this happens again (in the last jenkins run, all tests passed):
https://gerrit.osmocom.org/c/docker-playground/+/14563

#3 Updated by osmith 4 months ago

  • Status changed from In Progress to Closed

Tests have been passing 6 days in a row, closing.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)