Project

General

Profile

Bug #4151

Updated by pespin over 4 years ago

It was spotted several times that all osmo-trx-lms tests in osmo-gsm-tester fail with message: 
 <pre> 
 socket.c:367 unable to bind socket:10.42.42.117:4237: Address already in use 
 </pre> 

 Close lookup shows osmo-trx-lms stil running but idle (not consuming CPU): 
 <pre> 
 # ps -ef | grep osmo-trx-lms 
 root       14643 14604    0 11:47 pts/1      00:00:00 grep osmo-trx-lms 
 jenkins    55210       1    0 Aug06 ?          00:00:53 /osmo-gsm-tester-trx/last_run/osmo-trx/bin/osmo-trx-lms -C /osmo-gsm-tester-trx/last_run/osmo-trx.cfg 
 </pre> 

 In order to get process creation time (too see which test caused the issue and if logs around it provide more information): 
 <pre> 
 # ls -ld --time-style=full-iso    /proc/$(pidof osmo-trx-lms) 
 dr-xr-xr-x 9 jenkins jenkins 0 2019-08-06 10:37:14.274166715 +0200 /proc/55210 
 </pre> 

 At that time, following run was in place: 
 https://jenkins.osmocom.org/jenkins/view/osmo-gsm-tester/job/osmo-gsm-tester_run-prod/1926/ 

 And the test: @trial-1926 gprs:trx-lms+mod-bts0-numtrx2+mod-bts0-chanallocdescend cs_paging_gprs_active.py@ gprs:trx-lms+mod-bts0-numtrx2+mod-bts0-chanallocdescend@ 

 The test runs and at some fails (expected since multi-trx is not yet supported in osmo-trx-lms) and then osmo-gsm-tester goes over regular procedure to kill all processes (in the case of osmo-trx-lms, it kills the ssh client, which should end up killing its child through the script handler): 
 <pre> 
 10:38:00.558825 ---        ParallelTerminationStrategy: DBG: Scheduled to terminate 22 processes.    [process.py:108] 
 10:38:00.560001 ---        ParallelTerminationStrategy: DBG: Starting to kill with SIGTERM    [process.py:116] 
 ... 
 10:38:00.669914 run             osmo-trx-lms(pid=1883): Terminating (SIGTERM)    [trial-1926↪gprs:trx-lms+mod-bts0-numtrx2+mod-bts0-chanallocdescend↪osmo-bts-trx↪osmo-trx-lms↪osmo-trx-lms(pid=1883)]    [process.py:236] 
 ... 
 10:38:00.773158 ---        ParallelTerminationStrategy: PID 1883 died...    [process.py:75] 
 10:38:00.773706 run             osmo-trx-lms(pid=1883): DBG: Cleanup    [trial-1926↪gprs:trx-lms+mod-bts0-numtrx2+mod-bts0-chanallocdescend↪osmo-bts-trx↪osmo-trx-lms↪osmo-trx-lms(pid=1883)]    [process.py:265] 
 10:38:00.776101 run             osmo-trx-lms(pid=1883): Terminated {rc=36608}    [trial-1926↪gprs:trx-lms+mod-bts0-numtrx2+mod-bts0-chanallocdescend↪osmo-bts-trx↪osmo-trx-lms↪osmo-trx-lms(pid=1883)]    [process.py:270] 
 </pre> 

 So my guess is not that ssh killing its child process is not working, but rather than when running with multi-trx we may end up in some race condition which somehow blocks osmo-trx-lms and prevents it from exiting. 

Back

Add picture from clipboard (Maximum size: 48.8 MB)