https://osmocom.org/
https://osmocom.org/favicon.ico?1664741409
2018-04-19T09:05:27Z
Open Source Mobile Communications
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=8950
2018-04-19T09:05:27Z
pespin
<ul></ul><p>I'd say lchan pointer (0x7ffff7fb3290) is not correct in rsl_rx_conn_fail, and it is obtained in caller abis_rsl_rx_dchan(). It is also the only thing which I think can fail in the line causing the segfault.</p>
<p>Interestingly, though, lchan is accessed once in that code path without any issue when calling gsm_lchan_name, where pointer is derreferenced:<br /><pre>
msg->lchan = lchan_lookup(sign_link->trx, rslh->chan_nr,
"Abis RSL rx DCHAN: ");
if (!msg->lchan)
return -1;
ts_name = gsm_lchan_name(msg->lchan);
</pre><br /><pre>
static inline char *gsm_lchan_name(const struct gsm_lchan *lchan)
{
return lchan->name;
}
</pre></p>
<p>It is also used in rsl_rx_conn_fail previous to the crash without any issue:<br /><pre>
LOGP(DRSL, LOGL_NOTICE, "%s CONNECTION FAIL in state %s ",
gsm_lchan_name(msg->lchan),
gsm_lchans_name(msg->lchan->state));
</pre></p>
<p>Outputing in the log:<br /><pre>
<0004> abis_rsl.c:1367 (bts=0,trx=0,ts=0,ss=0) CONNECTION FAIL in state ACTIVE CAUSE=0x01(Radio Link Failure)
</pre></p>
<p>So it seems what is wrong is not lchan pointer, but <strong>lchan->conn</strong>, which is used first in that code path.</p>
<p>It would be interesting to know the value of lchan->conn when getting the segfault, to see if it's NULL or it contains garbage. If you run again into the crash with gdb, can you print the value of the pointer? In gdb cmd line: "print lchan->conn". You can also print the full lchan info: "print *lchan".</p>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=8962
2018-04-19T13:27:50Z
neels
nhofmeyr@sysmocom.de
<ul></ul><p>Just to mention it -- you have tried completely uninstalling all osmo libraries, cleaning all source trees and rebuilding everything from scratch?<br />If you e.g. install a newer version of a library (with an ABI change) and a dependent program is not rebuilt subsequently, that may cause stack corruption issues.<br />I hope that's not it and we can uncover a bug here.</p>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=8963
2018-04-19T13:31:06Z
neels
nhofmeyr@sysmocom.de
<ul></ul><p>BTW, unrelated: note <a class="external" href="http://git.osmocom.org/libosmo-crypt-a53/tree/README.md">http://git.osmocom.org/libosmo-crypt-a53/tree/README.md</a><br />i.e. you shouldn't need libosmo-crypt-a53</p>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=8970
2018-04-20T03:01:40Z
ron.menez@entropysolution.com
<ul><li><strong>File</strong> <a href="/attachments/3083">gdb_segfault_04202018.log</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/3083/gdb_segfault_04202018.log">gdb_segfault_04202018.log</a> added</li></ul><p>pespin wrote:</p>
<blockquote>
<p>I'd say lchan pointer (0x7ffff7fb3290) is not correct in rsl_rx_conn_fail, and it is obtained in caller abis_rsl_rx_dchan(). It is also the only thing which I think can fail in the line causing the segfault.</p>
<p>Interestingly, though, lchan is accessed once in that code path without any issue when calling gsm_lchan_name, where pointer is derreferenced:<br />[...]<br />[...]</p>
<p>It is also used in rsl_rx_conn_fail previous to the crash without any issue:<br />[...]</p>
<p>Outputing in the log:<br />[...]</p>
<p>So it seems what is wrong is not lchan pointer, but <strong>lchan->conn</strong>, which is used first in that code path.</p>
<p>It would be interesting to know the value of lchan->conn when getting the segfault, to see if it's NULL or it contains garbage. If you run again into the crash with gdb, can you print the value of the pointer? In gdb cmd line: "print lchan->conn". You can also print the full lchan info: "print *lchan".</p>
</blockquote>
<p>Run the following commands requested:</p>
<pre>
(gdb) print lchan->conn
$1 = (struct gsm_subscriber_connection *) 0x0
</pre>
<pre>
(gdb) print *lchan
$2 = {ts = 0x7ffff7fb2168, nr = 1 '\001', type = GSM_LCHAN_SDCCH, rsl_cmode = RSL_CMOD_SPD_SIGN,
tch_mode = GSM48_CMODE_SIGN, csd_mode = LCHAN_CSD_M_NT, state = LCHAN_S_ACTIVE, broken_reason = 0x45a5a5 "",
bs_power = 0 '\000', ms_power = 14 '\016', encr = {alg_id = 1 '\001', key_len = 0 '\000',
key = '\000' <repeats 15 times>}, mr_ms_lv = "\000\000\000\000\000\000",
mr_bts_lv = "\000\000\000\000\000\000", sapis = "\000\000\000\000\000\000\000", abis_ip = {bound_ip = 0,
connect_ip = 0, bound_port = 0, connect_port = 0, conn_id = 0, rtp_payload = 0 '\000',
rtp_payload2 = 0 '\000', speech_mode = 0 '\000', rtp_socket = 0x0, ass_compl = {rr_cause = 0 '\000',
valid = false}}, rqd_ta = 0 '\000', name = 0x87ab00 "(bts=0,trx=0,ts=0,ss=1)", T3101 = {node = {
rb_parent_color = 7067681, rb_right = 0x0, rb_left = 0x0}, list = {next = 0x7ffff7fb3b38,
prev = 0x7ffff7fb3b38}, timeout = {tv_sec = 1524195320, tv_usec = 830425}, active = 1,
cb = 0x4205d0 <t3101_expired>, data = 0x7ffff7fb3a90}, T3109 = {node = {rb_parent_color = 0, rb_right = 0x0,
rb_left = 0x0}, list = {next = 0x0, prev = 0x0}, timeout = {tv_sec = 0, tv_usec = 0}, active = 0, cb = 0x0,
data = 0x0}, T3111 = {node = {rb_parent_color = 0, rb_right = 0x0, rb_left = 0x0}, list = {next = 0x0,
prev = 0x0}, timeout = {tv_sec = 0, tv_usec = 0}, active = 0, cb = 0x0, data = 0x0}, error_timer = {node = {
rb_parent_color = 0, rb_right = 0x0, rb_left = 0x0}, list = {next = 0x0, prev = 0x0}, timeout = {tv_sec = 0,
tv_usec = 0}, active = 0, cb = 0x0, data = 0x0}, act_timer = {node = {rb_parent_color = 7068001,
rb_right = 0x0, rb_left = 0x0}, list = {next = 0x7ffff7fb3c78, prev = 0x7ffff7fb3c78}, timeout = {
tv_sec = 1524192324, tv_usec = 829670}, active = 0, cb = 0x41e010 <lchan_act_tmr_cb>,
data = 0x7ffff7fb3a90}, rel_work = {node = {rb_parent_color = 0, rb_right = 0x0, rb_left = 0x0}, list = {
next = 0x0, prev = 0x0}, timeout = {tv_sec = 0, tv_usec = 0}, active = 0, cb = 0x0, data = 0x0},
error_cause = 0 '\000', neigh_meas = {{arfcn = 0, bsic = 0 '\000',
rxlev = "\000\000\000\000\000\000\000\000\000", rxlev_cnt = 0, last_seen_nr = 0 '\000'}, {arfcn = 0,
bsic = 0 '\000', rxlev = "\000\000\000\000\000\000\000\000\000", rxlev_cnt = 0, last_seen_nr = 0 '\000'}, {
arfcn = 0, bsic = 0 '\000', rxlev = "\000\000\000\000\000\000\000\000\000", rxlev_cnt = 0,
last_seen_nr = 0 '\000'}, {arfcn = 0, bsic = 0 '\000', rxlev = "\000\000\000\000\000\000\000\000\000",
rxlev_cnt = 0, last_seen_nr = 0 '\000'}, {arfcn = 0, bsic = 0 '\000',
rxlev = "\000\000\000\000\000\000\000\000\000", rxlev_cnt = 0, last_seen_nr = 0 '\000'}, {arfcn = 0,
bsic = 0 '\000', rxlev = "\000\000\000\000\000\000\000\000\000", rxlev_cnt = 0, last_seen_nr = 0 '\000'}, {
arfcn = 0, bsic = 0 '\000', rxlev = "\000\000\000\000\000\000\000\000\000", rxlev_cnt = 0,
last_seen_nr = 0 '\000'}, {arfcn = 0, bsic = 0 '\000', rxlev = "\000\000\000\000\000\000\000\000\000",
rxlev_cnt = 0, last_seen_nr = 0 '\000'}, {arfcn = 0, bsic = 0 '\000',
rxlev = "\000\000\000\000\000\000\000\000\000", rxlev_cnt = 0, last_seen_nr = 0 '\000'}, {arfcn = 0,
bsic = 0 '\000', rxlev = "\000\000\000\000\000\000\000\000\000", rxlev_cnt = 0, last_seen_nr = 0 '\000'}},
meas_rep = {{lchan = 0x0, nr = 0 '\000', flags = 0, ul = {full = {rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {
rx_lev = 0 '\000', rx_qual = 0 '\000'}}, dl = {full = {rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {
rx_lev = 0 '\000', rx_qual = 0 '\000'}}, bs_power = 0 '\000', ms_timing_offset = 0, ms_l1 = {
pwr = 0 '\000', ta = 0 '\000'}, num_cell = 0, cell = {{rxlev = 0 '\000', bsic = 0 '\000',
neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000',
arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {
rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000',
bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000',
neigh_idx = 0 '\000', arfcn = 0, flags = 0}}}, {lchan = 0x0, nr = 0 '\000', flags = 0, ul = {full = {
rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000', rx_qual = 0 '\000'}}, dl = {full = {
rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000', rx_qual = 0 '\000'}},
bs_power = 0 '\000', ms_timing_offset = 0, ms_l1 = {pwr = 0 '\000', ta = 0 '\000'}, num_cell = 0, cell = {{
rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000',
bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000',
neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000',
arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {
rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}}}, {lchan = 0x0,
nr = 0 '\000', flags = 0, ul = {full = {rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000',
rx_qual = 0 '\000'}}, dl = {full = {rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000',
rx_qual = 0 '\000'}}, bs_power = 0 '\000', ms_timing_offset = 0, ms_l1 = {pwr = 0 '\000',
ta = 0 '\000'}, num_cell = 0, cell = {{rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0,
flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {
rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000',
bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000',
neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000',
arfcn = 0, flags = 0}}}, {lchan = 0x0, nr = 0 '\000', flags = 0, ul = {full = {rx_lev = 0 '\000',
rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000', rx_qual = 0 '\000'}}, dl = {full = {rx_lev = 0 '\000',
rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000', rx_qual = 0 '\000'}}, bs_power = 0 '\000',
ms_timing_offset = 0, ms_l1 = {pwr = 0 '\000', ta = 0 '\000'}, num_cell = 0, cell = {{rxlev = 0 '\000',
bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000',
neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000',
arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {
rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000',
---Type <return> to continue, or q <return> to quit---
bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}}}, {lchan = 0x0, nr = 0 '\000', flags = 0,
ul = {full = {rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000', rx_qual = 0 '\000'}}, dl = {
full = {rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000', rx_qual = 0 '\000'}},
bs_power = 0 '\000', ms_timing_offset = 0, ms_l1 = {pwr = 0 '\000', ta = 0 '\000'}, num_cell = 0, cell = {{
rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000',
bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000',
neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000',
arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {
rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}}}, {lchan = 0x0,
nr = 0 '\000', flags = 0, ul = {full = {rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000',
rx_qual = 0 '\000'}}, dl = {full = {rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000',
rx_qual = 0 '\000'}}, bs_power = 0 '\000', ms_timing_offset = 0, ms_l1 = {pwr = 0 '\000',
ta = 0 '\000'}, num_cell = 0, cell = {{rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0,
flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {
rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000',
bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000',
neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000',
arfcn = 0, flags = 0}}}, {lchan = 0x0, nr = 0 '\000', flags = 0, ul = {full = {rx_lev = 0 '\000',
rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000', rx_qual = 0 '\000'}}, dl = {full = {rx_lev = 0 '\000',
rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000', rx_qual = 0 '\000'}}, bs_power = 0 '\000',
ms_timing_offset = 0, ms_l1 = {pwr = 0 '\000', ta = 0 '\000'}, num_cell = 0, cell = {{rxlev = 0 '\000',
bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000',
neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000',
arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {
rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000',
bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}}}, {lchan = 0x0, nr = 0 '\000', flags = 0,
ul = {full = {rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000', rx_qual = 0 '\000'}}, dl = {
full = {rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000', rx_qual = 0 '\000'}},
bs_power = 0 '\000', ms_timing_offset = 0, ms_l1 = {pwr = 0 '\000', ta = 0 '\000'}, num_cell = 0, cell = {{
rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000',
bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000',
neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000',
arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {
rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}}}, {lchan = 0x0,
nr = 0 '\000', flags = 0, ul = {full = {rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000',
rx_qual = 0 '\000'}}, dl = {full = {rx_lev = 0 '\000', rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000',
rx_qual = 0 '\000'}}, bs_power = 0 '\000', ms_timing_offset = 0, ms_l1 = {pwr = 0 '\000',
ta = 0 '\000'}, num_cell = 0, cell = {{rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0,
flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {
rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000',
bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000',
neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000',
arfcn = 0, flags = 0}}}, {lchan = 0x0, nr = 0 '\000', flags = 0, ul = {full = {rx_lev = 0 '\000',
rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000', rx_qual = 0 '\000'}}, dl = {full = {rx_lev = 0 '\000',
rx_qual = 0 '\000'}, sub = {rx_lev = 0 '\000', rx_qual = 0 '\000'}}, bs_power = 0 '\000',
ms_timing_offset = 0, ms_l1 = {pwr = 0 '\000', ta = 0 '\000'}, num_cell = 0, cell = {{rxlev = 0 '\000',
bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000',
neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000',
arfcn = 0, flags = 0}, {rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {
rxlev = 0 '\000', bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}, {rxlev = 0 '\000',
bsic = 0 '\000', neigh_idx = 0 '\000', arfcn = 0, flags = 0}}}}, meas_rep_idx = 0, meas_rep_count = 0,
meas_rep_last_seen_nr = 255 '\377', rqd_ref = 0x0, conn = 0x0, dyn = {act_type = 0 '\000', ho_ref = 0 '\000',
rqd_ref = 0x0, rqd_ta = 0 '\000'}}
</pre>
<p>Also attached is the complete gdb logs.</p>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=8971
2018-04-20T03:07:45Z
ron.menez@entropysolution.com
<ul></ul><p>neels wrote:</p>
<blockquote>
<p>Just to mention it -- you have tried completely uninstalling all osmo libraries, cleaning all source trees and rebuilding everything from scratch?<br />If you e.g. install a newer version of a library (with an ABI change) and a dependent program is not rebuilt subsequently, that may cause stack corruption issues.<br />I hope that's not it and we can uncover a bug here.</p>
</blockquote>
<p>Hi Neels,</p>
<p>We installed all the osmo elements from scratch to a newly and updated installation of Ubuntu 16.04 last April 14, 2018 using the latest git version that time.</p>
<p>We will try to reinstall all of the osmo elements again today using the latest versions from git and removing the "libosmo-crypt-a53" from the installation.</p>
<p>We'll let you know if we will experience the segfault again.</p>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=8972
2018-04-20T06:32:10Z
ron.menez@entropysolution.com
<ul></ul><p><a class="email" href="mailto:ron.menez@entropysolution.com">ron.menez@entropysolution.com</a> wrote:</p>
<blockquote>
<p>neels wrote:</p>
<blockquote>
<p>Just to mention it -- you have tried completely uninstalling all osmo libraries, cleaning all source trees and rebuilding everything from scratch?<br />If you e.g. install a newer version of a library (with an ABI change) and a dependent program is not rebuilt subsequently, that may cause stack corruption issues.<br />I hope that's not it and we can uncover a bug here.</p>
</blockquote>
<p>Hi Neels,</p>
<p>We installed all the osmo elements from scratch to a newly and updated installation of Ubuntu 16.04 last April 14, 2018 using the latest git version that time.</p>
<p>We will try to reinstall all of the osmo elements again today using the latest versions from git and removing the "libosmo-crypt-a53" from the installation.</p>
<p>We'll let you know if we will experience the segfault again.</p>
</blockquote>
<p>Hi Neels,</p>
<p>We installed the latest version today and still we experience segfault.</p>
<p>It seems that every time a "Radio Link Failure" occurs, segfault will be triggered. Kindly see logs below for your reference:</p>
<pre>
<0004> abis_rsl.c:1367 (bts=0,trx=0,ts=0,ss=0) CONNECTION FAIL in state ACTIVE CAUSE=0x01(Radio Link Failure)
Segmentation fault (core dumped)
</pre>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=9000
2018-04-23T22:04:07Z
neels
nhofmeyr@sysmocom.de
<ul></ul><p>@Ron, thanks for the clarification.</p>
<p>The next thing to do to fix this issue is to try reproducing the failure with ttcn3 tests:<br />Trigger the CONNECTION FAIL with cause Radio Link Failure as seen in the logs and ensure graceful handling.</p>
<p>So far we haven't assigned or prioritized this issue.<br />A segfault is inherently important, but currently not sure when we'll get a chance to investigate in detail.</p>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=9911
2018-06-14T07:40:56Z
laforge
<ul><li><strong>Assignee</strong> set to <i>laforge</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>Urgent</i></li></ul>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=9912
2018-06-14T07:47:16Z
laforge
<ul></ul><p>Looking at this in more detail.</p>
<ul>
<li>rxl_rx_conn_fail() blindly dereferences lchan->conn</li>
<li>lchan->conn is set in gsm0408_rcvmsg() when the first DATA IND (i.e. Layer 3 data) arrives</li>
</ul>
<p>However, what if a CONN FAIL is received even before we receive the first DATA IND is received?</p>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=9913
2018-06-14T07:55:57Z
laforge
<ul><li><strong>% Done</strong> changed from <i>0</i> to <i>20</i></li></ul><p>or even more so: What if the RLL connection (equals LAPDm link) has already been closed by means of RLL REL IND / RLL REL REQ? <code>handle_release()</code> will e.g. set lchan->conn to NULL. As will do <code>lchan_release()</code>.</p>
<p>I think it's fundamentally wrong to assume that a lchan will always have a "conn" associated.</p>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=9914
2018-06-14T07:58:04Z
laforge
<ul><li><b>Checklist item</b> <input type='checkbox' class='checklist-checkbox' disabled> <i>do a thorough audit of all lchan->conn dereferences</i> added</li><li><b>Checklist item</b> <input type='checkbox' class='checklist-checkbox' disabled> <i>fix the actual bug reported</i> added</li><li><b>Checklist item</b> <input type='checkbox' class='checklist-checkbox' disabled> <i>implement TTCN3 tests to trigger the problem</i> added</li></ul>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=9915
2018-06-14T11:46:20Z
laforge
<ul><li><b>Checklist item</b> <input type='checkbox' class='checklist-checkbox' checked disabled> <i>implement TTCN3 tests to trigger the problem</i> set to Done</li><li><strong>% Done</strong> changed from <i>20</i> to <i>40</i></li></ul><p>We have two new test cases in <a class="external" href="https://gerrit.osmocom.org/9630">https://gerrit.osmocom.org/9630</a> which reporduce the problem on current OsmoBSC master</p>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=9916
2018-06-14T12:34:02Z
laforge
<ul><li><b>Checklist item</b> <input type='checkbox' class='checklist-checkbox' checked disabled> <i>fix the actual bug reported</i> set to Done</li><li><strong>% Done</strong> changed from <i>40</i> to <i>80</i></li></ul><p>fix has been committed to osmo-bsc master in <a class="external" href="http://git.osmocom.org/osmo-bsc/commit/?id=cc2fb61a1639b5237d2271f2789cfbe951471d78">http://git.osmocom.org/osmo-bsc/commit/?id=cc2fb61a1639b5237d2271f2789cfbe951471d78</a></p>
<p>feel free to either upgrade to latest osmo-bsc master, or to back-port the fix using</p>
<pre><code>git cherry-pick cc2fb61a1639b5237d2271f2789cfbe951471d78</code></pre>
<p>to your currently used version of osmo-bsc.</p>
<p>The bug was originally introduced [by me] in git commit 3561bd48976dbee8dbd4659dad15be85a3e79ace in January 2018</p>
<p>as we now have automatic tests for this bug, it should never be re-introduced unnoticed.</p>
<p>I'll also continue to audit the code for other similar bugs.</p>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=9917
2018-06-14T12:34:32Z
laforge
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=9918
2018-06-14T13:56:47Z
laforge
<ul><li><b>Checklist item</b> <input type='checkbox' class='checklist-checkbox' checked disabled> <i>do a thorough audit of all lchan->conn dereferences</i> set to Done</li><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>A manual audit of the code concludes that there appear to be no other bugs that dereference lchan->conn or even lchan->conn->fi without proper checks. It might be useful to re-audit the upcoming lchan_fsm changes by <a class="user active" href="https://osmocom.org/users/91">neels</a></p>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=10476
2018-07-26T06:41:13Z
ron.menez@entropysolution.com
<ul></ul><p>Hi Neels / Support,</p>
<p>As of now the segfault error is resolved. But we are encountering another issue in SDCCH channel not being release immediately if "CONNECTION FAIL in state ACTIVE CAUSE=0x01(Radio Link Failure)" error is experience.</p>
<p>May we know how long does the OSMO-BSC holds the SDCCH channel if this error occurs? or Does it really release the SDCCH channel or not?</p>
OsmoBSC - Bug #3182: OSMO-BSC: Intermittent Segmentation fault (core dumped)
https://osmocom.org/issues/3182?journal_id=10478
2018-07-26T12:35:44Z
neels
nhofmeyr@sysmocom.de
<ul></ul><p><a class="email" href="mailto:ron.menez@entropysolution.com">ron.menez@entropysolution.com</a> wrote:</p>
<blockquote>
<p>Hi Neels / Support,</p>
</blockquote>
<p>Hi there -- osmocom.org is an open community, if you want to address dedicated support, you need to contact a commercial support vendor. Also, rather direct questions and issues to the general community instead of individual members. Thanks!</p>
<blockquote>
<p>As of now the segfault error is resolved. But we are encountering another issue in SDCCH channel not being release immediately if "CONNECTION FAIL in state ACTIVE CAUSE=0x01(Radio Link Failure)" error is experience.</p>
</blockquote>
<p>Please try to avoid mixing separate issues in the same ticket. Feel free to create a new ticket for any problem you encounter.</p>
<blockquote>
<p>May we know how long does the OSMO-BSC holds the SDCCH channel if this error occurs? or Does it really release the SDCCH channel or not?</p>
</blockquote>
<p>While I'm here, even though it does not belong here, let me mention that osmo-bsc is currently being refactored (almost done), to use well-defined FSMs and catch a few holes of missing cleanup/release. You may try to run osmo-bsc from branch neels/inter_bsc_ho to see whether the failure to release still occurs on that branch, most likely it is fixed there. The proper course of action would be to find a ticket that describes your error or create a new one, then report there whether the branch fixes it or not. Thanks!</p>