https://osmocom.org/https://osmocom.org/favicon.ico?16647414092017-10-09T14:05:26ZOpen Source Mobile CommunicationsOsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=56312017-10-09T14:05:26Zneelsnhofmeyr@sysmocom.de
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Feedback</i></li><li><strong>Assignee</strong> changed from <i>neels</i> to <i>lynxis</i></li></ul><p>The decisive output is:</p>
<pre>
[ 105s] Regression tests.
[ 105s]
[ 105s] 1: legacy_mgcp ok
[ 105s] 2: legacy_mgcp-trans skipped (testsuite.at:12)
[ 105s] 3: mgcp_client ok
[ 105s] 4: mgcp FAILED (testsuite.at:27)
</pre>
<p>But I cannot reproduce the mgcp test failure.</p>
<p>The OBS builds (still) don't output the test logs and I am not sure how we can obtain them.<br />Something like the cat-testlogs.sh our jenkins builds are doing would be very helpful.<br /><a class="external" href="https://git.osmocom.org/osmo-ci/tree/scripts/cat-testlogs.sh">https://git.osmocom.org/osmo-ci/tree/scripts/cat-testlogs.sh</a></p>
<p>I would like to redirect to lynxis to probe whether we can get the test logs, and to dexter for an mgcp_test.c fix.</p> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=57462017-10-11T05:51:45Zlynxis
<ul><li><strong>File</strong> <a href="/attachments/2795">log.txt</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2795/log.txt">log.txt</a> added</li><li><strong>Assignee</strong> changed from <i>lynxis</i> to <i>dexter</i></li></ul><p><a class="user active" href="https://osmocom.org/users/91">neels</a>: I've extended the log similiar to libosmocore [1]. I've manual deployed this patch to OBS. [1] or attached log.txt now shows the exact problem.</p>
<p><a class="user active" href="https://osmocom.org/users/15">dexter</a>: can you take a look on the test output?</p>
<p>[1] <a class="external" href="https://gerrit.osmocom.org/#/c/4212/">https://gerrit.osmocom.org/#/c/4212/</a><br />[2] <a class="external" href="https://build.opensuse.org/build/network:osmocom:nitb-split:nightly/Debian_9.0/x86_64/osmo-mgw/_log">https://build.opensuse.org/build/network:osmocom:nitb-split:nightly/Debian_9.0/x86_64/osmo-mgw/_log</a></p> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=57472017-10-11T05:51:56Zlynxis
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>In Progress</i></li></ul> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=58002017-10-13T14:22:03Zdexter
<ul><li><strong>Assignee</strong> changed from <i>dexter</i> to <i>lynxis</i></li></ul><p>I am sorry, without further information such as log outputs I can not help much.</p> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=58012017-10-13T14:28:14Zneelsnhofmeyr@sysmocom.de
<ul><li><strong>Assignee</strong> changed from <i>lynxis</i> to <i>neels</i></li></ul><p>dexter, there is a link right there with log outputs, you should have been able to investigate.</p> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=58022017-10-13T14:30:10Zneelsnhofmeyr@sysmocom.de
<ul></ul><p>anyway, here is one again <a class="external" href="https://build.opensuse.org/package/live_build_log/network:osmocom:nitb-split:nightly/osmo-mgw/xUbuntu_16.10/x86_64">https://build.opensuse.org/package/live_build_log/network:osmocom:nitb-split:nightly/osmo-mgw/xUbuntu_16.10/x86_64</a><br />which was also posted on the chat. So it seems like you have other reasons to not look into this.</p> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=58032017-10-13T14:36:07Zneelsnhofmeyr@sysmocom.de
<ul><li><strong>Assignee</strong> changed from <i>neels</i> to <i>dexter</i></li></ul><p>thx</p> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=58042017-10-13T14:37:19Zdexter
<ul></ul> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=58092017-10-13T16:43:21Zdexter
<ul><li><strong>Assignee</strong> changed from <i>dexter</i> to <i>neels</i></li></ul><p>It seems that it is a problem with the override of sendto(). I also wonder why<br />the legacy test does not fail since it is very similar to the mgcp test that<br />fails. If the following solution works out we might want to apply it there<br />as well.</p>
<p>In mgcp_test.c one can find an override of sendto:</p>
<pre>
#define MGCP_DUMMY_LOAD 0x23
static int dummy_packets = 0;
/* override and forward */
ssize_t sendto(int sockfd, const void *buf, size_t len, int flags,
const struct sockaddr *dest_addr, socklen_t addrlen)
{
typedef ssize_t (*sendto_t)(int, const void *, size_t, int,
const struct sockaddr *, socklen_t);
static sendto_t real_sendto = NULL;
uint32_t dest_host = htonl(((struct sockaddr_in *)dest_addr)->sin_addr.s_addr);
int dest_port = htons(((struct sockaddr_in *)dest_addr)->sin_port);
if (!real_sendto)
real_sendto = dlsym(RTLD_NEXT, "sendto");
if (len == 1 && ((const char *)buf)[0] == MGCP_DUMMY_LOAD ) {
fprintf(stderr, "Dummy packet to 0x%08x:%d, msg length %zu\n%s\n\n",
dest_host, dest_port,
len, osmo_hexdump(buf, len));
dummy_packets += 1;
}
return real_sendto(sockfd, buf, len, flags, dest_addr, addrlen);
}
</pre>
<p>I have problems to understand this. There is a function pointer real_sendto<br />that is set to NULL, only to detect a few lines later that it is really NULL<br />and populate it then with the symbol of the sendto (the real sendto). However,<br />this happens only once and real_sendto is static. Its either a lengthy process<br />or it must not happen twice - however.</p>
<p>The interesting part is that the override sendo is counting the transmitted<br />dummy packets.</p>
<p>After the counting the packets are forwared using the real_sendto. This is<br />why we see the packets in wireshark. Probably we want to remove this so that<br />no actual packets leave. At least when I comment the real_sendto out the<br />test still passes. So that would be a way to get rid of the packets that<br />OBS probably won't like.</p>
<p>Lets have a look at the log output:</p>
<pre>
[ 80s] <0010> mgcp_network.c:136 endpoint:5 Failed to send dummy RTP packet.
[ 80s] [0;m<0010> mgcp_protocol.c:747 CRCX: endpoint:5 connection successfully created
[ 80s] [0;m<0010> mgcp_protocol.c:522 CRCX: creating new connection ...
[ 80s] [0;m<0010> mgcp_sdp.c:300 Got media info via SDP: port 5904, payload 97 (GSM-EFR), duration 40, addr 123.12.12.123
[ 80s] [0;mPolicy CB got state 1 on endpoint 1
[ 80s] Dummy packet to 0x7b0c0c7b:5904, msg length 1
[ 80s] 23
[ 80s]
[ 80s] <0010> mgcp_network.c:136 endpoint:1 Failed to send dummy RTP packet.
[ 80s] [0;m<0010> mgcp_protocol.c:747 CRCX: endpoint:1 connection successfully created
[ 80s] [0;m<0010> mgcp_osmux.c:682 All Osmux circuits are in use!
[ 80s] [0;m--- expout 2017-10-13 13:15:05.388398199 +0000
[ 80s] +++ /usr/src/packages/BUILD/tests/testsuite.dir/at-groups/4/stdout 2017-10-13 13:15:05.436398197 +0000
[ 80s] @@ -16,12 +16,12 @@
[ 80s] Testing MDCX1
[ 80s] Testing MDCX2
[ 80s] Testing CRCX
[ 80s] -Dummy packets: 2
[ 80s] +Dummy packets: 1
</pre>
<p>We can see that the sendto() override is executed. Then we spot a message<br />from mgcp_network.c. This could mean that the dummy sendto returned with<br />an error code. Probably because the real_sendto returned an eror.</p>
<p>The dummy packet transmission terminates on this function:</p>
<pre>
int mgcp_send_dummy(struct mgcp_endpoint *endp, struct mgcp_conn_rtp *conn)
{
static char buf[] = { MGCP_DUMMY_LOAD };
int rc;
int was_rtcp = 0;
OSMO_ASSERT(endp);
OSMO_ASSERT(conn);
LOGP(DLMGCP, LOGL_DEBUG,
"endpoint:%x sending dummy packet...\n", ENDPOINT_NUMBER(endp));
LOGP(DLMGCP, LOGL_DEBUG, "endpoint:%x conn:%s\n",
ENDPOINT_NUMBER(endp), mgcp_conn_dump(conn->conn));
rc = mgcp_udp_send(conn->end.rtp.fd, &conn->end.addr,
conn->end.rtp_port, buf, 1);
if (rc == -1)
goto failed;
if (endp->tcfg->omit_rtcp)
return rc;
was_rtcp = 1;
rc = mgcp_udp_send(conn->end.rtcp.fd, &conn->end.addr,
conn->end.rtcp_port, buf, 1);
if (rc >= 0)
return rc;
failed:
LOGP(DLMGCP, LOGL_ERROR,
"endpoint:%x Failed to send dummy %s packet.\n",
ENDPOINT_NUMBER(endp), was_rtcp ? "RTCP" : "RTP");
return -1;
}
</pre>
<p>Here we can see that the mgcp_udp_send() should normally execute twice. This<br />matches the observed behaviour. We see the output from the sendto() override,<br />wich executes ok until the point where it outputs. We also count the<br />dummy_packets one up.</p>
<p>Probably the real_sendto at the end of the sendto() override fails. This<br />eventually leads into a bad return code of mgcp_udp_send() and we will send<br />to failed: This is why we see the error message immediately after. By this<br />we omit the second mgcp_udp_send(), which would normally give us the second<br />dummy dummy packet.</p>
<p>Since the test still passes when the real_sendto is removed we possibly may<br />try this. A patch is available for review: <a class="external" href="https://gerrit.osmocom.org/4260">https://gerrit.osmocom.org/4260</a></p> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=58102017-10-13T18:39:00Zneelsnhofmeyr@sysmocom.de
<ul><li><strong>Assignee</strong> changed from <i>neels</i> to <i>dexter</i></li></ul><p>dexter wrote:</p>
<blockquote>
<p>I have problems to understand this. There is a function pointer real_sendto<br />that is set to NULL, only to detect a few lines later that it is really NULL<br />and populate it then with the symbol of the sendto (the real sendto). However,<br />this happens only once and real_sendto is static. Its either a lengthy process<br />or it must not happen twice - however.</p>
</blockquote>
<p>The code doesn't want to load the symbol using the dynamic linker more than once.<br />What I, too, fail to understand is why the test goes to all this trouble when<br />removing this doesn't seem to affect the test in the slightest.</p>
<blockquote>
<p>The interesting part is that the override sendo is counting the transmitted<br />dummy packets. [...] we see the error message [...] would normally give us the second<br />dummy dummy packet.</p>
</blockquote>
<p>Indeed, this should be the actual cause.<br />Let's try to clarify why the test was even sending out packets in the first place.<br />Ask zecke??</p> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=58192017-10-16T07:29:41Zdexter
<ul></ul><p>I will drop the dummy packets completely as demanded in <a class="issue tracker-2 status-5 priority-1 priority-lowest closed" title="Feature: drop the concept of dummy packets from libosmo-mgcp (Closed)" href="https://osmocom.org/issues/2574">#2574</a>. I wrote Holger an Email since we potentially have the same problem in legacy_mgcp as well.</p> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=58222017-10-16T08:49:59Zzecke
<ul></ul><p>neels wrote:</p>
<blockquote>
<p>The code doesn't want to load the symbol using the dynamic linker more than once.<br />What I, too, fail to understand is why the test goes to all this trouble when<br />removing this doesn't seem to affect the test in the slightest.</p>
</blockquote>
<p>We want to observe the side-effect (and we don't want to have a "network interface" indirection in the code).</p> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=59172017-10-23T21:07:42Zdexter
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li><li><strong>Assignee</strong> changed from <i>dexter</i> to <i>neels</i></li></ul><p><a class="user active" href="https://osmocom.org/users/91">neels</a> Is the problem still present?</p> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=59212017-10-24T10:14:37Zlaforge
<ul></ul><p>Philipp,</p>
<p>On Mon, Oct 23, 2017 at 09:07:42PM +0000, dexter [REDMINE] wrote:</p>
<blockquote>
<p><a class="user active" href="https://osmocom.org/users/91">neels</a> Is the problem still present?</p>
</blockquote>
<p>would you please simply look at the OBS status/output yourself rathern<br />than assigning this rather mundane task to a colleague?</p> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=59222017-10-24T10:16:18Zneelsnhofmeyr@sysmocom.de
<ul><li><strong>Assignee</strong> changed from <i>neels</i> to <i>zecke</i></li><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>Low</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>dexter wrote:</p>
<blockquote>
<p><a class="user active" href="https://osmocom.org/users/91">neels</a> Is the problem still present?</p>
</blockquote>
<p>Patch is merged, and osmo-mgw builds successfully, as can be seen at <a class="external" href="https://build.opensuse.org/project/show/network:osmocom:nitb-split:nightly">https://build.opensuse.org/project/show/network:osmocom:nitb-split:nightly</a></p>
<p>Do we have a conclusion yet on whether we should re-add sending of packets in the test?<br />Personally, I would keep them disabled, and refer to Harald's TTCN3 setup to test for network operation related leaks and side effects.<br /><a class="user active" href="https://osmocom.org/users/5">zecke</a>, would that be ok from your viewpoint?</p> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=59232017-10-24T10:16:28Zneelsnhofmeyr@sysmocom.de
<ul><li><strong>% Done</strong> changed from <i>100</i> to <i>90</i></li></ul> OsmoMGW - Bug #2561: osmo-mgw OBS nightly builds failing for 3 consecutive dayshttps://osmocom.org/issues/2561?journal_id=85442018-03-27T13:43:39Zlaforge
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li><li><strong>Assignee</strong> changed from <i>zecke</i> to <i>neels</i></li><li><strong>% Done</strong> changed from <i>90</i> to <i>100</i></li></ul><p>no feedback about re-enabling the sendign in the unit tests, let's close this.</p>