Bug #4375
closedRLCMACTest.cpp fails when changing logging levels
100%
Description
I was debugging other issues in a new test I'm adding, and I dropped the empty .err file and increased debug level of DCSN1 to debug the issue, but then existing tests fail:
Changes to reproduce the issue:
diff --git a/tests/rlcmac/RLCMACTest.cpp b/tests/rlcmac/RLCMACTest.cpp index 61026ed..673bbed 100644 --- a/tests/rlcmac/RLCMACTest.cpp +++ b/tests/rlcmac/RLCMACTest.cpp @@ -238,7 +238,7 @@ int main(int argc, char *argv[]) { void *ctx = talloc_named_const(NULL, 1, "RLCMACTest"); osmo_init_logging2(ctx, &gprs_log_info); - log_parse_category_mask(osmo_stderr_target, "DPCU,3:DLGLOBAL,1:"); + log_parse_category_mask(osmo_stderr_target, "DPCU,3:DLGLOBAL,1:DCSN1,2:"); //printSizeofRLCMAC(); testRlcMacDownlink(ctx); diff --git a/tests/testsuite.at b/tests/testsuite.at index 8a319bd..fe87ff9 100644 --- a/tests/testsuite.at +++ b/tests/testsuite.at @@ -5,8 +5,7 @@ AT_BANNER([Regression tests]) AT_SETUP([rlcmac]) AT_KEYWORDS([rlcmac]) cat $abs_srcdir/rlcmac/RLCMACTest.ok > expout -cat $abs_srcdir/rlcmac/RLCMACTest.err > experr -AT_CHECK([$OSMO_QEMU $abs_top_builddir/tests/rlcmac/RLCMACTest], [0], [expout], [experr]) +AT_CHECK([$OSMO_QEMU $abs_top_builddir/tests/rlcmac/RLCMACTest], [0], [expout], [ignore]) AT_CLEANUP AT_SETUP([multi_slot])
Test output failing:
=========Start ENCODE============= +++++++++Finish ENCODE+++++++++++ vector1 = 40284f00010981c826f446809dcecb2b2b2b2b2b -vector2 = 40284f00010981c826f446809dcecb2b2b2b2b2b -vector1 == vector2 : TRUE +vector2 = 40284f00010982194de88d13b9dab2b2b2b2b2b +vector1 == vector2 : FALSE vector1 = 40243f2f000087b042b2b2b2b2b2b2b2b2b2b2b =========Start DECODE=========== +++++++++Finish DECODE++++++++++
So it can be seen there's a missing 2b (or/and len is 1 byte less). Which means the change in logging makes the program behave differently, which shouldn't happen.
Updated by pespin about 4 years ago
The issue appears during decode time, in
decode_gsm_rlcmac_downlink() of 40284f0000001009810c826f4406809dcecb2b2b2b2b2b (Packet Uplink Assignment (EGPRS)).
Decoding issue may be related to #1640, but in any case it shouldn't change its output based on log level....
Updated by pespin about 4 years ago
I attach gdb output of the decoded structure for both current test (old.txt) as well as for test with the logging level changed (new.txt).
Unfortunately, gdb reports quite a lot of differences in the decoded structure....
How I got the output:
b RLCMACTest.cpp:161 ----- set max-value-size unlimited set print array on set print pretty on set pagination off set logging on set logging file /tmp/orig.txt print data
Updated by pespin about 4 years ago
The logging output (failing situation) of the packet which has its encoding/decoding situation changed by logging enabled is:
=========Start ENCODE============= +++++++++Finish ENCODE (0)+++++++++++ vector1 = 40 28 4f 00 00 00 10 09 81 0c 82 6f 44 06 80 9d ce cb 2b 2b 2b 2b 2b -vector2 = 40 28 4f 00 00 00 10 09 81 0c 82 6f 44 06 80 9d ce cb 2b 2b 2b 2b 2b -vector1 == vector2 : TRUE +vector2 = 40 28 4f 00 00 00 10 09 82 19 04 de 88 0d 01 3b 9d ab 2b 2b 2b 2b 2b +vector1 == vector2 : FALSE vector1 = 4024030f2f0000000087b0042b2b2b2b2b2b2b2b2b2b2b =========Start DECODE=========== +++++++++Finish DECODE (0)++++++++++
<0000> /git/osmo-pcu/src/gsm_rlcmac.cpp:5310 csnStreamEncoder (type=10): | MESSAGE_TYPE = 10 | PAGE_MODE = 0 | Exist_PERSISTENCE_LEVEL = 0 | : ID | u.TLLI = 2 | u.TLLI = 2013265920 | : End ID | u.PUA_EGPRS_Struct = 1 | : u.PUA_EGPRS_Struct | u.PUA_EGPRS_00 = 0 | : u.PUA_EGPRS_00 | Exist_CONTENTION_RESOLUTION_TLLI = 0 | Exist_COMPACT_ReducedMA = 0 | EGPRS_CHANNEL_CODING_COMMAND = 0 | RESEGMENT = 1 | EGPRS_WindowSize = 6 | AccessTechnologyType = 0 | ARAC_RETRANSMISSION_REQUEST = 0 | TLLI_BLOCK_CHANNEL_CODING = 0 | Exist_BEP_PERIOD2 = 0 | : PacketTimingAdvance | Exist_TIMING_ADVANCE_VALUE = 1 | TIMING_ADVANCE_VALUE = 3 | Exist_IndexAndtimeSlot = 0 | : End PacketTimingAdvance | Exist_Packet_Extended_Timing_Advance = 0 | Exist_Frequency_Parameters = 1 | : Frequency_Parameters | TSC = 0 | u.ARFCN = 0 | u.ARFCN = 623 | : End Frequency_Parameters | u.Dynamic_Allocation = 1 | : u.Dynamic_Allocation | Extended_Dynamic_Allocation = 0 | Exist_P0 = 0 | USF_GRANULARITY = 0 | Exist_UPLINK_TFI_ASSIGNMENT = 1 | UPLINK_TFI_ASSIGNMENT = 0 | Exist_RLC_DATA_BLOCKS_GRANTED = 0 | Exist_TBF_Starting_Time = 0 | u.Timeslot_Allocation_Power_Ctrl_Param = 1 | : u.Timeslot_Allocation_Power_Ctrl_Param | ALPHA = 10 | Slot[0].Exist = 0 | Slot[1].Exist = 0 | Slot[2].Exist = 0 | Slot[3].Exist = 0 | Slot[4].Exist = 0 | Slot[5].Exist = 0 | Slot[6].Exist = 1 | Slot[6].USF_TN = 1 | Slot[6].GAMMA_TN = 27 | Slot[7].Exist = 1 | Slot[7].USF_TN = 1 | Slot[7].GAMMA_TN = 27 | : End u.Timeslot_Allocation_Power_Ctrl_Param | : End u.Dynamic_Allocation | : End u.PUA_EGPRS_00 | : End u.PUA_EGPRS_Struct | Padding = 43|43|43|43|43|43|
Updated by pespin about 4 years ago
Interestingly the issue doesn't happen if I enable the category with another level (only error).
Updated by pespin about 4 years ago
I used gcc's -finstrument-functions feature to print the functions being entered and the sequence of functions seems to be the same both for decoding and encoding independently of whether logging is enabled or not (what's making the result change).
Updated by fixeria about 4 years ago
- % Done changed from 0 to 40
I did I quick investigation, and these logging statements seem to cause the different results:
@@ -1153,13 +1153,13 @@ csnStreamDecoder(csnStream_t* ar, const CSN_DESCR* pDescr, bitvec *vector, unsig
guint bits_to_handle = remaining_bits_len%8;
if (bits_to_handle > 0)
{
- LOGPC(DCSN1, LOGL_NOTICE, "%" PRIu64 "|", bitvec_read_field(vector, &readIndex, bits_to_handle));
+ LOGPC(DCSN1, LOGL_NOTICE, "%" PRIu64 "|", bitvec_read_field(vector, &readIndex, bits_to_handle)); // FIXME
remaining_bits_len -= bits_to_handle;
bit_offset += bits_to_handle;
}
else if (bits_to_handle == 0)
{
- LOGPC(DCSN1, LOGL_NOTICE, "%" PRIu64 "|", bitvec_read_field(vector, &readIndex, 8));
+ LOGPC(DCSN1, LOGL_NOTICE, "%" PRIu64 "|", bitvec_read_field(vector, &readIndex, 8)); // FIXME
remaining_bits_len -= 8;
bit_offset += 8;
}
@@ -1247,7 +1247,7 @@ csnStreamDecoder(csnStream_t* ar, const CSN_DESCR* pDescr, bitvec *vector, unsig
bit_offset += no_of_bits;
}
- LOGPC(DCSN1, LOGL_NOTICE, "%s = %u | ", pDescr->sz , (unsigned)bitvec_read_field(vector, &readIndex, 1));
+ LOGPC(DCSN1, LOGL_NOTICE, "%s = %u | ", pDescr->sz , (unsigned)bitvec_read_field(vector, &readIndex, 1)); // FIXME
/* existNextElement() returned FALSE, 1 bit consumed */
bit_offset++;
remaining_bits_len--;
@@ -1300,7 +1300,7 @@ csnStreamDecoder(csnStream_t* ar, const CSN_DESCR* pDescr, bitvec *vector, unsig
}
}
- LOGPC(DCSN1, LOGL_NOTICE, "%s = %u | ", pDescr->sz , (unsigned)bitvec_read_field(vector, &readIndex, 1));
+ LOGPC(DCSN1, LOGL_NOTICE, "%s = %u | ", pDescr->sz , (unsigned)bitvec_read_field(vector, &readIndex, 1)); // FIXME
/* existNextElement() returned FALSE, 1 bit consumed */
bit_offset++;
By definition bitvec_read_field() moves the internal pointer (current bit position) of a given vector, and increments readIndex by a given amount of bits. The problem is that LOGPC would not evaluate its format string if the logging message is not going to be printed, e.g. if a given logging level is lower than the current one, or in case if logging is not enabled at all.
The first two conditional calls to bitvec_read_field() are related to CSN_PADDING_BITS, so that's not critical because padding is always in the end of messages. The later two are related to CSN_RECURSIVE_ARRAY and CSN_RECURSIVE_TARRAY respectively.
Updated by pespin about 4 years ago
Hi fixeria , thanks for spending some time into this too!
Something you forgot to mention (or perhaps you didn't check it at all) is whether the "expected" or "correct" result happens when calling bitvec_read_field() (hence advancing the readIndex counter), or if the correct behavior is to not advance the index there.
At first glance, it looks like the expected behavior is to always increment the readIndex through bitvec_read_field(), since for instance I see same amount modified on other counters, but we really need to check if that's already being done previously or afterwards somewhere else:
LOGPC(DCSN1, LOGL_NOTICE, "%" PRIu64 "|", bitvec_read_field(vector, &readIndex, 8)); remaining_bits_len -= 8; bit_offset += 8;
Updated by fixeria about 4 years ago
- Status changed from New to Feedback
- % Done changed from 40 to 100
[...] or if the correct behavior is to not advance the index there.
That's the correct behavior. Please see https://gerrit.osmocom.org/c/osmo-pcu/+/17091.
Updated by fixeria about 4 years ago
- Status changed from Feedback to Resolved
- Assignee changed from pespin to fixeria
Merged, "works for me" (C). Closing.