Project

General

Profile

Actions

Bug #3394

closed

CTRL iface bsc<->bsc_nat causes infinite ping-pong ERROR message type loop

Added by pespin almost 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
07/12/2018
Due date:
% Done:

100%

Spec Reference:

Description

Found today in a running-bsc-nat this line constantly showing up in logs, around 1 per second:

<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 

Then took a pcap trace, and saw what seems to be an infinite conversation with a specific BSC<->BSCNAT CTRL messages:
BSC->BSCNAT: "ERROR err Failed to parse control message."
BSCNAT-BSC: "ERROR err Failed to parse command."

The message in there doesn't match the errors printed in the log. It seems they craft their own instead. It is fixed in:
https://gerrit.osmocom.org/#/c/openbsc/+/9973 "nat: ctrl: Use ctrl_cmd_parse2 to obtain detailed error"
https://gerrit.osmocom.org/#/c/openbsc/+/9974 "bsc: ctrl: Use ctrl_cmd_parse2 to obtain detailed error"

However the infinite loop is still there.
The cause: when ctrl_cmd_parse2 cannot parse a CTRL message, it returns a ctrl_cmd structure of type ERROR, which is then sent back to the sender. However, if an ERROR message is received, it also fails to parse it (because it uses "error" instead of a valid ID) and then a new ERROR message is returned and sent back to the sender, creating the loop.

What we should do:
1- Fix ctrl_cmd_parse2 to expect "err" token in ERROR messages as ID.
2- Create a new API ctrl_cmd_parse3 with an extra out bool param which specifies if there was an error parsing the messages. This way callers can differentiate between an ERROR message being received or a parsing ERROR. In the first case, they should drop the ERROR message and do something (like printing log), in the second they should send the ERROR message to the sender.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)