Project

General

Profile

Actions

Bug #5803

closed

ttcn3-gbproxy-test-fr hangs in endless loop waiting for hdlc8 net-device

Added by laforge 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
11/30/2022
Due date:
% Done:

100%

Spec Reference:
Tags:

Description

We've observed this at leat twice now:

Waiting for hdlc8 to become operational

+ true
+ [ ! -f /sys/class/net/hdlc8/operstate ]
+ cat /sys/class/net/hdlc8/operstate
+ OPSTATE=unknown
+ [ unknown = up ]
+ echo Waiting for hdlc8 to become operational
+ sleep 1
Waiting for hdlc8 to become operational

+ true
+ [ ! -f /sys/class/net/hdlc8/operstate ]
+ cat /sys/class/net/hdlc8/operstate
+ OPSTATE=unknown
+ [ unknown = up ]
+ echo Waiting for hdlc8 to become operational
+ sleep 1
Waiting for hdlc8 to become operational

+ true
+ [ ! -f /sys/class/net/hdlc8/operstate ]
+ cat /sys/class/net/hdlc8/operstate
Waiting for hdlc8 to become operational
+ OPSTATE=unknown
+ [ unknown = up ]
+ echo Waiting for hdlc8 to become operational
+ sleep 1

+ true
+ [ ! -f /sys/class/net/hdlc8/operstate ]
+ cat /sys/class/net/hdlc8/operstate
Waiting for hdlc8 to become operational
+ OPSTATE=unknown
+ [ unknown = up ]
+ echo Waiting for hdlc8 to become operational
+ sleep 1

+ true
+ [ ! -f /sys/class/net/hdlc8/operstate ]
+ cat /sys/class/net/hdlc8/operstate
Waiting for hdlc8 to become operational
+ OPSTATE=unknown
+ [ unknown = up ]
+ echo Waiting for hdlc8 to become operational
+ sleep 1

+ true
+ [ ! -f /sys/class/net/hdlc8/operstate ]
+ cat /sys/class/net/hdlc8/operstate
+ OPSTATE=unknown
+ [ unknown = up ]
+ echo Waiting for hdlc8 to become operational
+ sleep 1
Waiting for hdlc8 to become operational

+ true
+ [ ! -f /sys/class/net/hdlc8/operstate ]
+ cat /sys/class/net/hdlc8/operstate
+ OPSTATE=unknown
+ [ unknown = up ]
+ echo Waiting for hdlc8 to become operational
Waiting for hdlc8 to become operational
+ sleep 1

+ true
+ [ ! -f /sys/class/net/hdlc8/operstate ]
+ cat /sys/class/net/hdlc8/operstate
Waiting for hdlc8 to become operational
+ OPSTATE=unknown
+ [ unknown = up ]
+ echo Waiting for hdlc8 to become operational
+ sleep 1
+ true
+ [ ! -f /sys/class/net/hdlc8/operstate ]
+ cat /sys/class/net/hdlc8/operstate
Actions #1

Updated by laforge 2 months ago

  • Tags set to TTCN3
I've logged onto the deb10fr VM and can report:
  • the hdlcX and hdlcnetX devices are not in the root/host netns anymore (good)
  • the test container has hdlc1..8, all with operstate unknown
  • the gbproxy container has hdlcnet1..8, all with operstate unknown
  • dahdi_tool shows all spans as OK

I can't really investigate much as we're lacking basic tools like ip or ifconfig in the containers :/

Actions #2

Updated by laforge 2 months ago

  • % Done changed from 0 to 20

laforge wrote in #note-1:

I can't really investigate much as we're lacking basic tools like ip or ifconfig in the containers :/

using the same hacks like netdev-to-docker.sh I could execute a shell in the docker netns.

for the ttcn3 test side container:

root@gtp0deb10fr:/proc/dahdi# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: hdlc1: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
5: hdlc2: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
7: hdlc3: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
9: hdlc4: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
11: hdlc5: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
13: hdlc6: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
15: hdlc7: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
17: hdlc8: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
33: eth0@if34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:ac:12:19:67 brd ff:ff:ff:ff:ff:ff link-netnsid 0

Doing a ip link set hdlcX up did not change the state. However, the following worked:

root@gtp0deb10fr:/proc/dahdi# ip link set hdlc1 down
root@gtp0deb10fr:/proc/dahdi# ip link set hdlc1 up
root@gtp0deb10fr:/proc/dahdi# ip link
...
3: hdlc1: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UP mode DEFAULT group default qlen 50
    link/frad 
...

So taking the link down and up again works. Looks like some kind of stuck state in the kernel itself.

Actions #3

Updated by laforge 2 months ago

Interestingly, this does not change the state on the other side of the FR link, so it is purely a "local" property of the net-device:

root@gtp0deb10fr:/proc/dahdi# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: hdlcnet1: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
6: hdlcnet2: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
8: hdlcnet3: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
10: hdlcnet4: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
12: hdlcnet5: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
14: hdlcnet6: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
16: hdlcnet7: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
18: hdlcnet8: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50
    link/frad 
31: eth0@if32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:ac:12:19:0a brd ff:ff:ff:ff:ff:ff link-netnsid 0

Actions #4

Updated by laforge 2 months ago

So the "operstate" sysfs attribute can be unknown, notpresent, down, lowerlayerdown, testing, dormant or up. It reflects the netdev->operstate property, which is reperesented by the respective IF_OPER_* constants. drivers/net/wan/hdlc* doesn't ever touch those directly.

What's interesting is the following kernel inline function:

/**     
 *      netif_oper_up - test if device is operational
 *      @dev: network device
 *      
 * Check if carrier is operational      
 */
static inline bool netif_oper_up(const struct net_device *dev)
{
        return (dev->operstate == IF_OPER_UP ||
                dev->operstate == IF_OPER_UNKNOWN /* backward compat */);
}

which seems to suggest that the UNKNOWN state should be treated like the UP state, and both are equal.

Note also that the loopback device is in UNKNOWN state and perfectly working fine:

root@gtp0deb10fr:/proc/dahdi# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

I guess we're simply looking at the wrong property. If the link is UP, we should be able to use it.

The property can be set from userspace via netlink using the IFLA_OPERSTATE attribute, which in turn can be set by state OPERSTATE of iproute2 (which appears undocumented):

laforge@nataraja%pts/22 (12:20) ~/projects/git/iproute2 > /sbin/ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
laforge@nataraja%pts/22 (12:22) ~/projects/git/iproute2 > sudo ip link set dev lo state up
laforge@nataraja%pts/22 (12:23) ~/projects/git/iproute2 > sudo ip link show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

But I really think there's no point in our userspace setting a flag in the kernel just so some shell script proceeds. We're looking at the wrong property here.

Actions #5

Updated by laforge 2 months ago

  • % Done changed from 20 to 50

Currently testing a patch checking for the "UP" flag (0x01) in the "flags" attribute instead. Guess this is the first time I'm doing bit-wise arithmetic in bash, after almost 30 years of working with Linux and shells like bash...

Actions #6

Updated by laforge 2 months ago

  • % Done changed from 50 to 90
Actions #7

Updated by laforge 2 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 90 to 100
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)