Bug #5803
closedttcn3-gbproxy-test-fr hangs in endless loop waiting for hdlc8 net-device
100%
Description
We've observed this at leat twice now:
Waiting for hdlc8 to become operational + true + [ ! -f /sys/class/net/hdlc8/operstate ] + cat /sys/class/net/hdlc8/operstate + OPSTATE=unknown + [ unknown = up ] + echo Waiting for hdlc8 to become operational + sleep 1 Waiting for hdlc8 to become operational + true + [ ! -f /sys/class/net/hdlc8/operstate ] + cat /sys/class/net/hdlc8/operstate + OPSTATE=unknown + [ unknown = up ] + echo Waiting for hdlc8 to become operational + sleep 1 Waiting for hdlc8 to become operational + true + [ ! -f /sys/class/net/hdlc8/operstate ] + cat /sys/class/net/hdlc8/operstate Waiting for hdlc8 to become operational + OPSTATE=unknown + [ unknown = up ] + echo Waiting for hdlc8 to become operational + sleep 1 + true + [ ! -f /sys/class/net/hdlc8/operstate ] + cat /sys/class/net/hdlc8/operstate Waiting for hdlc8 to become operational + OPSTATE=unknown + [ unknown = up ] + echo Waiting for hdlc8 to become operational + sleep 1 + true + [ ! -f /sys/class/net/hdlc8/operstate ] + cat /sys/class/net/hdlc8/operstate Waiting for hdlc8 to become operational + OPSTATE=unknown + [ unknown = up ] + echo Waiting for hdlc8 to become operational + sleep 1 + true + [ ! -f /sys/class/net/hdlc8/operstate ] + cat /sys/class/net/hdlc8/operstate + OPSTATE=unknown + [ unknown = up ] + echo Waiting for hdlc8 to become operational + sleep 1 Waiting for hdlc8 to become operational + true + [ ! -f /sys/class/net/hdlc8/operstate ] + cat /sys/class/net/hdlc8/operstate + OPSTATE=unknown + [ unknown = up ] + echo Waiting for hdlc8 to become operational Waiting for hdlc8 to become operational + sleep 1 + true + [ ! -f /sys/class/net/hdlc8/operstate ] + cat /sys/class/net/hdlc8/operstate Waiting for hdlc8 to become operational + OPSTATE=unknown + [ unknown = up ] + echo Waiting for hdlc8 to become operational + sleep 1 + true + [ ! -f /sys/class/net/hdlc8/operstate ] + cat /sys/class/net/hdlc8/operstate
Updated by laforge over 1 year ago
- Tags set to TTCN3
- the hdlcX and hdlcnetX devices are not in the root/host netns anymore (good)
- the test container has
hdlc1..8
, all withoperstate
unknown
- the gbproxy container has
hdlcnet1..8
, all withoperstate
unknown
- dahdi_tool shows all spans as
OK
I can't really investigate much as we're lacking basic tools like ip
or ifconfig
in the containers :/
Updated by laforge over 1 year ago
- % Done changed from 0 to 20
laforge wrote in #note-1:
I can't really investigate much as we're lacking basic tools like
ip
orifconfig
in the containers :/
using the same hacks like netdev-to-docker.sh
I could execute a shell in the docker netns.
for the ttcn3 test side container:
root@gtp0deb10fr:/proc/dahdi# ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 3: hdlc1: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 5: hdlc2: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 7: hdlc3: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 9: hdlc4: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 11: hdlc5: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 13: hdlc6: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 15: hdlc7: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 17: hdlc8: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 33: eth0@if34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether 02:42:ac:12:19:67 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Doing a ip link set hdlcX up
did not change the state. However, the following worked:
root@gtp0deb10fr:/proc/dahdi# ip link set hdlc1 down root@gtp0deb10fr:/proc/dahdi# ip link set hdlc1 up root@gtp0deb10fr:/proc/dahdi# ip link ... 3: hdlc1: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UP mode DEFAULT group default qlen 50 link/frad ...
So taking the link down and up again works. Looks like some kind of stuck state in the kernel itself.
Updated by laforge over 1 year ago
Interestingly, this does not change the state on the other side of the FR link, so it is purely a "local" property of the net-device:
root@gtp0deb10fr:/proc/dahdi# ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 4: hdlcnet1: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 6: hdlcnet2: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 8: hdlcnet3: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 10: hdlcnet4: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 12: hdlcnet5: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 14: hdlcnet6: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 16: hdlcnet7: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 18: hdlcnet8: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1700 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 50 link/frad 31: eth0@if32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether 02:42:ac:12:19:0a brd ff:ff:ff:ff:ff:ff link-netnsid 0
Updated by laforge over 1 year ago
So the "operstate" sysfs attribute can be unknown, notpresent, down, lowerlayerdown, testing, dormant or up. It reflects the netdev->operstate property, which is reperesented by the respective IF_OPER_* constants. drivers/net/wan/hdlc* doesn't ever touch those directly.
What's interesting is the following kernel inline function:
/** * netif_oper_up - test if device is operational * @dev: network device * * Check if carrier is operational */ static inline bool netif_oper_up(const struct net_device *dev) { return (dev->operstate == IF_OPER_UP || dev->operstate == IF_OPER_UNKNOWN /* backward compat */); }
which seems to suggest that the UNKNOWN state should be treated like the UP state, and both are equal.
Note also that the loopback device is in UNKNOWN state and perfectly working fine:
root@gtp0deb10fr:/proc/dahdi# ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
I guess we're simply looking at the wrong property. If the link is UP, we should be able to use it.
The property can be set from userspace via netlink using the IFLA_OPERSTATE attribute, which in turn can be set by state OPERSTATE
of iproute2 (which appears undocumented):
laforge@nataraja%pts/22 (12:20) ~/projects/git/iproute2 > /sbin/ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 laforge@nataraja%pts/22 (12:22) ~/projects/git/iproute2 > sudo ip link set dev lo state up laforge@nataraja%pts/22 (12:23) ~/projects/git/iproute2 > sudo ip link show dev lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
But I really think there's no point in our userspace setting a flag in the kernel just so some shell script proceeds. We're looking at the wrong property here.
Updated by laforge over 1 year ago
- % Done changed from 20 to 50
Currently testing a patch checking for the "UP" flag (0x01) in the "flags" attribute instead. Guess this is the first time I'm doing bit-wise arithmetic in bash, after almost 30 years of working with Linux and shells like bash...
Updated by laforge over 1 year ago
- % Done changed from 50 to 90
Updated by laforge over 1 year ago
- Status changed from In Progress to Resolved
- % Done changed from 90 to 100