Bug #5657
opendia2gsup not (re-)connecting to HLR if osmo-hlr is down
100%
Description
It seems sometimes osmo_dia2gsup doesn't manage to connect to osmo-hlr:
root@sysmonitb:~# systemctl status osmo_dia2gsup ● osmo_dia2gsup.service - Osmocom DIAMETER to GSUP translator Loaded: loaded (/lib/systemd/system/osmo_dia2gsup.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2022-08-22 14:57:42 UTC; 39min ago Main PID: 254 (beam.smp) Tasks: 23 (limit: 4642) Memory: 55.7M CPU: 4.710s CGroup: /system.slice/osmo_dia2gsup.service ├─254 /usr/bin/osmo-dia2gsup -B -- -root /usr/lib/erlang -progname erl -- -home /var/lib/osmo_dia2gsup -- -boot no_dot_erlang -noshell -escript main osmo_dia2gsup -pz osm> ├─271 erl_child_setup 1024 ├─454 inet_gethost 4 └─455 inet_gethost 4 Aug 22 14:57:42 sysmonitb systemd[1]: Starting Osmocom DIAMETER to GSUP translator... Aug 22 14:57:42 sysmonitb systemd[1]: Started Osmocom DIAMETER to GSUP translator. Aug 22 14:57:47 sysmonitb osmo-dia2gsup[254]: *DBG* ss7_routes got call flush_routes from <0.132.0> Aug 22 14:57:47 sysmonitb osmo-dia2gsup[254]: *DBG* ss7_routes sent ok to <0.132.0>, new state {sr_state,ss7_routes} Aug 22 14:57:48 sysmonitb osmo-dia2gsup[254]: 14:57:48.180 [info] Diameter HSS Application started on IP 127.0.0.8, sctp port 3868 Aug 22 14:57:48 sysmonitb osmo-dia2gsup[254]: 14:57:48.239 [info] Connecting to GSUP HLR on IP 127.0.0.1 port 4222 root@sysmonitb:~# systemctl status osmo-hlr ● osmo-hlr.service - Osmocom Home Location Register (OsmoHLR) Loaded: loaded (/lib/systemd/system/osmo-hlr.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2022-08-22 15:02:28 UTC; 34min ago Docs: https://osmocom.org/projects/osmo-hlr/wiki/OsmoHLR Main PID: 532 (osmo-hlr) Tasks: 1 (limit: 4642) Memory: 5.1M CPU: 176ms CGroup: /system.slice/osmo-hlr.service └─532 /usr/bin/osmo-hlr -c /etc/osmocom/osmo-hlr.cfg -l /var/lib/osmocom/hlr.db Aug 22 15:02:28 sysmonitb systemd[1]: Started Osmocom Home Location Register (OsmoHLR). Aug 22 15:02:29 sysmonitb osmo-hlr[532]: 20220822150229034 DMAIN NOTICE hlr starting (hlr.c:791) Aug 22 15:02:29 sysmonitb osmo-hlr[532]: 20220822150229034 DDB NOTICE using database: /var/lib/osmocom/hlr.db (db.c:558) Aug 22 15:02:29 sysmonitb osmo-hlr[532]: 20220822150229053 DDB NOTICE Missing database tables detected; Bootstrapping database '/var/lib/osmocom/hlr.db' (db.c:626) Aug 22 15:02:29 sysmonitb osmo-hlr[532]: 20220822150229097 DDB NOTICE Database '/var/lib/osmocom/hlr.db' has HLR DB schema version 6 (db.c:636) Aug 22 15:02:29 sysmonitb osmo-hlr[532]: 20220822150229102 DLGLOBAL NOTICE Available via telnet 127.0.0.1 4258 (telnet_interface.c:100) Aug 22 15:02:29 sysmonitb osmo-hlr[532]: 20220822150229102 DLCTRL NOTICE CTRL at 127.0.0.1 4259 (control_if.c:1013)
You can see that the hlr has been started after dia2gsup was already running.
After restarting dia2gsup manually the connection succeeds:
root@sysmonitb:~# systemctl restart osmo_dia2gsup root@sysmonitb:~# systemctl status osmo_dia2gsup ● osmo_dia2gsup.service - Osmocom DIAMETER to GSUP translator Loaded: loaded (/lib/systemd/system/osmo_dia2gsup.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2022-08-22 15:37:21 UTC; 3s ago Process: 4301 ExecStartPre=/usr/bin/mkdir -p /var/lib/osmo_dia2gsup (code=exited, status=0/SUCCESS) Main PID: 4302 (beam.smp) Tasks: 23 (limit: 4642) Memory: 36.1M CPU: 4.608s CGroup: /system.slice/osmo_dia2gsup.service ├─4302 /usr/bin/osmo-dia2gsup -B -- -root /usr/lib/erlang -progname erl -- -home /var/lib/osmo_dia2gsup -- -boot no_dot_erlang -noshell -escript main osmo_dia2gsup -pz os> ├─4309 erl_child_setup 1024 ├─4330 inet_gethost 4 └─4331 inet_gethost 4 Aug 22 15:37:24 sysmonitb osmo-dia2gsup[4302]: "00:00:00:00:00:00", Aug 22 15:37:24 sysmonitb osmo-dia2gsup[4302]: "00:00:00:00:00:00", Aug 22 15:37:24 sysmonitb osmo-dia2gsup[4302]: "HSS-00-00-00-00-00-00",false} Aug 22 15:37:24 sysmonitb osmo-dia2gsup[4302]: 15:37:24.947 [info] connected! Aug 22 15:37:24 sysmonitb osmo-dia2gsup[4302]: Registering handler {process_id,<0.147.0>} for socket #Port<0.7> Stream {osmo, Aug 22 15:37:24 sysmonitb osmo-dia2gsup[4302]: 5} Aug 22 15:37:24 sysmonitb osmo-dia2gsup[4302]: Unblocking socket #Port<0.7> Aug 22 15:37:24 sysmonitb osmo-dia2gsup[4302]: Unblocking socket #Port<0.7>:ok Aug 22 15:37:24 sysmonitb osmo-dia2gsup[4302]: Stream 254, 17 bytes Aug 22 15:37:24 sysmonitb osmo-dia2gsup[4302]: Socket #Port<0.7> Stream 254: ID_GET -> ID_RESP Aug 22 15:37:50 sysmonitb osmo-dia2gsup[4302]: 15:37:50.392 [info] Peer up <0.146.0> - #diameter_caps{origin_host={"hss.localdomain","mme.localdomain"},origin_realm={"localdomain","lo>
Updated by pespin 4 months ago
This one helps in properly connecting to HLR if it gets started after osmo_dia2gsup: https://gerrit.osmocom.org/c/erlang/osmo_dia2gsup/+/34199 gsup: Attempt reconnecting if connect fails
However, once it becomes connected, I didn't test yet what happens if the conn goes down. That scenario probably needs fixing too.
Updated by pespin 3 months ago
- Assignee changed from fixeria to lynxis
pespin wrote in #note-2:
This one helps in properly connecting to HLR if it gets started after osmo_dia2gsup: https://gerrit.osmocom.org/c/erlang/osmo_dia2gsup/+/34199 gsup: Attempt reconnecting if connect fails
However, once it becomes connected, I didn't test yet what happens if the conn goes down. That scenario probably needs fixing too.
This scenario is proved to be NOT working. It can be easily tested by running the docker-playground TTCN3 DIA2GSUP_Tests.
Assigning to lynxis since he's looking into it.
Updated by lynxis 3 months ago
The gsup side is now reconnecting. IMHO this ticket is done as soon https://gerrit.osmocom.org/c/erlang/osmo_dia2gsup/+/34285 is merged.
But the diameter side is also not allowing reconnects from the open5gs-mmed side.