Bug #4378
closedHaving 'sctp-role client' in configuration causes high CPU load
100%
Description
How to reproduce?¶
Just add 'sctp-role client' to your configuration file, e.g. to osmo-msc.cfg:
cs7 instance 0 point-code 0.23.1 asp asp-clnt-OsmoMSC-A 2905 0 m3ua remote-ip 127.0.0.1 sctp-role client ! <-- this line causes OsmoMSC to use 100% CPU as as-clnt-OsmoMSC-A m3ua asp asp-clnt-OsmoMSC-A routing-key 1 0.23.1
What happens?¶
Looking at the output of strace, it seems like some file descriptor makes the select() loop non-blocking:
select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=519755}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=519751}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=519456}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=519450}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=519371}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=519367}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=519280}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=519277}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=519191}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=519187}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=519112}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=519109}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=519038}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=519035}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=518958}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=518955}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=518877}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=518872}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=518813}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=518810}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=518747}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=518744}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=518686}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=518683}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=518632}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=518619}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=518553}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=518550}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=518501}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=518498}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=518423}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=518420}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=518353}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=518350}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=518282}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=518278}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=518132}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=518128}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=518048}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=518045}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=517967}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=517963}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=517888}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=517885}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=517809}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=517806}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=517730}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=517727}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=517651}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=517647}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=517576}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=517573}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=517519}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=517517}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=517465}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=517461}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=517395}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=517392}) select(11, [3 4 5 6 9 10], [4 10], [], {tv_sec=0, tv_usec=517317}) = 2 (in [4], out [4], left {tv_sec=0, tv_usec=517314})
lsof tells that fd=4 is an SCTP connection:
$ lsof -p 387341 -ad 4 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME osmo-msc 387341 fixeria 4u sock 0,9 0t0 459782671 protocol: SCTP
OsmoBSC fails to establish connection to the MSC:
Jan 28 16:45:50 DELL osmo-bsc[389147]: DMSC NOTICE a_reset.c:106 A-RESET(msc-0)[0x55b321ad6a70]{DISC}: (re)sending BSSMAP RESET message... Jan 28 16:45:50 DELL osmo-bsc[389147]: DMSC NOTICE osmo_bsc_sigtran.c:102 Sending RESET to MSC: RI=SSN_PC,PC=0.23.1,SSN=BSSAP Jan 28 16:45:50 DELL osmo-stp[389002]: DLSS7 ERROR osmo_ss7_hmrt.c:257 MTP-TRANSFER.req for DPC 185: no route! Jan 28 16:45:52 DELL osmo-bsc[389147]: DMSC NOTICE a_reset.c:106 A-RESET(msc-0)[0x55b321ad6a70]{DISC}: (re)sending BSSMAP RESET message... Jan 28 16:45:52 DELL osmo-bsc[389147]: DMSC NOTICE osmo_bsc_sigtran.c:102 Sending RESET to MSC: RI=SSN_PC,PC=0.23.1,SSN=BSSAP Jan 28 16:45:52 DELL osmo-stp[389002]: DLSS7 ERROR osmo_ss7_hmrt.c:257 MTP-TRANSFER.req for DPC 185: no route! Jan 28 16:45:54 DELL osmo-bsc[389147]: DMSC NOTICE a_reset.c:106 A-RESET(msc-0)[0x55b321ad6a70]{DISC}: (re)sending BSSMAP RESET message... Jan 28 16:45:54 DELL osmo-bsc[389147]: DMSC NOTICE osmo_bsc_sigtran.c:102 Sending RESET to MSC: RI=SSN_PC,PC=0.23.1,SSN=BSSAP Jan 28 16:45:54 DELL osmo-stp[389002]: DLSS7 ERROR osmo_ss7_hmrt.c:257 MTP-TRANSFER.req for DPC 185: no route! Jan 28 16:45:56 DELL osmo-bsc[389147]: DMSC NOTICE a_reset.c:106 A-RESET(msc-0)[0x55b321ad6a70]{DISC}: (re)sending BSSMAP RESET message... Jan 28 16:45:56 DELL osmo-bsc[389147]: DMSC NOTICE osmo_bsc_sigtran.c:102 Sending RESET to MSC: RI=SSN_PC,PC=0.23.1,SSN=BSSAP Jan 28 16:45:56 DELL osmo-stp[389002]: DLSS7 ERROR osmo_ss7_hmrt.c:257 MTP-TRANSFER.req for DPC 185: no route!
OsmoSTP has the following routing table:
OsmoSTP# show cs7 instance 0 route Routing table = system C=Cong Q=QoS P=Prio Destination C Q P Linkset Name Linkset Non-adj Route ---------------------- - - - ------------------- ------- ------- ------- 0.23.3/14 0 as-rkm-2 ? ? ?
Some more observations¶
I don't know the exact role OsmoMSC is supposed to play. Adding this line to osmo-bsc.cfg replicates the same behaviour. Changing sctp-role from 'client' to 'server' or omitting this line makes both OsmoMSC and OsmoBSC talk to each other just fine again.
I had this line in my osmo-msc.cfg since the last breakage of libosmo-sccp, and so far everything was working good (until the recent upgrade). Even if this is an incorrect configuration, we should not keep OsmoMSC running and eating CPU. Aborting the process makes more sense in this case. Otherwise it's not easy to notice this problem.
pespin any ideas?
Files