Project

General

Profile

Feature #1980

Review osmo-gsm-tester code

Added by neels 4 months ago. Updated about 1 month ago.

Status:
Closed
Priority:
High
Assignee:
Target version:
-
Start date:
03/13/2017
Due date:
% Done:

100%

Spec Reference:

Description

The current unreliability of the osmo-gsm-tester is partly due to the internal code structures.
My plan for the osmo-gsm-tester would be to pick the interesting bits of the current
implementation and basically wrap those in a rewrite of the "outer" structures.


Related issues

Related to OsmoBTS - Feature #1849: osmo-bts-trx integration to osmo-gsm-tester In Progress 11/18/2016

History

#1 Updated by neels 4 months ago

The interesting pickable cherries of working code are:
- Jenkins builds of binaries for BTS models and dispatch to the tester box
- ofono interaction (pending ofono stability improvements?)
- BTS updating and interaction
- BTS/NITB config templates
- Jenkins XML reporting

Parts in particular need of rethink/rewrite:
- User-config of available hardware and resource management (should be simpler)
- How to write (new) tests for the system (have a clear API)
- Error handling and overall stability (sane exception handling)
- Logging (log far less, but more of the interesting information)
- Content value of Jenkins XML reports

#2 Updated by neels 3 months ago

  • Status changed from New to In Progress

#3 Updated by neels 3 months ago

  • % Done changed from 0 to 20

started off by implementing a useful logging framework, choosing a config parser, implementing basic config file validation.
Now heading towards the test scripts API.

#4 Updated by neels 3 months ago

  • Related to Feature #1849: osmo-bts-trx integration to osmo-gsm-tester added

#5 Updated by neels 3 months ago

  • % Done changed from 20 to 30

Implemented file system based state and resource allocation logic using it. (Resource allocation: which BTS,modems,etc. are reserved for which test suite.)

The resource allocation allows requesting either "any" BTS/modem/etc., or requesting specific traits (model, ...).

The file system based state for resource allocation allows running a number of test suites from separate processes without hardware usage conflicts.

Each test suite advertises what hardware it wants reserved, but it is up to each test case script to actually setup and use it. This theoretically allows scheduling test suites intelligently by required resources, while staying flexible on the things that tests can check. The API available to tests will prevent using hardware not reserved by the respective test suite.

(It may make sense to also allow a way to set up BTS/modems for an entire test suite, with several separate tests running on the same configuration, but so far each test is expected to launch a BTS explicitly. My plan is to guarantee running test scripts in alphabetical order, so a 00_prep test could set up a BTS, be used by 01_*, 02_*, 03_*, and tear down could be automatic after the suite is through. A test suite's config can say whether used hardware should be torn down after each test or not, or a test script could ask the API to tear down all used hardware...)

General overview: the estimate was about 25 man-days for this task, of which I have worked ~7.1 and progress is looking good.

Next up: (re-)implement updating/launching of BTS processes, taking the good bits of the previous code, but making it available as API to be called from within test scripts.

#6 Updated by neels 3 months ago

The main remaining issue, DBus integration, seems to be resolved -- I'm quite happy about these news, it was the single problem/unknown in the refactoring I've had in mind all the while:

We need DBus to talk to ofono, to benefit from modem management abstraction. There have been issues with the DBus integration: a 'main loop' has to run to manage DBus messaging, essentially passing control of the main event flow to DBus. In the tester, however, we would prefer to maintain an own sequential call flow without passing control over to another event loop and adhering to its event driven model. The natural idea to solve this is to run our sequential flow in a separate thread from the 'main loop' to manage DBus. In our previous osmo-gsm-tester implementation, we were using the libdbus based 'dbus' library, which disrupted threading in a way that this paradigm was not possible. We were forced to run basically all actions from a timeout callback, using complex re-entrant code with lots of state. Instead, using pydbus -- named almost similarly to 'dbus' but being a completely different implementation -- we can have a sequence of expectations and/or wait instructions in our main program flow and keep the DBus 'main loop' in a thread. This will make writing tests simpler by far.

The pydbus library is basically a nicer way of using GLib's GIO GDBus.

In a test program, I was able to pass messages to ofono modems without running a 'main loop' at all: setting a Modem to 'Powered' = True via DBus blocks for some seconds and then returns. After that, fetching the properties of the modem reflects the change (Powered == True). Using polling, one could interact with ofono completely without a DBus 'main loop'...

But to benefit from events passed down DBus from ofono, we need this main loop. Using the pydbus library, I was able to "hide" the DBus main loop (actually the GLib.MainLoop()) in a separate thread and maintain normal sequentiality in the remaining program. i.e., if I send the Powered=True request to ofono, I can now receive the modem's state change from the DBus signal without actively polling, actually while my controlling program is idling in a time.sleep() (or doing anything else, for that matter). Of course we'll need cross-thread communication with one or two mutexes or semaphores, but it's definitely worth it.

pydbus is currently at version 0.6.0, unfortunately not available as debian package, but it is installable comfortably by pip3.
(We also have the python smpplib depending on a pip installation -- actually not used in the new osmo-gsm-tester implementation yet).

I have various unit tests to self test the osmo-gsm-tester code, so a nice-to-have would be a fake DBus service to write such unit tests for the DBus client. However, the pydbus example code for a DBus 'server' throws:

Exception: GLib 2.46 is required to publish objects; it is impossible in older versions.

Debian stable currently seems to ship 2.42. An upgrade to Debian testing (2.50) could resolve this, but instead I'll postpone writing DBus related regression tests for now, to avoid potential issues from mixing package versions in a non-standard way. Also, such DBus server would probably need to run as root and/or need systemd config changes on the system tested on, so it won't be easy to just run such tests during a 'make check' anywhere by the uninformed user.

I plan to reach a state where the new code runs a first actual complete GSM test on one of the BTS models (osmo-bts-trx or nanoBTS) within a week, after which I will be able to document preliminarily how the new system will work -- I've been asked for that a number of times recently.

(btw, I will technically be on easter vacation next week, but will probably do some work every now and then anyway.)

#7 Updated by neels 3 months ago

pespin (new colleague at sysmocom) has shared some of his DBus / glib experience with me and pointed out that interacting with DBus from a thread alongside the glib main loop should rather be avoided. In my test program it works, but as usual threading issues are hidden until they hit some race.

It's not an option to write the GSM tests event based. I don't want a state machine (like in the previous gsm tester implementatin), I want to be able to write something like:

nitb.start()
modem.power_on()
modem.connect(nitb)
wait(nitb.sees(modem))

According to glib docs the main loop is poll() based. That link also claims Glib to be completely (TM) thread safe, but I'm not familiar with it, the statement might imply that threads are invoked via Glib or something in that line.

My initial test program runs the glib main loop in a thread alongside the gsm test code, so it poll()s in a separate thread and invokes the signal functions registered from my own thread. It could be safe with python's way of threading and so forth. But to definitely be on the safe side, a better way to do this is to invoke single Glib main loop iterations with MainContext.iteration()

from gi.repository import GLib
glib_main_ctx = GLib.MainContext()

def wait(condition):
    while not condition():
        glib_main_ctx.iteration()

This runs single poll() iterations of GLib sequentially without the need of a separate thread. The GSM test API has the responsibility of calling iteration() in appropriate places, with the benefit of being sure that no races arising from separate threads can occur, because all poll()s happen at well-defined times.

Will test the single iteration() calls now...

#8 Updated by neels 3 months ago

The single iteration() invocations are verified to work, like this:

from gi.repository import GLib
glib_main_loop = GLib.MainLoop()
glib_main_ctx = glib_main_loop.get_context()

def pump():
    global glib_main_ctx
    print('pump?')
    while glib_main_ctx.pending():
        print('* pump')
        glib_main_ctx.iteration()

# and then like
def main():
    foo()
    pump()
    while not bar():
        pump()
        time.sleep(.1)

#9 Updated by neels 3 months ago

  • % Done changed from 30 to 70

The new GSM tester is on the brink of being useful.

Binaries are built on (the sysmocom internal) jenkins, test runs are launched and a first SMS test is run.

  • The osmo-bts-trx is started, but osmo-trx doesn't find any UHD device. Apparently the B200 is not connected at present.
  • The osmo-bts-sysmo is started and connects to the osmo-nitb.
  • TODO: The osmo-bts-octphy is not yet implemented (not much effort)
  • TODO: The ip.access nanoBTS is not yet implemented (not much effort)
  • The SMS test so far fails because the modem does not attach to the BTS (sysmoBTS). ofono has a NetworkRegistration interface, but the wavecom modems don't seem to publish such an interface. I believe this is not implemented for the wavecom modems. TODO: I guess the modem test should start working once I use the same MCC+MNC as the previous gsm tester setup, so the SIM card will select the same network it knew before? Alternatively connect a quadmodem board with modems that are better supported by ofono in the hope that they provide a NetworkRegistration interface.
  • TODO: Documentation -- describe in an easy way how the various configuration files converge to form a test trial.
  • TODO: refine the configuration structures. Some config files could be converged to one, and the scenario's config can be refined.

#10 Updated by laforge 3 months ago

On Mon, Apr 10, 2017 at 01:27:14PM +0000, neels [REDMINE] wrote:

The new GSM tester is on the brink of being useful.

great.

  • The osmo-bts-trx is started, but osmo-trx doesn't find any UHD
    device. Apparently the B200 is not connected at present.

I did not make any modifications to the development setup. We should
investigate tomorrow.

  • TODO: The osmo-bts-octphy is not yet implemented (not much effort)

if we do this, octasic would have to cover the hours. given that there
is plenty of other work pending, I think we should not be proactive here
but rather "rub it under their nose" that they should contract us
specifically for this integration work.

  • TODO: The ip.access nanoBTS is not yet implemented (not much effort)

This is also not a particularly high priority. Yes, it is part of our
heritage and so on, but presently I don't think any of our paying
customers is using nanoBTSs, so continuous testing+integration is not
something that sysmocom should invest in.

TODO: I guess the modem test should start working once I use the
same MCC+MNC as the previous gsm tester setup, so the SIM card will
select the same network it knew before? Alternatively connect a
quadmodem board with modems that are better supported by ofono in the
hope that they provide a NetworkRegistration interface.

Do not spend any time on investigating any issues that might be related
to the lack of good ofono support for the old modem banks. the
quad-modem boards with QMI based modems such as sierra wireless, EC20 or
gobi2000, gobi3000 is the way to go. Part of the work would be to
figure out which modems are good candidates, and how well they are
supported by ofono. Maybe that kind of testing is something for Pau?

#11 Updated by neels 2 months ago

laforge wrote:

the
quad-modem boards with QMI based modems such as sierra wireless, EC20 or
gobi2000, gobi3000 is the way to go. Part of the work would be to
figure out which modems are good candidates, and how well they are
supported by ofono. Maybe that kind of testing is something for Pau?

+1

if we do this, octasic would have to cover the hours.

I've already implemented a basic octasic stub, but will not spend any more time on getting it to actually work.

How about the litecell1.5? Same story?

Update: a preliminary manual for the new osmo-gsm-tester is found at
http://jenkins.osmocom.org/jenkins/job/osmo-gsm-manuals-gerrit/label=linux_amd64_debian8/ws/Osmo-GSM-Tester/osmo-gsm-tester-manual.pdf
https://gerrit.osmocom.org/2325

First test runs (with complete failures so far for described reasons) can be seen on the sysmocom internal jenkins
http://10.9.1.103/view/osmo-gsm-tester/job/osmo-gsm-tester_run/17/console

However, the wavecom modems now seem to have disappeared without substitute from the gsm tester R&D setup, ofono no longer lists any modems (even after restarting ofono for good measure).
lsusb does show some stuff that looks like modems:

Bus 001 Device 018: ID 05c6:9215 Qualcomm, Inc. Acer Gobi 2000 Wireless Modem
Bus 001 Device 017: ID 05c6:9215 Qualcomm, Inc. Acer Gobi 2000 Wireless Modem
Bus 001 Device 016: ID 1d50:4004 OpenMoko, Inc. 
Bus 001 Device 019: ID 1199:68c0 Sierra Wireless, Inc. 
Bus 001 Device 015: ID 05c6:9204 Qualcomm, Inc. 
Bus 001 Device 020: ID 1d50:4004 OpenMoko, Inc. 
Bus 001 Device 013: ID 1d50:4002 OpenMoko, Inc. 

But not ofono

# mdbus2 -s -i org.ofono / org.ofono.Manager.GetModems
([],)

#13 Updated by laforge 2 months ago

On Fri, Apr 14, 2017 at 02:52:39AM +0000, neels [REDMINE] wrote:

https://git.kernel.org/pub/scm/network/ofono/ofono.git/tree/doc/hardware-support.txt

Sorry, but this is a file that was introduced in 201 and has never been
touched since 2012. What is the reference to this document supposed to
tell us?

#14 Updated by laforge 2 months ago

On Fri, Apr 14, 2017 at 02:49:04AM +0000, neels [REDMINE] wrote:

How about the litecell1.5? Same story?

yes.

However, the wavecom modems now seem to have disappeared without
substitute from the gsm tester R&D setup, ofono no longer lists any
modems (even after restarting ofono for good measure).

You don't expect this to "just work"? That would be quite a bit naive,
sorry ;)

The task at this point (not sure if there's a separate/better ticket)
is to

1) get the modems to expose their interfaces to userspace,
particularly/preferably a QMI based interface
2) evaluate which of the modem[s] we can source will work to what level
of satisfaction with ofono.
3) if there are still serious issues with ofono even on "well supported modems",
maybe look at other directions like directly interfacing libqmi (which gives
us a relatively detailed/rich interface to Qualcomm based modems)

For the Quectel modems (at least with old kernels) you will probably
have to start with a kernel patch to get their USB IDs recognized by the
qcserial driver at first at all.

For the sierra wireless modem, I would generally expect better "out of
the box" experience, as it is quite old.

For the Qualcomm Gobi 2000 you will need to install the right firmware
on the Linux system. The don't have flash an the firmware needs to be
loaded in their ram after every power cycle/reset of the modem.

As most people in the sysmocom team have worked with the various modems
already before, I presumed the above information was known.

I originally assumed you would want to do the modem evaluation on a PC
where you can sit in front, and test various different modems locally.
However, I understood you recently in a way that you would like to have
the quad-modem board with modems connected to the osmo-gsm-tester
development system ASAP, so roh, martin and I put in some effort to get
this done (soldering the required splitter/combiner board, mechanical
assembly, ...) still ahead of the easter holidays.

This is all really off-topic here, feel free to move to a more apropriate place.

#15 Updated by neels 2 months ago

Ok, I was indeed naive about the modems just working right away.

Will first try to get the Gobi to run, then the Sierra Wireless.
Will ask acouzens in case of problems.

#16 Updated by lynxis 2 months ago

latest ofono can be found on https://code.fe80.eu/lynxis/ofono
I'm working on getting my commits upstream. The commits only touches the detection.

#17 Updated by neels 2 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 70 to 100

While the code has not reached its "final" structure, the overall code rewrite is done.
I am now fanning remaining changes out into various individual issues, so that it is easier to collaborate on it, hence closing this.

#18 Updated by laforge about 1 month ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF