Project

General

Profile

Feature #4311

blocking stderr output inhibits the program?

Added by neels about 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
12/06/2019
Due date:
% Done:

0%

Spec Reference:

Description

I'm currently using urxvt (rxvt-unicode) as terminal program (not sure why, but...)
and I am every now and then seeing odd behavior when the terminal is scrolled up into scrollback, so that new output lines are not seen.

It seems to me that after a while urxvt stops reading from stdout/stderr. That is weird enough in itself, but the Osmocom related problem:
Apparently the server program stops interacting when the log output stops reading?

Even though stderr (or any other log target) is not writable, IMO we should still continue to serve all protocols as usual.
A blocking log target should not affect server operation.

Investigate whether my vague impression is correct and find ways to fix that in logging.c.

History

#1 Updated by fixeria about 2 months ago

ACK. I observed the described behaviour in tmux while being in "scroll" (Ctrl + Mod + [) mode. Maybe we can solve this problem by introducing write queues?

#2 Updated by laforge about 2 months ago

This is pretty much the expected behavior for virtually any program, I would say.

If the program logs to stdout/stderr, then it relies on those file descriptos to be
writable everyy so often. If they are blocked on the reader side, then (after filling some glibc buffering)
things will stall.

I don't really see this as an Osmocom specific problem, or as a problem at all. I mean, what are
we supposed to do? Allocate ever-growing buffers until we are OOM? Throw away output messages
despite file descriptor / pipe / stream semantics don't normally do that?

If you don't disable stdout/stderr, or don't redirect it to /dev/null, you better keep reading from it.

#3 Updated by laforge about 2 months ago

On Fri, Dec 06, 2019 at 01:40:46PM +0000, fixeria [REDMINE] wrote:

ACK. I observed the described behaviour in tmux while being in "scroll" (Ctrl + Mod + [) mode. Maybe we can solve this problem by introducing write queues?

but then how do you size those queues and prevent against OOM? Without
knowing what is stopping to make stdout/stderr writable, you cannot
really know what the best solution is. If somebody wants to discard the
output, let's have them do that discarding based on their
preference/policy outside of osmo-* ?

#4 Updated by neels about 2 months ago

well, if we "crash" the entire core network just because we can't log to stderr at the moment, then I'd rather drop logging to stderr (and keep logging to gsmtap anyway, in that sense).
It's also a dangerous / non-recoverable situation, in that osmo-x doesn't really crash and gets restarted, it just sits there and hogs the space doing nothing.
Granted, in a systemd service environment that's probably not going to happen (is it?).

It is also often an inconvenience when testing. I would prefer losing stderr logging to the test not working from forgetting to scroll the terminal window.
The current way keeps a sword over your head of "how long am I allowed to read this backscroll before everything breaks down"

Todo: Is there a simple shell magic workaround that could prevent blocking? like '| tee' or something?
(I still always want stderr to also see MNCC domain socket communication,
and a "live" pipe, not cat to file and tail that other file, to see whether the program is still running)

#5 Updated by laforge about 2 months ago

Hi Neels,

On Tue, Dec 10, 2019 at 12:42:07PM +0000, neels [REDMINE] wrote:

well, if we "crash" the entire core network just because we can't log to stderr at the moment, then I'd rather drop logging to stderr (and keep logging to gsmtap anyway, in that sense).

I think this is flawed, sorry. If you configure to write log to any type of output, and that output then
stalls for some reason, then the process stalls. I would consider this the expected behavior of virtually
any 'daemon' style of software.

A logging subsystem that looses events on high system I/O load or otherwise is not worth anything.

How would you ever want to debug anything if you don't know if your debug log contains everything?

Or what if you want to use the log postprocessing for accounting? Or for investigating security incidents?

It's also a dangerous / non-recoverable situation, in that osmo-x doesn't really crash and gets restarted, it just sits there and hogs the space doing nothing.

I would like to invite you to do a test on other 'daemon' style software that is configured to write to stdout (or any file descriptor) for that matter. I would consider this to be perfectly normal behavior for multiple decades.

If a user doesn't want to log to a filedescriptor (with related blocking semantics), he should simply not
configure it and use a non-blocking mechanism like syslog (or GSMTAP) in our case.

Granted, in a systemd service environment that's probably not going to happen (is it?).

I don't know systemd in detail, but I would suppose at some point there could be back-pressure if journald
cannot write sufficiently fast to disk?

It is also often an inconvenience when testing. I would prefer losing stderr logging to the test not working from forgetting to scroll the terminal window.

Once again, I think this argument is (extermely) flawed. If you don't want logging on stdout/stderr, simply disable it.

If you want to ever test or debug or analyze anything, and you cannot know if your log is complete, how would you ever get to any type of reliable result?

The current way keeps a sword over your head of "how long am I allowed to read this backscroll before everything breaks down"

I'm not really following you here. Why would scrolling back in a terminal window block the ability of the
program to write to stdout/stderr? And if so, why would you ever blame anything but said terminal emulator for the results of the actions/

Todo: Is there a simple shell magic workaround that could prevent blocking?

I still don't really understand where the problem is? Why would you (or tools you use) intentionally block the stdout/stderr in the first place? I use 'uxterm' here and I don't recall any situation where I would have accidentially blocked stdout/stderr?

#6 Updated by laforge about 2 months ago

On Tue, Dec 10, 2019 at 12:42:07PM +0000, neels [REDMINE] wrote:

I still believe your argument is in favor of fundamentally breaking unix stream / file semantics.

Todo: Is there a simple shell magic workaround that could prevent blocking? like '| tee' or something?

'buffer' is likely what you want, also see
https://stackoverflow.com/questions/8554568/add-a-big-buffer-to-a-pipe-between-two-commands

#7 Updated by neels about 2 months ago

On Tue, Dec 10, 2019 at 03:33:36PM +0000, laforge [REDMINE] wrote:

I'm not really following you here. Why would scrolling back in a terminal window block the ability of the
program to write to stdout/stderr? And if so, why would you ever blame anything but said terminal emulator for the results of the actions/

I think that indeed this ends up being the cause, which is why I mentioned
urxvt at the start. Probably this terminal fails to do buffer forwards properly
and that's all that I need to fix... using a proper terminal.

I still don't really understand where the problem is? Why would you (or tools you use) intentionally block the stdout/stderr in the first place? I use 'uxterm' here and I don't recall any situation where I would have accidentially blocked stdout/stderr?

Will try uxterm...

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)