Forums

Scheduled task deadlocks due to stderr buffer fill

It appears PA uses a limited buffer to which it redirects stderr from scheduled taks (I don't know whether this also happens with stdout). I have a task that outputs a lot of information to stderr and it causes it to deadlock when it is run from the PA scheduler. I think this behaviour should be rectified so we don't have to change our scripts when they run on PA.

We'll have a look at that. Thanks for reporting it.

If the scheduler is using something like subprocess then it can be a bit fiddly to deal with output from the process in a safe fashion (i.e. which doesn't risk blocking indefinitely and doesn't deadlock on lots of output). I've always thought it was disappointing that Python doesn't provide a version of communicate() which doesn't buffer everything in memory - for example, you could specify an optional size limit and it would return early, allow the application to handle the buffered output and then call it once again.

I note that the latest version at least has a timeout, but no size limit. Why oh why oh why do the core python devs not add timeouts to every single operation in the standard library that might block when they're first written? Time and again we've had to wait for later releases to add timeout parameters to various functions. It's IO API design 101! You always provide a timeout, zero timeout always should mean "return current status, do not block at all" and a timeout of None means "block indefinitely. It's a shame because by and large the Python library is pretty good, but networking and asynchronous IO seem to be real blind spots among the core team. Sorry, rant over.

In any case, if it's any help I wrote a ProcPoller class which attempts to deal with multiple subprocess invocations in a non-blocking fashion. You create an instance of it, call a method to invoke as many subprocess commands as you wish and then call poll() to watch them all for output. By default it buffers output infinitely (not very safe), but the expectation is that you'll derive the class and handle the output in a more sensible fashion. The poll() method has a timeout so you can perform other background tasks, like terminating jobs after a certain amount of time, without requiring the hassle of threads.

@giorgostzampanakis -- we've added this to our to-do list, but it may be a while before we can find the time for it. In the meantime, you could investigate using the logging module and logging to a file, instead of using stderr?

@harry -- logging to a file works fine, I've been using that for days with no problems.