Forums

Strange errors when running console

Since late last night, I have been getting strange errors when starting a console (listed below). In addition, when I try to run any command in my bash shell, I get similar messages. Any idea what is going on?

bash: fork: retry: Resource temporarily unavailable
bash: fork: Interrupted system call
bash: fork: retry: No child processes
bash: fork: retry: No child processes
bash: fork: retry: No child processes
bash: fork: retry: No child processes
bash: fork: Resource temporarily unavailable

Hi Beison, the problem should be fixed for you now.

What caused it? Well we limit the number of processes a user can start to stop people fork bombing the servers. You had hundreds of Xvfb processes open. When you are writing scripts that open a process or resource. You should always make sure you close it.

If you use a

try:
    # normal code
except:
    raise

finally:
    # close my xvfb processes if they exist no matter what

Around your entire code you can guarantee that it will happen, mostly :)

Thanks! I will be more careful with this going forward.

One technique I've used in the past to guard against "hard" failures (i.e. segfaults and other crashes in the Python interpreter itself) is to create a "child-pids" directory where the name of the directory includes the PID of the parent. Every time I fork a child process, I create an empty file in the directory whose name is the PID of the child. When I reap a child process, I unlink the file. A single text file or database table would do just as well as the directory approach, but creating and removing files is just more convenient and I was feeling lazy at the time.

Anyway, the point is that you can then have a scheduled task which periodically scans these directories. For each parent PID directory it sends SIGKILL to each PID file within that directory and then removes the file.

You have to remember to remove the directory and files at the point your application terminates cleanly, and ideally you'd check process names before killing them, although I'm not aware of a way to do this on PA currently. Also it's worth noting that you only tend to need to do this in the case of processes which are designed to run as a daemon. If you're using subprocess with something which reads from its standard input, for example, closing the parent will close the input filehandle and hence the child process will get a read() of size 0 and generally terminate. It all depends on the behaviour of the applications concerned.

The risk is that you might end up killing the wrong process if, say, one of your children has silently exited and another process has reused the same PID. On a shared system like PA, however, the chances of that PID being reused by one of your processes (you won't have permission to terminate anybody else's) are slim.

Anyway, you might consider an approach like this if you find you run into this problem regularly.

EDIT: It occurs to me that while you can't check the process name or command-line on PA, you can use Python's os.getpgid() to get the process's group ID. If you also record the group ID of the parent in the directory name, you can use this as an additional check that you've got the correct process. This assumes that child processes don't change their process group (i.e. don't call os.setpgid()).

That's a nice idea, but it's worth noting (in case it's not obvious) that the scheduled task-based reaping of rogue child processes won't work for children of things you've started from the console -- they run on different machines. But right now all of your scheduled tasks will run on the same machine, so it will definitely work for children of scheduled tasks.