Forums

subprocess

With regards to running a subprocess how many can we run?

Also, do I have to terminate them on PA?

So running...

for i in range(5):
    subprocess.Popen(['python', 'file.py'])

Looks like some of them hang for a while in the 'fetch processes' section in the consoles.

On console servers, you can have 128 total processes per server.

You can terminate them from the process list on the Consoles page.

So they don't just clear once they are done? Sounds like I need to close them as I'm done with them

Every time you refresh the list, the process list is retrieved from the various machines. If the process is still running, it will be in the list.

I want to clear them with my code not manually.

I've read the can spawn onto new servers and are essentially lost. So if I did something like this:

procs = []
for i in range(5):
    procs.append(subprocess.Popen(['python', 'file.py']))

for proc in procs:
    proc.kill()

It won't always work? How can I get round this?

To give some more info on what I'm wanting to achieve.

When a user logs in, my db is updated and then my script (similar to the above) opens a new process for each user (this would be limited) so each process would be cleaned up to allow new processes to run.

This is why I was asking if they automatically close once they have finished running as you've said there is a limit and therefore I don't want to run into limits due to finished processes just hanging there.

any processes that finishes should close yes. the problem occurs if say your processes never finish/close, and end up piling up.

for the automatically killing process thing- that might not be easy, again, because your proc.kill() might not be called on the server that your proc is on.

Hey Conrad,

That's perfect then as they are just one process scripts which do a job and then close.

What strategies might I employ to not hit limits?

For example, Glenn mentions 128 processes per server, does that mean if I hit 128 I just get more pushed over to a new server, or I'm stuck at 128 while others finish?

I guess my problem is, if I can't check them (proc.poll()) due to them being 'lost' then I can't keep track of how many are open

hmm- the easiest is probably just to delegate instead of actually running it.

eg: have a "task queue" in your db, and your webapp just appends each new task to the db. then have a scheduled task that pulls from the queue and runs it separately

I still wouldn't know when it was finished though... unless I'm missing something?

Hmm, perhaps it's best to take a step back, I worry that we might be helping you optimise a specific bit of code that isn't the right way to achieve your goal. Why are you running the scripts? Is it (for example) a way of kicking off a background process to do some kind of calculations when the user logs it, the results of which will be presented to them later? Or something else?

Hey Giles,

I have a script which collects data from the Twitter API - this is heavily rate limited so I want only request the data for active users. This makes sense from a db performance point of view, too, as I'm not just filling data for users that might not be using it.

Each route in my flask app as a @last_active decorator which basically adds a timestamp to the db so I know when they were 'last active'.

My script, currently sat in the Tasks tab is a long-running task which looks for anyone 'active' and opens a subprocess - I'm doing the following:

subprocess.Popen(['python', 'twitter_fetch_subprocess.py', account_id])

This subprocess script just grabs all the search terms the user is monitoring, based o the account_id and polls the twitter search api and updates my db - this takes anywhere from 5 seconds to 2 minutes (depending on how much they are tracking).

I wanted to implement this as then a new process is spawned for each user which will then lessen the number of people sat in a queue - my original question was on the limits so I could write in some limits into the script...

if processes == limit:
    time.sleep(5)
    # check again...

That type of thing.

Ok. After all the back-and-forth in the thread I'm going to try to answer some of the queries I think I've spotted:

  1. 128 is per machine and if you spawn a process, it will be on the same machine as the process that spawned it. Also, all of your scheduled tasks run on the same machine, so the limit is 128 between all of your tasks.
  2. I'm not sure what you're talking about when you're talking about "lost" processes above. What do you mean, there?

ah ok, I'd read on another thread that processes can spawn onto different machines but I may have read it wrong. This thinking, however, was backed up by me calling .kill() and receiving a 'process does not exist' type error.

Perhaps the problem was, I was storing the process objects and then trying to kill them but they had already ended; I then read through the forums and got the wrong end of the stick.

If that's the case then I'm sorry for wasting your time.

Ah- sorry, I wasn't sure if you were running multiple processes from the console/from the web server etc.

Task stuff is all on one server, and I guess your task finished before you called .kill() on it.

When I try to run a pdf creation with wkhtmltopdf from a free account, it runs a os.Popen which tries to get some files and it fails with an "Exit with code 1 error due network error: ContentAccessDenied" error. Will this work in a paid account or will I have the same error? Do the commands executed by os.Popen have internet access in a paid account?

It may work it the resources are available on the internet. All processes in a paid account have direct access to the internet.