Forums

So...cron...

I know every system requires limits. I know every service requires limits. However...please consider this. Since we can circumvent the limits to cron here on PA, why not just open up the restrictions? I mean what is the point of PA staff having to spend the resources to develop and support restrictions that the users can essentially code around by rolling their own? All this does is create extra effort for both PA and the PA users. As well as likely introducing bugs and other unexpected features.

I know there are other reasons why you could consider this a bad idea, but I just figured I'd ask you to consider it. Then again you likely already have considered it, but I haven't seen the reasons you decided against the idea in public.

I would like to add ps to the list of utilities as well. Unless there is another way of getting the process list that I am not aware of. I had a whole bunch of runaway scheduled processes overnight and I am not sure how I could have gotten rid of them without knowing their process ids because I would like to write a task that runs at least every hour and culls runaway processes. None of my scheduled tasks should take more than an hour. In fact they shouldn't take more than 30 seconds.

The limited scheduled task abstraction that we provide is not really to restrict what you can do so much as it is to provide a cron-like service in an environment where cron isn't available. That might need a bit more explanation...

On PythonAnywhere, you're not operating in your own virtual server -- our virtualisation layer is much simpler. This has the huge advantage that we can let you "burst" and use much more resources than we could guarantee to provide day-in, day-out at our current prices. But it does have the disadvantage that we have to reimplement certain things.

The situation with processes is similar, though it's made more complicated by the fact that we abstract away the location where your processes are running, so even if we did provide ps so that you could see what was running on a specific server, there's no guarantee that the thing you were looking for (especially if it was started from a scheduled task) would be running on the server you were running ps on.

So, what we need to do is either change our virtualisation system (which I think would be a mistake) or provide more cron-like (and ps-like) features of our own. We're planning to do the latter.

Which leads me on to the question... which features of cron should we look at emulating? I see that someone was asking for days-of-the-week scheduling in the web2py cron thread, so would that be a good one? Or are there other things that would be more useful?

For a ps equivalent, I think it's pretty clear that we need to provide a page inside PythonAnywhere where you can see a definitive list of everything you have running, with the option to kill them. This could aggregate across all of the servers in our cluster. Is there anything else we'd need in that?

Suggestions, as ever, very much welcome :-)

The definitive list of processes would be a good addition I think. Sometimes I'm not sure if the things I have started have properly cleaned up after themselves so I'm wary of doing any kind of multiprocessing and forking when I have no way of shutting things down.

I like cron a lot but I almost never use the full power of its scheduling facilities so the current functionality PA provides has been enough for whatever I've tried to do.

The explanation you have provided has cleared up a lot of questions I have had about why some of the restrictions exist. I was pretending that I had my own virtual instance running at all times which is clearly not the case so I'll be a little more careful about resource assumptions.

That's a fair point about control and visibility of processes you've started. I think we should bump the priority of that if it's making you shy away from certain applications.

Out of curiosity I noticed that psutil doesn't import successfully due to a lack of /proc/stat to determine system uptime. This is probably understandable if you're hiding processes, but maybe it makes sense to uninstall the module if it can't be successfully used (though I guess that might be effort for little gain).

It would be an interesting project to implement a version of /proc which represented both local and remote processes, but without a fancy way of packaging up signals and sending them across the network I suppose it's of limited value. But certainly either a page which allows sending of signals to any running processes (or their children!) would seem to be important.

As an aside, I do hope the instances have limits on the number of processes which can be forked before terminating other ones? It wouldn't be hard for one user to fork bomb someone else, either inadvertently or maliciously.

That's a fair point about psutil. We have it installed so PythonAnywhere can track scheduled tasks. We're planning on providing a nice UI for managing processes.

We already limit processes that users can create to prevent one from taking over, but it's not perfect at the moment and there are times when it goes a bit wrong. We're planning on making it more water-tight and giving users a way to control the processes that they own.

Thanks for the reply. I didn't think access to psutil was a particularly big deal in itself, but really just wanted to make you aware it doesn't seem to import successfully right now.

Sounds like you've got the fork bomb angle on your radar, irrespective of whether the current solution is water-tight, which is good news. I guess if the CPU allocation is limited externally then the impact of a fork bomb may be much reduced anyway.

Thanks for that.

Re: fork bombs -- like Glenn said, there's a hard limit on the number of processes per user, and our continuous integration loop contains a test that actively tries to start a forkbomb and confirms that other users aren't affected -- so we're pretty comfortable that we're not going to inadvertently allow that kind of DoS, at least...

Thank you for the information it does clear up some questions that I didn't realize I had...☺

Perhaps it would be good to go with a model like some search boxes have. There is the simple version we'll say that is represented by the current cron page and what it continues to be as it develops, but then perhaps also an advanced view that could parse a traditional cron command line with whatever switches are allowed as well as * * * * * * * * scheduling.