Bash console keeps stopping

[PythonAnywhere dev update. Please see the wiki help topic on long-running tasks

I'm trying to run a Python script continually through Bash, but every one or two days I log on and find that the script has stopped running. The console only displays a "~" line, with none of the script's terminal output appearing, and having returned to the root directory. Is there any way I can use PythonAnywhere to run a script continuously?

Thanks for your help.


My understanding is that the software that underpins PythonAnywhere is updated twice a week or so. When that happens, all consoles, etc., gets reset.

If you want a console app that runs pretty much all the time, what you can do is set up a scheduled task every hour or so that checks if the script is running, and starts it if it is not. Then your maximum downtime will be 50 minutes or so.

Yes, I'm pretty sure the intention of the online bash consoles is not to run persistent scripts through them, that's what the scheduled jobs are for.

Not to say that you can't run persistent scripts, but the virtualisation is quite lightweight so you can't rely on things running continuously and in the same place.

Are you doing something that needs constant CPU time? If you're doing something that only needs sporadic access (e.g. polling an RSS feed) and you've just written it as a continuous script for convenient, I'm sure it can fairly easily be re-written to persist some of its state and be run from a scheduled job instead.

Okay, I see. I think scheduled tasks might help; I'll look into that.

I'll just add the official imprimatur to the above;

Consoles do persist but there's no guarantee on how long -- normally they should be fine for a few days, but when we're updating the system frequently it can be less. We're planning to increase the time they last, but I don't think we'll ever have a guarantee of precisely how long they'll last.

Scheduled tasks are definitely the best way to ensure that a process keeps running. Here's a snipped of code you can use at the start of your process to make sure that you have only one at a time:

import logging
import socket
import sys

__lock_socket = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
    __lock_socket.bind('\0' + "MYPROCESS")
    logging.debug("Acquired lock %r" % (lock_id,))
except socket.error:"FAILED to acquire lock %r" % (lock_id,))

You'll need to change MYPROCESS to something like johnsmithy2.processname

Will that locking code still work if the two instances are started on different machines?

I would have thought that unix domain sockets in the abstract space were implicitly machine-specific - is there something in the virtualisation which shares them across all instances?

I feel I should also point out for anybody's reference that the abstract socket namespace (i.e. names with the leading nul (\0) byte) are a Linux-specific feature - you may wish to consider them carefully if you want your code to be portable to other platforms. That said, the main script invoked by a scheduled task is most likely quite PAW-specific, so most people probably don't care too much about portability in that.

Hi Cartroo -- yes, you're right on both points -- the code only stops two copies from running on the same machine, and is Linux specific. We do currently guarantee that a given user's tasks will always run on the same machine, though -- and as you say, we're running on Linux.

On the other hand, perhaps I should be more careful here -- while we don't currently have plans to have a given user's tasks running on a different machines at different times, we might shard things differently in the future -- so maybe I can't say that that will always work... it will work for the time being, however.

@giles: Thanks for clarifying, it's useful to know they'll always be on the same machine for the time being.

If anyone does need a slightly more platform-agnostic approach, there's always fnctl locking. I know the reliability of this has come up before, but it's the variety file locking that seems to work on the widest array of platforms (even where the atomicity of creat() isn't guaranteed, for example).

To adapt the example above:

import errno
import fcntl
import logging
import os
import sys

lock_filename = os.path.expanduser("~/scheduled-script.lock")

    lock_fd = open(lock_filename, "w")
    fcntl.lockf(lock_fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
    logging.debug("Acquired lock on %r, lock_filename)
except IOError, e:
    if e.errno in (errno.EAGAIN, errno.EACCES):"FAILED to acquire lock on %r", lock_filename)
    logging.error("Error acquiring lock on %r: %s", lock_filename, e, exc_info=True)

Bit more verbose - of course, you can get away without all that logging and differentiated error codes if you don't fancy it. Once benefit of this is that you can write things like the PID and start time of the instance into the lock file once you've acquired the lock, which can be useful for debugging.

Note that the lack of lock_fd.close() (or equivalent use of with) is quite intentional - the lock file is only closed when the script exits and Linux closes the file descriptor. This works even if the process terminates ungracefully, so there should be zero chance of a dead process leaving the file locked.

You could equally well put this in, say, a main() function, but for obvious reasons it needs to be near the top-level scope of the script - as soon as lock_fd goes out of scope, your script is no longer protected by the lock.

Thanks for the code snippet, giles. Sorry if it's a stupid question, but: What do I put in for "MYPROCESS"? Is it just the name of the function I want to call? E.g.,

import logging
import socket
import sys

__lock_socket = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
    __lock_socket.bind('\0' + "helloworld")
    logging.debug("Acquired lock %r" % (lock_id,))
except socket.error:"FAILED to acquire lock %r" % (lock_id,))

import time
def helloworld():
    while True:
        print "Hello world!"

Or do I have to store the actual process code (from "while True" on) in a separate file called "" in the same directory as the first file?

@Cartroo -- thanks! I agree that that definitely should work -- but I do worry that filesystem locking might not work 100% reliably in our environment thanks to EC2, as discussed in the thread you link to. It would be nice if there was a way to do something that will work when/if we have multiple possible task servers for a given user, though. Perhaps we should be thinking in terms of PythonAnywhere-specific "keep one and only one instance of this job running" option?

@johnsmithy2 it's basically a machine-wide name for your lock, so just use johnsmithy2.anythingyouwant -- then use a different anythingyouwant if you have another task you want to do the same thing for.

@giles: It's always worked in my experiments on PAW, but I fully accept that's a pretty weak assertion when it comes to the correctness of locking primitives.

The "keep one instance running" option could be useful, but I can imagine the locking issue also affecting web applications at some point, and that's a case where running on multiple machines is presumably going to be a lot more likely.

I guess it would be possible to use MySQL - for example, take out a table lock with a connection with a short timeout and rely on the loss of the connection to release the table lock. It would be a bit of a pain for the application to keep checking it's still got the table lock, though (in case the connection drops for due to networking issues or similar). Also, you'd need to write code which coped with losing the lock midway through its processing (for anything which takes more than a few seconds), although some might say that crash-only software is good practice anyway.

Before wasting time on all that fuss, though, might be worth checking out memcached_lock which purports to be a distributed locking primitive built on top of memcached. Not sure how well it works, but if it's a drop-in solution then it might be worth a brief look.

Web applications are definitely something where we're planning to scale things up, but I don't think it matters quite so much -- there shouldn't be an issue with multiple instances of someone's web app running at one time -- in fact, hopefully it'll be an advantage.

But for tasks, we'll definitely need to find a solid solution -- thanks for the link to memcached_lock!

@giles: It wouldn't typically matter for web applications as the only shared resource that most of them have is a third-party database engine with its own locking arrangements. I was thinking of the odd user who might wish to do their own locking for other types of shared resources, should that ever crop up - for example, someone implements a wiki backed by a git repository instead of a database. Quite possibly this isn't worth worrying about too much until someone actually tries to do it, of course!

I agree -- if people want it, then we'll add it, but we won't plan for it for the time being.