Forums

Best practice/service (or how-to) for close-to-seamless task continuation

After reading on Long running tasks, I understand that 6 hours is the current limit for a task. This shouldn’t be a problem as long as the task can restart immediately.

Could someone verify the following is a valid initial understanding and workable approach, or would it be recommended that another service be used for this sort of long-running process?:

Plan: One script contains the long-running code (so should be set to run every 24 hours [meaning it should stop after 6 hours, since it is always running]). Another script is set to run every hour, its purpose being to continually monitor that the main script is running, and restart the main script if it stopped. The result would then be a main script that, barring PA server maintenance, runs seamless for 6 hours a time and continually restarts.

Given this is a valid approach, are there any other resources that could help me start this process? I’m not sure how to check a script’s status from another script and how to remotely restart it if it stopped.

The code on the page should work fine without that -- let me try to explain it in a different way, and see if it's clearer (if it is, we can update the help page :-)

Let's say you have a module called my_module, which contains a function called my_long_running_process. What you want is for the function my_long_running_process to be kept running. If it exits -- either due to the 6-hour task run limit, or a server reboot (hopefully rare), or even if it crashes -- you want it to be started up as soon as possible.

If you put the code that's on the Long running tasks page into a separate script, let's call it long_running.py, and run it, what it will do is try to acquire a lock. If the lock is already held by another process on the same machine, then it will exit immediately. If the lock is not held by another process, it will run your my_long_running_process function, and then exit when that function completes. The lock will be automatically released if the long_running.py script exits for any reason -- whether it's due to the process being killed, a server reboot, or a crash.

Now, all scheduled tasks -- not processes run in consoles -- on PythonAnywhere for a particular user account are currently run on the same machine. (This might change in the future, but not before we have a better solution for long-running tasks.)

What that means is that if you set up the script python3.6 long_running.py as a scheduled task -- scheduled every five minutes by setting up 12 hourly tasks that just run it -- then the first time it was run, it would try to acquire the lock, and succeed, so it would run your my_long_running_process function. The next time it was run, five minutes later, it would try to acquire the lock, but it would see that another process already has it, so it would immediately exit. The next time, the same thing. The lock would stop any extra copies of the process from running.

Now, after 6 hours, the process that had the lock would be killed. (Or perhaps before then, if the server was rebooted -- again, unlikely -- or the script crashed.) The lock would automatically be released. And within a maximum of five minutes, the hourly task would kick off again. This time it would find that no other process had the lock, so it would acquire it, and would start running the code.

The net effect would therefore be that the function my_long_running_process would be kept running forever, with a maximum outage time of five minutes.

Is that any clearer?

Good to know that a five minute interval monitor is possible and conventional. The explanation is probably as clear as it should be regarding my disposition, having not yet attempted it. I think this will help get something started, so I'll post back here regarding any unresolved roadblocks. Thank you.

OK -- do let us know if you have any problems.

It appears I need a way to manually stop (or restart) the long-running process, as there could be a logic error, or error in another thread that doesn’t stop the main, long-running, thread (or otherwise just an immediate update needed). If this occurs, the monitoring tasks would not know to restart the long-running process. I haven’t yet found a method to view and kill processes manually, if that's possible. Could you provide a method or resource for this?

Update: Perhaps it would be best to just add a database table that could be updated from the console to signal whether it should keep running...monitored by the long-running process. I'm presently trying this and will post back if there are issues. In any case, it would be useful to know a simple method of doing this directly.

I'm not entirely sure if you're using "thread" to mean an actual thread, or if you're just using it as a placeholder for process. If you mean actual thread, then it's generally not advisable to kill threads from other threads and most programming languages don't permit it. The best way to do that is to have some sort of indicator - a variable value or, as you suggested, an external store - to indicate to the thread that it should terminate. Then you need a little extra logic in the thread to check for that value and terminate if it needs to.