Flask - sending interim data to the front end during a route : Forums : PythonAnywhere

Flask - sending interim data to the front end during a route

I've made an app using Flask, which is currently hosted on Pythonanywhere. The app has a backend process that is initiated by the user. When the user initiates the process, some data is sent to Flask via jQuery AJAX, which is then processed, and the results are returned. This process can take between a few seconds and up to around a minute, so I have a 'please wait' modal on the front end while waiting for the AJAX response from the backend.

Is there a way I can send interim data to the front end, to update the 'please wait' modal, while the backend process is doing its thing? The backend process performs iterations until it is satisfied. So ideally I would like to be able to display to the user how many iterations it has performed.

I asked this question on stack overflow, which led me to find 'Socket IO' for Flask. So I went ahead and implemented it using that (which worked on my local dev environment), only to find that it's not supported on Pythonanywhere.

Are there any other methods I could use to send interim data from the backend, that are supported by Pythonanywhere?

Thanks, Hugh.

kinematrix | 8 posts | July 30, 2020, 7:32 a.m. | permalink

Hi Hugh,

There are ways you could do something like that while keeping the long-running code in your Flask views, but I'd not recommend it; the reason is that while a view is running, the process that it is running in is not available to handle other requests. With your account as it is configured right now, you have two worker processes to handle all requests to your site. So if you had a view that took one minute to complete processing, if one request came in to that view, one of those workers would be busy handling that. If another request came in at the same time, the other worker would be busy, and then all other requests that came in to your site would be queued up until one of the workers finished the work that it was doing. You could add on more workers by customizing and upgrading your plan, but you would always be in a situation where repeated requests to the slow view could cause your site to lock up for all other people viewing it.

So what I would suggest is that you take the long-running background process out of the Flask views and instead run it in some other way, like a scheduled or an always-on task. The way that would work is that your Flask view that initiated the job would write the details of what needs processing into a database table, and then the background job would poll that table for new jobs, process them, and put the results into the database. You could then have a view in Flask that allowed you to check the current status of the job, which you could hit from your JavaScript AJAX code to report progress to the user.

There's some more information on how you can manage that here.

giles | 12096 posts | PythonAnywhere staff | July 30, 2020, 10:03 a.m. | permalink

Hi giles,

Thanks for the detailed response. I can see how my current setup could quickly become overwhelmed if multiple users request this long-running process at the same time, and how the 'async work' suggestion could allow my front end to query the status of the backend process. If I was to change it so that the heavy processing is done by the 'always-on' task, does it still mean that only one 'worker' is working on that 'always-on' task queue? or does the 'always-on' process start additional threads to meet demand? If the 'always-on' engine is only single threaded, then it seems to me that it would be better to add multiple workers to my existing process to better handle the situation of multiple users triggering this long-running process at the same time (even though that wouldn't solve my initial question regarding getting interim data). Is my understanding correct?

Thanks, Hugh.

kinematrix | 8 posts | July 30, 2020, 12:02 p.m. | permalink

If you implement threads in your always on task, they will work there. You can write them however you like. If you do not implement threads, then the always on task will not use them.

glenn | 9718 posts | PythonAnywhere staff | July 30, 2020, 2:53 p.m. | permalink

Thanks for your previous suggestions on this issue. I've spent some time in my local dev environment, making some changes to my app, utilising a 'background process' that is constantly polling my 'tasks' database table, and then processing any new tasks that appear in that table. The way I've set up my 'background process' is within a "While True" loop, so that it just keeps running. This is working ok in my local development environment.

If I was to use this script as my 'always-on' task, what does that mean for my CPU usage on that task? E.g. even if no new tasks appear in my db table, it will constantly be looping though the 'check table for new tasks'. If this is a problem, I was thinking I could introduce a 'wait 1 second' pause, so that it only checks the database every second, rather than constantly. That way, my app would still respond relatively quickly to any new task requests (within one second), but would use up significantly less CPU time. Is my understanding correct? Thanks, Hugh.

kinematrix | 8 posts | Aug. 19, 2020, 4:59 a.m. | permalink

Yes, that's exactly right -- a busy-wait (one without a sleep) could wind up unnecessarily burning lots of CPU, but a one-second sleep should fix that.

giles | 12096 posts | PythonAnywhere staff | Aug. 19, 2020, 11:10 a.m. | permalink

I've now implemented my 'always-on' task, and it seems to be functioning ok.

Now I have a couple of questions:

I'm interested in CPU usage because my always-on task is repeatedly polling the 'tasks' database table to see if there are any tasks to process. I've set this polling at 1sec intervals so that it doesn't chew up my CPU usage, but I'm interested in how fast I can make it, without chewing through the usage. I'm a bit confused by the difference between the 'CPU Used' figures in the 'Running Tasks' section of the 'tasks' page, versus the 'CPU Usage' shown on my dashboard. Are these two metrics comparable? The 'CPU Used' on the tasks page seems to increase much more quickly than my overall 'CPU Usage' figure on the dashboard.

My Always-on task uses multiprocessing to be able to process task(s) while also checking for more tasks. In my local dev environment, I tested queueing up around 10 concurrent tasks, and they seemed to all be processed pretty quickly. Now on PA, when I run the same test, it slows down significantly, which I guess is because my machine has more cpu cores than the always-on task has access to. Would I see better performance if I used 'threading' for the parallel work instead of 'multiprocessing'?

Thanks.

kinematrix | 8 posts | Aug. 25, 2020, 3:09 a.m. | permalink

Thanks for the heads-up regarding CPU usage! It looks like one of the machines in the always-on cluster was not correctly monitoring it, so always-on tasks running there were not contributing to people's used CPU numbers. I've fixed that, so the numbers should match up now.

Threading might give you a small performance enhancement over multiprocessing, but probably not enough to be worth the effort of changing it. You probably will see slower performance on your local machine than you will on PythonAnywhere, though. If you consider the costs, your own machine might last a year or two if it was running at 100% CPU all the time -- consumer-grade CPUs are designed for continuous 100% usage. A server-grade chip would perhaps last twice as long, but would cost twice as much. So a $1,200 machine running at 100% CPU would add up to $100/month, even ignoring electricity and cooling costs.

giles | 12096 posts | PythonAnywhere staff | Aug. 25, 2020, 2:45 p.m. | permalink