Forums

File write and timeout errors

Hello. I have built a Flask application based on a python project of mine. The application basically performs a few REST API queries, computes logic on the results and then writes them out to an html file. It cycles through this for several queries depending on the data available for the user.

As I wrote this originally without Flask in mind, I simply present the html file to the user after the processing has completed. This is working most of the time.

However I have a few strange instances from some users who tested the app for me. 2 users I see that their html file wrote out with a size of 0KB. My code basically looks like this:

def initialize_report(obj_report, user):
''' Create opening HTML/CSS for report '''        
    f = obj_report
        html = f"""
        <HTML>
        <title>
            CCSSR - {user}
        </title>
    """
    write_line(f, html)

filename = f"{user.name}_{date_time}.html"
obj_report = open(filename, 'a+', encoding='utf-8') 
initialize_report(obj_report, user.name)

So, you can basically see that opening the report is immediately followed by writing a few lines of html to the file. This means the script must have been able to create the file, but was unable to append data into it.

I realize I don't have a lot of error handling here (mostly because I am not yet proficient enough with Flask to pass these errors back to the user). However I need to know how this is possible. I do have several sections in my error.log that have:

OSError: write error

However they don't appear to correspond with the file creation times (I'm not sure if file time stamps in files area is same time format as logs). I have about 4 separate instances of this error, sometimes it is just logged once, sometimes there are several in succession.

My other issue which may be related is that one user ran a report that had many entries. On the first iteration for the user, over 400 items were queried from REST and written out to the html. The second iteration returned more (I don't know how much as I can only see what was recorded. The file appears to have stopped writing in the middle of item # 1,038 (approximately). I know this because I can actually see the HTML table truncated half way through this item.

The user doesn't remember exact details of runtime, but it is possible that this write operation was running for several minutes. Originally I was opening and closing my file on every writeline, but this obviously was slow performance. So, I modified my code to open the file at the beginning of the report and not close it until the entire report was written. Is there some timeout that would have cause the OS to close the file in the middle of writes? How can this be prevented?

It sounds like you're being affected by the way Python buffers file writes. Essentially, internally it has a buffer, and when you write to a file it just writes to that buffer in memory -- only when the buffer fills up does it flush the data to the disk. So if (for example) your view takes more than a few minutes to run, and is timed out by the system (which assumes that views that take a long time to return have crashed and kills them), then you would lose all of the data that was pending in the buffer.

So the best answer here is, "don't do things that take a long time in website code" -- we have suggestions on how to offload that to other processes on this help page, though most of them are only really effective in paid accounts, where you can use always-on tasks.

But you can at least work around the buffering -- when you create a file-like object with open, like this:

obj_report = open(filename, 'a+', encoding='utf-8')

...then you have a method obj_report.flush() available, which will write the current contents of the buffer to disk.

So I found this forum post:

https://www.pythonanywhere.com/forums/topic/13047/

and indeed I do see the Harakiri in my server logs. I believe that in at least a couple of these occurrences I am coming up against the 5 minute timeout. I don't believe however this can account for the 0 byte files I am seeing on some runs. My code initializes a report file and writes some data to it even before it begins on any of the queries that may cause a time out. There is literally nothing that can go wrong between opening the file and writing a line unless something is happening on the underlying file system.

Unless the python file flushing works different on your servers than on my workstation? In my testing the file is always immediately written to as soon as it is created.

To clarify about the 5 minute time out. This is an artificial number enforced specifically by PA?

What specifically contributes to this period? Is it the time my API call would run, or the entire time it takes to return a page to the user?

For instance, say my code does this: - flask_app.py - presents form for script parameters - calls function which enumerates courses from a REST API - each course once retrieved is written to a file - after written to file, next course is retrieved, rinse, repeat. All being written to the same file.

So I understand that if any given API call takes more than 5 minutes, that is a deal breaker...we need to do some type of queuing (which is beyond my skill at this point...but I'll get there eventually). But let's say any single API call will take no longer than 3 minutes (per course), but I might need to go through 3 courses per run.

Is there a way to break this up so that for each course it somehow breaks the chain (maybe by presenting a status to a user and then automatically continuing on with the next task)?

I'm new to this, so don't want to try to redesign around something I don't fully understand.

A write to a file only reaches the disk when there is a flush. Depending on the configuration of the machine, that flush might happen immediately or only when some buffer is full. Your disk probably flushes on fairly small writes because it is local and so there is little cost to flushing often. On PythonAnywhere, the disk is across a network (that is how we can allow you to access your files from a number of different machines in our cluster), so a flush can be expensive. If you need your data written immediately, you can flush after writing it, but there will be a performance hit for doing that.

The 5 minute limit is imposed to ensure that broken web apps get restarted in a reasonably timely manner and so that poorly written web apps that may never return a response cannot just sit there consuming resources. It covers the time between a request arriving at the server and the response being sent. There is not no opportunity for an intermediate partial response to delay the timeout. No user of a web page is going to sit watching the loading animation for 5 minutes, so doing the work in a scheduled task or an always on task and letting the user know in some way that the work is in progress and then when it has completed is really the best way to provide results that take that long to generate. Have a look at out help page that has a brief outline of how to go about it.

No user of a web page is going to sit watching the loading animation for 5 minutes

Of course not, but that doesn't mean that the user can't run a report and come back after 8 minutes. I think a user is aware if they need to process a large data set that can take some time, and if your app is letting the user know that, then there isn't a problem (from a user expectation perspective).

It would of course be nice to show some sort of progress - since a user just seeing the form submit page for 5 minutes may not know if something is amiss or not.

I realize you're incentivized to provide options which require a paid plan, but I'm not sure I understand your statement:

There is not no opportunity for an intermediate partial response to delay the timeout

Firstly, it is a double negative - so I'm not sure if you meant to imply that there is an opportunity? If not, I'm not sure I understand why the following wouldn't be possible:

  • Start processing each item in turn, which includes an API call to retrieve item and then a series of logic and file writes.
  • After each item is written (or even in between the API retrieval and writing) a page is presented to the user which informs them of which item is currently being processed.
  • Upon submitting that screen to the user, the next item in the list is processed.

What are the limitations of creating a flow like this and if possible, why would it not be a way to limit a single uninterrupted process? I realize that if any single one of these items took over 5 minutes there would still be an issue. However I: a) Don't believe that any single one will take over 5 minutes and b) I would like to be able to return a status page to my user about the processing. Personally, I believe a user would rather have their report status showing up on a screen being updated every 30 seconds+ rather than click to run a report and have to check back at some future point.

thanks

With the paid account on PythonAnywhere, you can use the Always-on Task to do some processing in the background and build API so the frontend can trigger some processing, query for the result, and retrieve it when it's ready. See https://help.pythonanywhere.com/pages/AlwaysOnTasks/