Forums

502 errors because of insufficient web workers or something else?

Hi PA,

I started receiving emails from a client a few days ago about the app feeling "extra slow" compared to a week ago. I wen't through the logs and noticed the app (mcdconstruc) was responding with 502 errors. I haven't made any major changes which leads me to believe I may not have enough web workers (currently have 9). Could you let tell me if that's the case or if the cause was something unrelated (e.g., someone else's app on the server)?

For reference:

[07/Mar/2023:20:38:29 +0000] "GET / HTTP/1.1" 502 1661 "-" " ... response-time=0.000
[07/Mar/2023:20:38:30 +0000] "GET / HTTP/1.1" 502 1661 "-" ... response-time=5.454
[07/Mar/2023:20:38:30 +0000] "GET / HTTP/1.1" 502 1661 "-" ... response-time=0.000
[07/Mar/2023:20:38:30 +0000] "GET /favicon.ico HTTP/1.1" 502 1661 "/" ... response-time=0.000
[07/Mar/2023:20:38:31 +0000] "GET /serviceworker.js HTTP/1.1" 502 1661 "/serviceworker.js" ... response-time=0.000
.
.
.
[07/Mar/2023:20:39:38 +0000] "GET /communication/notifications/center/unread/count HTTP/1.1" 200 0 "/subcontractors/pay-periods/90/" " ... response-time=131.788

Thank you for your time!

Also, I tried to include an emoji in the forum post and it looks like if I do it causes all text afterwords to be excluded when posting (just incase you weren't aware).

I see only two occurences when there were no idle workers on 2023-03-07 -- at 06:07:45 and 20:39:44. So the second timestamp matches what you see in your log. There are 5 more occurences in our logs like this in March (all before the 7th). So it may be related.

Hi pafk,

Thanks for the response. Just to clarify, you think it’s directly related to the lack of webworkers (not someone else’s app)? If that’s the case, what would you recommend I increase the web workers too? Or if that’s too broad of a question, are you able give me a rough idea on how many idle web workers there are near those occurrences (e.g., like an hour before)? I’d be happy to go through the logs myself too if I could somehow get access!

Thanks for your time.

We have a help page on how you can work out how many workers you might need here: https://help.pythonanywhere.com/pages/HowManyHitsCanMySiteHandle/

Hi glenn,

I've gone through that article before and I understand the concept behind web workers as I've had this issue in the past. The estimation is just difficult when the same view's response is different by an order of magnitude:

... [27/Feb/2023:23:51:10 +0000] "GET ...unread/count HTTP/1.1" 200 0 "/" "Mozilla/5.0 x64) ..." response-time=0.650
... [27/Feb/2023:23:51:54 +0000] "GET ...unread/count HTTP/1.1" 200 0 "/" "Mozilla/5.0 x64) ..." response-time=0.055
... [27/Feb/2023:23:52:41 +0000] "GET ...unread/count HTTP/1.1" 200 0 "/" "Mozilla/5.0 x64) ..." response-time=1.100
... [27/Feb/2023:23:53:25 +0000] "GET ...unread/count HTTP/1.1" 200 0 "/" "Mozilla/5.0 x64) ..." response-time=0.234
... [27/Feb/2023:23:54:10 +0000] "GET ...unread/count HTTP/1.1" 200 0 "/" "Mozilla/5.0 x64) ..." response-time=0.117
... [27/Feb/2023:23:54:55 +0000] "GET ...unread/count HTTP/1.1" 200 0 "/" "Mozilla/5.0 x64) ..." response-time=0.226
... [27/Feb/2023:23:55:40 +0000] "GET ...unread/count HTTP/1.1" 200 0 "/" "Mozilla/5.0 x64) ..." response-time=0.118
... [27/Feb/2023:23:56:25 +0000] "GET ...unread/count HTTP/1.1" 200 0 "/" "Mozilla/5.0 x64) ..." response-time=0.215
... [27/Feb/2023:23:57:10 +0000] "GET ...unread/count HTTP/1.1" 200 0 "/" "Mozilla/5.0 x64) ..." response-time=0.199
... [27/Feb/2023:23:57:57 +0000] "GET ...unread/count HTTP/1.1" 200 0 "/" "Mozilla/5.0 x64) ..." response-time=2.006
... [27/Feb/2023:23:58:40 +0000] "GET ...unread/count HTTP/1.1" 200 0 "/" "Mozilla/5.0 x64) ..." response-time=0.373
... [27/Feb/2023:23:59:26 +0000] "GET ...unread/count HTTP/1.1" 200 0 "/" "Mozilla/5.0 x64) ..." response-time=0.991

I am going to increase the number from 9 to 12 and see if it resolves the client's issue. Would it be possible for me to check back in a week to see if there are still occurences?

You might learn something if you work out what makes that view's responses vary so much and you can check back in a week on your worker usage.

42.110.233.33 - - [13/Mar/2023:07:02:21 +0000] "GET /favicon.ico HTTP/1.1" 502 1661 "https://thinknxtmedia.pythonanywhere.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36" "42.110.233.33" response-time=0.000

sir i'm facing this 502 error for favicon.ico though it was render properly while localhost but here it isn't work on hosting although changing file path and deploy form scratch also not working same error show again also 502 error handling also not working on flask,

There are many things that can cause 502 errors, so we have a help page to help you to debug it here: https://help.pythonanywhere.com/pages/502BadGateway/

Hi PA,

Following up on the previous posts, can you provide any insights into the worker usage?

Also @glenn, I haven't been able to figure out what is causing the response's to vary. That particular view's responses are are very small (roughly 550B - 1kB) and at that moment in the logs, the content of the response was the exact same which led me to believe it was something unrelated the app and more likely something to do with someone else's app running on the server.

Thanks for your help!

I only see 2 times in the last few days where all your workers are busy. Unless your traffic is very bursty (that is, there are a large number of requests hitting your site in a very short time (under a minute)), I don't think that you are running out of workers. When were the 502s that you are asking about?

The original 502s were from the first week of March (1st post in this topic has the timestamps). Also, may be unrelated but I have been seeing an increase in OSError: write error appearing in the error logs.

The 'OSError: write error' message is logged when a client disconnects from the server before the server has finished sending the response.