Forums

Webapp is running fine until app tries to load tensorflow model and create a session. Error code generated is 502: backend but the error log is empty. The server log shows that workers died!

https://ibb.co/g3VeaR

[edit by admin: formatting]

It looks like Tensorflow crashed. Which version are you using? How did you install it?

I am using tensorflow 1.4. I installed it using "pip2.7 install upgrade --user tensorflow" command on bash console.

The model is trained. I tried running the model on bash console with a custom input, it worked fine and was giving the result. I recently limiit the number of inter and intra processes to 1 in tf.configproto variables and tried to create a session. The error log still says that resources are not available and worker dies.

Hmm, interesting. I just checked the server where your code was running, and it looks like there were some processes that had been left running, probably detached from earlier instances of your website (eg. before your most recent reloads).

A couple of thoughts:

  • Is your model using lots of memory?
  • Are you opening/closing your Tensorflow sessions carefully, for example using with tf.Session() as session: around the code to run stuff?

My model is only 15Mb on disk. So not that big of a deal I guess. If it was memory issue then it would not even work in bash console right! Here is my tensorflow code - https://ibb.co/gsuiKm

[edit by admin: formatting]

Hmm. I'm not a Tensorflow expert, but it looks like you might not be closing your tf.Session. Try replacing

sess = tf.Session(config=session_conf)
with sess.as_default:
    # ....

with:

with tf.Session(config=session_conf) as sess:
    with sess.as_default:
        # rest of your code here, indented one extra step

That should make sure that everything is properly shut down when your function exits.

Hey, tried it just now. Still no progress. :(

out of curiosity- does your code work from a bash console instead of a webapp?

--- nvm- realized that you already tried running it from a bash console and it worked.

how long did it take to run in the bash console?

It took around 5-10 seconds to run from bash console.

oh just to double check- you don't do anything like post back to your webapp etc / access anything external in your code right?

Can you please elaborate? Like access anything external? I access the model stored in some other file. Thats it! I didn't really get you right?

By "anything external", Conrad meant something like hitting an external API via requests or urllib or something like that -- or, indeed, using those libraries to hit your own site from within the view. Are you doing anything like that?

No I am not. My app is not hitting any external API.

Did you take a look at your server log? there seems to be a recurring error:

2017-12-22 12:07:18 terminate called after throwing an instance of ' 2017-12-22 12:07:18 std::system_error 2017-12-22 12:07:18 ' 2017-12-22 12:07:18 what():
2017-12-22 12:07:18 Resource temporarily unavailable

I know some other PythonAnywhere users are running tensorflow happily though... I'm wondering if it's because you are a free user and you are limited in the the # of threads you can start from a webapp.

Maybe try upgrading and reloading your webapp etc and running it? If it doesn't work just downgrade again (eg within the hour and you won't be charged, or just let us know and we will refund your payment).

okay..I am upgrading my account. Please tell me the number of workers I need to run it without errors? I am signing up for hacker account and it has only 2 workers. Open to recommendations.

Hmm I would say just try the hacker plan and see. Alternatively maybe there is a particular tensor flow configuration that you may be able to change (eg to not start any threads etc)

Just wondering if you ever found a solution to this problem, I am running into exactly that same error when running tensorflow from the flask app. Similarly it has no problem running from the console.

Just to clarify, your code runs fine from a PythonAnywhere console, but when run as a webapp, does not produce any error messages in the PythonAnywhere logs?

The problem was when calling a method from another class from my main flask_app.py. If I run the method directly in the class it works perfectly, but when I trigger that method call from the main app it causes the system error that is above

can you give us a quick code snippet of what works vs what doesn't work?

Whenever I run a file I have called test with this code

"from ChatbotFramework import ChatbotFramework

chatbot = ChatbotFramework() testString = chatbot.handleIncoming("What rooms are available?") print(testString)"

I can get the response from the tensorflow model, but in my flask_app.py file whenever I run

"chatbot = ChatbotFramework() testString = chatbot.handleIncoming("What rooms are available?") print(testString)"

which is imported in the same way I get the system error.

Whenever I run a file I have called test with this code

from ChatbotFramework import ChatbotFramework

chatbot = ChatbotFramework()
testString = chatbot.handleIncoming("What rooms are available?")
print(testString)

I can get the response from the tensorflow model, but in my flask_app.py file whenever I run

chatbot = ChatbotFramework()
testString = chatbot.handleIncoming("What rooms are available?")
print(testString)"

which is imported in the same way I get the system error.

[edited by admin: formatting]

is chatbotframework a flask app? does it do something similar to app.run() in normal flask?

It isn't a flask app, just a python class that has the model.predict call for tensorflow

And just to make sure I understand -- the specific error you're seeing is the "Resource temporarily unavailable" one...?

2017-12-22 12:07:18 terminate called after throwing an instance of ' 2017-12-22 12:07:18 std::system_error 2017-12-22 12:07:18 ' 2017-12-22 12:07:18 what(): 2017-12-22 12:07:18 Resource temporarily unavailable

Yeah that is exactly the one

Hmm. Do you know if the ChatbotFramework is spinning off threads of its own?

That is something that I am worried about, I don't believe that it is though but I could just not be understanding it properly. I can't see anywhere that it would be doing that though

Where are you seeing the error? Is it in the error or server log on the "Web" tab?

Sorry about being so slow giving you the information! The error is in the server logs

Hmm, very odd. There seem to be a bunch of different things that can make Tensorflow crash with that error message. Perhaps a good start would be to find out which line of Python code is triggering it. Could you put print statements before each line in your code where you use ChatbotFramework (including the import) so that we can find out? The output of the prints will go to the server log too.

I'm facing the exact same problem and it crashes at the point where the line

with tf.Session() as sess:

is being run.

I found this out by having the code

print("rt4")

with tf.Session() as sess:
    print("rt5")

and the log shows that it only reaches rt4.

The log is as follows:

2018-05-30 07:24:05 rt4
2018-05-30 07:24:05 W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2018-05-30 07:24:05 W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-05-30 07:24:05 W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-05-30 07:24:05 W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-05-30 07:24:05 terminate called after throwing an instance of '
2018-05-30 07:24:05 std::system_error
2018-05-30 07:24:05 '
2018-05-30 07:24:05   what():  
2018-05-30 07:24:05 Resource temporarily unavailable
2018-05-30 07:24:05 
2018-05-30 07:24:06 DAMN ! worker 1 (pid: 16) died, killed by signal 6 :( trying respawn ...
2018-05-30 07:24:06 Respawned uWSGI worker 1 (new pid: 27)
2018-05-30 07:24:06 spawned 2 offload threads for uWSGI worker 1

[edit by admin: formatting]

OK -- that sounds like Tensorflow is trying to spin up new threads, and crashing (!) when it can't. If there's some way to configure it to not use extra threads, then it should work -- but if not, it won't work in a PythonAnywhere web app :-(

So guys, how do you solve the problem, it works perfectly on my computer, but it went wrongly like yours on "tf.session", so what should I do?

The previous post by giles pretty much covers the current state of tensorflow in a web app on PythonAnywhere.