Forums

Selenium and Flask error

Hi,

I have a working selenium program that successfully completes a task for me. This works fine when running from the console.

I am now trying to incorporate this exact same program into a flask app and I am running into some issues.

the issue

I have possibly found an error where selenium cannot load web pages when running through a flask app.

replicating it

I made the test really simple to check what the issue could be:

It will just try to load https://www.google.co.uk/ and then save a screenshot of what is there. Every time it saves a blank white screenshot (no matter what the web address is).

When running exactly the same test code through a console (still on pythonanywhere) it works fine.

somebody else already encountered it

Here we see a thread about somebody likely running into the same issue; however, they just ended up solving their issue by avoiding using selenium (something which isn't an option for me).

I would much appreciate it if somebody could try and see what's going on, please? I'm new to using Selenium and Flask as of this weekend so it may just be an issue I'm causing.

Many thanks, Tom

Are you sure that you're not just taking the screenshot while the page is still loading?

Thanks Glenn

I tried adding in a delay of 100 seconds between loading the webpage and taking the screenshot to be safe. The error still occurred.

Any other ideas please?

What code are you using to start and run Selenium? I just tested with the following code, and it saved the screenshot as expected:

from flask import Flask
from pyvirtualdisplay import Display
from selenium import webdriver

app = Flask(__name__)

@app.route('/')
def hello_world():
    with Display():
        browser = webdriver.Firefox()

        try:
            browser.get('http://www.google.com')
            browser.get_screenshot_as_file("/tmp/screenshot.png")
            return(browser.title)
        finally:
            browser.quit()

Just in case, I tried the same code but using www.google.co.uk in case it was something to do with the specific regional version of Google involved, but I still got a screenshot.

Hi Giles,

I tried using exactly that code and was unable to get it working... you can see a video of what happened here.

Basically after a few "Internal Server Error"s it ran but the screenshot was still just blank white.

Thanks, Tom

EDIT: I did change your code slightly so I could see if it was returning anything at all:

return("returned: " + browser.title)

That is very strange. I see you're using Python 3.6, and I was using 3.7, but that shouldn't affect anything. Is there any chance you might have your own version of Firefox installed somewhere in your PythonAnywhere account?

That's very unlikely, I only really started using my account for this app this last weekend

Tried using Python 3.7 and nothing changed unfortunately

Can we take a look at your files? We can see them from our admin interface, but we always ask for permission first.

Of course you can, thanks guys!

Interesting -- I couldn't see anything in your account that might cause problems like this, so I created a fresh account, upgraded it to the same plan as the one you're on, and ran the code there. I got the same behaviour as you are seeing, which suggests that it's unusual that the code works on my own account, rather than there being something unusual about your account. I'll keep investigating and post back here when I have more of a handle on the cause.

Great to hear the progress, thanks so much for looking into this. I'll keep an eye out for any replies.

Phew! Well, that was a debugging session and a half :-) After numerous false starts trying to work out what differed between my own account and yours (tested and rejected hypotheses ranged from "it only works for sysadmins" to "it only works for people whose numerical user IDs are less than 65536"), we finally worked out that it was related to the number of websites associated with your account.

When we configure your setup on a web server, we limit the number of processes you can start -- the limit is pretty high, it's really just to stop people from crashing a server by creating a website that tries to start millions of subprocesses very quickly -- eg. a fork-bomb.

The process limit is based on the number of worker processes you have, and -- because you could have multiple websites running on the same server -- the total number of websites that you can create with your account.

My own account has a larger number of possible websites associated with it, so the process limit was higher. When I bumped up the number of possible websites for my fresh test account (which was configured with exactly the same account settings as yours), it suddenly started working.

I think what must be happening is that when Firefox starts up, it spins up a bunch of threads. From the operating system perspective, a thread is just a different kind of process, so the same limits apply. If it fails to start the threads, sometimes it starts up, but when Selenium talks to it, it just says "OK, that's done" without actually doing what it was asked to do -- and other times, it just crashes when starting up, so Selenium errors out. That explains the two situations you were seeing, where sometimes you got an empty title string and a blank screenshot, and other times you got an internal server error.

Anyway, I've pushed a change live for the specific web server where your website runs to bump up the number of allowable processes. I reloaded your website (required to make that take effect) and it now looks like it works. Could you check and confirm? It would also be useful to know if this fixes the problems that you were originally having when you started this forum thread.

Amazing!

Thank you so much for all the work you've put into fixing that for me! I can confirm that's working now with being able to properly load the sites through selenium and flask!

more amazing news

My program now works flawlessly!

While I've got you here; basically it scraps the uni website and gets other students' timetable details and turns this into a google calendar which it emails them a link to... it could be about to receive a load of traffic... might I need to upgrade my account?

I'll see how it goes - it might not be very popular, who knows

Overall just thanks a bunch, I've been hugely impressed with the way you guys have dealt with this :D

Excellent! Many thanks for confirming that, and indeed for making us aware of the problem originally :-) We'll roll out the change across all of our web servers so that no-one else gets caught in the same trap.

Re: upgrading your account -- the thing to look out for is the number of worker processes (which is something you can customize on the "Account" page). Basically, as requests come in for pages on your site, they're put into a queue. Worker processes pull requests off the queue one at a time, process them, return the result, and then move on to the next request (or wait for a request to appear if the queue is empty).

What that means is that if your site takes, say, 0.5 seconds to process a request, and you have two workers, you can handle 4 hits a second. If, instead, it takes 0.1 seconds to process a request, then it can handle 20 hits/second. If in the second case, you were to add on an extra worker, then you'd be able to handle 30 hits/second -- and so on.

If your site is doing Selenium stuff on every hit, then on average it will take quite a while to process requests -- so you might need quite a lot of workers to handle them. When I was using the simplified version of the site that we were talking about earlier, it took about 3 seconds to handle each request -- which would mean that if all views were like that, then with two workers, you could only handle a request every 1.5 seconds -- that is, 0.66 requests/second. To handle lots of traffic, you'd need lots of workers.

Of course, if most of your views are super-simple and there are only one or two that need to do Selenium stuff, then while those specific views might be slow, the average might be much lower than 3 seconds/request, so it wouldn't be a problem.

One thing to consider -- maybe there's some way you could move some or all of the Selenium stuff out of the website and into some other kind of code? For example, if students' timetables only need to be scraped once a day, you could do that in a scheduled task, and then put the results in a database or something like that -- then the website itself could just use the data from the DB.

We have the same problem. Can I solve it somehow?

Is your Selenium code running in your website's views? If so, it may be the same problem, but if it's elsewhere -- say in a console, or in a scheduled/always-on task -- then it's likely to be something different.

Yes, the selenium code is executed in the application. There is very little action, but the very first launch brings nothing - a blank page.

OK -- I've rolled the same fix out to the web server where your site runs. Could you try again now and see if it's fixed?

Yes, all right. Problem solved. What if this problem appears again? Or should there be no more problem?

Excellent! Thanks for confirming that.

We're in the process of writing a full fix for this -- the one that's live is experimental -- and our next system update will contain that. In the meantime, if you see the problem again, just let us know.

Thanks for your work. Feedback didn’t answer anything yet, but on the forum you quickly solve problems.

It looks like our reply to your feedback bounced because the email associated with your account is invalid.