Forums

Nothing to display using pyvirtualdisplay

I upgraded my account to Hacker account (Paid account).

And I followed the below step but still no result. How can I do?

https://help.pythonanywhere.com/pages/selenium/

from pyvirtualdisplay import Display
from selenium import webdriver

with Display():
    # we can now start Firefox and it will run inside the virtual display
    browser = webdriver.Firefox()

    # put the rest of our selenium code in a try/finally
    # to make sure we always clean up at the end
    try:
        browser.get('http://www.google.com')
        print(browser.title) #this should print "Google"

    finally:
        browser.quit()

I'm using python 3.6 and below application.

beautifulsoup4 (4.6.0)
bs4 (0.0.1)
certifi (2017.11.5)
chardet (3.0.4)
Django (2.0)
EasyProcess (0.2.3)
idna (2.6)
lxml (4.1.1)
numpy (1.13.3)
pandas (0.21.0)
pip (9.0.1)
python-dateutil (2.6.1)
pytz (2017.3)
PyVirtualDisplay (0.2.1)
requests (2.18.4)
selenium (2.53.6)
setuptools (38.2.4)
six (1.11.0)
urllib3 (1.22)
wheel (0.30.0)

I think maybe the virtual display cannot activate. How can I activate the virtual display for selenium?

How can I check the virtual display driver?

Thank you.

and you are sure that the script ran without any errors?

I'm sure.

If I change the script like below , that can print out text "Hello World" in log file.

print(browser.title) --> print("Hello World")

out of curiosity- what happens if you try to get a different page other than google.com?

the virtual display is working find if the code has not errored by then (if there is no display, selenium wouldn't be able to start)

also- what python version/selenium version etc are you using?

If I click RUN directly, it show below error message in console.

ModuleNotFoundError: No module named 'pyvirtualdisplay'

I changed the script like below:

print(browser.title) --> print(str(browser.page_source))

It print out below text:

<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body></body></html>

So, I can't get the web page title.

How can I do? I already installed pyvirtualdisplay using pip install.

I'm using python 3.6. PyVirtualDisplay (0.2.1) selenium (2.53.6)

Thank you.

When I run your code above with

python 3.6. PyVirtualDisplay (0.2.1) selenium (2.53.6)

installed in a python3.6 virtualenv, the browser.title prints as Google.

If you are getting the error

ModuleNotFoundError: No module named 'pyvirtualdisplay'

It is because you do not have pyvirtualdisplay installed. How are you installing the packages? And are you using the correct python when running your script? eg: if you installed into a virtualenv, are you using the virtualenv python to run the script? Or if you installed using the --user flag into say python3.6, are you running the script in python3.6?

And just to clarify- how are you running your script? Is it from the file editor? Or from the bash console and then calling python myscript? Or a scheduled task?

If I write it into independent file, it works. I can get a text as "Google" now.

However, I can't run it with Django.

I write below script for output:

print(str(browser.page_source))

It print out below text:

<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body></body></html>

I need a web scraping then display a simple html with Django. And I need input some parameters with web address then web scraping with selenium and display the html with Django.

How can I do?

Thank you.

That sounds strange; the code you gave earlier (with the replacement of print(browser.title) with print(str(browser.page_source))) should work fine. Can we take a look at your code? We can see it from our admin interface, but we always ask for permission first.

Just as an aside: my initial thought was that the server where your web code runs might be blocked by Google, but I was able to run equivalent code myself there -- so that's not the problem.

HI giles,

How can I grant permission to you for checking?

My code shown as below:

urls.py

from django.conf.urls import url
from django.contrib import admin
import WebScraping.views

urlpatterns = [
    url(r'^admin/', admin.site.urls),
    url(r'^TestDataView/$', WebScraping.views.TestDataView.as_view(), name='TestDataView'),
]

WebScraping/views.py

from django.shortcuts import render
from django.views.generic.list import ListView
from selenium import webdriver
from pyvirtualdisplay import Display

def test():

    DataTable = ""
    data = ""

    with Display():
        # we can now start Firefox and it will run inside the virtual display
        browser = webdriver.Firefox()

        # put the rest of our selenium code in a try/finally
        # to make sure we always clean up at the end
        try:
            browser.get('http://www.google.com')
            print(browser.title) #this should print "Google"
            data = str(browser.title)

        finally:
            browser.save_screenshot('./screen.png')
            browser.quit()
            #display.stop()


        DataTable = "<table border=1>"
        DataTable += "<tr>"
        DataTable += "<td>" + data + "</td>"
        DataTable += "</tr>"
        DataTable += "</table>"


    return DataTable

class TestDataView(ListView):
    def dispatch(self, request, *args, **kwargs):

        DataTable = test()

        return render(request, 'DataTable.html',locals())

WebScraping/templates/DataTable.html

<html>
<head>
</head>
<body>
{{DataTable|safe}}
</body>
</html>

It cannot show "Google".

Please help.

Thank you.

Hi there,

when I visit your site at view-source:http://kenmine.pythonanywhere.com/TestDataView/

I see some HTML that includes:

<table border=1 id='CurrectPriceData'><tr><td></td></tr></table>

but I don't see that id=CurrectPriceData anywhere in your code -- so I think maybe you're not running the code you think you are, or you're not using the template you think you are?

the second thing is: we strongly recommend against doing any kind of web scraping directly in your web app. It will be too slow. Instead, you should use some sort of async task queue. More info here: http://help.pythonanywhere.com/pages/AsyncInWebApps

Hi,

Sorry, I haven't click the "Reload kenmine.pythonanywhere.com" Button. So, the source code haven't update that you see the "id='CurrectPriceData'".

How can I passed the parameter using AsyncInWebApps as a scheduled task?

e.g I need to browse http://kenmine.pythonanywhere.com/TestDataView/parameter1

Then Django will base on the parameter value "parameter1" to do the web scraping and then base on the template to display a modified format web page.

e.g . If I input the "parameter1" into web address, Django will do the web scraping for "https://www.google.com.hk/search?q=parameter1"

How scheduled task can do this?

How can I do?

Thank you.

Here is the relevant section from the Async Web App help page that Harry referenced above:

  1. register the user's request for work somewhere by storing the details of the request somewhere, eg on the filesystem, or in a table in your database
  2. respond immediately to the user and let them know the request is now in state "pending"
  3. set up a Scheduled task whose job it is to monitor your task queue (eg the database), and pick jobs off one by one. Include some code to update the job status (eg, pending, under way, complete...)
  4. give the user a way of checking on the progress of the job, either by asking them to refresh the page, or perhaps setting up an Ajax polling system.

Hi,

How can I do the Async Web App, Scheduled task and a task queue in Pythonanywhere ?

Any example?

I saw the Scheduled tasks in Pythonanywhere can run hourly or daily only. How can I run a Scheduled tasks per 1 mins?

Please help.

Thank you.

We don't have any examples, but the help page that Conrad linked to above has some good guidelines about how to code it. The best way to do the processing is to use an always-on task rather than a scheduled task; they're a beta feature, but I've switched them on for your account. There's more detail about how they work in this blog post.