How to run scrapy with splash on PythonAnywhere

I would like to run scrapy with splash on PythonAnywhere. I have succesfully installed scrapy itself.

Is it possible to install docker on PythonAnywhere? This is the way I have installed splash on my own machine and it is the recommended way to install and run splash. I haven't been able to find any info about running docker on PythonAnywhere and therefore I haven't succeeded in installing splash via docker.

Instead, I tried installing splash manually but it doesn't work. The installation went fine (pip install splash) but I cannot start splash. See error message below.

(scrapy36) 18:58 ~/gds/gds $ python3 -m splash.server
Traceback (most recent call last):                                                                                                                  
  File "/usr/lib/python3.6/", line 193, in _run_module_as_main                                                                              
    "__main__", mod_spec)                                                                                                                           
  File "/usr/lib/python3.6/", line 85, in _run_code                                                                                         
    exec(code, run_globals)                                                                                                                         
  File "/home/robinhat/.virtualenvs/scrapy36/lib/python3.6/site-packages/splash/", line 11, in <module>                                    
    from splash.qtutils import init_qt_app                                                                                                          
  File "/home/robinhat/.virtualenvs/scrapy36/lib/python3.6/site-packages/splash/", line 15, in <module>                                   
    from PyQt5.QtWebKit import QWebSettings                                                                                                         
ModuleNotFoundError: No module named 'PyQt5.QtWebKit'                                                                                               
(scrapy36) 18:58 ~/gds/gds $

no- unfortunately you cannot run your own docker images on PythonAnywhere. What does splash do? My impression of PyQt is that it is a GUI toolkit- which is meaningless in a headless server environment.

From the Splash documentation:

Splash is a javascript rendering service. It’s a lightweight web browser with an HTTP API, implemented in Python 3 using Twisted and QT5. The (twisted) QT reactor is used to make the service fully asynchronous allowing to take advantage of webkit concurrency via QT main loop.

The websites I need to scrape are heavily javascript-based, so vanilla scrapy won't do the job. I need Splash to render the websites into HTML the same way my Chrome browser does.

I am not very experienced at this and I cannot explain why Splash needs PyQt.

You can try using scrapy with the Firefox that is already installed on PythonAnywhere.

Ok, I gave up on Splash even though it seems really nice. Instead I tried Selenium with the Firefox driver. The example code worked fine:

from pyvirtualdisplay import Display
from selenium import webdriver

with Display():
    # we can now start Firefox and it will run inside the virtual display
    browser = webdriver.Firefox()

    # put the rest of our selenium code in a try/finally
    # to make sure we always clean up at the end
        print(browser.title) #this should print "Google"


When I run it, I get:

(scrapy36) 19:17 ~/cbb $ python 
(scrapy36) 19:21 ~/cbb $

However, if I change the line




which is the website, I would like to scrape, it raises an exception:

(scrapy36) 19:21 ~/cbb $ python 
Traceback (most recent call last):
  File "", line 12, in <module>
  File "/home/robinhat/.virtualenvs/scrapy36/lib/python3.6/site-packages/selenium/webdriver/remote/", line 248, in get
    self.execute(Command.GET, {'url': url})
  File "/home/robinhat/.virtualenvs/scrapy36/lib/python3.6/site-packages/selenium/webdriver/remote/", line 234, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/home/robinhat/.virtualenvs/scrapy36/lib/python3.6/site-packages/selenium/webdriver/remote/", line 401, in execute
    return self._request(command_info[0], url, body=data)
  File "/home/robinhat/.virtualenvs/scrapy36/lib/python3.6/site-packages/selenium/webdriver/remote/", line 433, in _request
    resp = self._conn.getresponse()
  File "/usr/lib/python3.6/http/", line 1331, in getresponse
  File "/usr/lib/python3.6/http/", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.6/http/", line 266, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
(scrapy36) 19:23 ~/cbb $

I can't find out why this happens on this particular site I am trying to scrape. Other sites (including https:// sites) work well. Is it because Pythonanywhere has an old version of Firefox that cannot handle What can I do to solve my problem?

It does look a bit like the browser is crashing or is not able to respond to selenium when it's visiting the site.

Was there ever a resolution to this topic?

in general the scrapy + selenium solution works fine. I don't think there was a resolution for problems specific to