Forums

Wrong "Scrapy" when running python file via flask and server.py

Hi, I googled and googled but I can't come up with a solution so I wanted to ask you for help. I think it's related to the Scrapy modul that I am not allowed to pip uninstall.

Background Informations

I am running a flask web app at kulturdata.pythonanywhere.com/audit where I use the great pip package "advertools" to run some web scraping tasks.

If I run the web scraping file alone in my virtualenv it works just fine. But when I try to import that same module from my flask server.py file and try to run a scrape it serves an error.

import flask
from flask import request, render_template, redirect, url_for

app = flask.Flask(__name__, instance_relative_config=True,
                  template_folder='../frontend',  static_folder="../frontend")

@app.route('/audit', methods=['GET', 'POST'])
    def audit():
       import advertools as adv
       adv.crawl("https://www.muenchenmusik.de", "try.jl")

the traceback says:

2021-05-03 11:40:34 2021-05-03 11:40:34 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: scrapybot)
2021-05-03 11:40:34 2021-05-03 11:40:34 [scrapy.utils.log] INFO: Versions: lxml 4.2.3.0, libxml2 2.9.8, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.7.0, Python 2.7.12 (default, Oct  8 2019, 14:14:10) - [GCC 5.4.0 20160609], pyOpenSSL 19.0.0 (OpenSSL 1.1.1d  10 Sep 2019), cryptography 2.8, Platform Linux-5.4.0-1029-aws-x86_64-with-Ubuntu-16.04-xenial

2021-05-03 11:40:34 Traceback (most recent call last):
2021-05-03 11:40:34   File "/usr/local/bin/scrapy", line 8, in <module>
2021-05-03 11:40:34

2021-05-03 11:40:34 sys.exit(execute())
2021-05-03 11:40:34   File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 146, in execute
2021-05-03 11:40:34

2021-05-03 11:40:34 _run_print_help(parser, _run_command, cmd, args, opts)
2021-05-03 11:40:34   File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 100, in _run_print_help
2021-05-03 11:40:34

2021-05-03 11:40:34 func(*a, **kw)
2021-05-03 11:40:34   File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 154, in _run_command
2021-05-03 11:40:34

2021-05-03 11:40:34 cmd.run(args, opts)
2021-05-03 11:40:34   File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/runspider.py", line 79, in run
2021-05-03 11:40:34

2021-05-03 11:40:34 module = _import_file(filename)
2021-05-03 11:40:34   File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/runspider.py", line 21, in _import_file
2021-05-03 11:40:34

2021-05-03 11:40:34 module = import_module(fname)
2021-05-03 11:40:34   File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
2021-05-03 11:40:34

2021-05-03 11:40:34 __import__(name)
2021-05-03 11:40:34   File "
2021-05-03 11:40:34 /home/KulturData/.virtualenvs/myvirtualenv/lib/python3.8/site-packages/advertools/spider.py
2021-05-03 11:40:34 ", line 
2021-05-03 11:40:34 4
2021-05-03 11:40:34

2021-05-03 11:40:34 SyntaxError
2021-05-03 11:40:34 :

2021-05-03 11:40:34 Non-ASCII character '\xf0' in file /home/KulturData/.virtualenvs/myvirtualenv/lib/python3.8/site-packages/advertools/spider.py on line 5, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

I already fixed this ascii (emoji) problem. But it's not the underlying problem. It can't read f'Strings as well (bc python2.7 I think)

I think its running the wrong version of Scrapy (/usr/local/bin/scrapy) although I have installed it via pip at

I tried to uninstall it but that wasn't allowed. I don't know why it only chooses the wrong scrapy path when I run it via the server file. Thats the path in my virtualenv:

(myvirtualenv) 11:50 ~/website (main)$ which scrapy
/home/KulturData/.virtualenvs/myvirtualenv/bin/scrapy

And that is my WSGI configuration file:

import sys
path = '/home/KulturData/website'
if path not in sys.path:
    sys.path.append(path)
#
from backend.server import app as application

I would appreciate any help you can give me.

I think that you're right, it's using the 2.7 version of scrapy inside your website. I'd try changing the system path inside your WSGI file, by adding code like this before you import app:

os.environ["PATH"] += "/home/KulturData/.virtualenvs/myvirtualenv/bin/scrapy" + os.pathsep

(You may need to import os as well, of course.)

Hi Giles, thanks for your input! Unfortunately, It returns the same error. But I'm not that experienced with changing a system path. My new code looks like this:

path = '/home/KulturData/website'
if path not in sys.path:
    sys.path.append(path)

os.environ["PATH"] += "/home/KulturData/.virtualenvs/myvirtualenv/bin/scrapy" + os.pathsep
from backend.server import app as application

I tested it and if I print(os.environ["PATH"]) it outputs:

/home/KulturData/.local/bin:/usr/local/bin:/usr/bin:/bin/home/KulturData/.virtualenvs/myvirtualenv/bin/scrapy

Is that what it should look like :) ?

Ah, sorry! I made a mistake in my last post. It should have been this:

os.environ["PATH"] = "/home/KulturData/.virtualenvs/myvirtualenv/bin/" + os.pathsep + os.environ["PATH"]

-- I've switched things around so that the directory in your virtualenv comes first, and so that it doesn't have the "scrapy" in it, both of which were wrong.

OMG, it works! I can't thank you enough Giles! That's so great. This bug ruined my last two days :D

Excellent, really glad to hear it :-)