Forums

Errno socket while running a Spider with urllib

Hi.

I've developed a scrapper using urllib and beautifulsoup.

If I run muy script in my localhost It runs with no problem, but if i run It from its own bash console it throws me [Errno socket error][Errno 111] at different steps (never is in the same step)

Is there any issue with running this kind of scripts in Pythonanywhere?

Thanks in advance

See http://help.pythonanywhere.com/pages/403ForbiddenError/

Thanks a lot for the response.

I'm a paid user indeed, and if I undestood the link you've provided me, paid users wouldn't see that Errno socket error because we have unrestricted acces.

Here is a brief extract of my urllib and beautifulsoup call.

    for prod in producto:
        busca=urlopen(sitio + prod)
        bsObj=BeautifulSoup(busca.read(), "lxml")
        nombres = bsObj.findAll("a", {"class": "product__list--name"})
        precios = bsObj.findAll("div", {"class": "product__listing--price price-colour-final"})

Am I doing something wrong?

You're right. The connection error is not because of the whitelist.

There's not real way that I can debug your spider. The issue is most likely between your code and the sites you're accessing. Perhaps they're detecting your crawling as abuse and blocking you or you're not cleaning up your connections cleanly or something like that.

Ummmm, yes, I've noticed you're right.

It's just 1 pharmacy (I'm getting public prices from some pharmacies for some drugs in Mexico) called "Farmacias del Ahorro" the one which is throwing me that error.

Just 3 final questions:

1) When I run that script from my localhost then everything is ok. What's the difference when running from Pythonanywhere and localhost?

2) The "Errno socket" error is being throwed randomly, I mean, it's never in the same place (i.e. drug price) where it stops, why is that?

3) Is there a way to use another proxy to prevent firewalls from Pythonanywhere?

Thanks a lot

The first 2 questions have pretty-much the same answer: I have no way of knowing what their criteria are or what's actually going on. Maybe you're making requests faster on PythonAnywhere, maybe they only care about IPs that are in the AWS range, maybe you don't have that pharmacy in your list when you're running it locally.

The third: Yes, but we're not going to help you to abuse a site. Try to be more polite and see if that helps.

Yeah, I understand.

It's a weird stuff 'cause if I run the same command from Pythonanywhere bash console (step by step) it doesn't have any trouble :S

Probably because doing it step by step slows it down to the point where they don't block you.