Forums

urllib2 web2py timeout problem with some websites

I have a script that I can run from my pythonanywhere console and it executes this code perfectly and I'm able to process the website. However, whenever I try to execute this code from my web2py default.py controller it times out. If I don't add the timeout then it hangs forever. If I change the website to www.google.com it works. Any ideas why it works in a script and not in web2py?

html = urllib2.urlopen('http://www.grocerysmarts.com/utah/lists/indexg84cac.php?m84ac2', timeout=10).read()

Thanks!

That's weird. Have you tried reloading your app from the "Web" tab?

Yes. I've tried that about 20 times.

That's very odd. You have a paid account so you shouldn't be going through the proxy (unless you added specific http_proxy environment variables etc to your code at some stage?). And it seems unlikely that the site you're going to has blacklisted the IP of the web server you're running on specifically. What happens if you try to get the front page of the same site, ie. http://www.grocerysmarts.com/? And if that doesn't work, how about a different non-whitelisted site -- say, http://www.wikipedia.com/?

It does not work if I use http://www.grocerysmarts.com/, but it works fine if I use http://www.wikipedia.com/. I ran my script again from the pythonanywhere console it works fine using http://www.grocerysmarts.com/. Is there a different whitelist for the webserver? I have not added any http_proxy environment variables. All I'm doing is trying to run the exact same code from my script (that works) in my controller. I've dumb it down to a minimal amount of code.

This works fine:

import urllib2
def index():
    html = urllib2.urlopen('http://www.wikipedia.com/', timeout=10).read()

This works when running python script in console (with the def index() removed), but it does not work in web2py controller:

import urllib2
def index():
    html = urllib2.urlopen('http://www.grocerysmarts.com/', timeout=10).read()

That's really weird. I've just checked all of the web and console servers in our cluster, and on each one I ran wget http://www.grocerysmarts.com/. On every single one apart from one web server, it worked fine. On the one web server where it failed, it just hung. Which sounds exactly like the problem you're seeing.

My best guess is that they've blocked the IP address on that web server using some kind of security software that just leaves connections open. Did this suddenly start going wrong after you'd had your web app running for a while? Or has it never work from your web app?

It has never worked from my web2py app. The first time I tried to use the code in web2py was last night. Anyway at least I'm not going crazy. It's possible that someone else had a service constantly pinging there site and maybe thats why its blocked. I'm sure they watch for that kind of activity. I was only planning on pinging their site once each day in order to look for good food deals.

That would make sense. In fact, it might not even have been another PythonAnywhere user -- as we're ultimately hosted on Amazon AWS, it could be that someone who previously used the IP address was the culprit, and we were just unlucky enough to inherit it in this particular cluster. So that will change next time we deploy a new cluster, but that might be a week or more away.

In the meantime, perhaps one of us could get in touch with the site and see if they're willing to unblock the IP? I don't know whether it would be more effective coming from us at PythonAnywhere or from you. I'm happy to do ask if you prefer, or if you want to ask I can give you the external IP address of the web server. (It's not the same as the internal one you'd get by pinging it, due to the load-balancer.)

Let me know.

It's not a big deal and I wouldnt worry about it. Thanks for all your help.

OK. Thanks for getting in touch.