Forums

Error 503 when getting Amazon with Requests

I'm trying to use the Requests module to do web scraping on Amazon, but I get error 503 with below code. Why can't I access Amazon? Thanks in advance!

>>> page = requests.get('https://www.amazon.es/dp/B077YLDC5N')
>>> page
<Response [503]>

A 503 status code normally means "Service unavailable" -- it's returned by a server when it's overloaded. But that would be a pretty weird thing to happen from Amazon -- I'm sure they have enough server capacity to handle pretty much everything!

I tried running the code from my own account just now and got a <Response [200]> -- is the problem you're seeing intermittent?

I have this error too, and only from pythonanywhere and sometimes in an hour

just to double check- it's a 503 error code? could you perhaps print out the response body to get more details on that error?

Yes 503 error code service unavailable

hmm, my only guess is that amazon does not like others scraping their site, and so they are returning that error code when they detect that.

Could you trace it? I have not issues from other site

do you mean that you have been running the same code on other cloud platforms? or do you mean that you were able to access other websites?

From my pc

Not at same time...

that is possibly because amazon only blocks IPs that it recognizes are from cloud platforms and that are systematically scraping their site.

Try the code below

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'
}

page = requests.get("https://www.amazon.es/dp/B077YLDC5N", headers = headers)

print(page)

[edited by admin: code formatting]

Is there a way to use headers in selenium? Is SeleniumWire supported?

I was just trying to learn selenium to programatically search Amazon and I think all my requests are now getting blocked.

Yes, there's an option to set the user-agent header with Chrome in Selenium:

options = webdriver.ChromeOptions()
options.add_argument("--user-agent=something")
browser = webdriver.Chrome(chrome_options=options)

does anyone have any update on how to scrape amazon using cloud platforms like pythonanywhere or anyother?

Amazon does not want to be scraped form the known headless machines like Amazon Web Services that PythonAnywhere is running on.