Error 503 when getting Amazon with Requests : Forums : PythonAnywhere

Error 503 when getting Amazon with Requests

I'm trying to use the Requests module to do web scraping on Amazon, but I get error 503 with below code. Why can't I access Amazon? Thanks in advance!

>>> page = requests.get('https://www.amazon.es/dp/B077YLDC5N')
>>> page
<Response [503]>

deleted-user-4590381 | 1 post | Oct. 10, 2018, 5:56 p.m. | permalink

A 503 status code normally means "Service unavailable" -- it's returned by a server when it's overloaded. But that would be a pretty weird thing to happen from Amazon -- I'm sure they have enough server capacity to handle pretty much everything!

I tried running the code from my own account just now and got a <Response [200]> -- is the problem you're seeing intermittent?

giles | 12095 posts | PythonAnywhere staff | Oct. 10, 2018, 6:42 p.m. | permalink

I have this error too, and only from pythonanywhere and sometimes in an hour

deleted-user-7268271 | 21 posts | May 22, 2020, 10:03 p.m. | permalink

just to double check- it's a 503 error code? could you perhaps print out the response body to get more details on that error?

conrad | 4232 posts | PythonAnywhere staff | May 24, 2020, 4:51 a.m. | permalink

Yes 503 error code service unavailable

deleted-user-7268271 | 21 posts | May 24, 2020, 6:53 a.m. | permalink

hmm, my only guess is that amazon does not like others scraping their site, and so they are returning that error code when they detect that.

conrad | 4232 posts | PythonAnywhere staff | May 24, 2020, 7:41 a.m. | permalink

Could you trace it? I have not issues from other site

deleted-user-7268271 | 21 posts | May 24, 2020, 11:28 a.m. | permalink

do you mean that you have been running the same code on other cloud platforms? or do you mean that you were able to access other websites?

conrad | 4232 posts | PythonAnywhere staff | May 25, 2020, 11:45 a.m. | permalink

From my pc

deleted-user-7268271 | 21 posts | May 25, 2020, 12:29 p.m. | permalink

Not at same time...

deleted-user-7268271 | 21 posts | May 25, 2020, 12:30 p.m. | permalink

that is possibly because amazon only blocks IPs that it recognizes are from cloud platforms and that are systematically scraping their site.

conrad | 4232 posts | PythonAnywhere staff | May 25, 2020, 2:04 p.m. | permalink

Try the code below

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'
}

page = requests.get("https://www.amazon.es/dp/B077YLDC5N", headers = headers)

print(page)

[edited by admin: code formatting]

deleted-user-8125758 | 1 post | Nov. 15, 2020, 5:59 a.m. | permalink

Is there a way to use headers in selenium? Is SeleniumWire supported?

I was just trying to learn selenium to programatically search Amazon and I think all my requests are now getting blocked.

nrs250 | 13 posts | April 9, 2021, 4:28 p.m. | permalink

Yes, there's an option to set the user-agent header with Chrome in Selenium:

options = webdriver.ChromeOptions()
options.add_argument("--user-agent=something")
browser = webdriver.Chrome(chrome_options=options)

giles | 12095 posts | PythonAnywhere staff | April 9, 2021, 5:17 p.m. | permalink

does anyone have any update on how to scrape amazon using cloud platforms like pythonanywhere or anyother?

litescanpy | 1 post | Dec. 18, 2023, 8:40 a.m. | permalink

Amazon does not want to be scraped form the known headless machines like Amazon Web Services that PythonAnywhere is running on.

fjl | 4614 posts | PythonAnywhere staff | Dec. 18, 2023, 9:18 a.m. | permalink