Forums

403 error, no code change

Hi, can you explain why I would get this error (urllib.error.HTTPError: HTTP Error 403: Forbidden) on a code that has been working for 2 to 3 months everyday?

I am a paying customer and I have been running this code everyday without a problem for 3 months. But now I get this error. What's the issue here?

The code is very simple, just the below 2 lines:

import pandas as pd apple_stock = pd.read_html('http://www.nasdaq.com/symbol/aapl/historical')

And below is the error message:

import pandas as pd

apple_stock = pd.read_html('http://www.nasdaq.com/symbol/aapl/historical') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/dist-packages/pandas/io/html.py", line 874, in read_html parse_dates, tupleize_cols, thousands, attrs, encoding) File "/usr/local/lib/python3.5/dist-packages/pandas/io/html.py", line 736, in _parse raise_with_traceback(retained) File "/usr/local/lib/python3.5/dist-packages/pandas/compat/init.py", line 333, in raise_with_traceback raise exc.with_traceback(traceback) urllib.error.HTTPError: HTTP Error 403: Forbidden

Now, all the sudden the code is working. I have not made any changes. Any explanation for this?

hmm- next time that happens, perhaps print the body of the response?

If you do that, you will see if it is a PythonAnywhere 403 page, or a 403 returned by nasdaq.com.

That should give more clarity to trouble shoot the problem.

I think NASDAQ has blocked PythonAnywhere IPs

Looking at the response from a requests.get('http://www.nasdaq.com/symbol/aapl/historical').content, I see that Nasdaq returns a page:

Access Denied You don't have permission to access "http://failoverwaf-www.nasdaq.com/failover/outage-notification-2.html?" on this server. Reference #18.26dc6068.1535024590.92ad602

Going to the un-mangled url gives a Nasdaq site down maintenance page.

However, I can also confirm that running the same request from a local machine gives a different result.

Given that this is a html page instead of an API, it does seem likely Nasdaq has blocked our IPs / is discouraging people from algorithmic-ly accessing a page that they intend to be used only for humans.

If you are interested in obtaining approval from Nasdaq to allow you to scrape their site, feel free to try to quote the above reference number etc.