Forums

Can we request to add a new website to the proxy whitelist?

I am scraping images from Reddit and I am getting 403 error while using requests module. I guess this is because of scraping from preview.redd.it

Can you add it to the whitelist?

reddit is already in the whitelist. If you're getting a 403 it may be coming from the service itself because you have not authenticated with it.

reddit.com is in the whitelist but not redd.it

Someone must have provided us with API documentation that said that it was an official public API at some point -- could you give us a link to anything saying that redd.it hosts an API?

Actually reddit hosts its images on preview.redd.it.

So when I use the official Reddit API, the JSON it returns contains image links hosted on redd.it

If I want to download them, or do something else, I must have access to redd.it

In order to whitelist the site for free users, please provide a link to the API documentation by reddit that shows redd.it as one of the public api endpoints

I couldn't find anything about it in the official documentation but if you are going to scrape images from any subreddit, you will be getting the media hosted at something like "https://external-preview.redd.it".

For example, if you see https://www.reddit.com/r/funny.json, and Ctrl+F "redd.it', you will find many instances for it.

Ok. That looks legitimate. I have added external-preview.redd.it to the whitelist.

If you could just whitelist all the subdomains of redd.it, that would be great, because they keep changing it. Although it's your call.

We try to avoid whitelisting large numbers of subdomains (for example, what if they start using something like dontscrapethis.redd.it in the future). But if their API starts providing content on a new subdomain in the future, just post back here.

Can you add preview.redd.it ?

Send us a link to the API documentation that shows the URL for the API and we'll consider it for the whitelist.

The same thing that I mentioned earlier in this thread. https://www.pythonanywhere.com/forums/topic/13699/#id_post_54550

Copying and pasting: I couldn't find anything about it in the official documentation but if you are going to scrape images from any subreddit, you will be getting the media hosted at something like "https://preview.redd.it".

For example, if you see https://www.reddit.com/r/funny.json, and Ctrl+F "redd.it', you will find many instances for it.

OK, thanks -- that's whitelisted now.

Thanks a lot :)