Forums

AttributeError: 'NoneType' object has no attribute 'find_all'

Hi, I have been running my webSpider locally without a problem but then When I uploaded my code onto pythonanywhere I am getting errors. I am not going to copy the full source here but I am going to paste a prototype version of it which also return the same error.

My code run fine locally and also on Google cloud shell.

import requests
from bs4 import BeautifulSoup


url ="https://www.indeed.com/jobs?q=Developer&l=waterbury%2C%20CT&fromage=1&"
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
result = soup.find(id="mosaic-provider-jobcards")
job_elements = result.find_all("div", class_="job_seen_beacon")
print(job_elements)

here is the error :

(jobSpider) 14:59 ~/jobSpider $ python testSoup.py                                                                                                                                                               
Traceback (most recent call last):                                                                                                                                                                               
  File "/home/moodkiller2022/jobSpider/testSoup.py", line 12, in <module>                                                                                                                                        
    job_elements = result.find_all("div", class_="job_seen_beacon")                                                                                                                                              
AttributeError: 'NoneType' object has no attribute 'find_all'

It looks like soup.find(id="mosaic-provider-jobcards") is returning None. Perhaps you can print out the page that you're getting for soup and find out what it contains? It's possible that the site you're accessing is blocking requests from PythonAnywhere -- sites often don't like being scraped from cloud computing platforms, and it's entirely possible that they've blocked us but haven't blocked the IP of the Google Cloud shell that you're using yet.

Thank you for getting back to me So there aren't any way around that ?

So I tried printing the soup object to see what it contains and I notice that

import requests
from bs4 import BeautifulSoup
import time

url ="https://www.indeed.com/jobs?q=Developer&l=waterbury%2C%20CT&fromage=1&"
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
#result = soup.find(id="mosaic-provider-jobcards")
#job_elements = result.find_all("div", class_="job_seen_beacon")
print(soup)

Output:

(jobSpider) 15:45 ~/jobSpider $ python testSoup.py                                                                                                                                                               
<html>                                                                                                                                                                                                           
<head>                                                                                                                                                                                                           
<title>hCaptcha solve page</title>                                                                                                                                                                               
<script async="" defer="" src="https://www.hcaptcha.com/1/api.js"></script>                                                                                                                                      
<meta content="width=device-width, initial-scale=1" name="viewport"/>                                                                                                                                            
</head>                                                                                                                                                                                                          
<body>                                                                                                                                                                                                           
<div>                                                                                                                                                                                                            
<form action="/jobs?q=Developer&amp;l=waterbury,%20CT&amp;fromage=1&amp;redirected=1" method="POST" style="margin: 80px;">                                                                                       
<div class="h-captcha" data-sitekey="eb27f525-f936-43b4-91e2-95a426d4a8bd" data-size="compact"></div>                                                                                                            
<br/>                                                                                                                                                                                                            
<input type="submit" value="Submit"/>                                                                                                                                                                            
</form>                                                                                                                                                                                                          
</div>                                                                                                                                                                                                           
</body>                                                                                                                                                                                                          
</html>

Looks like I got this captcha page. But the same soup objected returned actual HTML details, it looks like I cant even print soup anymore.

What do you mean by "the same soup objected returned actual HTML details"? The captcha thing confirmes what @giles said above, that the site in question probably doesn't like to be scraped.