I have an async function that does http get requests to get a list of urls and returns a list of dicts with status codes.
import asyncio
import aiohttp
import sys
from user_agent import generate_user_agent
username = 'username'
password = 'pass'
proxy = 'dc.oxylabs.io:8000'
proxy_url = f"http://{username}:{password}@{proxy}"
# Windows fix for annoying event loop warnings
if sys.platform.startswith('win'):
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
def fetch_urls(urls):
return asyncio.run(_fetch_all(urls))
async def _fetch_all(urls):
results = []
async with aiohttp.ClientSession(trust_env=True) as session:
tasks = [_fetch(session, url, results) for url in urls]
await asyncio.gather(*tasks)
return results
async def _fetch(session, url, results):
try:
user_agent = generate_user_agent(os='win', navigator='chrome')
async with session.get(url, headers={"user-agent": user_agent, "upgrade-insecure-requests": "1"}, proxy=proxy_url) as response:
status = response.status
results.append({"url": url, "status": status})
except Exception as e:
print(f"[ERROR] {url} -> {e}")
results.append({"url": url, "status": None})
At first it didn't work due to:
Cannot connect to host www.domain.com:443 ssl:default [Connect call failed ('x.x.x.x', 443)]
Had to use trust_env=True
in async with aiohttp.ClientSession(trust_env=True) as session
- found that on the forums. That fixed that
Later, I implemented proxies and now I get a similar issue:
Cannot connect to host dc.oxylabs.io:8000 ssl:default [Connect call failed ('x.x.x.x', 8000)]
I tried running the script on my local pc and it works flawlessly. Could it be because pythonanyhwhere blocks access to oxylabs.io? Checked the whitelist (https://www.pythonanywhere.com/whitelist/) since I'm a free account for now and found only residential-api.oxylabs.io.