Forums

Issue with selenium WebDriverWait

Hello,

I am using selenium to scrape data from a webpage that uses AJAX. I have no problem running the code on my computer and have tried to fix it for pythonanywhere.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/usr/bin/python2.7
# -*- coding: utf-8 -*-
from selenium import webdriver
from bs4 import BeautifulSoup

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time


from pyvirtualdisplay import Display
display = Display(visible=0, size=(800, 600))
display.start()

for retry in range(3):
    try:
        driver = webdriver.Firefox()
        break
    except:
        time.sleep(3)

driver.get("http://games.espn.com/tournament-challenge-bracket/2018/en/group?groupID=657052")

try:
    element = WebDriverWait(driver, 10).until(
        EC.visibility_of_element_located((By.XPATH,'//*[@id="groupTableWrapper"]'))
        )
    html = driver.page_source
    soup = BeautifulSoup(html,'html.parser')

except:
    print 'Did not load'
finally:
    driver.quit()
    display.stop()

When I run this code with the "except: print did not load" removed, I get the following error (the line numbers may be off):

Traceback (most recent call last):File "/home/colmena14/scrapeDyn2018.py", line 33, in <module>
    EC.visibility_of_element_located((By.XPATH,'//*[@id="groupTableWrapper"]'))
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 
Stacktrace:

at FirefoxDriver.findElementInternal_ (file:///tmp/tmpv8Ek7_/extensions/fxdriver@googlecode.com/components/driver-component.js:10770)
at FirefoxDriver.findElement (file:///tmp/tmpv8Ek7_/extensions/fxdriver@googlecode.com/components/driver-component.js:10779)
at DelayedCommand.executeInternal_/h (file:///tmp/tmpv8Ek7_/extensions/fxdriver@googlecode.com/components/command-processor.js:12661)
at DelayedCommand.executeInternal_ (file:///tmp/tmpv8Ek7_/extensions/fxdriver@googlecode.com/components/command-processor.js:12666)
at DelayedCommand.execute/< (file:///tmp/tmpv8Ek7_/extensions/fxdriver@googlecode.com/components/command-processor.js:12608)

I've tried increasing the wait time to 60 seconds, but it still did not load. I was unable to find any similar errors in the forums. Could someone tell me what is going wrong? Is it I simply need to wait longer?

Thanks!

Actually look at what you're getting back. It's probably this: http://help.pythonanywhere.com/pages/403ForbiddenError/

I guess I would have expected the error to trace back to the "get URL" command.

I saw that ".games.espn.go.com" is whitelisted and was unsure if that would include ".games.espn.com". ESPN clearly has an API for their websites but recently made it private. I'm assuming that means ".games.espn.com" cannot be added to the whitelist?

Thanks for your help!

ESPN API info: http://www.espn.com/static/apis/devcenter/blog/read/publicretirement.html

Yup. That does look like ESPN have closed their API down, so I guess the whitelist entry is a little pointless.