Forums

Selenium and https://finance.yahoo.com/

Hello! I want to open https://finance.yahoo.com/ with Selenium, don't forget about creating Display, but in result I get next error:

>>>from pyvirtualdisplay import Display
>>> display = Display(visible=0, size=(800, 600))
>>> display.start()
<Display cmd_param=['Xvfb', '-br', '-screen', '0', '800x600x24', ':1255'] alias={alias} cmd=['Xvfb', '-br', '-screen', '0', '800x600x2
4', ':1255'] ({scmd}) oserror=None returncode=None stdout="None" stderr="None" timeout=False>
>>> from selenium import webdriver
>>> browser = webdriver.Firefox()
>>> browser.close()
>>> browser = webdriver.Firefox()
>>> browser.get("http://finance.yahoo.com/")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 177, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 163, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute
    return self._request(url, method=command_info[0], data=data)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 396, in _request
    response = opener.open(request)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1187, in do_open
    r = h.getresponse(buffering=True)
  File "/usr/lib/python2.7/httplib.py", line 1051, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 415, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 379, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine: ''

Please help me with this problem. I have a free account, but as for as I know the site https://finance.yahoo.com/ is included in the whitelist for free users (https://www.pythonanywhere.com/whitelist/).

That looks more like selenium is having trouble communicating with firefox. Possibly because you're reusing the WebDriver or something.

.

glenn, but what I must to use in Selenium for getting html of my page (http://finance.yahoo.com/)?

Ah. It's also possible that Firefox is not using the proxy to access the internet. You'll probably need to set the proxy on the Profile. This SO post looks like it might help and here are our docs relating to the proxy

I tried this suggestion and, although it helped me connect to finance.yahoo.com, the page returned was one that basically said it was not going to allow me to connect.

Is it a page from yahoo? If so, could it be yahoo blocking us from scraping it?

Yes, in fact even simply scraping yahoo.com, not even finance.yahoo.com, is turning out to be a real pain. Other sites appear to be working with the techniques I'm using.

Are you saying that there is nothing to be done about this if yahoo has decided to block you?

yes- there is nothing we can do if yahoo decides that they don't like people scraping or programmatically accessing their site, and decides to start blocking access that it detects as programmatic etc.

I'll accept that as my answer, then. Thanks.