Selenium and https://finance.yahoo.com/ : Forums : PythonAnywhere

Selenium and https://finance.yahoo.com/

Hello! I want to open https://finance.yahoo.com/ with Selenium, don't forget about creating Display, but in result I get next error:

>>>from pyvirtualdisplay import Display
>>> display = Display(visible=0, size=(800, 600))
>>> display.start()
<Display cmd_param=['Xvfb', '-br', '-screen', '0', '800x600x24', ':1255'] alias={alias} cmd=['Xvfb', '-br', '-screen', '0', '800x600x2
4', ':1255'] ({scmd}) oserror=None returncode=None stdout="None" stderr="None" timeout=False>
>>> from selenium import webdriver
>>> browser = webdriver.Firefox()
>>> browser.close()
>>> browser = webdriver.Firefox()
>>> browser.get("http://finance.yahoo.com/")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 177, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 163, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute
    return self._request(url, method=command_info[0], data=data)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 396, in _request
    response = opener.open(request)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1187, in do_open
    r = h.getresponse(buffering=True)
  File "/usr/lib/python2.7/httplib.py", line 1051, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 415, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 379, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine: ''

Please help me with this problem. I have a free account, but as for as I know the site https://finance.yahoo.com/ is included in the whitelist for free users (https://www.pythonanywhere.com/whitelist/).

deleted-user-1582990 | 2 posts | July 16, 2016, 5:25 p.m. | permalink

That looks more like selenium is having trouble communicating with firefox. Possibly because you're reusing the WebDriver or something.

glenn | 9718 posts | PythonAnywhere staff | July 17, 2016, 9:55 a.m. | permalink

deleted-user-1562091 | 1 post | July 17, 2016, 2:57 p.m. | permalink

glenn, but what I must to use in Selenium for getting html of my page (http://finance.yahoo.com/)?

deleted-user-1582990 | 2 posts | July 17, 2016, 3:21 p.m. | permalink

Ah. It's also possible that Firefox is not using the proxy to access the internet. You'll probably need to set the proxy on the Profile. This SO post looks like it might help and here are our docs relating to the proxy

glenn | 9718 posts | PythonAnywhere staff | July 18, 2016, 10:16 a.m. | permalink

I tried this suggestion and, although it helped me connect to finance.yahoo.com, the page returned was one that basically said it was not going to allow me to connect.

deleted-user-1973221 | 14 posts | Sept. 17, 2018, 11:33 p.m. | permalink

Is it a page from yahoo? If so, could it be yahoo blocking us from scraping it?

conrad | 4232 posts | PythonAnywhere staff | Sept. 18, 2018, 1:48 p.m. | permalink

Yes, in fact even simply scraping yahoo.com, not even finance.yahoo.com, is turning out to be a real pain. Other sites appear to be working with the techniques I'm using.

deleted-user-1973221 | 14 posts | Sept. 18, 2018, 1:53 p.m. | permalink

Are you saying that there is nothing to be done about this if yahoo has decided to block you?

deleted-user-1973221 | 14 posts | Sept. 18, 2018, 2:17 p.m. | permalink

yes- there is nothing we can do if yahoo decides that they don't like people scraping or programmatically accessing their site, and decides to start blocking access that it detects as programmatic etc.

conrad | 4232 posts | PythonAnywhere staff | Sept. 18, 2018, 4:36 p.m. | permalink

I'll accept that as my answer, then. Thanks.

deleted-user-1973221 | 14 posts | Sept. 18, 2018, 4:53 p.m. | permalink