Forums

is it possible to downlaod 2GB data into PA?

Is it possible to download a 2GB data into PA?

Yes, but we'd recommend using scp/sftp or rsync. More info: https://help.pythonanywhere.com/pages/FTP

I need to download it from the below link that require login. Will this possible? https://www.shareinvestor.com/prices/price_download_zip_file.zip?type=history_all&market=bursa

Hmm, if you need to log on to the site to get the data then, while you could try to do something clever using a headless browser on PythonAnywhere, the easiest way (if not the fastest) would probably be to download it to your local machine, then use something like Filezilla to upload it to PythonAnywhere.

I plan to use schedule task so I would like to know the limitation first. What is headless browser? Mechanize or selenium?

Yes, you'd need to use Selenium and pyvirtualdisplay. This help page should give you the necessary code.

I observe the selenium cause the a lot of disk space? How to clear the tmp file for selenium?

*done it with rm -rf , but is it anyway in the code itself will not cause a high usage of tmp files or folders? I have use browser.close() browser.quit() display.stop()

What else I can do?

You're right, Selenium can leave stuff lying around in /tmp. One possibility is to put something in a scheduled task to run rm -rf /tmp/* on a daily basis -- or you could even use subprocess.check_call to run the same command after you do the three calls to shut down Selenium.

It browser get twice is allow in PA? It can not find the element in xpath after second browser get

display = Display(visible=0, size=(800, 600))
display.start()

for retry in range(5):
    try:
        browser = webdriver.Firefox()
        print "firefox"
        break
    except:
        time.sleep(3)
time.sleep(1)

browser.get("https://www.shareinvestor.com/my")
time.sleep(10)
login_main = browser.find_element_by_xpath("//*[@href='/user/login.html']").click()
print browser.current_url
username = browser.find_element_by_id("sic_login_header_username")
password = browser.find_element_by_id("sic_login_header_password")
print "find id done"
username.send_keys("bkcollection")
password.send_keys("123456")
print "log in done"
login_attempt = browser.find_element_by_xpath("//*[@type='submit']")
login_attempt.submit()
browser.get("https://www.shareinvestor.com/prices/price_download.html#/?type=price_download_all_stocks_bursa")
print browser.current_url

browser.close()
browser.quit()
display.stop()

Yes, it's allowed. Are you sure that you're getting a page that has the element that you're looking for? Also, you only appear to have 2 gets in your code and no xpath lookup after the second one.

display = Display(visible=0, size=(800, 600)) display.start()

for retry in range(5):
    try:
        browser = webdriver.Firefox()
        print "firefox"
        break
    except:
        time.sleep(3)
time.sleep(1)

browser.get("https://www.shareinvestor.com/my")
time.sleep(10)
login_main = browser.find_element_by_xpath("//*[@href='/user/login.html']").click()
print browser.current_url
username = browser.find_element_by_id("sic_login_header_username")
password = browser.find_element_by_id("sic_login_header_password")
print "find id done"
username.send_keys("bkcollection")
password.send_keys("123456")
print "log in done"
login_attempt = browser.find_element_by_xpath("//*[@type='submit']")
login_attempt.submit()
browser.get("https://www.shareinvestor.com/prices/price_download.html#/?type=price_download_all_stocks_bursa")
print browser.current_url
dl = browser.find_element_by_xpath("//*[@href='/prices/price_download_zip_file.zip?type=history_all&market=bursa']").click()
browser.close()
browser.quit()
display.stop()

but return error

selenium.common.exceptions.StaleElementReferenceException:Message: u'Unable to locate element: {"method":"xpath","selector":"//*[@href=\'/prices/price_download_zip_file.zip?type=history_all&market=bursa\']"}' ; Stacktrace: at FirefoxDriver.findElementInternal_ (file:///tmp/tmp1AqcTj/extensions/fxdriver@googlecode.com/components/driver_component.js:6993)    at FirefoxDriver.findElementInternal_ (file:///tmp/tmp1AqcTj/extensions/fxdriver@googlecode.com/components/driver_component.js:8434)    at FirefoxDriver.findChildElement (file:///tmp/tmp1AqcTj/extensions/fxdriver@googlecode.com/components/driver_component.js:8456)    at DelayedCommand.executeInternal_/h (file:///tmp/tmp1AqcTj/extensions/fxdriver@googlecode.com/components/command_processor.js:10456)    at DelayedCommand.executeInternal_ (file:///tmp/tmp1AqcTj/extensions/fxdriver@googlecode.com/components/command_processor.js:10461)    at DelayedCommand.execute/< (file:///tmp/tmp1AqcTj/extensions/fxdriver@googlecode.com/components/command_processor.js:10401)

That just means that the element that you're looking for is not in the page that was retrieved.

Hi Glenn, The error is gone now. The only problem is

dl = browser.find_element_by_xpath("//*[@href='/prices/price_download_zip_file.zip?type=history_all&market=bursa']").click()

unable to download or the I miss the path the file is downloaded? It is a large file and I should not miss it.

I'm not sure. This might help: https://sqa.stackexchange.com/questions/2197/how-to-download-a-file-using-seleniums-webdriver

Hi glenn,

Added this and able to download as zip

profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2)
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', "/home/vinasia/shKLSE/")
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/zip')

Great! Glad you worked it out.