Forums

Selenium downloaded files go into temp folder with .part extension

Hello, I'm new to the site - awesome service!

I'm using Selenium to download CSV files via an online search tool. For whatever reason, when I download the files, the csv files are all saved as ".csv.part".

I know that I'm supposed to set the preferences of my firefox webdriver - but how do I do that within pythonanywhere? How can I avoid files being saved with the .part extension? Code samples would be appreciated.

Bonus question: how can I change the download directory for the browser? I'd prefer to avoid working in the /temp folder...

Thanks,

The .part files are an indication that the download did not complete. There may be a timeout or something that else that is preventing them from finishing.

Have a look at this for setting the download directory.

I'm having the same problem :(

How did you fix that?

I'm already defined the download folder:

options = Options()
profile = {'download.default_directory' : download_dir}
options.add_experimental_option('prefs', profile)

options.add_argument('--disable-gpu')
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(options=options)

When i'm using firefox selenium, the downloaded files are all saved as ".xlsx.part". Now, with chrome selenium, the files don't appear, even in the /tmp/ folder.

Does it raise an error?

No. The files only don't appear. I've tried with firefox selenium and chrome and nothing appeared.

What have you set download_dir to?

Yes... The same code worked fine in another cloud ide. How I fix it?

Yes... The same code worked fine in another cloud ide. How I fix it?

What is the value of the variable download_dir? Does the directory in question definitely exist?

It might also be worth taking a screenshot of the browser just after you've performed the action that should download the file in order to see if there's some kind of error page:

driver.get_screenshot_as_file("screenshot.jpg")

I managed to enable the download via headless chrome with the following code:

driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': download_dir}}
command_result = driver.execute("send_command", params)

But I was unable to download pdf files, even though I used this as preferences when running chrome:

options = Options()
profile = {"plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}],  # Disable Chrome's PDF Viewer
           "profile.default_content_settings.popups": 0, # Disable download file dialog
           "download.default_directory": download_dir,
           "download.prompt_for_download": False,  # To auto download the file
           "download.extensions_to_open": "applications/pdf",
           "download.directory_upgrade": True,
           "plugins.always_open_pdf_externally": True,  # It will not show PDF directly in chrome
           "safebrowsing.enabled": False,
           "safebrowsing.disable_download_protection": True
           }
options.add_experimental_option('prefs', profile)

So my problem now is to download pdf files :(

Does it happen only for pdfs?

Yeah. Maybe the problem is with the website I'm accessing, which is causing some restriction due to the use of headless chrome. But how do i find this out? :(

I'm switching the driver to the new tab that opens when I click to download, and if I try to take a screenshot or get the current URL of the new tab, I get a timeout error. So I think that new tab, that should automatically start the download, is crashing for some reason I don't know.

Could it be running out of memory? If a process hits 3GiB it will run out and exit. But if that were happening, you'd receive an email telling you; I've checked and your account is set up to receive those messages. Have you seen anything like that in your inbox?

I didn't receive any message saying anything like that :(

What can I do now?

If you're getting a timeout, that suggests that either the tab is doing its job and downloading the file so you can't access it or that it has crashed. I don't really have any idea how you might go about finding that out. If the service that you're trying to use has protections that prevent downloads from automated tools there is, unfortunately, not much you can do about that.

The service has no protections that prevent downloads of automated tools. Exactly the same code I am using here is working perfectly on another Cloud IDE and on my personal computer as well. Perhaps it is some permission on your platform that is preventing my program from downloading the pdf file from the website that I need. The file was to be downloaded, but the download does not happen.

The only thing I can think of then is that perhaps you do not have write permissions to the directory that you're trying to download into or that, perhaps, the directory does not exist.

I tried everything, but your service did not serve me as I expected. Unfortunately I am going to switch to a competitor that meets my needs.

I have just downgraded my account. It would be nice if you could refund the money. Thanks!

Sure, no problem -- that's done now.