Forums

Load a pandas (xarray, that is) dataset on startup

Hi! I am trying to load a 100Mb dataset on app startup to see how fast I can do queries on them. If not loaded, a query takes ~15s which is no option.

However the webapp does not seem to start up whenever I do this, saying "Your webapp took a long time to reload. It probably reloaded, but we were unable to check it." When I access an endpoint, I get the 504-loadbalancer error at some point.

I try to find out:

  • how much real memory this dataset needs (that does not seem to be possible at all really)
  • how much memory there is available per user (512Mb?)
  • whether paid accounts have more memory available

Thanks! t

I get a blank page when I visit your site. Is it possible that you just needed to wait for the dataset to load?

There is just a fake endpoint implemented: http://tomtom101.pythonanywhere.com/countries/us

Here, the dataset is not loaded into memory and queried layzily, hence slow.

I now changed the code to read into memory (app.py:128) and also mask it (app.py:129). The first operation will require some unknown amount of memory, the latter will take some time. Basically, result in the app not running. Maybe you can still see more now.

It looks like your workers are being killed when they go over 2G of memory. Paid accounts don't get any more than that.

Do you really need to load the entire thing into memory?

I am afraid so, I'll have to make this set as small as possible then! Thanks!