wikipedia2vec on PythonAnywhere : Forums : PythonAnywhere

wikipedia2vec on PythonAnywhere

Hi Everyone,

I'm not much of a DevOps developer, I usually use Heroku to deploy Ruby-on-Rails Postgres apps. This would be my first Python app (but I can easily work with it). I'm looking for a place to host a basic API for wikipedia2vec and its pretrained embeddings. I'm not looking to do any training yet. Just have some space to place the embeddings and a server to run queries and serve responses.

Can I use PythonAnywhere for this? I've been trying to figure out how I would go about this...

Thank you in advance.

Best, Dol

deleted-user-6776348 | 5 posts | Jan. 17, 2020, 6:08 p.m. | permalink

Sure, that looks like it should work OK. A couple of things that might trip you up:

Memory usage -- you're limited to 3GiB RAM per process, so if the library tries to use more than that, it probably wouldn't work.
Time -- requests to websites on PythonAnywhere are limited to a few minutes duration, so if the lookups took longer than that, they'd get timed out.
Access to external websites -- I couldn't tell from the docs whether the library has all of its own data built in, or stored in some kind of training file. If it's all bundled together, there won't be a problem. If it has to access external servers to gather data then there could potentially be problems if you're using a free account, because they have restricted Internet access -- they can only access sites on our whitelist. That said, we can generally add new sites to the list if they're part of an official public API.

One useful help page, if you've not spotted it already, is this one on installing new Python modules into your account.

giles | 12095 posts | PythonAnywhere staff | Jan. 17, 2020, 7:42 p.m. | permalink

Hi Giles,

Thank you for the very informative, polite and helpful response!

Is there a way to increase the RAM? Saying that, I don't know if it will go over this threshold. I've been trying to figure out which metrics to increase in the custom app section of the account upgrade. I'm guessing the disk space is the default database, which metric would I increase so that I can install the pretrained embeddings? (they're usually a few GB). I don't think it would be disk space, would it?
I'm guessing I'd have to get around the time requests with web workers, if thats a problem, but I'll cross that bridge later, thanks for mentioning it. Btw, a few minutes per request is very generous!
External websites aren't necessary because everything is used from the pretrained data, but thanks for listing this too.

Thanks for the link, very helpful, i saw you mentioned it in the Tensorflow topic - seems there is a lot that can be done on this platform!

Thanks!

deleted-user-6776348 | 5 posts | Jan. 17, 2020, 8:03 p.m. | permalink

Way to increase the RAM? No. Workers, you have to see if you need to increase the number.

fjl | 4614 posts | PythonAnywhere staff | Jan. 18, 2020, 10:23 a.m. | permalink

Thanks Fjl!

deleted-user-6776348 | 5 posts | Jan. 18, 2020, 4:09 p.m. | permalink

Still having trouble of where to place the English embedding downloaded here at 16GB. Its a binary file, so not a database or static file (which is limited to 100mb).

How can I put it on the system?

Thanks in advance.

P.S. I managed to install the Wikipedia2Vec library, thanks. Your platform is really cool!

deleted-user-6776348 | 5 posts | Jan. 18, 2020, 7:15 p.m. | permalink

You may use sftp client. See sftp section on https://help.pythonanywhere.com/pages/UploadingAndDownloadingFiles/

fjl | 4614 posts | PythonAnywhere staff | Jan. 18, 2020, 9:59 p.m. | permalink

Thanks Fjl!

deleted-user-6776348 | 5 posts | Jan. 19, 2020, 9:11 a.m. | permalink