Forums

multitenancy super basics

I need to start understanding the basics of multitenancy, especially if I can build in a multitenant way on PA. I read through https://www.pythonanywhere.com/forums/topic/1391/ but that is probably too advanced for me and I don't think I need wildcard domains. Can you point me in the getting started direction? For example, will I need to switch from Flask to Django?

Just to make sure we're on the same page, as different people use the work "multitenancy" different ways -- do you mean you'd like to set things up so that you have different subdomains of your website for different users? For example, userone.yourdomain.com, usertwo.yourdomain.com, and so on...?

Hi Giles, no I don't think so. Maybe this is more about elasticity. Every user would run on example.com. However, I guess I'd need minimal resource with elasticity depending on what they do. For example let's say my app does linear regression. User 1 and 2 log on separately normally so I need few resources. But suddenly they are working at the same time and User 2 crunches more data. I'd need to pay PA for more capacity and possibly charge User 2 (maybe not). So I'm maybe mixing up the tenant question with an elasticity question, but does this seem possible/easy/difficult? Any advice appreciated.

Hmm, I see. PythonAnywhere web apps get their processor power from the number of worker processes they have running; so, for example, an app with 10 workers can handle 10 simultaneous requests. Web requests tend to be very short-lived things (request comes in, quick database lookup, render a template, return) so you can support a very large number of hits/second with a small number of workers. I think the current record-holder is handling several million hits a day with ten workers, for example... though that is a particularly well-optimised site.

Anyway, what you're talking about sounds like it would need to be implemented as a website with a base number of workers that auto-scaled when the workers became busy. That's certainly something we've discussed in the past, but we don't have a feature to do that yet :-(

Ah ok thanks a lot. I'm likely thinking too far ahead but yeah that architecture does sound ideal because I'd like to do a bit of math on possibly larger data. This would be a while from now, though. I want to do some simpler "analytics" in the meantime and doubt it would be resource intensive. I will run some simple tests first.

Another use case would be:

What if I want to let a user tweak something with a bit of Python code? Suppose I had linear regression (this is made up) but the user wanted to and knew how to do logistic regression for example. This shouldn't slow down anyone else, but it could be CPU intensive for a short time. The other tenants should be able to do normal workloads.

Actually, one other thing that occurs to me -- doing big calculations inside web requests is kind of risky anyway. On PythonAnywhere, any request that takes more than a few minutes is assumed to have crashed, so the worker process is killed. If you're thinking about number-crunching that would take more than a few dozen seconds, it might be better to implement as some kind of separate task -- essentially, users who wanted data processed would make a request that put the details of the work to be done into a queue (perhaps in a database table), and then when the processing was done, another view could fetch the results.

I'd be very careful about allowing users to run code they provide -- it could lead to serious security problems. It would be very easy for a hostile user to submit something that caused problems for your site -- Python isn't very well sandboxed internally.

(We use operating-system level virtualisation to keep everyone's accounts separate.)

Ok shoot that idea is off the table then. I'd have to build a PA myself instead of using a PA in that case. Too difficult. But that would be ideal for what I want to make.

For the math that takes more than a few seconds, I'd ideally want to use Apache Spark.

That's probably not a go either at the moment, I'd imagine. Shoot basically what I want to do would take the smarts and execution of the PA team. Instead of PaaS it would be sort of Analytics aaS.

In the meantime I can do a much simpler app with just Python, though.

you may be able to use pyspark and the default option of just running on a single core. But keep in mind that would just be a slower version of normal python because of all the overhead.

bfg, thanks. that might be good for learning. slower doesn't sound good for other cases, though.