Forums

Multi Tenancy on PA

Has anyone achieved anything significant here on PA (working with Django) with regards to multi-tenancy? The same way that tumblr, strikingly, and everyone else is doing (yup, includes PA too I guess).

It seems like a fairly complex task to navigate, due to a combination of factors.

  • 1) PA doesn't yet support pgsql, whose schemas are the most commonly supported for third-party apps which assist with achieving multi tenancy.
  • 2) PA's web app mapping only allows for one project per subdomain, and even if we were to use multiple web apps working together, seems like it would be rather cumbersome to maintain multiple projects with the same source code and database.
  • 3) You can't create a web app on PA which resides at {{ wildcard }}.example.com. Which means that even if you were to set up a {{ wildcard }}.example.com CNAME record on your DNS, your user accessing user.example.com would just get a "unconfigured domain" message from PA.
  • 4) Multi-tenancy inception (I don't know if this is a valid point).
  • 5) Django itself isn't especially suited for multi tenant design.

All limitations considered (especially including my ability), I would still much prefer sticking to PA so I don't have to deal with all that icky server stuff. I would probably stick to a non-subdomain way (meaning, using urls) of implementing the same functionality at the moment, but of course that wouldn't be ideal.

I would love to hear any thoughts on the matter!

*** The above is the result of an afternoon's research findings. If I sound like an idiot, let me know. I'll be glad if I've been thinking about things in too complex terms :)

{{ wildcard }} == * // don't know how to get rid of the italics

It's unlikely that you'll be able to make domain-based multi tenancy work on PythonAnywhere, but you could get something working with a WSGI app that acts as a switching point for your other WSGI apps where each one is at its own space (like http://here.com/webapp1 and http://here.com/webapp2). Each app could be entirely stand alone except that they publish their wsgi app with a particular name that the switching app could import. It's not a perfect solution, because unless you do something weird with imports and reloads, adding a new wsgi app or changing one of the apps would require a reload of the switching app.

Yup... Thought so :/ Oh well, at least I know my options clearer now! Thanks glenn!

I've added handling wildcard domains to our to-do list, I don't know how much work it would be but it's something we should at least be thinking about.

Sounds great! :)

+1 for wildcard domains. I'm using web2py and it has great support for multi-tenancy. Right now I'm just detecting the tenant after login but I'd like to use a unique subdomain for each tenant (web2py can also use the first URL arg) to show a personalized home page.

+1 for wildcard domains. Setting up multi tenancy sounds intriguing, especially now that Postgre is available, but I don't see it being as easy as I hoped since I don't think it is possible to set up and use subdomains on the fly on PA.

+1 noted!

+1 for wildcard domains - I have something I want to host on PA that's multitenanted based on subdomain!

+1 noted :-)

Have the same issue. Any progress on the wildcard domains?

We have made some progress and some infrastructure changes to make this more feasible (eg: how you can change your webapps to a different domain now). But we have not implemented wildcard domains yet.

It seems today it is not possible to use e.g. "django-tenant-schema" with postgress on Python Anywhere due to the wildcard domain limitation?

See: https://django-tenant-schemas.readthedocs.io/en/latest/index.html

Then what is the suggested approach to implement multi-tenancy today without wildcard domains?

If u want to have infinite number of x.yourdomain.com and y.yourdomain.com etc, PythonAnywhere doesn't support that right now.

Is there a plan or a solution to work around this?

See also my post here: http://stackoverflow.com/questions/40973589/multi-tenant-django-application-using-url-mappings-instead-of-domain-mappings

If not, I will be pushed towards other SaaS providers

I've just posted on your Stack Overflow question -- it looks like django-multitenants may have been planned to be a fork of django-tenant-schemas that supported URL-based mapping, but that feature was never added.

It's going to be a while before we support wildcard domains -- we're taking steps in that direction, but it's a major change to the way we route requests around our system, which right now is based on the hostname in the request, so we're taking it one step at a time to make sure we don't break stuff.

Could you give a few more details about why you're looking into multi-tenancy specifically, rather than (say) just extracting a user object from the request's URL? For example, are you planning to use different databases for each user, or something like that? With a few more details we might be able to suggest something.

Why multi-tenancy:

My app is nearly ready for 1 customer. But since the business model doesn't make sense with just 1 customer, I need to foresee the application & the deployment infrastructure for multiple customers.

The most obvious way to support multiple customers without complicating the application code (extra query & additional logic in almost every view) is multi-tenancy.

Django supports this, typically by mapping differents sub-domains (hosts) onto 1 django application while using different postgress schemas for each customer. So every customer would have a URL like:

customer1.myapp.com
customer2.myapp.com

This requires:

  1. postgress DB (to have the schemas, which mysql doesn't support)
  2. wildcard domains (to map multiple customer "hosts" onto the same django application

Then the django app managing the multi-tenancy maps the multiple hosts onto regular django views passing an extra parameter 'tenant' and automatically performs the extra filtering to pass the data relevant to the particular tenant to my django application:

customer1.myapp.com/view1/arg1 -> myapp.view1(arg1)  using schema 'customer1'
customer2.myapp.com/view1/arg1 -> myapp.view1(arg1)  using schema 'customer2'
customer3.myapp.com/view1/arg1 -> myapp.view1(arg1)  using schema 'customer3'

I hope this clarifies the question.

hmm- how many customers do you expect to have?

Difficult to predict, but designing for a number between 10 and 100 in the first 2 years.

OK, so not really a case where you could create one website on PythonAnywhere for each user.

As it looks like the best option from the Django perspective is to use django-tenant-schema, given that django-multitenants doesn't work, perhaps a good trick would be to trick it into thinking that the host header provided is different to what it would normally be.

If you create a Django app on PythonAnywhere, and set it up to use django-tenant-schema, then you can actually write some WSGI code to take a look at the path part of the URL of an incoming request and change the host header appropriately, and put that in the WSGI file. For example, the following code will do this:

  • http://www.yoursite.com/ -> request sent to Django for "/" on host www.yoursite.com
  • http://www.yoursite.com/user1/ -> request set to Django for "/" on host user1.www.yoursite.com
  • http://www.yoursite.com/user1/foo -> request set to Django for "/foo" on host user1.www.yoursite.com

.

import os
import re
import sys

# add your project directory to the sys.path
project_home = "/home/username/path"
if project_home not in sys.path:
    sys.path.append(project_home)

# set environment variable to tell django where your settings.py is
os.environ['DJANGO_SETTINGS_MODULE'] = 'project_name.settings'

from django.core.wsgi import get_wsgi_application
django_application = get_wsgi_application()

def application(environ, start_response):
    path = environ.get("PATH_INFO")
    user_match = re.match(r'^/([^/]+)(/.*)$', path)
    if user_match:
        user = user_match.group(1)
        path = user_match.group(2)
        environ["HTTP_HOST"] = "{}.{}".format(user, environ["HTTP_HOST"])
        environ["PATH_INFO"] = path

    return django_application(environ, start_response)

I imagine that's not exactly what you need (the extra "www" in the modified hostnames looks wrong) but if you let me know more about what would work for your setup I'd be happy to update it -- it's actually quite an interesting problem :-)

I now have been installing postgress locally and upgraded my account on pythonanywhere with postgress support.

Locally I have multi-tenants working fine using the package “django-tenant-schemas”. Now trying to deploy this on pythonanywhere by applying the suggested solution in above topic.

It somewhat works, in the sense that http://<mysite>/<customer1>/user/login/ indeed shows my login page and uses the postgres schema referring to customer1.

But many issues are present mostly with redirects and links within the site. Those links & redirects do not include the extra level /<customer1> so are all wrong. Any suggestion how to solve this transparently for the application code?

Note: don't get what is meant with the "extra www" in above comment. This is not posing issues so far.

I believe host based multi-tenancy with wildcard domains would be a better solution.

By the "extra www" I meant the one in the hostnames that are coming through, for example:

  • http://www.yoursite.com/user1/foo -> request set to Django for "/foo" on host user1.www.yoursite.com

-- it's going to user1.www.yoursite.com instead of user1.yoursite.com. But if that's not causing you problems, then that should be OK.

For the redirects and links within the site -- do you mean that you have some URLs that are, for example, http://www.yourdomain.com/x/y, with no customer name, and others that are http://www.yourdomain.com/customer1/a/b, which do?

If so, then the code that I gave would have to be modified to be able to work out when the first part of a URL path was a customer name, and when it wasn't. Is there anything that would allow code to distinguish between "customer1" and "x" in the above examples? For example, if you were happy to make a small change to the WSGI file when you added customers, you could put a list in the code and switch based on whether the first part of the path was in that list.

Agreed that host-based multi-tenancy is a good idea -- it's just going to take a while for us to change the way requests are routed.

Still no solution.

Problem with the link remains: Links to internal pages on the django site are of form http://yoursite.com/app/link while for this solution to work, they should be: http://yoursite.com/customer/app/link

So while the first request to the (manually entered URL) http://yoursite.com/customer/app/link works fine (due to the WSGI script translating this into http://customer.yoursite.com/app/link, this does not work for link on that page as the <customer> part is not part of the link.

I tried playing with HTTP_REFERER to know where the request came from but that is not a good solution as HHTP_REFERER is not guaranteed to be correct under all browsers / circumstances.

Suggestions?

What I meant in my last post was that you could change the WSGI code that I originally posted so that it had some way of telling whether in a particular link http://yoursite.com/A/B/ should be interpreted as "this is for customer A, so I should hack it to look like a request to http://A.yoursite.com/B or whether it should be interpreted as "this is an internal link so I should not hack it at all, and send it to Django as http://yoursite.com/A/B/.

One way of doing that (which might be a paid to maintain) would be to have a list of the customers, so that the WSGI code would look like this:

import os
import re
import sys

# add your project directory to the sys.path
project_home = "/home/username/path"
if project_home not in sys.path:
    sys.path.append(project_home)

# set environment variable to tell django where your settings.py is
os.environ['DJANGO_SETTINGS_MODULE'] = 'project_name.settings'

from django.core.wsgi import get_wsgi_application
django_application = get_wsgi_application()

USERS = ('username1', 'username2', 'username3')

def application(environ, start_response):
    path = environ.get("PATH_INFO")
    user_match = re.match(r'^/([^/]+)(/.*)$', path)
    if user_match:
        user = user_match.group(1)
        if user in USERS:
            path = user_match.group(2)
            environ["HTTP_HOST"] = "{}.{}".format(user, environ["HTTP_HOST"])
            environ["PATH_INFO"] = path

    return django_application(environ, start_response)

Of course, that would have the problem that you'd need to edit the WSGI file when you added a user.

An alternative would be to do it the other way around. For example, if all of the internal links were of the form http://yoursite.com/internal-SOMETHING/B/ then the WSGI code could be:

import os
import re
import sys

# add your project directory to the sys.path
project_home = "/home/username/path"
if project_home not in sys.path:
    sys.path.append(project_home)

# set environment variable to tell django where your settings.py is
os.environ['DJANGO_SETTINGS_MODULE'] = 'project_name.settings'

from django.core.wsgi import get_wsgi_application
django_application = get_wsgi_application()

def application(environ, start_response):
    path = environ.get("PATH_INFO")
    user_match = re.match(r'^/([^/]+)(/.*)$', path)
    if user_match:
        user = user_match.group(1)
        if not user.startswith("internal-"):
            path = user_match.group(2)
            environ["HTTP_HOST"] = "{}.{}".format(user, environ["HTTP_HOST"])
            environ["PATH_INFO"] = path

    return django_application(environ, start_response)

Without knowing more about the actual structure of the URLs you're using, I can't be more specific.

I do not really have an problem to update the wsgi file for each additional customer. That would cause only a few seconds downtime (time to reload the application) when adding a customer I guess?

Problem is in the internal links. The application is not aware of the multi-tenancy.

So the internal links are of the form: http://yoursite.com/app/link

But to distinguish the tenant, the links should be e.g.: http://customer1.yoursite.com/app/link But this cannot be done since wildcard domains are not supported.

Alternatively, links could be: http://yoursite.com/customer1/app/link But not clear how to do that in django since the application is itself not aware of the "customer" it is. There might be a way in django to make this working, but it should work for all apps.

So the issue is that also internal links must point to the right tenant (customer)...

Ah, I think I see -- so, when a customer is browsing their site, the system assumes that they're on "customer1.yourdomain.com", so internal links within the site are just to (say) "/a/b", rather than to "/customer1/a/b". Is that right?

In that case, and if it's hard to change the internal links (I assume it is, otherwise you wouldn't be asking) maybe this hacking of the hostname in the WSGI file isn't going to work.

I don't know what your revenue model is, or how much traffic a given customer is likely to generate, but perhaps then the simplest option on PythonAnywhere -- until we support wildcard domains -- would be to have a separate web app on the "Web" tab for each customer. A normal "hacker" account can add on web apps for $2/month each. And you can have as many web apps as you want pointing to the same code, so there wouldn't be any duplication there.

Might that work?

Yes, correct

The proposed solution looks acceptable to me as long as the webapps can all point to the same postgres database and use the same code. I'll give it a try.

Thank you for your help.

Yes, they definitely can point to the same DB and use the same code. It should be pretty easy and obvious to set that up, but do let us know if you have any problems.

I also have a multi-tenant requirement, but (in the short term) plan on handling it thru multiple webapps pointing to the same postgres instance with multiple schemas. I don't anticipate more than 5 clients in first year or so. There are certain advantages in having separate webapps for small numbers (handling of log files and defects for example).

My only question is handling wildcard certificates -- Are they currently supported on PAW?

Yes, if you have a wildcard cert, we can apply it to the domains that you specify with no problem.

Setup works fine now. Now since every customer has a separate webapp, how can we dynamically create new customers? (setting up a new webapp is a manual process...)

And: is there progress on the wildcard domains? As that is still the ultimate target here.

Thank you

We have a very experimental API that you can use to create and configure web apps. I can enable it for your account, if you like.

We have not made any progress on wildcard domains.

Yes pls enable it for my account. Where can I find the documentation?

Ok. I've enabled it. If you go to the Account page, you'll see a new tab called "API token". Once you have created a token, the (admittedly sparse) docs will be visible.

All running fine, still eager to work with wildcard domains as the overhead of multiple apps is annoying (upgrade takes longer, cannot dynamically add clients, need more ssl certificates, ...) . I would rather pay for more threads than for more apps. Any progress on wildcard domains?

No progress on wildcard domains -- we know how to implement it, but there are other things we're working on first.

Regarding the SSL certificates -- we do support wildcard certificates, so you only need one. Though, of course, it does need to be installed for each subdomain web app you add, so it's not completely transparent.

Just to check, when you say you "would rather pay for more threads than for more apps", you mean you'd rather have more worker processes, right?

Correct. Reason: increasing workers scales transparently, while adding apps requires manual work

Understood. We'll keep you posted, but no news for now I'm afraid.

How is wildcard dns progressing? I would also like this feature as it would help me set up varying environments for development, testing, etc. in my web2py applications. I currently have wildcard forwarding set up through my domain name registry but when it is forwarded to example.mydomain.com, a PA "Coming Soon!" page is shown. An example of how this could be useful would be checking the sub-domain per environment (https://stackoverflow.com/questions/6180592/web2py-with-configuration-per-environment) to properly point to different databases, settings and features while visiting development.mydomain.com, testing.mydomain.com, etc.

Please let me know if here is another way to accomplish this in the mean time, as I do not want to setup a PA web app per sub-domain as that limits the flexibility of the service.

Currently the only way to server requests on a domain is to set up an individual web app for that domain, I'm afraid.

You can copy + paste the contents of an existing wsgi file into a new web app wsgi file, and then it should only need minor tweaks. that might save you a little time. hopefully if you're just setting up www+testing+dev, that's not too much manual work.

Will keep this thread posted with updates re: multi-tenancy, but nothing is scheduled yet.

Perhaps this question is much more basic, but perhaps not. Suppose I want one domain, www.some-example.com, to host a service, but each of my user accounts (one account, possibly multiple users) needs the equivalent power of a PyAW "Hacker" account plus some additional code. Is that possible? I don't know if it's multi-tenant or single tenant or what, but it seems like every time I would hypothetically add an account, I'd need to auto-buy and auto-setup another "Hacker" account.

what's the "power" of a hacker account? do you mean the cpu seconds?

if you have multiple users all on the same site, and you need to add more computation power etc as you get more users, then you can customize your plan and add cpu seconds/web workers etc.

by "power" I mean mostly RAM, but also cpu-seconds and storage. I don't have any hard numbers, so it's only theoretical/hypothetical at the moment. if users are doing calculations (e.g., Pandas), at some point the customized plan will need to add resources. On a tangent, I'm still not sure how to think about the web worker, but that seems to be more about volume of requests, e.g., social media-type of growth/scalability, not resources used/user for some calculations?

[deleted]

[deleted]

Usefully enough, I just posted an explanation of how web app worker processes operate on another thread -- I'll copy/paste it here, just in case it clarifies things:

When a request comes in for a page on your web app, it's put on a queue. At the other end of the queue are your worker processes; when they're available, they take a request off the queue, process it, then go back to looking at the queue. It's basically the same model as many banks often use in the real world, where you have a single queue and multiple cashiers.

So if one of your workers is handling a request that takes a long time, if the other workers are handling normal requests that can be handled quickly, everything should be OK -- there might be a bit of a slowdown, but not much. In the bank metaphor, this is when one customer comes in with a time-consuming transaction, but the others are just doing normal short transactions. But if all of your workers are handling long-running requests, then there can be problems. Maybe this second case maps reasonably well to a situation in a bank where local businesses that operate in cash all close for the day and all send people to the bank to deposit the day's takings at the same time, so the queue gets long. The best solution to that is to increase the number of workers.

So, getting back to your original point -- in the app you're describing, are you planning to do the pandas etc calculations inside the code that's handling requests for your web app?

Thanks giles. That is a very key question. I'm not sure. Web requests should be quick and are scalable with web workers, but pandas could be longer (and require more memory). Some ideas but I'm not sure of the tradeoffs or what's more feasible.

  • pandas code is in same code for web app.
  • "batch queue" where pandas code is separate from web app request handling.
  • some other way to manage scalability of memory for pandas vs. web app (just relatively simple requests).

our help pages have some tips on dealing with heavy processing for web apps using async queues -- take a look, and if you'd like to discuss in more detail maybe start a separate forum thread?

harry, thanks a lot, will definitely check that out. still using "small data" but need to plan ahead.