Forums

Migration from mod_wsgi... confused

This is probably a very dumb question, but I cannot figure it out by myself. I have this simple wsgi application I wrote (without frameworks) and it works flawlessy on my local server with apache+mod_wsgi. I imported it here, but every request seems to be answered by the default handler _python_anywhere_com_wsgi.py eg if I point the browser at /report.py or even at invalid urls I still get the login page. The handler is:

# +++++++++++ CUSTOM WSGI +++++++++++
# If you have a WSGI file that you want to serve using PythonAnywhere, perhaps
# in your home directory under version control, then use something like this:
#
import os
import sys

path = '/home/tclodi/'
if path not in sys.path:
    sys.path.append(path)

from login import application

Thanks for any help

Have you clicked on 'reload web application'?

Yes, but it didn't make a difference... It just occurred to me, maybe I am supposed to multiplex all the requests through the default handler, eg

if environ["REQ_URL"]=="/login":
  # login stuff
elif environ["REQ_URL"]=="/report":
  # report stuff

?

Hmmm: I've never hand-built a wsgi application (I've always used the frameworks). I think one of the developers is going to have to step up and help you :-)

Any particular URL must be associated with a single WSGI application. Here on PA, this is specified on a per-domain basis - for free accounts, that essentially means a single WSGI application. This means essentially that you need to have a single WSGI app which delegates control to other applications for specific URLs.

So essentially, yes you need to do something as you've suggested. I'm not sure about that REQ_URL variable - I thought the WSGI standard was PATH_INFO (and SCRIPT_NAME for applications hosted below the root), but it's been a long time since I did raw WSGI.

As an aside, I would suggest that your life is going to be a lot easier if you use some sort of framework - microframeworks such as flask, bottle or wheezy.web are lightweight enough to be helpful without constraining you too much (personally I find Django and the like a bit too prescriptive for my liking, but it's largely personal preference).

Even if you don't fancy using a full framework (and I can quite understand that) I'd still really recommend you use an object wrapper such as WebOb to make your life easier. Using raw WSGI is fine as a learning exercise, but it gets pretty tedious once your application gets beyond trivial.

@tclodi, from the code in your original post, it looks like you import the application from login. Do you have a separate application for login and for report? If you do then you would need to do some sort of routing to the different applications. Our infrastructure assumes that the application imported into the wsgi file is the application that handles all the requests. I believe you can combine several WSGI applications into a single application, but I have never tried it.

@Cartroo Thanks for the answer! That makes sense

@glenn I have separate py files for different activities: on mod_wsgi, for example, I'd call /login.py for login, /report.py for the report, /update.py to update the records in the db, /action.py for miscellaneous actions and so on. So I understand I'll just have to add a small multiplexer to the handler. Maybe you could make it clearer somewhere? I just assumed it would work like mod_wsgi does. Thanks!

Sorry to be really pedantic, but it's not that it works much differently from mod_wsgi, the only difference is that you don't have control over the webserver configuration. This is because the webserver is a global front-end to all PA users and the devs clearly can't risk one user being able to affect service for other users by breaking the configuration. Effectively it's like a mod_wsgi that someone else has set up at the root and you can't change it (actually it's nginx rather than Apache, but that's pretty incidental).

Since most people are using a framework of some sort this isn't much of an issue, because they all have some sort of request routing functionality. Even for raw WSGI, putting in the trivial "routing app" isn't hard, but I agree that you first need to know that you need to do it.

I guess some sort of FAQ entry clarifying the fact that all WSGI apps are hosted at the root of the domain might prevent confusion - I'm just not sure whether it's something that's going to crop up commonly enough to be worth a FAQ, since it'll only effect fairly advanced users.

That's not pedantic at all, I appreciate it. But I think there is a misunderstanding here: the difference I'm talking about is that with mod_wsgi, without touching the configuration, I can drop python files in the root directory and they will be executed when called with their url. Here, regardless of the url, the default py file will always be executed.

That definitely sounds odd. Would you mind if I took a look at your code to try to see what's going on?

Ahh, are you using WSGIScriptAlias to specify a directory instead of a file path? So it executes any .py file in that directory, a bit like CGI scripts? If so, that's not how I've used mod_wsgi in the past, I've always specified the path to an individual script file, which is used to serve all requests under that path.

This is more like how PA works, where you have a single entry-point for your whole domain. In my experience this is how most WSGI applications are set up, and many of the frameworks provide convenient request routing functionality so you can map URLs within your application rather than the WSGI middleware having to do it for you. Generally this makes your applications more portable since pretty much any WSGI hosting environment provides for a single entry-point, whereas the way of mapping URLs to many entry-points tends to vary quite a lot.

So yes, PA always executes the specified Python file. To do what you're hoping you'd need to have some sort of WSGI application which auto-loaded scripts from a directory yourself.

EDIT

For interest, I've had a quick stab at writing such a beast, using the Flask auto-generated WSGI file on PA as a starting point. Here's what I got:

"""Delegating WSGI application.

This application delegates the request to either a default application or one
of a set of "hook" scripts if they match the first path item.
"""

import errno
import imp
import os
import sys
import threading
import time

MIN_CHECK_TIME = 30

# Add your project and hooks directory to the path.
# (Leave initial entry alone on the assumption it's the current dir)
project_home = os.path.expanduser("~/mysite")
if project_home not in sys.path:
    sys.path.insert(1, project_home)
hooks_home = os.path.join(project_home, "hooks")
if hooks_home not in sys.path:
    sys.path.insert(2, hooks_home)

# Import "default" application to use in case no hooks apply.
from flask_app import app as default_application


class Hook(object):
    """Represents a hook script."""

    def __init__(self, name, path):
        self.name = name
        self.path = path
        self.last_used = 0
        self.mtime = 0
        self.app = None


    def get_app_func(self):
        """Return app function or None if no longer valid.

        This function checks whether the module needs to be reloaded, or
        if it's been deleted. It returns a reference to the app() function
        from the module if still valid, or None otherwise. It only checks
        the file at most every MIN_CHECK_TIME seconds, returning the most
        recent cached function reference in between these checks.
        """
        if time.time() - self.last_used < MIN_CHECK_TIME and self.app:
            return self.app
        self.last_used = time.time()
        try:
            cur_mtime = os.stat(self.path).st_mtime
        except OSError, e:
            if e.errno != errno.ENOENT:
                print >> sys.stderr, "error reading %s: %s" % (self.path, e)
            return None
        if cur_mtime <= self.mtime and self.app:
            return self.app
        try:
            module = imp.load_source(self.name, self.path)
        except Exception, e:
            print >> sys.stderr, "error loading %s: %s" % (self.path, e)
            return None
        try:
            self.app = module.app
        except AttributeError:
            print >> sys.stderr, "module %r has no app() func" % (self.name,)
            return None
        self.mtime = cur_mtime
        return self.app


# Global variables for caching hooks.
hooks = {}
hook_lock = threading.Lock()

def get_hook(name):
    """Returns the application hook for specified app, or None."""
    try:
        with hook_lock:
            hook = hooks.get(name, None)
            if hook is None:
                path = os.path.join(hooks_home, name + ".py")
                if not os.path.exists(path):
                    return None
                hook = Hook(name, path)
                hooks[name] = hook
            app = hook.get_app_func()
            if app is None:
                del hooks[name]
            return app
    except Exception, e:
        print >> sys.stderr, "error loading %r: %s\n" % (name, e)

    return None


def application(environ, start_response):
    """Main application wrapper - delegates to either default or hook app."""
    path_items = environ.get("PATH_INFO", "").lstrip("/").split("/", 1)
    app = None
    if path_items and path_items[0]:
        app = get_hook(path_items[0])
        if app is not None:
            environ["SCRIPT_NAME"] = "/" + path_items[0]
            if path_items[1:]:
                environ["PATH_INFO"] = "/" + "/".join(path_items[1:])
            else:
                environ["PATH_INFO"] = ""
    app = default_application if app is None else app
    return app(environ, start_response)

This implements the usual behaviour for the PA flask setup, which is to use the application object app from the file flask_app.py in ~/mysite to serve the request. However, it also looks in ~/mysite/hooks for Python files and if the top-level directory in the request URL matches any of those, it passes control to that instead (again it looks for an application called app in the file - this could be a Flask app but I've tried to keep the code WSGI-clean so it should work with any framework).

It attempts to amend the SCRIPT_NAME and PATH_INFO environment so the URLs in the delegated Python file will be relative to the script. For example, if you have a file ~/mysite/hooks/example.py and you make a request to http://username.pythonanywhere.com/example/foo/bar then your SCRIPT_NAME will be /example and your PATH_INFO will be /foo/bar. In a Flask application, this means you'd route the request with:

@app.route('/foo/bar')
def foo_bar_handler():
    # Code here

It should automatically reload any module which you've modified (it examines the last-modified time of the Python scripts on each request), but it has a hard-coded 10 second limit to prevent a high request rate from thrashing the filing system (i.e. it only checks for new and updated scripts at least 10 seconds after the last time it checked).

Bear in mind this is something pretty quick I've just whipped up, but hopefully it illustrates the idea. No, it's not the cleanest code in the world; yes, it needs more comments; yes, there are a number of efficiency improvements which could be made. But it's really just an example - you can modify it as suits your needs. To use it, go to your Web tab and look for the message "It is configured via a WSGI file stored at..." - that points you at the file you need to amend.

Warning: Unless you're pretty confident with your Python, I would suggest proceeding with caution. PA does a great job of insulating users from the internals of WSGI, but this script is pretty low-level and there's a risk you might spoil your working web app by messing around with these files. If you're pretty confident in how it all works then it's not too hard to get back, but I felt I should warn people - this isn't really aimed at total novice users.

EDIT 2: I've simplified the app a bit - I realised it was being needlessly inefficient by checking for all hook scripts instead of the one being requested. I did mention I'd whipped it up pretty quickly...

EDIT 3: Tidied it up a bit more, used a class to represent hook scripts. No longer working on something with long compile cycles, however, so that'll have to do for today... (^_^)

@giles Sure, do what you want with it :-)

@Cartroo Wow, thanks! In the meantime I wrote a simple multiplexer.py, but your solution is a lot more flexible. Yes, I'm using WSGIScriptAlias, actually I wasn't aware it's not the default...

Well, it's not really about "default" begaviour. WSGIScriptAlias can accept either the path of a file or a directory - if you specify a file, that file is used for all requests; if you specify a directory, it operates more like CGI and attempts to dynamically load files from that directory. It just depends how you've configured it, both are equally valid.

(Just to clarify for anybody else reading, I'm describing Apache's behaviour, PA works slightly differently as outlined earlier in this thread).

A simple multiplexer works fine, but the above code has the advantage of not requiring a web app reload.

@tclodi -- I think @Cartroo has it worked out. I see your multiplexer app, and doing things that way or using his dynamic multiplexer definitely looks like the way to go under PythonAnywhere.

Thank you!