Forums

Improved GitHub integration?

Hey!

I'm missing the feature of direct GitHub integration. Something similar to StackMob's custom code GitHub integration. So basically if I create a web app, I can then through the manage web app interface pair my app with a github repository and then have it update automatically when the repository is pushed to. In StackMob I can configure whether I want automatic updates or not though.

Cheers! --Andreas

As an interim measure you could set up an hourly scheduled job to do a git pull, but I appreciate that's not quite the same as a commit trigger. Also, it wouldn't be too hard to write a WSGI application that can support the Webhook URL interface and respond to any POST with a simple git pull. I'd give it a go myself, but I'm currently rather busy.

I'm sure the PA devs might consider writing a more customised PA commit hook and committing it to the github-services repo, but it's probably not a particularly high priority and also the code is in Ruby which probably requires a mental gear shift!

Hi bitwalk, that's an interesting idea for sure. It's a bit more difficult than us than for stackmob because everyone's code is written against a different framework.

Are there any git hooks that you could write that would solve your problem on the client side?

My development process normally go slightly differently. I would actually push to my PythonAnywhere repo, make sure that works, then I would push to github.

I was bored during my lunch hour, so I had a go at writing a raw WSGI app which could acts as a Github Webhook:

import git
import json
import urlparse

class RequestError(Exception):
    pass

REPO_MAP = {
    "test": "/home/Cartroo/repos/test"
}

def handle_commit(payload):
    try:
        # Only pay attention to commits on master.
        if payload["ref"] != 'refs/heads/master':
            return False
        # Obtain local path of repo, if found.
        repo_root = REPO_MAP.get(payload["repository"]["name"], None)
        if repo_root is None:
            return False

    except KeyError:
        raise RequestError("422 Unprocessable Entity")

    repo = git.Repo(repo_root)
    repo.remotes.origin.pull(ff_only=True)
    return True

def application(environ, start_response):
    try:
        if environ["REQUEST_METHOD"] != 'POST':
            raise RequestError("405 Method Not Allowed")
        post_data = urlparse.parse_qs(environ['wsgi.input'].read())
        try:
            payload = json.loads(post_data["payload"][0])
        except (IndexError, KeyError, ValueError):
            raise RequestError("422 Unprocessable Entity")

        if handle_commit(payload):
            start_response("200 OK", [("Content-Type", "text/plain")])
            return ["ok"]
        else:
            start_response("403 Forbidden", [("Content-Type", "text/plain")])
            return ["ignored ref"]

    except RequestError as e:
        start_response(str(e), [("Content-Type", "text/plain")])
        return ["request error"]

    except Exception as e:
        start_response("500 Internal Server Error",
                       [("Content-Type", "text/plain")])
        return ["unhandled exception"]

It's a bit quick and dirty, but I think it should do the job.

Very few comments I'm afraid but hopefully it's pretty obvious. One thing you'll need to change is the REPO_MAP dictionary, which maps a Github repository name on to the path to that repository in your filespace (including the leading home directory). So, remove the example test entry I've added and replace it with one or more entries for your own repositories.

You may also wish to change refs/heads/master to a different branch if you want to use, for example, a staging branch to push to your repo.

EDIT

Because that looks like a quick script I feel I should add a comment for the benefit of anybody thinking "oh, that's pretty easy, the PA devs can just integrate that script". It would be considerably more difficult for the PA platform as a whole to implement such a thing for lots of reasons, mostly relating to having to cover a multitude of potential use-cases. For example, PA doesn't constrain where in your filespace you clone your repositories.

Also, I should add that, as it stands, that WSGI script is totally insecure - anybody could push updates to it to trigger your repository to update. They can't insert code, of course, because it's only a trigger to pull from the official repository - but conceivably one could imagine some sort of DoS attack where you force the Git repo to endlessly update itself.

A better solution would involve at least a simple fixed secret that must be passed as a query parameter by the Github hook (presuming you trust everyone with access to the settings of the repository), and also ideally it would use something like Beanstalk to carry out pull requests asynchronously so that rapid updates can be batched up and rate-limited (to mitigate just the sort of DoS attack I mentioned). Hopefully this proof-of-concept code is still a useful starting point, however.

Thanks Cartroo! I learned alot from your example and reasonings around it!

--Andreas