Forums

Replace single file from biopython library

Hi,

First, I love the idea of pythonanywhere... it's a great tool for learning Python and for rapid web development.

Now, my question: I need to replace a few lines of code in a BioPython library file (specifically, /Bio/GenBank/Scanner.py) to fix a known bug. (Exact bug info here: http://biopython.org/pipermail/biopython-dev/2011-October/009270.html )

The default version of BioPython installed on pythonanwhere has the bug. What is the easiest way for me to get python to use my version of Scanner.py? Should I use a virtualenv or can I just get python to somehow substitute this one file while using the rest of the library from the default install on pythonanywhere?

thank you for your help!

@neville: Welcome to PA. I'm glad you're here...☺

I'm not familiar with BioPython, however as long as it does not require a compiler (to install) the easiest way to run your custom copy would be noted here.

Enjoy!!!

@neville -- as a2j says, if BioPython doesn't need a compiler to install then pip install --user will pull in a version that you can change. If it does (and for bioinformatics I do suspect it will) then perhaps we should patch the version we have installed by default. Do you happen to know when/if the bugfix will be rolled into an official release? I see the bug hasn't been updated for a while...

Hi all... thanks for the friendly and helpful replies. I think the problem with pip as given in the link by a2j is that it will install from some repository. I made the patch myself in a single .py file. Do either of you know how I can replace just this one file? Or replace all of BioPython with my local version?

@giles: I'm not sure why this bug has sat around for so long. But the fix definitely works and fixes my issues with that particular piece of BioPython code.

I probably just don't understand how to use my own .py libraries with the pip command but can you guys can break it down for a newbie on how to use pip to install a local copy (which I can of course upload any directory in my pythonanywhere space), I would appreciate it.

Thank you very much for your help!

There's no easy way to patch files in the system-installed package, but you could install locally with pip install --user as Giles mentions above and then it'll be installed in your own space where you can patch it. Alternatively, if the change only affects a few classes or functions, you could probably monkey patch it. This involves importing the standard library (i.e. the system one) as well as your own code, and then simply replacing the system versions at runtime with your own versions.

To give you a simple example, the code below will make sure anything using the time module always gets a fixed value instead of the current time, so code can always live in the 70's forever:

def my_time():
    return 123.0

import time
time.time = my_time

# Will always print "123.0"
print str(time.time())

This will even apply to any other modules which use the time.time() function because Python only keeps a single version of each module loaded at once, so when you modify your reference to it then you modify it for everything in the same script.

This isn't always considered the cleanest coding practice, but for occasions like this it can be at least a useful workaround. Make sure you submit your fix to the package maintainers as well if you haven't already so they can fix it in the mainline and, eventually, the PA devs can pull that official fix in here to avoid you needing to do this.

@neville -- to break it down:

  • Start a Bash console
  • In it, run pip install --user biopython
  • If it reports an error, let us know here.
  • If it doesn't report an error, use the editor to edit the biopython file that needs the patch -- it will be somewhere under /home/neville/.local/lib/python2.7/site-packages
  • Save the file, then see if your code works properly.

HTH

Hi Giles & everyone else,

First, I really appreciate the help. I've learned something from each post!

Giles's super newbie-friendly break down was exactly what I needed. Unfortunately, the pip install with the user option doesn't seem to work. Here's what happens when I do it:

22:34 ~ $ pip install --user biopython
Requirement already satisfied (use --upgrade to upgrade): biopython in /usr/local/lib/python2.7/site-packages
Cleaning up...
22:34 ~ $ ls -a
 .  ..  .bash_history  .bashrc  .emacs.d  .gitconfig  .pip  .profile  .pythonstartup.py  .vimrc  Dropbox  README.txt  mysite  vcrisp

So, you can see that there is no .local folder in my home directory.

And I tried the pip command in a virtualenv and this is what I got:

22:50 ~ $ cd vcrisp/
22:50 ~/vcrisp $ source ./bin/activate                                                                                                                                     
(vcrisp)22:51 ~/vcrisp $ pip install --user biopython
Can not perform a '--user' install. User site-packages are not visible in this virtualenv.
Storing complete log in /home/neville/.pip/pip.log

And in the pip.log file:

/home/neville/vcrisp/bin/pip run on Sun May 12 22:51:36 2013
Can not perform a '--user' install. User site-packages are not visible in this virtualenv.

Any ideas how to get the pip install working? As always, thank you very much for the help!

Okay, I think I've found a way around this by giving pip the -I command:

02:25 ~ $ pip install -I biopython

Unfortunately, this too fails. There is a lot of output from this (which can easily be re-created by trying it yourself) but there are likely two reasons for the failure: 1) biopython depends on numpy, which requires a compiler and so its install fails 2) biopython itself needs some compiled elements (see: http://biopython.org/DIST/docs/install/Installation.html#htoc3 ) and so it too fails.

As far as I know, I just need to replace a single python file (ie. not compiled C code that requires gcc). Any ideas on how to do this would be greatly appreciated. I really want to use python anywhere and not have to maintain my own server just to make python web apps!

For reference, the Numpy dependency probably could be worked around, but if Biopython itself requires a C compiler then you're not going to be able to (easily) get that going. For the sake of completeness I tried building a virtualenv with --system-site-packages and then inside that doing pip install -I --no-deps biopython and indeed it definitely still requires a C compiler. Bummer.

Since we already have a compiled version installed on the system, however, we can probably engage in some underhanded hackery to clone it. Try the following - first, create a virtualenv:

source virtualenvwrapper.sh
mkvirtualenv --system-site-packages biopython-local

You can use a different name from biopython-local, but if you do then replace it in all of the commands below. Whatever name you use has to be a valid filename (and I strongly suggest sticking to letters, digits, - and _).

At this point you should see your prompt has the prefix (biopython-local) to show you're inside your virtualenv, like this:

(biopython-local)10:41 ~ $

If you don't see this on your prompt, type:

workon biopython-local

If you still don't see it, STOP and ask here for advice because something's gone wrong (probably I made a typo in the instructions or something). By the way, I strongly suggest putting that first line (source virtualenvwrapper.sh) in your .bashrc otherwise you'll have to do it each login to enable access to the mkvirtualenv and workon commands.

Once we've created your virtualenv, we can manually copy the Biopython packages into it. Paste each of these commands in turn, pressing enter after each one:

cp -r /usr/local/lib/python2.7/site-packages/Bio ~/.virtualenvs/biopython-local/lib/python2.7/site-packages
cp -r /usr/local/lib/python2.7/site-packages/BioSQL ~/.virtualenvs/biopython-local/lib/python2.7/site-packages
cp -r /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7.egg-info/ ~/.virtualenvs/biopython-local/lib/python2.7/site-packages

After you do this, you should now be able to run up an interactive Python interpreter and confirm that you're importing the version of Biopython which you just copied:

(biopython-local)10:41 ~ $ python
Python 2.7.3 (default, Apr 22 2013, 12:32:55) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import Bio
>>> print Bio.__file__
/home/neville/virtualenvs/biopython-local/lib/python2.7/site-packages/Bio/__init__.pyc

Note that the file is in your home directory. If you see the file is in fact in /usr/local/lib somewhere then stop and ask because again, something's gone wrong.

At this point you should be able to modify the files for your local Biopython install to your heart's content - they should be located under these directories:

/home/neville/virtualenvs/biopython-local/lib/python2.7/site-packages/Bio
/home/neville/virtualenvs/biopython-local/lib/python2.7/site-packages/BioSQL

If you want to edit your virtualenv, just run deactivate. When you want to get back inside it (after logging off, for example) then just run workon biopython-local.

Finally, you need to also activate your virtualenv when your web app runs, otherwise it'll still see the system-installed Biopython. This tutorial covers the steps required to run the latest version of Django in a web app which is very close to what you need to do, but you'll need to make some appropriate changes. Essentially the start of your web app will look like this:

activate_this = '/home/neville/.virtualenvs/biopython-local/bin/activate_this.py'
execfile(activate_this, dict(__file__=activate_this))

As you can see, it's just a case of running the script activate_this.py which will have been created for you in your virtualenv.

As an aside, you can read more about virtualenv on it's web page - it's an extremely useful tool for creating isolated environments so you can control which versions of libraries you use and also make your own local changes like this.

Do let us know how you get on.

@neville, you don't need the --user if you're installing into a virtualenv, but it looks like BioPython needs compilation, so you won't be able to install a version that you can edit. You might be able to put the new file into your virtualenv, and then mangle your python path so that your replacement appears earlier in the search path than the real one. Then, when the moduel gets imported, yours will be loaded in preference to the original one.

@Cartroo: Your instructions were perfect! Problem is solved with my amended Biopython library file.... and when I "deactivate" out of the virtualenv, the problem comes right back. I love the virtualenv idea; really an elegant solution to avoid things breaking after libraries are upgraded. If I convert to a paid PA account (which if there isn't any other critical stuff that requires gcc/compilation, I will), your astute troubleshooting will be the main reason!

I haven't tried out the last part of Cartroo's post for getting the web app stuff hooking into the virtualenv. Right now, I'm using Flask as my web framework, so I guess I will have to change things from the tutorial for Flask. Hopefully, it will be pretty smooth. I will definitely report back on my progress. Thanks again to everyone for being so helpful!

Update: I got flask to work with my hacked Biopython library. The tutorial link was helpful but pip did not let me install a local copy of flask (as is done for Django in the tutorial); I think it might be because the virtualenv was created without passing --no-site-packages.

The key was to use the code that Cartroo provided to activate the virtualenv and then to run import Bio after that.

Thanks again to all the help... I think this community of helping folks is perhaps the best feature that PA has going for it (and that's impressive given how innovative and easy to use the PA system is).

Great to hear that you got it all working -- and many thanks for letting us know how, I'm sure that'll be really useful for anyone else following in your footsteps.

Great stuff, glad you're making good progress!

Assuming you were following my instructions above, since created a virtualenv with --system-site-packages then you should be able to use the system Flask installation, so you shouldn't need to install Flask in your virtualenv (unless you needed to make changes of that too). If you did need to install it, I think the --ignore-installed option you found earlier should have worked, although I haven't tried that specifically with Flask myself.

Typically it's best practice not to use --system-site-packages (the default is --no-site-packages) because it keeps your environment under tighter control. You know precisely which versions of which libraries on which your code depends. However, because you needed to use Numpy but you didn't need to modify that, I figured it was easier to use --system-site-packages instead. In principle you could have cloned it into your virtualenv like we did with Biopython, but copying files around like that is a bit of a hack and it's best to restrict it to only the cases where it's really required.

I'd just like to reiterate, if this fix is generally useful then it's probably in your own interests to get it merged into the mainline project as soon as you can - most open source projects are usually happy to accept patches / pull requests from people for legitimate bugs. That way PA can update their version once it's officially released and this workaround will no longer be required.

And yes, virtualenv is fantastic - I always try and run within a virtualenv during my development process, so I don't accidentally end up depending on a system-installed library without realising it. Unfortunately on a shared service like PA it's a lot harder for the devs to make things like compilers available, so for now there are some limits on what virtualenv can do. For future reference you always want to make activating the virtualenv the very first thing you do. For command-line scripts I often use a simple wrapper script so it's activated even before the main script is executed, just to keep such dependencies outside the main script. This is rather trickier for web apps, however.

Anyway, it sounds like you've got things working which is great. I agree with your comments on the PA community - the PA devs and users are typically very helpful and the forums are turning into a great little resource. Also, in posting feedback on what is and isn't working for you, and why, then you're also making a valuable contribution back, so thanks for that! (^_^)

EDIT: Heh, Giles always gets there first. Curse my verbosity...