Forums

PIL jpeg decoder not available when starting script from bash console

I have a Flask Web App that loads and manipulates JPEG images. To improve performance I would like to preprocess some of these images using a cron job. While the code works fairly well in a Flask Context, I don't succeed in running the same code from a bash console or through cron. The stack trace I got looks like this:

Traceback (most recent call last):
  File "teampicscron.py", line 61, in <module>
    img = img.crop((int(x_offset), int(y_offset), int(x_offset)+int(crop_width), int(y_offset)+int(crop_height)))
  File "/usr/local/lib/python2.7/dist-packages/PIL/Image.py", line 763, in crop
    self.load()
  File "/usr/local/lib/python2.7/dist-packages/PIL/ImageFile.py", line 189, in load
    d = Image._getdecoder(self.mode, d, a, self.decoderconfig)
  File "/usr/local/lib/python2.7/dist-packages/PIL/Image.py", line 385, in _getdecoder
    raise IOError("decoder %s not available" % decoder_name)
IOError: decoder jpeg not available

That is weird. The web servers and console servers have exactly the same libraries and packages installed... Can you share a bit more example code, so we can try and repro?

Is either your web app or scheduled job running in a virtualenv? If so, did you create it some time ago? Some virtualenvs were broken by the Ubuntu upgrade, but that wasn't particularly recent. Even ignoring that issue, it would be useful to know if either (or both) of your tests is within a virtualenv.

Here's the Flask code:

from flask import Flask, request, json, Response, send_file
from functools import wraps
import requests
import urllib
from PIL import Image
import ImageDraw, ImageFont
import StringIO
from os import listdir
from os.path import isfile, join
import random
import sys, string, xmlrpclib, re

@app.route('/images', methods=['GET'])
def images():

   server = xmlrpclib.ServerProxy('https://xxxxx.example.com/rpc/xmlrpc')
   token = server.confluence1.login('xxxxx', 'xxxxxxxxxxx')
   page = server.confluence2.getPage(token, 'xxxxxx', 'Team Pics')
   attachments = server.confluence1.getAttachments(token, page['id'])
   attachment = random.choice (attachments)
   data = server.confluence1.getAttachmentData(token, page['id'], attachment['fileName'], '0')
   file = StringIO.StringIO(data)
   img = Image.open(file)

   teampic = attachment['fileName']
   src_width, src_height = img.size
   src_ratio = float(src_width) / float(src_height)
   dst_width = 170
   dst_height = 220
   dst_ratio = float(dst_width) / float(dst_height)

   if dst_ratio < src_ratio:
       crop_height = src_height
       crop_width = crop_height * dst_ratio
       x_offset = float(src_width - crop_width) / 2
       y_offset = 0
   else:
       crop_width = src_width
       crop_height = crop_width / dst_ratio
       x_offset = 0
       y_offset = float(src_height - crop_height) / 3
   img = img.crop((int(x_offset), int(y_offset), int(x_offset)+int(crop_width), int(y_offset)+int(crop_height)))
   img = img.resize((int(dst_width), int(dst_height)), Image.ANTIALIAS)
   img = grayscale_image(img) # convert image to black and white

Here's the code I try to run through bash:

from functools import wraps
import requests
import urllib
from PIL import Image
from PIL import ImageDraw, ImageFont
import StringIO
from os import listdir
from os.path import isfile, join
import random
import sys, string, xmlrpclib, re

server = xmlrpclib.ServerProxy('https://xxxxxxxx.xxxx.com/rpc/xmlrpc')
token = server.confluence1.login('xxxxx', 'xxxxxxxxx')
page = server.confluence2.getPage(token, 'xxxxx', 'Team Pics')
attachments = server.confluence1.getAttachments(token, page['id'])

for attachment in attachments:
   print "load "+ attachment['fileName']
   data = server.confluence1.getAttachmentData(token, page['id'], attachment['fileName'], '0')
   file = StringIO.StringIO(data)
   img = Image.open(file)

   teampic = attachment['fileName']

   print "proc "+ attachment['fileName']
   src_width, src_height = img.size
   src_ratio = float(src_width) / float(src_height)
   dst_width = 170
   dst_height = 220
   dst_ratio = float(dst_width) / float(dst_height)

   if dst_ratio < src_ratio:
       crop_height = src_height
       crop_width = crop_height * dst_ratio
       x_offset = float(src_width - crop_width) / 2
       y_offset = 0
   else:
       crop_width = src_width
       crop_height = crop_width / dst_ratio
       x_offset = 0
       y_offset = float(src_height - crop_height) / 3
   img = img.crop((int(x_offset), int(y_offset), int(x_offset)+int(crop_width), int(y_offset)+int(crop_height)))   #HERE IT BREAKS
   img = img.resize((int(dst_width), int(dst_height)), Image.ANTIALIAS)
   img = grayscale_image(img) # convert image to black and white

I'm not aware running in a virtualenv. To be honest, I'm not even really sure what that is...

OK, I can't use that code to try and reproduce the error, because I don't have access to that XMLRPC thing. Can you simplify it down and try and create a "minimal repro"?

Ideally, just 2 or 3 lines of code that try and load a local image file and call the problematic img.crop() function?

If you're feeling really curious, you could do the same inside your Flask app, create a new view+url, just do try and repro the problem, with those same 3 lines of code...

Here's a simplified version that breaks as well if invoked through bash :

import urllib
from PIL import Image
import StringIO

file = StringIO.StringIO(urllib.urlopen('http://www3.nd.edu/~networks/HumanDynamics_20Oct05/images/Einstein_1_JPEG.jpg').read())
img = Image.open(file)
img = img.crop((0, 0, 400, 600))   #HERE IT BREAKS

I get this error, even before the "img.crop":

16:55 ~ $ python /tmp/piltestting.py
Traceback (most recent call last):
  File "/tmp/piltestting.py", line 6, in <module>
    img = Image.open(file)
  File "/usr/local/lib/python2.7/dist-packages/PIL/Image.py", line 1980, in open
    raise IOError("cannot identify image file")
IOError: cannot identify image file

Same on a local windows pc... Is there something wrong with the source image file?

I'm puzzled. An hour ago I got a different error message than I get now:

16:04 ~/crons $ python teampicscrontest.py 
Traceback (most recent call last):
  File "teampicscrontest.py", line 8, in <module>
    img = img.crop((0, 0, 400, 600))   #HERE IT BREAKS
  File "/usr/local/lib/python2.7/dist-packages/PIL/Image.py", line 763, in crop
    self.load()
  File "/usr/local/lib/python2.7/dist-packages/PIL/ImageFile.py", line 189, in load
    d = Image._getdecoder(self.mode, d, a, self.decoderconfig)
  File "/usr/local/lib/python2.7/dist-packages/PIL/Image.py", line 385, in _getdecoder
    raise IOError("decoder %s not available" % decoder_name)
IOError: decoder jpeg not available
16:05 ~/crons $ 
17:19 ~/crons $ python teampicscrontest.py 
Traceback (most recent call last):
  File "teampicscrontest.py", line 6, in <module>
    img = Image.open(file)
  File "/usr/local/lib/python2.7/dist-packages/PIL/Image.py", line 1980, in open
    raise IOError("cannot identify image file")
IOError: cannot identify image file
17:19 ~/crons

Did you change anything? I have to re-check how this example behaves in a Flask environment. It's still kind of strange indeed. The image file looks o.k. as far as I can tell but we can check with other images of course.

Checked the source in a Flask web app again:

from flask import Flask, request, json, Response, send_file
from functools import wraps
import requests
import urllib
from PIL import Image
import ImageDraw, ImageFont
import StringIO
from os import listdir
from os.path import isfile, join
import random
import sys, string, xmlrpclib, re

app = Flask(__name__)

@app.route('/test', methods=['GET'])
def test():

    file = StringIO.StringIO(urllib.urlopen('http://www3.nd.edu/~networks/HumanDynamics_20Oct05/images/Einstein_1_JPEG.jpg').read())
    img = Image.open(file)
    img = img.crop((0, 0, 400, 600))

    return "OK"

It seems to work flawlessly: http://npohle.pythonanywhere.com/test

another check to demonstrate it works in a Flask context: http://npohle.pythonanywhere.com/test2

from flask import Flask, request, json, Response, send_file
from functools import wraps
import requests
import urllib
from PIL import Image
import ImageDraw, ImageFont
import StringIO
from os import listdir
from os.path import isfile, join
import random
import sys, string, xmlrpclib, re

def serve_pil_image(pil_img):
    img_io = StringIO.StringIO()
    #pil_img.save(img_io, 'JPEG', quality=100)
    pil_img.save(img_io, 'PNG', quality=100)
    img_io.seek(0)
    return send_file(img_io, mimetype='image/png')

app = Flask(__name__)

@app.route('/test2', methods=['GET'])
def test2():

    file = StringIO.StringIO(urllib.urlopen('http://www3.nd.edu/~networks/HumanDynamics_20Oct05/images/Einstein_1_JPEG.jpg').read())
    img = Image.open(file)
    img = img.crop((0, 0, 400, 600))

    return serve_pil_image(img)

I just ran the simplified version above in my PA account and it works fine with no error. Does it work for you too, now?

If not, it appears to be either user-dependent (perhaps we're on different servers?) or perhaps you're running in a virtualenv?

@Cartroo: Thanks for trying. Have you used the Flask code actually or the did you invoke the code from the shell like demonstrated in my post from July 22, 2013, 4:06 p.m.? Just to clarify: The code works for me when invoked as a Flask App but not if invoked as a stand alone Python script.

I don't think I'm using virtualenv, unless Flask is applying some magic here. An obvious difference is that the Flask Code gets compiled into a pyc file and is then loaded as a Web App by PA. Not sure what happens behind the curtains.

Yes, your code from July 22 2013 4:06pm works fine for me in a bash prompt. For the avoidance of doubt, here is a transcript:

16:51 ~/t $ cat > test.py
import urllib
from PIL import Image
import StringIO

file = StringIO.StringIO(urllib.urlopen('http://www3.nd.edu/~networks/HumanDynamics_20Oct05/images/Einstein_1_JPEG.jpg').read())
img = Image.open(file)
img = img.crop((0, 0, 400, 600))   #HERE IT BREAKS
16:51 ~/t $ 
16:51 ~/t $ python ./test.py
16:51 ~/t $

As you can see, there's no output so presumably the code is working correctly.

I don't think that the PA system will be doing any magic which is likely to affect this, and it shouldn't be running anything in a virtualenv unless you've deliberately set it up that way.

So there must be something else about the environment which is causing issues. Just for a sanity check, could you fire up a Python interpreter from the bash prompt and run the following two commands in it and paste the output:

import PIL
print PIL.__file__

interesting. I do agree that this most likely has something to do with the specific account / environment then. Here's the output from the 'sanity check':

18:43 ~/crons $ python                                                                                                                                                                              
Python 2.7.4 (default, Apr 19 2013, 18:28:01) 
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import PIL
>>> PIL.__file__
'/usr/local/lib/python2.7/dist-packages/PIL/__init__.py'
>>> exit()
18:44 ~/crons $

Hm, so you're importing the same version as me, then. Can you please check if the following files are available on your installation - they should both be there:

/usr/local/lib/python2.7/dist-packages/PIL/_imaging.so
/usr/lib/x86_64-linux-gnu/libjpeg.so

Also, in a Python shell, could you please run these commands and paste the output:

from PIL import Image
Image.core
Image.core.jpeg_decoder
Image._getdecoder("RGB", "jpeg", ("RGB", None))

For reference, I'd expect the following:

>>> from PIL import Image
>>> Image.core
<module 'PIL._imaging' from '/usr/local/lib/python2.7/dist-packages/PIL/_imaging.so'>
>>> Image.core.jpeg_decoder
<built-in function jpeg_decoder>
>>> Image._getdecoder("RGB", "jpeg", ("RGB", None))
<ImagingDecoder object at 0x7f6359442110>

If Image.core isn't defined then it points to a failure to import _imaging.so. If Image.core.jpeg_decoder isn't defined then it would appear that somehow your version of PIL was compiled without JPEG support (it appears to be just a compile-time condition). If both of those are defined by the last function still returns an error then something really odd is going on...

Apologies for making you jump through hoops like this, but it's really puzzling that we appear to be in different environments and so we need to try and get to the bottom of how they differ before we can be confident that we can fix the problem.

This is definitely weird. Everyone should have the same environment -- in fact, there's nothing in the system that could allow different consoles to differ, unless there are virtualenvs or some other way of having different Python modules installed.

Hmm, perhaps it's the Python path? @npohle, could you run Python and then

import sys
print sys.path

...?

I wondered about sys.path which is why I suggested printing PIL.__file__, but that seemed to be correct. I don't think sys.path would affect the location that PIL would search for libjpeg - my understanding from a brief look at the code is that it's a compile-time option and it uses the standard dynamic library dependency machinery to load it (rather than using dlopen() or similar) so I wouldn't expect any Python settings to have any effect. Since the error doesn't arise loading PIL, but instead whilst searching for the JPEG decoder, it didn't seem like a Python search path issue (unless it's failing to load _imaging.so, but I'd rather expect a different error in that case). I may well be missing something obvious, however!

I suppose it might be also worth double-checking that the LD_LIBRARY_PATH environment variable isn't being set to something weird.

Right, good point -- LD_LIBRARY_PATH could be the cause. Or perhaps there's some other dependency of PIL -- perhaps even a Python one? So I guess we need both...

No worries. I'm glad you are following up :-) My output (see below) looks very much like yours. That's probably a bad thing as it doesn't reveal any additional insights... maybe its easier to ask for a brand new account? I don't have that much code so transferring everything wouldn't be a big deal.

Having said that, it's an interesting problem and if you want me to try some more stuff I'm happy to do so. It's not a particularly urgent issue. I'm using PA only for nice-to-have stuff anyway.

21:38 ~ $ ls /usr/local/lib/python2.7/dist-packages/PIL                                                                                                                                             
PIL  PIL-1.1.7-py2.7.egg-info  _imaging.so  _imagingft.so  _imagingmath.so
21:38 ~ $ ls /usr/lib/x86_64-linux-gnu/libjpeg.so                                                                                                                                                   
/usr/lib/x86_64-linux-gnu/libjpeg.so
21:38 ~ $ python
Python 2.7.4 (default, Apr 19 2013, 18:28:01) 
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from PIL import Image                                                                                                                                                                           
>>> Image.core
<module '_imaging' from '/usr/local/lib/python2.7/dist-packages/PIL/_imaging.so'>
>>> Image.core.jpeg_decoder
<built-in function jpeg_decoder>
>>> Image._getdecoder("RGB", "jpeg", ("RGB", None))
<ImagingDecoder object at 0x7f18ceb281f0>
>>>

Oh... and by the way: The issue apparently was secretly solved in the meantime. The original test code from July 22 2013 4:06pm runs flawlessly when invoked through bash now.

Actually I missed your last posts @Cartroo and @giles (didn't update the tab). Looks like you hit the mark with LD_LIBRARY_PATH and somebody fixed it?

Hm, I doubt LD_LIBRARY_PATH would be set system-wide - I was mentioning it in case you'd set it yourself for some reason.

Glad it's now working for you but I've no idea why - perhaps one the PA servers had got into a half-installed state and needed refreshing or something. Perhaps the PA devs can shed some light... Or perhaps we'll never know! As long as it doesn't break again, I guess all's well that ends well.

@nphole -- thanks for letting us know. That's all really strange, but I'm glad it's sorted now. The only explanation I can think of is that there was a bug in our last release, someone on the team here patched it on some but not all of our servers and also on our development systems, and then yesterday's release updated all of the servers with the fixed version. And that would be a great explanation if anyone here remembered doing anything like that...

Anyway, we'll keep an eye out for anything related to this in the future, and if it happens again then please do tell us.

Whenever I have a problem I can't explain on remote servers, I blame NSA hackers. That's probably why there's always an unmarked van full of men in black suits and sunglasses following me...

That must be it!