Forums

Print to terminal or file: UnicodeEncodeError

Trying to print a unicode string to standard output (SSH terminal) from management command in Django (Python 3.4) app:

UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 10: ordinal not in range(128)

The same thing happens when the output is redirected to a file (in my case, when I run the management command from a script triggered by a scheduled task and the output goes to a log file).

I opted for a cheap workaround (replacing Unicode characters by a question mark):

print(unicode_string.encode('ascii', 'replace'))


There is the following environment variable:

LANG=en_US.UTF-8

Also, sys.getdefaultencoding() gives utf-8.

And the terminal correctly prints Unicode characters from another source (curl http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt).

For that to happen in Python 3, my guess is that you're not printing a string, you're printing a byte stream. The problem is not with the terminal or the file you're printing to, it's that your Python code is trying to print a byte stream without knowing enough about it to properly encode it.

I've tried this:

print(unicode_string.encode('utf-8').decode('utf-8')))

to make sure I print a UTF-8 string, but I still have the following error:

UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 23: ordinal not in range(128)

I guess it is also worth noting that the following works on my local terminal with Unicode characters:

print(unicode_string)

@ArnaudRenaud, that's a bit strange -- if you try just running

python3.4 -c "print('\u2019'.encode('utf-8'))"

from a Bash console, you'll see it works fine. So regular python3.4 in a Bash console is ok. But somehow django is overriding the encoding and forcing it to be ascii? it's very odd...

What happens if you open a Bash console, cd into your webapp, activating your virtualenv if required, and then run

python3.4 manage.py shell
>>> print('\u2019'.encode('utf-8'))

?

Whether I type: python3.4 -c "print('\u2019'.encode('utf-8'))" in the bash shell

or: print('\u2019'.encode('utf-8')) in the Django (manage.py) shell,

the outcome is the same:

b'\xe2\x80\x99'

I wonder if the scheduled tasks run with slightly different settings to the bash consoles? try changing your line in the task to:

print(unicode_string.encode('utf-8'))

The following:

print(unicode_string.encode('utf-8'))

does not crash but the Unicode characters are printed out as ASCII codes: \xe2\x80\x99.

hmmm. ok, let me look into it?

ok, here's a workaround. If you set the PYTHONIOENCODING environment variable before running your python script, you can fix things.

so, if your script was:

python3.4 -c "import sys; print(sys.stdout.encoding)"

change it to

PYTHONIOENCODING='utf8' python3.4 -c "import sys; print(sys.stdout.encoding)"

Setting the environment variable in the scheduled shell script before calling the Python script has no effect...

By the way, when I set the environment variable from the shell before running python3.4 -c "import sys; print(sys.stdout.encoding)", it still prints out the following instead of UTF-8: ANSI_X3.4-1968. Is this normal?

Are you sure? In my tests, setting PYTHONIOENCODING definitely changes sys.stdout.encoding effectively. see screenshot - the task at 14:44 prints ANSI.etcetc and the 14:46 prints utf8.

scheduled tasks page screenshot

How are you setting your environment variables? Could there be a typo?

Thank you Harry, everything is working now (I had omitted to put PYTHONIOENCODING and the subsequent command on the same line...)!