Forums

Scheduled task error

When I run my script from save and run (with hashbang for python 3.3) it executes perfectly. When I try and run it from the task scheduler the log shows this traceback

Traceback (most recent call last):
File "bitlaundry.py", line 139, in <module>
soup = BeautifulSoup(open("1.html"))
 File "/home/m3ta/.local/lib/python3.3/site-packages/bs4/__init__.py", line 166, in __init__
markup = markup.read()
File "/usr/lib/python3.3/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 3604: ordinal not in range(128)

Whats going on?

That's strange. Could it be something different about the environment in scheduled tasks and consoles means that the default language + encoding settings aren't the same? try a

print(os.environ['LANG'])

on both perhaps?

Also, take a look at the source code of '/home/m3ta/.local/lib/python3.3/site-packages/bs4/init.py' -- does it explicitly decide which codec to use?

Now I'm getting this traceback when I import os and add print(os.environ['LANG']) This is happening in scheduler and in the 3.3 interpreter.

Traceback (most recent call last):                                                                                                                                   
File "/home/m3ta/bitlaundry.py", line 13, in <module>                                                                                                              
print(os.environ['LANG'])                                                                                                                                        
File "/usr/lib/python3.3/os.py", line 672, in __getitem__                                                                                                          
value = self._data[self.encodekey(key)]                                                                                                                          
KeyError: b'LANG'

No idea whats going on here...do you support cron or any other method of scheduling scripts? Need a quick solution here if possible. Cheers!

When I just checked in my bash console, I'm not getting the LANG environment variable set at all - I wonder if this has changed as part of the recent upgrade to Ubuntu?

I'd expect the system to set this to something default (e.g. en_GB.UTF-8), but this doesn't seem to be happening for some reason. You can always add the following line to ~/.bashrc:

export LANG="en_GB.UTF-8"

Replacing the value with the appropriate locale. I've read conflicting reports of whether it's LANG or LC_ALL you should set, but I've always used LANG. However, I'm not sure if scheduled scripts will run in an environment where your .bashrc has been executed, so for now you may have to also add the value to os.environ at the start of your script.

Note to devs: the system should probably be setting a sensible default for this, but locales are a tricky subject and I gather that Ubuntu's handling of them isn't necessarily the most elegant. You might need to re-run update-locale or locale-gen after configuring something suitable in /etc/default/locale or somewhere else. I'd be more specific, but I couldn't claim to be an expert. Ubuntu has some scanty documentation on the subject, but it's not exactly detailed.

If you can set LANG to en_US.UTF-8 does that solve the problem? It is certainly something that should be already set in the environment.

import os
os.environ['LANG'] = 'en_US.UTF-8'

We'll investigate why it's different for tasks vs consoles.

I found that LANG wasn't set in consoles, although I haven't tried a scheduled task - not sure if this changed with the Ubuntu upgrade or not. If it's any help, the only locale-related variable I can see set is LC_CTYPE, which is en_US.UTF-8.

That's definitely strange. We'll look into it.

I've been having a brief poke around my own Ubuntu machines. On my work machine, it appears to be set from /etc/default/locale which itself appears to be referenced in the PAM configuration (via pam_env.so).

On my personal machine, however, I thought this was still being set despite that file not existing. However, it turns out that this was just because the machine I'd originally SSHed in from had it set and it was one of the whitelisted environment variables which got passed through SSH. Since I then ran a tmux session at the other end it persisted even when I logged in from other machines.

So it appears that perhaps Ubuntu doesn't set it by default after all - apologies if my comments above confused the issue. If that's the case, you should be able to create a file /etc/default/locale something like this to ensure it's set:

LANG="en_GB.UTF-8"
LANGUAGE="en_GB:en"

So perhaps it's not safe for code to rely on this being set after all, if default Ubuntu doesn't set it. All the same, it might be an easy fix for the sake of saving people from these sorts of issues.

Interesting. We did have to add some stuff to get this working when we switched to Ubuntu, but obviously we missed a (important!) case. I suspect if you ssh into PythonAnywhere it will be set, anyway.

We'll see if we can add it to consoles and tasks in the next release.

So whats the story here? Is there another way to schedule tasks?

I don't understand the advice on creating a file in /etc/

The discussion about creating files in /etc was aimed at the PA devs, users don't have permission to mess around in there. Sorry for confusing the issue.

Try what Hansel said -- set the environment variable inside your python program:

import os
os.environ["LANG"] = "en_GB.UTF-8"
# rest of your code goes here

Tried that, getting the same UnicodeDecodeError as before.

Well, if beautifulsoup is being weird, maybe you can try reading the file yourself, and then you'll have full control over what codec it uses to open the file, something like this:

html_string = open("1.html", encoding="utf8").read()
soup = BeautifulSoup(html_string)

Still getting the same traceback.

If you're doing exactly what Harry suggested above, the tracback should at least be slightly different - could you please paste the exact traceback here because there may be important clues in how it differs from before.

EDIT: Also, if you're happy to share the HTML file you're parsing, we could try and reproduce the failure.

Thanks, Cartroo -- just a quick note to say +1 for the exact traceback. It's likely to help us a lot in resolving this problem.

Alternatively, perhaps we could take a look at your source code? No need to post it here, if you give your permission we can log in and take a look from our side.

Hey, guys! I'm experiencing the same issue. When I run my script through console, it's OK. And it gives an "ascii" encoding error when run through scheduled tasks. I run it with <code>python3 /home/vanderloos/testrun.py</code> The difference I noticed, it returns different values for sys.getfilesystemencoding(): for console it's UTF-8, and for scheduled task it's "ascii". Setting os.environ["LANG"] does not help. Give me a note if you need further reproducing details.

We'll take a look, but we may not be able to fix it in short order... In the meantime, have you got a specific error you need to fix? You should be able to manually specify an encoding for any I/O operations...

I could not change the default encoding settings from inside Python (I used internets, but none of the ways worked; maybe due to my scripts, maybe due to the env. settings). I just called my script in Scheduled Tasks with LC_CTYPE specified and it worked at last, like:

LC_CTYPE=uk_UA.utf-8 python3 /home/vanderloos/testrun.py

Thanks for letting us know!