Forums

run script problem

theres some script with some computations (counting keywords and store them into mysql database): on my local laptop its done through preliminary 15 minutes, but on server it still run for hours (according to "sheduled tasks" ); maybe I doing something wrong or server's resources is limited somehow?

if run this script from console its okay, but for a long time ~ hour from "sheduled tasks" its stuck somehow as I can see

Hey there,

Is one of your scripts scheduled to run at midnight (00:00)? For historical reasons, that's a really bad time, because it used to be our default time, so lots+lots of scripts are scheduled for then.

Try changing it to another time?

okay; there's another problem: installing pystemmer: (pip install --user pystemmer): there's error and exception, unable to execute gcc or something;

please install pystemmer; or gcc

It's not possible for users to install extensions which require compilation at the moment - the PAW staff will have to do that themselves, as I can't see them setting up a full compilation environment any time soon.

As an aside, if there are any other dependencies you have which might suffer the same issue, it would probably be easier to present the full list at once, rather than encountering them one at a time. If this was the only remaining dependency you need, that's fine.

Well... actually we have been thinking about installing a compiler toolchain...

But yeah. For now we will have to install pystemmer. It will then be in the next deploy.

Really? That would be pretty awesome, but also quite brave! (^_^)

I can't help wondering if there was a way to abstract it out so instead of getting access to a real compiler (and all that CPU and IO load), instead people could have a web-based means to request PyPI modules. The requests get serialised into a single queue and processed in order, and either installed into the general environment for everyone, or perhaps left in the user's own space but the build artifacts cached so the next user to request it gets it instantly. The latter approach would be more complicated, but would allow people to request specific versions for themselves.

Maybe it's just a crazy idea and it's definitely a little vague, but it would avoid lots of people having to spend the time (and available resources) compiling up the same modules. Almost certainly more complicated to implement, of course, which is a fairly significant downside.

Of course, you could alternatively just compile every released version of every package on PyPI and make them available in binary form... Hm, I can't decide if I'm entirely joking when I say that! (o_O)

EDIT: I suppose in hindsight having access to a compiler isn't actually any worse than being able to execute arbitrary code anyway, and getting that working is probably the easiest and most flexible anyway. I still like the idea of a package cache for speed and ease, but as I say it's probably quite fiddly to implement - maybe create a system protected by a writable unionfs layer on top, and then simply recover every file created during the installation process and tar them all up. Or something. One to file under "if only time were infinite...", perhaps!

I think so -- a package cache is a great idea, but probably more work. Your hindsight is right, though -- ultimately we allow people to upload files and execute arbitrary code, so adding a C compiler won't harm us (we think!) security-wise. A hacker could always compile something on their own machine then upload it.

The question is more, how much stuff would we need to add to put together a useful compiler toolchain, and would any of that stuff make life easier for Bad People to do Bad Things. Ultimately we want to make sure we have what's needed to compile, but no more.

If one assumes competent Bad People then I don't think a compiler really makes their life much easier. I think it perhaps makes it easier for people to accidentally consume more CPU and IO, because compilation can be heavy duty task, but not enough to worry about too much.

I'd probably approach this by picking a small but popular subset of packages which require compilation and then putting in the dependencies one by one until they no longer fail to build. That'll give a reasonable (but hopefully minimal) starting subset, and then more tools can be added on demand as people run into problems with specific packages.

I'm guessing you'd probably start with gcc, binutils, system and libc header files (including those for the -ld library) and Python devel header files and take it from there. You might get away with a subset of binutils, but you'll probably need at least ld, strip, ar and maybe as. You could see what the gcc package on some sample distro depends on to check for more possible dependencies.

I'm not sure if the PAW environment matches a third party distribution at all (e.g. if it's forked off Ubuntu or something), but if so then watch out for disto packagers bundling a ton of only loosely related functionality into a package. I've had to whittle a Fedora distribution down to a minimal subset before and it took a considerable amount of effort just to remove X (given this was for a headless server). I think Debian-based distros tend to be better at compiling different variants of packages with reduced dependencies, however.

Thanks as always! That's definitely the way we're planning to go -- we just need to identify the right packages. Probably best to just search our to-do list for the ones we've done in the past.

PythonAnywhere is essentially Debian, so hopefully we can keep things limited...

Debian's much better about package splitting than Fedora, certainly. I think it's no accident that it was a Debian-derived distribution that first came up with a system which automatically fetched and installed dependencies.

That's right kids, back when I started using Linux you had to figure out and install the dependencies manually.

(Actually when I first started using Linux you had to grab tarballs off a huge stack of floppy disks, but that's probably giving away a little too much information...)

EDIT: Good grief, the wayback machine still has some fragments of my personal website from University - here is me discussing Linux way back then. Not sure if it's only hilariously awful to me, but might give someone a chuckle.

http://web.archive.org/web/19980515105948/http://ajp39.caths.cam.ac.uk/linux_main.shtml#introduction

(Note: I couldn't link it - I guess Markdown doesn't like the double http:\\).

Hah! I was at Cambridge '92-'95

Me too, it's a small world :-)

I remember downloading Linux onto 20 or so floppies (bought second-hand at the Computer Lab) and installing them on my shiny new 486 PC... must have been late '92 or so...

The 486 kicked arse... At least compared to the 286 which it replaced (and which I got for free, monochrome graphics adapter and twin 5.25" drives and all). Since it was a 486SX (i.e. FPU failed production testing) it was a baptism of fire into the world of fixed point arithmetic. The days of floppy-based installs were before my uni days, however - I matriculated in '96 so I was well into the heady days of CD-based distributions by that point. These days they probably have trouble fitting it on a DVD image!

By the way, it was only recently I discovered that the 487 FPU "coprocessor" for the 486SX was in fact a fully-functional 486DX with an extra pin which simply disabled the previous CPU. Funny what market forces do to technology.