Project Euler

Well, problem one took me about 30 seconds to code. I did brute force it.

Does that make me a bad person?

rcs1000: You're only a bad person if you brute force 1000-digit Fibonacci number (Problem 25)...j/k

Yes, it makes you despicable... (^_^)

Nah, the first few are basically warm-up exercises, from what I remember. Also, some of them you probably can brute force or use uninteresting approaches, but half the fun is trying to come up with a decent time via a clever approach.

For example, you could probably solve problem 7 in a reasonable time with a simple Sieve of Eratosthenes, but if you pretend you're on a system with much tighter memory requirements or if you needed the billionth prime instead then you need something more clever.

Also, prime numbers crop up again in later puzzles so you'd benefit from having a fast prime generation algorithm, and also a good quality primality test, which is quite different (and should be a lot faster).

Psssst... Using is cheating... (^_^)

Now there's a funny thought. If you have a good ping time you could solve a lot of these with clever searching/parsing of Google queries. And the algorithms would 'solve' in record time...☺

As a web dev that's how I thought you were supposed to solve them? Why would you redo someone else's work? Also, you can then make a generic solution that is given a problem number and finds the best solution...

You guys are funny :-)

And, yes, I solved 3 with a sieve approach (before I read your post!)

Until today, I had no idea how much faster a sieve was than the (naive, simple) alternatives. I wrote three prime generators as part of solving 3, and found my sieve algorithm was 50x faster than my first attempt.

Problem 4 done :-)

Oh yes, the sieve approaches are pretty quick, although there are some common misunderstandings - I've some people describe the Sieve of Eratosthenes as trying all the previous primes, as opposed to knocking out numbers ahead of time (which is rather the whole point).

The problems with them are not so much speed as storage - once you get to very high primes, having to keep a set of all integers gets pretty big. In C/C++ you can get a bit more headroom by storing the sieve as a bitfield (so only using a single bit per integer), but even so it gets fairly limiting.

Also, knowing how high you need to go in advance is a bit of an issue. I did once write a "chunked" sieve implementation which generates them in chunks of a million (or something) at a time - when it creates a new chunk, it has to do a pre-pass knocking out the multiples of all the ones generated so far. This makes it convenient to generate primes arbitrarily high, but obviously isn't as fast as doing a single-pass sieve.

Incidentally, Sieve of Atkin is better than Eratosthenes for largish sets - it does some up-front calculations to reduce the iterative cost.

All of these algorithms struggle for arbitrarily high primes, however, and at that point it's sometimes more useful to simply use a primality test such as Miller-Rabin. I think the typical approach to picking a single large prime is to pick a range of very large odd numbers, apply a standard sieve against the low primes (say, under 65,000) to weed out the obvious ones, and then try the remaining numbers in a random order against a standard primality test (e.g. Miller-Rabin again). This is if you want to pick an arbitrary prime - if you want to try them sequentially then you just do them in numerical order instead. If you don't find the one you're looking for, you try the next chunk of odd numbers and so on.


I should point out that I think I started with Eratosthenes as well, so that'll likely do you fine for awhile - one of the later questions just became unbearably slow with it, however.

Ahhh...last night I struggled with (5), generating a solution that would have worked... albeit at the cost of overloading PythonAnywhere for an hour or so.

This morning - despite a slightly boozy evening last night - I had a 'doh!' moment, and coded up a solution in around in a minute and which took only 0.1 seconds to excute...

wow problem 8 is unbelievably simple in python...

(also Cartroo, it's amazing what problems like these do to teach you to write more efficient functions. i just rewrote my sieve function for problem 7, and sped it up a further 4-5x. so, my current sieve is around 250x faster than my first attempt at a prime algorithm.)

Good to hear it's been a helpful exercise! (^_^)

It's useful practice in directing your optimisation to where it matters most, and also highlights some of Python's performance quirks along the way. Admittedly most of the problems are about picking a good algorithm, but sometimes you need to do a bit of profiling to figure out what it is about a seemingly good algorithm which is slowing things down.

In fact, I think it was while looking at a Project Euler problem that I encountered Python's previously awful string concatenation performance in tight loops (as in, "foo" + "bar"). This used to be pretty poor as the immutability of strings meant that each concatenation required allocating a new string object, but in Python 2.4-2.6 they did quite a lot of work in this area and now, for some use-cases at least, concatenation with addition is actually faster than the % operator.

gosh - goes to show what i know. i've always used '+', on the basis that it would be quicker in execution than '%'

Well, if you're using Python 2.6 or later, you happen to be correct, so huzzah for that! I've never worried too much about this luck/judgement distinction some overly fussy people make... (^_^)

But yes, it can be a little surprising - that's why if performance really matters you get something that works and profile the hell out of it until you know which bits need improving. Nobody cares whether it takes 500ms or 5 seconds to run a function that's executed once every 24 hours, but 500ms vs 550ms might be critical in the middle of a tight loop.

I've only dealt with a few cases in the past where performance of string manipulation really mattered - the classic one was a system which chewed through a constant stream of massive log files. I compressed them to reduce disk IO and then had to change Python's builtin gzip handling somewhat as it does things in a pretty inefficient way. Also, the builtin shlex.split() function is pretty sucky for that sort of load, so I ended up implementing a pure C extension which did the same job. Those few changes alone improved the throughput by a few orders of magnitude.