Forums

Count query on indexed column taking forever

[SOLVED -- the problem was apparently due to the myIsam engine. No performance issue with innoDB]

[Also want to add that support is stellar]

The table has 60+M rows. Primary index is CHAR(11).

On my laptop, SELECT COUNT(*) from myTable WHERE myIndex > '-'; runs in about 30s (for 107M rows). On PA I killed the process after 20 minutes...

SHOW INDEX myTable doesn't suggest that the index is disabled. (no such comment) so I suppose that the problem is caused by an exceedingly small index buffer.

Can someone confirm the cause and suggest a fix?

[EDIT] Actually, all SELECT queries are extremely slow, more than an order of magnitude slower than on a puny laptop.

I ended up splitting the master table in much smaller ones (60M -> 2M) and things are back to acceptable.

What do you see if you run SHOW CREATE TABLE myTable;? That should explicitly show what indexes exist -- so if one isn't mentioned, then it will be because it's not set on the table.

says there is an index. SHOW INDEX doesn't suggest that it is disabled. EXPLAIN gives:

| id | select_type | table       | type   | possible_keys | key            | key_len | ref      | rows           | Extra                    |
|  1 | SIMPLE       | myTable | index | PRIMARY         | PRIMARY | 33         | NULL  | 67282608 | Using where; Using index |

Most likely the index doesn't fit in the buffer, so mySQL has to perform a scan.

I can get decent performance by (manually) partitioning the table. Works for one script, but would be terribly inefficient for another, where I must INSERT IGNORE, which wouldn't work on partitions.

Could we take a look at your DB from our side? We can access it as sysadmins, but we always ask for permission before looking at any of your files or data.

Please do.

(I had answered the notification email. Not sure if it reached you)

Received your email, thanks!

Well, I agree that the difference you're seeing between your local machine and PythonAnywhere is perplexing. You could probably expect the PythonAnywhere server to be a bit slower -- after all, it's a shared machine while you have complete ownership of your local machine -- but not the orders of magnitude you're seeing.

I did an

EXPLAIN SELECT COUNT(*) from myTable WHERE myIndex > '-';

...and it said that it was using the index, so that all sounds good.

The only possibility that comes to mind is that the table on PythonAnywhere might not be optimized properly. Is the table here one that has grown over time, with new stuff being added over a period of months or years? If so, perhaps an OPTIMIZE TABLE might help, though of course it would lock the table for a while. Indeed, it might help regardless of how it was created.

Additions to the table were made on batches of 2M+ via LOAD DATA INFILE. Very unlikely to be an optimization problem.

Feel free to issue the query if you want. I do not use this table anymore.

As we're discussing this over email as well, I'll respond over that channel -- if there's anything that comes up that sounds like it will be broadly useful to other PythonAnywhere users, I'll post back here later.

The problem appears to be related to the engine. Changing from myIsam -> InnoDB makes all query 10 to 100+ times faster -- with the exception of full table count. The table also has a smaller footprint since there are no index redundancies (myIsam creates a separate file for the indexes).

Thanks for posting that! Out of interest, were you using myisam on your laptop too? Is there any advantage to using myisam nowadays (given that we're on MySQL 5.6)?

yep. myIsam on my laptop.

I deal with a fairly large dataset, including 2 tables with 400M+ and 1.5G+ rows and a 250G+ footprint each. myIsam makes it much faster to backup and port across platforms (flush/lock/disk snapshot of the MYI/MYD).

Now that it is possible to isolate innoDB in separate tablespaces, and with the imminent dismissal of myIsam, not 100% clear.

But for now, I can easily copy a 650G DBase on a portable drive and load or simlink whatever I want to look at, away from the office and without decent connectivity -- like now :)

Ah, I see -- thanks! I didn't consider that aspect.