Forums

Possible bug in MySQL. Is this well-known?

Hello, I was tinkering with PythonAnywhere's MySQL Full Text Search and found unexpected behaviour in Full Text Search boolean mode.

I posted a question on Stackoverflow and they believe I have found a bug. In a nutshell, the boolean operator, ~ behaves opposite of what the documentation states.

Before I submit a bug report to Oracle, I would like to ask you, whether this is something you are familiar with, or if the available MySQL version here is up to date.

Thank you very much!

Thanks for the detailed report.

We are not aware of a PythonAnywhere-specific bug in MySQL full-text boolean mode, but the ~ operator can be confusing: in MySQL boolean full-text searches it is used to reduce the relevance contribution of a term rather than exclude rows, so results can look “opposite” depending on what other terms/operators you use. The MySQL manual has the definitive description of boolean full-text operators here: https://dev.mysql.com/doc/refman/8.4/en/fulltext-boolean.html

On the “is your MySQL version up to date” part: PythonAnywhere runs MySQL server versions that we maintain, and some accounts are on different MySQL major versions. The easiest way to confirm what you are on is to run this in a MySQL console:

SELECT VERSION();

If you can send us the output (and ideally a minimal example query + sample rows showing the behaviour), we can confirm the exact version and reproduce it on our side.

If it turns out you are on an older MySQL server and you would like to move to a MySQL 8 server, we can usually migrate users on request (it involves dumping and reloading your databases). See this forum thread for background: https://www.pythonanywhere.com/forums/topic/32788/

General info on using MySQL on PythonAnywhere is here: https://help.pythonanywhere.com/pages/UsingMySQL/

Thank you for your swift response!

I am aware, that the ~ operator only reduces the relevance of the row, though in my case, it actually boosts it for some reason. I made a simple example query:

SELECT title, score FROM
    (
        SELECT title, MATCH(title) AGAINST
        ('pædagog' IN BOOLEAN MODE) AS score
        FROM jobs ORDER BY score DESC
    ) AS temp WHERE score != 0;

This returns rows, which all have a score of 2.79, including a row with this title: "Pædagog til SFO og indskoling på Vadgård Skole" which I try and demote with this query:

SELECT title, score FROM
    (
        SELECT title, MATCH(title) AGAINST
        ('pædagog ~SFO' IN BOOLEAN MODE) AS score
        FROM jobs ORDER BY score DESC
    ) AS temp WHERE score != 0;

However, this results in all rows still having a score of 2.79, except the row with "SFO" in it, which is boosted to 8.27. Here, I would expect a lower score, not a higher.

The version I am getting is 8.0.40.

Below are example rows of the result with the ~ operator.

| # | Title                                                                 | Score    |
|---|-----------------------------------------------------------------------|----------|
| 0 | Pædagog til SFO og indskoling på Vadgård Skole                        | 8.273522 |
| 1 | Angerlarsimaffik Uulineq søger Pædagog og Socialassistent             | 2.791787 |
| 2 | Pædagog til moderne daginstitution i børnehaveafdeling                | 2.791787 |
| 3 | Pædagog til vuggestueafdeling med fokus på natur og udeliv.           | 2.791787 |
| 4 | Livsværk Pædagog til døgnafdeling - Livsværk                          | 2.791787 |

Thanks again :-)

  1. You’ve confirmed you’re on MySQL 8.0.40, so this is not due to running an old MySQL version.
  2. PythonAnywhere does not modify MySQL full-text behaviour. We run stock MySQL servers.

Relevant documentation for reference:

  • https://dev.mysql.com/doc/refman/8.4/en/fulltext-boolean.html
  • https://help.pythonanywhere.com/pages/UsingMySQL/
<hr />
  • The ~ operator makes a term’s contribution negative, but the overall relevance score in boolean full-text search is not a simple linear add or subtract.

  • With InnoDB full-text search, MySQL uses BM25 or TF-IDF style ranking and the manual notes that boolean-mode relevance can behave in non-intuitive ways because of this. A term with a negative contribution can still end up producing a higher final score once normalisation and term statistics are applied.

  • The MySQL documentation is still correct in saying that ~ demotes rather than excludes, but it does not guarantee that the numeric score will always be lower in every case. What you’re seeing looks like a MySQL ranking or tokenisation edge case rather than anything PythonAnywhere-specific.

<hr />

if your goal is to rank down rows containing “SFO”, it’s often more predictable to compute relevance explicitly, for example by subtracting a second MATCH score for the unwanted term, rather than relying on ~ in boolean mode, as boolean ranking can be counter-intuitive with InnoDB.