MichD

 

MySQL: find number of matches in another table

in Blog

The problem of finding the amount of matches in another table is one that had me stumped for a while, but after googling a while, looking through concepts that were new to me, I figured it out. Since it may be a fairly common problem, I thought I’d share my solution here.

Use cases

I used this technique to find the amount of posts a user has on a forum, in a situation where I have a users table, and a forum table. Other possibilities could be amount of sales for a certain item in a webshop, where the items are in a table, and the sales in another. I’ll use the users / posts example to show the query.

Users table

user_id username
1 Mich
2 John
3 CornCob

Forum table

post_id user_id topic_id message
142 1 54 Pellentesque posuere…
143 3 42 Cras sodales molestie…
144 3 54 Phasellus condimentum euismod…
145 1 65 Aliquam placerat nibh ut…
146 3 54 Donec nec convallis…
147 1 54 Sed tincidunt fermentum…
148 1 60 Aenean id turpis…

The query

SELECT
  users.user_id,
  users.username,
  COALESCE(pc.postcount, 0) AS postcount
FROM
  users
  LEFT JOIN
  (
    SELECT
      forum.user_id,
      COUNT(forum.post_id) AS postcount
    FROM forum
    WHERE 1
    GROUP BY forum.user_id
  ) pc ON pc.user_id=users.user_id
WHERE 1

How it works

First, we select the information of the users, simply, by referring to them using table.field. Then, we also include a postcount field, but not quite like we’re used to seeing it. Since the value NULL would be retrieved for users who don’t have any posts, we have a workaround. The COALESCE function returns the first non-NULL argument it is given. So if there is no match for a certain user_id in the table 'forum', the resulting NULL will not be used, we get a regular numeric 0 instead.

The simple user information is retrieved, as expected, from the users table. After that it gets a little more tricky. We perform a LEFT JOIN with what is called a subquery. Within the parentheses after LEFT JOIN, there is a whole new query; the resultset of this query will behave itself as another table we’re joining.

In our subquery we select the user_id and postcount. Normally, using COUNT() this way would simply return a single result, with the amount of rows in the whole forum table. However, because we’re using GROUP BY forum.user_id here, it will count the postcount for each unique value of user_id encountered in forum. This gives us a simple user_id, postcount table as a result.

Now we’re outside the parentheses again. The pc you see after the parentheses is the name we assign to the resultset of our subquery. We can refer to it as if it were any regular table, as we see in the SELECT part of our main query, as well as the ON part we’re at now. We want to join the two tables where the user_id‘s match, again, like any regular join. In case there is no match for a user_id from the table users, there would be no corresponding postcount value to join in the final resultset, thus, a value of NULL is given, which is then ‘changed’ to 0 by the COALESCE function.

The resultset

Here you see what the outcome of above query would be on our given tables:

user_id username postcount
1 Mich 4
2 John 0
3 CornCob 3

Please note

This was all tested and figured out on MySQL 5+. Thus, I don’t know if it would work on other SQL servers. As far as I know, the IFNULL function is the one thing that’s usually different throughout database servers. The subquery system will mostly be the same-ish.

If you know more about this, or I’m misinforming on anything, please, by all means, let me know in the comments and I’ll change it up to be as accurate as possible. Also, if you know a more efficient way to go for this specific task, let me know as well. Obviously, feel free to ask any questions if something is unclear.

Updates

April 23rd, 2011: Changed the IFNULL to the standards-compliant COALESCE function – thanks Omar.

April 30rd, 2011: Fixed the query: COUNT(forum.postIDpost_id) AS postcount – Thanks Ringish.