MySQL: find number of matches in another table
The problem of finding the amount of matches in another table is one that had me stumped for a while, but after googling a while, looking through concepts that were new to me, I figured it out. Since it may be a fairly common problem, I thought I’d share my solution here.
Use cases
I used this technique to find the amount of posts a user has on a forum, in a situation where I have a users table, and a forum table. Other possibilities could be amount of sales for a certain item in a webshop, where the items are in a table, and the sales in another. I’ll use the users / posts example to show the query.
Users table
user_id | username |
---|---|
1 | Mich |
2 | John |
3 | CornCob |
Forum table
post_id | user_id | topic_id | message |
---|---|---|---|
142 | 1 | 54 | Pellentesque posuere… |
143 | 3 | 42 | Cras sodales molestie… |
144 | 3 | 54 | Phasellus condimentum euismod… |
145 | 1 | 65 | Aliquam placerat nibh ut… |
146 | 3 | 54 | Donec nec convallis… |
147 | 1 | 54 | Sed tincidunt fermentum… |
148 | 1 | 60 | Aenean id turpis… |
The query
SELECT users.user_id, users.username, COALESCE(pc.postcount, 0) AS postcount FROM users LEFT JOIN ( SELECT forum.user_id, COUNT(forum.post_id) AS postcount FROM forum WHERE 1 GROUP BY forum.user_id ) pc ON pc.user_id=users.user_id WHERE 1
How it works
First, we select the information of the users, simply, by referring to them using table.field
. Then, we also include a postcount field, but not quite like we’re used to seeing it. Since the value NULL
would be retrieved for users who don’t have any posts, we have a workaround. The COALESCE
function returns the first non-NULL argument it is given. So if there is no match for a certain user_id
in the table 'forum'
, the resulting NULL
will not be used, we get a regular numeric 0 instead.
The simple user information is retrieved, as expected, from the users table. After that it gets a little more tricky. We perform a LEFT JOIN
with what is called a subquery. Within the parentheses after LEFT JOIN
, there is a whole new query; the resultset of this query will behave itself as another table we’re joining.
In our subquery we select the user_id
and postcount
. Normally, using COUNT()
this way would simply return a single result, with the amount of rows in the whole forum table. However, because we’re using GROUP BY forum.user_id
here, it will count the postcount
for each unique value of user_id
encountered in forum
. This gives us a simple user_id
, postcount
table as a result.
Now we’re outside the parentheses again. The pc
you see after the parentheses is the name we assign to the resultset of our subquery. We can refer to it as if it were any regular table, as we see in the SELECT
part of our main query, as well as the ON
part we’re at now. We want to join the two tables where the user_id
‘s match, again, like any regular join. In case there is no match for a user_id
from the table users
, there would be no corresponding postcount
value to join in the final resultset, thus, a value of NULL
is given, which is then ‘changed’ to 0 by the COALESCE
function.
The resultset
Here you see what the outcome of above query would be on our given tables:
user_id | username | postcount |
---|---|---|
1 | Mich | 4 |
2 | John | 0 |
3 | CornCob | 3 |
Please note
This was all tested and figured out on MySQL 5+. Thus, I don’t know if it would work on other SQL servers. As far as I know, the IFNULL function is the one thing that’s usually different throughout database servers. The subquery system will mostly be the same-ish.
If you know more about this, or I’m misinforming on anything, please, by all means, let me know in the comments and I’ll change it up to be as accurate as possible. Also, if you know a more efficient way to go for this specific task, let me know as well. Obviously, feel free to ask any questions if something is unclear.
Updates
April 23rd, 2011: Changed the IFNULL
to the standards-compliant COALESCE
function – thanks Omar.
April 30rd, 2011: Fixed the query: COUNT(forum.
AS postcount – Thanks Ringish.postIDpost_id)