I am trying to build a query that tells me how many distinct women and men there are in a given dataset. The person is identified by a number 'tel'. It is possible for the same 'tel' to appear multiple times, but that 'tel's gender should only be counted one time!
7136609221 - male
7136609222 - male
7136609223 - female
7136609228 - male
7136609222 - male
7136609223 - female
This example_dataset would yield the following.
Total unique gender count: 4
Total unique male count: 3
Total unique female count: 1
My attempted query:
SELECT COUNT(DISTINCT tel, gender) as gender_count,
COUNT(DISTINCT tel, gender = 'male') as man_count,
SUM(if(gender = 'female', 1, 0)) as woman_count
FROM example_dataset;
There's actually two attempts in there. COUNT(DISTINCT tel, gender = 'male') as man_count
seems to just return the same as COUNT(DISTINCT tel, gender)
-- it doesn't take into account the qualifier there. And the SUM(if(gender = 'female', 1, 0))
counts all the female records, but is not filtered by DISTINCT tels.
Here's one option using a subquery with DISTINCT
:
SELECT COUNT(*) gender_count,
SUM(IF(gender='male',1,0)) male_count,
SUM(IF(gender='female',1,0)) female_count
FROM (
SELECT DISTINCT tel, gender
FROM example_dataset
) t
This will also work if you don't want to use a subquery:
SELECT COUNT(DISTINCT tel) gender_count,
COUNT(DISTINCT CASE WHEN gender = 'male' THEN tel END) male_count,
COUNT(DISTINCT CASE WHEN gender = 'female' THEN tel END) female_count
FROM example_dataset