SqlAlchemy: count of distinct over multiple columns

EoghanM picture EoghanM · May 24, 2013 · Viewed 22.3k times · Source

I can't do:

>>> session.query(
        func.count(distinct(Hit.ip_address, Hit.user_agent)).first()
TypeError: distinct() takes exactly 1 argument (2 given)

I can do:

session.query(
        func.count(distinct(func.concat(Hit.ip_address, Hit.user_agent))).first()

Which is fine (count of unique users in a 'pageload' db table).

This isn't correct in the general case, e.g. will give a count of 1 instead of 2 for the following table:

 col_a | col_b
----------------
  xx   |  yy
  xxy  |  y

Is there any way to generate the following SQL (which is valid in postgresql at least)?

SELECT count(distinct (col_a, col_b)) FROM my_table;

Answer

RedNaxel picture RedNaxel · Sep 21, 2013

distinct() accepts more than one argument when appended to the query object:

session.query(Hit).distinct(Hit.ip_address, Hit.user_agent).count()

It should generate something like:

SELECT count(*) AS count_1
FROM (SELECT DISTINCT ON (hit.ip_address, hit.user_agent)
hit.ip_address AS hit_ip_address, hit.user_agent AS hit_user_agent
FROM hit) AS anon_1

which is even a bit closer to what you wanted.