Elasticsearch documentation suggests* that their piece of code
*documentation fixed
GET /cars/transactions/_search?search_type=count
{
"aggs": {
"distinct_colors": {
"cardinality": {
"field": "color"
}
}
}
}
corresponds to sql query
SELECT DISTINCT(color) FROM cars
but it actually corresponds to
SELECT COUNT(DISTINCT(color)) FROM cars
I don't want to know how many distinct values I have but what are the distinct values. Anyone knows how to achieve that?
Use a terms aggregation on the color
field. And you need to pay attention to how that field you want to get distinct values on is analyzed, meaning you need to make sure you're not tokenizing it while indexing, otherwise every entry in the aggregation will be a different term that is part of the field content.
If you still want tokenization AND to use the terms
aggregation you might want to look at not_analyzed
type of indexing for that field, and maybe use multi fields.
Terms aggregation for cars:
GET /cars/transactions/_search?search_type=count
{
"aggs": {
"distinct_colors": {
"terms": {
"field": "color",
"size": 1000
}
}
}
}