LINQ with GROUP BY and HAVING COUNT

Tjab picture Tjab · Apr 4, 2016 · Viewed 18.9k times · Source

I'd like to understand what I am doing wrong with my GROUP BY query in Linq. I've tried many examples (i.e. Linq with group by having count), but I still get more results (as is the WHERE is skipped). My code is like this:

var test = session.Query<SomeClass>()
                  .GroupBy(c => new { c.Var1, c.Var2, c.Var3 })
                  .Where(g => g.Count() > 1)
                  .Select(g => g.Key.Var3)
                  .ToList();

This gives 229 results (all records). The query that I'd like to build in Linq is:

SELECT Var3
FROM myTable
GROUP BY Var1, Var2, Var3
HAVING COUNT(*) > 1

Somehow, the query is giving me 27 results, but the Linq expression gives me 229 (all). When I replace the where/select part of the Linq expression to the following, I do get a list with counts that are 2 or higher:

.Select(g => new { Item = g.Key, Count = g.Count() })

But I dont want a list with items (and counts) and having to go through that list, I'd like to have the HAVING part work in the Linq expression...

Edit 2: If you take a look at LINQ Group By Multiple fields -Syntax help, this also works for me. However, I'll get a list of objects with Var1, Var2, Var3 and Count. Of this list, I only want to Var3 of the objects with a Count higher than 1.

Anyone who can point me in the right direction?

Edit 1: As I said in my introduction, the question Linq with group by having count is not answering my problem. If I use this code, I still have a set of 229 results instead of the 27 that are actually "duplicated" (meaning, after the group having a count of more than 1).

Edit 3: I am using the following at this moment. I needs two statements, and that I think is weird, but as stated before, this seems to be the only way to select only the records having count > 1.

var querygroup = session.Query<SomeClass>()
                        .GroupBy(e => new { e.Var1, e.Var2, e.Var3 })
                        .Select(s => new { s.Key.Var1, s.Key.Var2, s.Key.Var3, Count = s.Count() })
                        .ToList();

var duplicates = querygroup.Where(g => g.Count > 1)
                           .Select(g => new SomeClass() { Var1 = g.Var1, Var2 = g.Var2, Var3 = g.Var3})
                           .ToList();

Note that instead of selecting only Var3, I decided to select the values Var1 and Var2 aswell and store them in the SomeClass(). This is just an addition, selecting everything doesn't help with creating 1 statement to get this selection.

Edit 4: I can ofcourse take the .Where.. part in the duplicates variable and add it to the querygroup statement, thus making the whole one statement. Success? Seems overkill but at least it works.

If anyone can find out why I need 2 statements, please elaborate :)

Answer

Viru picture Viru · Apr 4, 2016

Try this

var test = session.Query<SomeClass>()
                  .GroupBy(c => new { c.Var1, c.Var2, c.Var3 })
                  .Select(d=> new { var1 = d.Key.var1,var2 = d.Key.var2,var3=d.Key.var3,records = d.ToList()})
                  .Where(e => e.records.Count() > 1)
                  .Select(g => g.Var3)
                  .ToList();