Group by on IEnumerable<DataRow>

Null Head picture Null Head · Jul 31, 2012 · Viewed 12.7k times · Source

I have a collection of DataRow objects. I should select distinct rows, based on the column 'URL_Link'. Following this post, I came up with below code.
Is it possible to apply it for DataRow collection?

IEnumerable<DataRow> results = GetData();  
results.GroupBy(row => row.Field<string>("URL_Link")).Select(grp => grp.First());

It is syntactically correct, but it does not solve the problem. It doesn't remove duplicate rows. What am I doing wrong?

Answer

Independent picture Independent · Jul 31, 2012

Except the minor error that you don't reassign the resultset to the result-variable.

Personaly I feel it much more clear to use a distinct, if you in fact should recieve the distinct values. Groupby is not really clear to use in such case, if return the whole row is intended, look at first sample below or else the second.

    class Program
    {
        static DataTable GetData()
        {
            DataTable table = new DataTable();
            table.Columns.Add("Visits", typeof(int));
            table.Columns.Add("URL_Link", typeof(string));

            table.Rows.Add(57, "yahoo.com");
            table.Rows.Add(130, "google.com");
            table.Rows.Add(92, "google.com");
            table.Rows.Add(25, "home.live.com");
            table.Rows.Add(30, "stackoverflow.com");
            table.Rows.Add(1, "stackoverflow.com");
            table.Rows.Add(7, "mysite.org");
            return table;
    }

    static void Main(string[] args)
    {
        var res = GetData()
                  .AsEnumerable()
                  .GroupBy(row => row.Field<string>("URL_Link"))
                  .Select(grp => grp.First());

        foreach (var item in res)
        {
            string text = "";
            foreach (var clm in item.ItemArray)
                text += string.Format("{0}\t", clm);

            Console.WriteLine(text);
        }
        Console.ReadLine();
    }
}

This is more or less exactly what you already provided. First of all you didn't re-assigned the variable. Then you should reach your fields from ItemArray. You see the sample above, which gave this output:

57    yahoo.com
130   google.com
25    home.live.com
30    stackoverflow.com
7     mysite.com

Please remember you may have to specify the Select, Orderby and Where clauses depends on your need of return a specific of those rows (i.e. the duplicate with most visits).

If URL_Link is the only field you need or want to return from a distinct result, this sample clear and stright forward. It just take a Select of the field you wan't, then distinct it.

    static void Main(string[] args)
    {
        var res = GetData()
                    .AsEnumerable()
                    .Select(d=>d.Field<string>("URL_Link"))
                    .Distinct();

        foreach (var item in res)
            Console.WriteLine(item.ToString());  

        Console.ReadLine();
    }