I have a collection of DataRow objects. I should select distinct rows, based on the column 'URL_Link'. Following this post, I came up with below code.
Is it possible to apply it for DataRow collection?
IEnumerable<DataRow> results = GetData();
results.GroupBy(row => row.Field<string>("URL_Link")).Select(grp => grp.First());
It is syntactically correct, but it does not solve the problem. It doesn't remove duplicate rows. What am I doing wrong?
Except the minor error that you don't reassign the resultset to the result
-variable.
Personaly I feel it much more clear to use a distinct, if you in fact should recieve the distinct values. Groupby is not really clear to use in such case, if return the whole row is intended, look at first sample below or else the second.
class Program
{
static DataTable GetData()
{
DataTable table = new DataTable();
table.Columns.Add("Visits", typeof(int));
table.Columns.Add("URL_Link", typeof(string));
table.Rows.Add(57, "yahoo.com");
table.Rows.Add(130, "google.com");
table.Rows.Add(92, "google.com");
table.Rows.Add(25, "home.live.com");
table.Rows.Add(30, "stackoverflow.com");
table.Rows.Add(1, "stackoverflow.com");
table.Rows.Add(7, "mysite.org");
return table;
}
static void Main(string[] args)
{
var res = GetData()
.AsEnumerable()
.GroupBy(row => row.Field<string>("URL_Link"))
.Select(grp => grp.First());
foreach (var item in res)
{
string text = "";
foreach (var clm in item.ItemArray)
text += string.Format("{0}\t", clm);
Console.WriteLine(text);
}
Console.ReadLine();
}
}
This is more or less exactly what you already provided. First of all you didn't re-assigned the variable. Then you should reach your fields from ItemArray. You see the sample above, which gave this output:
57 yahoo.com
130 google.com
25 home.live.com
30 stackoverflow.com
7 mysite.com
Please remember you may have to specify the Select, Orderby and Where clauses depends on your need of return a specific of those rows (i.e. the duplicate with most visits).
If URL_Link
is the only field you need or want to return from a distinct result, this sample clear and stright forward. It just take a Select of the field you wan't, then distinct it.
static void Main(string[] args)
{
var res = GetData()
.AsEnumerable()
.Select(d=>d.Field<string>("URL_Link"))
.Distinct();
foreach (var item in res)
Console.WriteLine(item.ToString());
Console.ReadLine();
}