LinqToExcel: Distinct values in excel column

inquisitive_one picture inquisitive_one · May 2, 2012 · Viewed 7.2k times · Source

This might be a very simple thing for you gurus, but I'm not familiar with C#4 and INTEROP. Therefore, I'm stumped. Here's my problem. I have a excel column that has duplicate data and I want to trim it down to only unique values. Here's what the data looks like:

ColA    ColB
10      Adam
12      Jane
14      Adam
18      Adam
20      Eve

So, in the end I just want unique names from ColB:

Adam
Jane
Eve

I know that I can do this by getting all those values into a List and then adding the Distinct functionality to it. But I think I'm doing something wrong. Anyway, here's my program:

Application XLApp = new Microsoft.Office.Interop.Excel.Application();
var XLBook = XLApp.Workbooks.Open(@"c:\temp\Test.xls", ReadOnly: false);
// Grab the 1st sheet
var XLSheet = (Microsoft.Office.Interop.Excel.Worksheet)XLBook.Worksheets.get_Item(1);

XLApp.Visible = true;
// I think I need help with the following lines
IEnumerable<string> myCol = XLApp.Range["B2", XLApp.Range["B2"].End[XlDirection.xlDown]].Select();
myCol.ToList().Distinct();

XLBook.Close(false, System.Reflection.Missing.Value, System.Reflection.Missing.Value);
((_Application)XLApp).Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(XLApp);

How do I go about doing this using C#4 against Excel?

Thanks in advance.

Answer

Paul picture Paul · May 2, 2012

You can use LinqToExcel to easily get the distinct values in a column

var excel = new ExcelQueryFactory("worksheetFileName");
var distinctNames = (from row in excel.WorkSheet()
                     select row["ColB"]).Distinct()

EDIT:

To use Distinct in LinqToExcel, you have to use a class that corresponds to the row data.

public class WorksheetRow
{
    public string ColA { get; set; }
    public string ColB { get; set; }
}

var excel = new ExcelQueryFactory("worksheetFileName");
var distinctNames = (from row in excel.WorkSheet<WorksheetRow>()
                     select row.ColB).Distinct()