Find number of characters mutual between two strings in C#

Tommy picture Tommy · Feb 13, 2014 · Viewed 7.4k times · Source

I am looking for a method that will take two strings and return the number of characters that are common to both e.g.:

"G010" & "G1820A" should return 3 as the G, 0 and 1 chars exist in both.

If a char exists twice in both they should be counted separately as follows:

"G12AA" & "GAA2" should return 4 as the G, A, A and 2 characters exist in both.

Any help with this? Google searches haven't been too helpful thus far.

Answer

Jodrell picture Jodrell · Feb 13, 2014

Okay, how about this, it has the advantage of maximising lazy evaluation and minimising string manipulation.

public int CommonChars(string left, string right)
{
    return left.GroupBy(c => c)
        .Join(
            right.GroupBy(c => c),
            g => g.Key,
            g => g.Key,
            (lg, rg) => lg.Zip(rg, (l, r) => l).Count())
        .Sum(); 
}

essentialy, it groups each side by char, then finds chars which have a group on both sides. The matched groups are counted in tandem, until either runs out. These counts are summed to produce the result.


It would be trivial to perform this generically for any two sequences. See below,

public static int CommomCount<T>(
        this IEnumerable<T> source,
        IEnumerable<T> sequence,
        IEqualityComparer<T> comparer = null)
{
    if (sequence == null)
    {
        return 0;
    }

    if (comparer == null)
    {
        comparer = EqualityComparer<T>.Default;
    }

    return source.GroupBy(t => t, comparer)
        .Join(
            sequence.GroupBy(t => t, comparer),
            g => g.Key,
            g => g.Key,
            (lg, rg) => lg.Zip(rg, (l, r) => l).Count(),
            comparer)
        .Sum();
}

Which you would use like this.

"G12AA".CommonCount("GAA2")

The optional comparer parameter may prove useful if you require case insensitivity or other special treatment.


In the interest of resuability, I'd be tempted to remove the Sum() and return an IEnumerable<T>, and then add sum to the call, like this,

public static IEnumerable<T> Commom<T>(
        this IEnumerable<T> source,
        IEnumerable<T> sequence,
        IEqualityComparer<T> comparer = null)
{
    if (sequence == null)
    {
        return Enumerable.Empty<T>();
    }

    if (comparer == null)
    {
        comparer = EqualityComparer<T>.Default;
    }

    return source.GroupBy(t => t, comparer)
        .Join(
            sequence.GroupBy(t => t, comparer),
            g => g.Key,
            g => g.Key,
            (lg, rg) => lg.Zip(rg, (l, r) => l),
            comparer)
        .SelectMany(g => g);
}

so you could easily do

Console.WriteLine(new string("G12AA".Common("GAA2").ToArray()));

or just the orgininal

"G12AA".Common("GAA2").Count();