Find common parent-path in list of files and directories

Frame91 picture Frame91 · Jul 21, 2014 · Viewed 8.6k times · Source

I got a list of files and directories List<string> pathes. Now I'd like to calculate the deepest common branch every path is sharing with each other.

We can assume that they all share a common path, but this is unknown in the beginning.

Let's say I have the following three entries:

  • C:/Hello/World/This/Is/An/Example/Bla.cs
  • C:/Hello/World/This/Is/Not/An/Example/
  • C:/Hello/Earth/Bla/Bla/Bla

This should get the result: C:/Hello/ as Earth is breaking this "chain" of subdirectories.

Second example:

  • C:/Hello/World/This/Is/An/Example/Bla.cs
  • C:/Hello/World/This/Is/Not/An/Example/

-> C:/Hello/World/This/Is/

How would you proceed? I tried to use string.split(@"/") and start with the first string and check if every part of this array is contained in the other strings. However, this would be a very expensive call as I'm iterating (list_of_entries)^list_of_entries. Is there any better solution available?

My current attempt would be something like the following (C# + LINQ):

    public string CalculateCommonPath(IEnumerable<string> paths)
    {
        int minSlash = int.MaxValue;
        string minPath = null;
        foreach (var path in paths)
        {
            int splits = path.Split('\\').Count();
            if (minSlash > splits)
            {
                minSlash = splits;
                minPath = path;
            }
        }

        if (minPath != null)
        {
            string[] splits = minPath.Split('\\');
            for (int i = 0; i < minSlash; i++)
            {
                if (paths.Any(x => !x.StartsWith(splits[i])))
                {
                    return i >= 0 ? splits.Take(i).ToString() : "";
                }
            }
        }
        return minPath;
    }

Answer

AlexD picture AlexD · Jul 21, 2014

A function to get the longest common prefix may look like this:

public static string GetLongestCommonPrefix(string[] s)
{
    int k = s[0].Length;
    for (int i = 1; i < s.Length; i++)
    {
        k = Math.Min(k, s[i].Length);
        for (int j = 0; j < k; j++)
            if (s[i][j] != s[0][j])
            {
                k = j;
                break;
            }
    }
    return s[0].Substring(0, k);
}

Then you may need to cut the prefix on the right hand. E.g. we want to return c:/dir instead of c:/dir/file for

c:/dir/file1
c:/dir/file2

You also may want to normalize the paths before processing. See Normalize directory names in C#.