C# - Splitting on a pipe with an escaped pipe in the data?

Frijoles picture Frijoles · Apr 28, 2011 · Viewed 13k times · Source

I've got a pipe delimited file that I would like to split (I'm using C#). For example:

This|is|a|test

However, some of the data can contain a pipe in it. If it does, it will be escaped with a backslash:

This|is|a|pip\|ed|test (this is a pip|ed test)

I'm wondering if there is a regexp or some other method to split this apart on just the "pure" pipes (that is, pipes that have no backslash in front of them). My current method is to replace the escaped pipes with a custom bit of text, split on pipes, and then replace my custom text with a pipe. Not very elegant and I can't help but think there's a better way. Thanks for any help.

Answer

Jonathan Wood picture Jonathan Wood · Apr 28, 2011

Just use String.IndexOf() to find the next pipe. If the previous character is not a backslash, then use String.Substring() to extract the word. Alternatively, you could use String.IndexOfAny() to find the next occurrence of either the pipe or backslash.

I do a lot of parsing like this, and this is really pretty straight forward. Taking my approach, if done correctly will also tend to run faster as well.

EDIT

In fact, maybe something like this. It would be interesting to see how this compares performance-wise to a RegEx solution.

public List<string> ParseWords(string s)
{
    List<string> words = new List<string>();

    int pos = 0;
    while (pos < s.Length)
    {
        // Get word start
        int start = pos;

        // Get word end
        pos = s.IndexOf('|', pos);
        while (pos > 0 && s[pos - 1] == '\\')
        {
            pos++;
            pos = s.IndexOf('|', pos);
        }

        // Adjust for pipe not found
        if (pos < 0)
            pos = s.Length;

        // Extract this word
        words.Add(s.Substring(start, pos - start));

        // Skip over pipe
        if (pos < s.Length)
            pos++;
    }
    return words;
}