Parse email header with Regex in C#

Kevin Jensen picture Kevin Jensen · Apr 27, 2011 · Viewed 8.6k times · Source

I've got a webhook posting to a form on my web application and I need to parse out the email header addresses.

Here is the source text:

Thread-Topic: test subject
Thread-Index: AcwE4mK6Jj19Hgi0SV6yYKvj2/HJbw==
From: "Lastname, Firstname" <[email protected]>
To: <[email protected]>, [email protected], [email protected]
Cc: <[email protected]>, [email protected]
X-OriginalArrivalTime: 27 Apr 2011 13:52:46.0235 (UTC) FILETIME=[635226B0:01CC04E2]

I'm looking to pull out the following:

<[email protected]>, [email protected], [email protected]

I'm been struggling with Regex all day without any luck.

Answer

csharptest.net picture csharptest.net · Apr 27, 2011

Contrary to some of the posts here I have to agree with mmutz, you cannot parse emails with a regex... see this article:

http://tools.ietf.org/html/rfc2822#section-3.4.1

3.4.1. Addr-spec specification

An addr-spec is a specific Internet identifier that contains a locally interpreted string followed by the at-sign character ("@", ASCII value 64) followed by an Internet domain.

The idea of "locally interpreted" means that only the receiving server is expected to be able to parse it.

If I were going to try and solve this I would find the "To" line contents, break it apart and attempt to parse each segment with System.Net.Mail.MailAddress.

    static void Main()
    {
        string input = @"Thread-Topic: test subject
Thread-Index: AcwE4mK6Jj19Hgi0SV6yYKvj2/HJbw==
From: ""Lastname, Firstname"" <[email protected]>
To: <[email protected]>, ""Yes, this is valid""@[emails are hard to parse!], [email protected], [email protected]
Cc: <[email protected]>, [email protected]
X-OriginalArrivalTime: 27 Apr 2011 13:52:46.0235 (UTC) FILETIME=[635226B0:01CC04E2]";

        Regex toline = new Regex(@"(?im-:^To\s*:\s*(?<to>.*)$)");
        string to = toline.Match(input).Groups["to"].Value;

        int from = 0;
        int pos = 0;
        int found;
        string test;

        while(from < to.Length)
        {
            found = (found = to.IndexOf(',', from)) > 0 ? found : to.Length;
            from = found + 1;
            test = to.Substring(pos, found - pos);

            try
            {
                System.Net.Mail.MailAddress addy = new System.Net.Mail.MailAddress(test.Trim());
                Console.WriteLine(addy.Address);
                pos = found + 1;
            }
            catch (FormatException)
            {
            }
        }
    }

Output from the above program:

[email protected]
"Yes, this is valid"@[emails are hard to parse!]
[email protected]
[email protected]