Parsing Multipart emails in python and saving attachments

ajt picture ajt · Jun 6, 2011 · Viewed 8.1k times · Source

I am pretty new to python and I am trying to parse email from gmail via python's imaplib and email. It is working pretty well but I am having issues with email attachments.

I would like to parse out all of the plaintext from the email while ignoring any HTML that may be inserted as a secondary content type while also removing and saving all other attachments. I have been trying the following:

...imaplib connection and mailbox selection...

typ, msg_data = c.fetch(num, '(RFC822)')
        email_body = msg_data[0][1]
mail = email.message_from_string(email_body)
        for part in mail.walk():
            if part.get_content_type() == 'text/plain':
                body = body + '\n' + part.get_payload()
            else:
                continue

This was my original attempt to just take the plaintext portions of an email, but when someone sends an email with a text attachment, the contents of the text file shows up for the 'body' variable above.

Can someone tell me how I can extract the plaintext portions of an email while ignoring the secondary HTML that is sometimes present, while also saving all other types of file attachments as files? I appologize if this doesn't make a lot of sense. I will update the question with more clarification if needed.

Answer

robots.jpg picture robots.jpg · Jun 7, 2011

If you just need to keep text attachments out of the body variable with what you have there, it should be as simple as this:

mail = email.message_from_string(email_body)
    for part in mail.walk():
        c_type = part.get_content_type()
        c_disp = part.get('Content-Disposition')

        if c_type == 'text/plain' and c_disp == None:
            body = body + '\n' + part.get_payload()
        else:
            continue

Then if the Content-Disposition indicates that it's an attachment, you should be able to use part.get_filename() and part.get_payload() to handle the file. I don't know if any of this can vary, but it's basically what I've used in the past to interface with my mail server.