How to get attached files from an email, using Pentaho Kettle?

Alexander Villamil picture Alexander Villamil · Sep 19, 2014 · Viewed 7.7k times · Source

I'm stuck in a great problem. My task is to download some emails from a server using the IMAP protocol. This is accomplished by using the "get mails (POP3 / IMAP)" job entry, which downloads the emails, but in binary format.

Files in binary format are .mail files containing sender, subject, body, and encoded attachment files. I need to obtain separate files, because I must realize some steps with these files as input.

I've seen that there are third-party libraries or utilities to decode the .mail file and get the attachment file list. However, I want to do this process without any additional utility (because this should require a shell step, depending on the SO).

Is there any way or trick to get the attachments using only Pentaho job entries or transformation steps?

I'm using the version 5.1 of Pentaho Kettle.

Answer

Marlon Abeykoon picture Marlon Abeykoon · Apr 10, 2015

I will explain the whole process so that anybody can get the advantage of it.


1) Add START and Get mails (POP3/IMAP) job entries, and create a hop between them.
2) Edit the Get mails entry to use your IMAP server (host name, port number, username, password, etc), and click Test Connection to verify settings.
3) In the Target folder, uncheck Save message content and check Get mail attachment and Different folder for attachment. Define a target folder for both the Target directory and Attachment files folder.
4) On the Settings tab, select the IMAP folder that you want to download from. Change other settings as desired.
5) Click OK, Save the Job, and Run the job.