How to decode a PDF stream?

rebel87 picture rebel87 · Jan 17, 2015 · Viewed 40.3k times · Source

I want to analyze a stream object in a PDF file which is encoded using /FlateDecode.

Are there any tools which allow one to decode such encoding (ASCII85decode, LZWDecode, RunlenghtDecode etc.) used in PDFs?

The stream content is most likely a PE file structure, which the PDF probably will use later in the exploit.

Also, there are two xref tables in the PDF, that is alright but also two %%EOF which follow the xref.

Is the presence of these allright? (Note: The second xref points to the 1st xref using the /prev name.

this xref refers to the second xref:

xref 
5 6
0000000618 00000 n
0000000658 00000 n
0000000701 00000 n
0000000798 00000 n
0000045112 00000 n
0000045219 00000 n
1 1
0000045753 00000 n
3 1
0000045838 00000 n
trailer
>
startxref
46090
%%EOF

the second xref:

xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000067 00000 n
0000000136 00000 n
0000000373 00000 n
trailer
>
startxref
429
%%EOF

Answer

Kurt Pfeifle picture Kurt Pfeifle · Jan 17, 2015
  1. "Two xref tables and two %%EOF"?

    This alone is not an indication of a malicious PDF file. There can by two or even more instances of each, if the file was generated via the "incremental update" feature. (Each digitally signed PDF file is like that, and each file which was changed in Acrobat and saved by using the 'Save' button/menu instead of the 'Save as...' button/menu is like that too.)

  2. "How to decode a compressed PDF stream from a specific object"?

    Have a look at Didier Stevens' Python script pdf-parser.py. With this command line tool, you can dump the decoded stream of any PDF object into a file. Example command to dump the stream of PDF object number 13:

    pdf-parser.py -o 13 -f -d obj13.dump my.pdf