I am using Apache Tika to detect the mime type of an input stream and I was wondering if there's a ready method to detect that this file is an executable file, there's a big list of executable files mime types here:
http://www.file-extensions.org/filetype/extension/name/program-executable-files
and I was wondering about the best way to cover them all.
Apache Tika's mime-types have a hierarchy. So, you don't need to check for all possible executable types, all you need to do is check if the detected type has a parent that's one of the handful of executable umbrella types
For Windows, the main one is application/x-msdownload
. You might also want to check for application/x-ms-installer
too
For Unix, the main one is application/x-elf
, but you potentially also want to check for the scripting formats such as application/x-sh
, text/x-perl
, text/x-python
etc.
As for how to go from a Mimetype in Tika to its parent, you'll want this existing answer here - "Correct use of Apache Tika MediaType". (Note that you need to recurse, in case there are multiple levels between the detected mime type and the base executable parent type)