How to get file extension from content type?

lisak picture lisak · Apr 4, 2011 · Viewed 31.4k times · Source

I'm using Apache Tika, and I have files (without extension) of particular content type that need to be renamed to have extension that reflect the content type.

Any idea if there is something I could use instead of programming that from scratch based on content type names ?

Answer

Gagravarr picture Gagravarr · Apr 4, 2011

The two key classes for you are MediaTypeRegistry and MimeTypes. Using these, you can do mime type magic based detection, and get information on the mime types and their relationships.

(That said, if you want to do a full detection, potentially involving some parsing of the File using extra logic in the Tika Parsers jar for container-based formats, you should be using TikaConfig.getDetector() and/or DefaultDetector.)

// Load your Tika config, find all the Tika classes etc
TikaConfig config = TikaConfig.getDefaultConfig();

// Do the detection. Use DefaultDetector / getDetector() for more advanced detection
Metadata metadata = new Metadata();    
InputStream stream = TikaInputStream.get(new File(file), metadata);
MediaType mediaType = config.getMimeRepository().detect(stream);

// Fest the most common extension for the detected type
MimeType mimeType = config.getMimeRepository().forName(mediaType.toString());
String extension = mimeType.getExtension();