The word file
here refers to the shell file command, and not actual files. I want to determine whether a file is a, for example, video file (.mpg
, .mkv
, .avi
). file
is pretty good at returning image
for image files, video
for video files, and audio
for audio files (and application/x-empty
for some reason for text). My question is how reliable this is for identifying types. If I did a simple
file -ib deliverance.avi | grep video
would that work for all of the main video files outlined here?
The results from file
are less than perfect, and it has more problems with some types of files than others. File basically just looks for particular pieces of binary data in predictable patterns to figure out filetypes.
Unfortunately, in particular, some of the filetypes often used for video fall into this "problematic" category. The newer container formats like .mp4
and .mkv
usually have several different MIME types that should properly depend on what type of data is being contained. For example, an .mp4
could properly be identified as video/mp4
, audio/mp4
, or application/mp4
depending on the content.
In practice, file
often makes guesses that simply conform with common usage, and it may work perfectly well for you. For example, while I mentioned some theoretical difficulties with identifying Matroska files correctly, file
basically just assumes that any Matroska file is a video. On the other hand, the usage of the Ogg container is more evenly split between audio and video, and I believe the current version of file
just splits the difference, and identifies Ogg files as application/ogg
, which wouldn't fall into any of your categories.
The one thing I can say with certainty is that you want the most up-to-date version of file
you can get your hands on. The "magic" files that contain the patterns to match against and the MIME types that will result from a match are updated fairly often to include newer filetypes like WebM, or just to improve accuracy for older types.