file: classify unknown files on the console
June 29th, 2008 edited by Alexey BeshenovArticle submitted by Caspar Clemens Mierau. Guess what? We still need you to submit good articles about software you like!
Somebody just sent you a mail with attachments that don’t have usable file extensions so you don’t really know how to handle them. Audio file? PDF? What is it? The same problem might occur after a file recovery, on web pages with upload features, etc.
While you can try to give the file an extension and open it with a software you think might be suitable, the better way is to let your computer find out what is all about. As a GNU/Linux user you probably already think “There is surely a command line tool for this”. Of course there is: the file by Ian Darwin.
It often gets automatically installed by dependencies. In any case, aptitude install file will help you. file depends on libmagic which provides patterns for the so called “magic number” detection.
Let’s assume we have the following directory with unknown files:
$ ls -l total 2152 -rw-r--r-- 1 ccm ccm 4118 2008-03-30 06:32 unknown.0 -rw-r--r-- 1 ccm ccm 10220 2008-05-06 02:23 unknown.1 -rw-r--r-- 1 ccm ccm 12693 2008-05-06 02:23 unknown.2 -rw-r--r-- 1 ccm ccm 25933 2007-10-26 07:41 unknown.3 -rw-r--r-- 1 ccm ccm 2121 2007-10-26 07:41 unknown.4 -rw-r--r-- 1 ccm ccm 185 2007-10-14 20:14 unknown.5 -rw-r--r-- 1 ccm ccm 1189011 2008-05-17 22:37 unknown.6 -rw-r--r-- 1 ccm ccm 824163 2008-02-02 05:02 unknown.7 -rw-r--r-- 1 ccm ccm 82367 2007-09-20 06:18 unknown.8 -rw-r--r-- 1 ccm ccm 8872 2006-04-24 12:43 unknown.9
Now we want to know what’s inside those black boxes. Therefore we just call file * on the console:
$ file * unknown.0: XML unknown.1: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped unknown.2: ASCII C program text unknown.3: PDF document, version 1.4 unknown.4: LaTeX 2e document text unknown.5: perl script text executable unknown.6: gzip compressed data, from Unix, last modified: Wed Oct 8 16:27:09 2003 unknown.7: Ogg data, Vorbis audio, stereo, 44100 Hz, ~192003 bps, created by: Xiph.Org libVorbis I (1.0) unknown.8: PNG image data, 492 x 417, 8-bit/color RGBA, non-interlaced unknown.9: HTML document text
Hey, that’s all. Pretty impressive, isn’t it? file does even not only distinguishes binaries and text files, it even tries to guess what programming language a text file is written in. And the magic is not that much magic: for example, in case of the ZSH script it just sees a shebang pointing to the zsh in the first line of the file, a PDF file typically starts with “%PDF
” and so on. It’s all about patterns.
file provides you with some command line options that make it’s usage even more helpful. The most interesting is -i as it prints out MIME-types instead of verbose file types. If you are a web developer and want to know the exact MIME-type for a file download, this can save you a lot of time:
$ file -i * unknown.0: text/xml unknown.1: application/x-object, not stripped unknown.2: text/x-c; charset=us-ascii unknown.3: application/pdf unknown.4: text/x-tex unknown.5: application/x-perl unknown.6: application/x-gzip unknown.7: application/ogg unknown.8: image/png unknown.9: text/html
Great, isn’t it? The Apache web server also uses libmagic for this purpose. With file you just use a wrapper for the same task.
file is available in Debian and Ubuntu for a long time.
June 29th, 2008 at 12:21 pm
Elder unix administrators did that know of course. But, yes someone new to linux do not even thinking about that this magic is build in :)
So, good idea to introduce the very old file!
July 2nd, 2008 at 1:30 pm
Awesome! this saves me a trip to filext.com to look up what a file type extension is. :D
July 17th, 2008 at 1:44 pm
@schrambo
No, this won’t save you a trip to filext.com. All this programs does is look up the headers of the file and compare it to its file type database.
As you can see, the files have numbers as extensions, and most, if not all, of us know that extensions in linux are meaningless, other than just for visual reference.
July 19th, 2008 at 2:18 pm
If only MIME was not that badly designed… A killing example : application/ogg (right in this article), why not audio/ogg and video/ogg ?
July 23rd, 2008 at 1:37 am
Your mention of magic made me laugh. That’s how it works: see
‘man magic’
Now you know why most shell scripts start with pound-bang.
July 23rd, 2008 at 11:39 am
@Roland
Thanks, but the text does mention “magic number” detection and libmagic.
August 14th, 2008 at 6:58 am
UNIX rules the universe!
August 31st, 2009 at 9:45 am
There’s not a way to find the lenght of a movie? For example if I type:
$file pippo.avi
pippo.avi: RIFF (little-endian) data, AVI, 608 x 336, 25.00 fps, video: XviD, audio: MPEG-1 Layer 3 (stereo, 48000 Hz)
That’s ok! But I’d like have an output like this
Lenght: 1.13.00 (h/m/s)
There’s a way to do that? Thanks