file: classify unknown files on the console
June 29th, 2008 edited by Alexey BeshenovArticle submitted by Caspar Clemens Mierau. Guess what? We still need you to submit good articles about software you like!
Somebody just sent you a mail with attachments that don’t have usable file extensions so you don’t really know how to handle them. Audio file? PDF? What is it? The same problem might occur after a file recovery, on web pages with upload features, etc.
While you can try to give the file an extension and open it with a software you think might be suitable, the better way is to let your computer find out what is all about. As a GNU/Linux user you probably already think “There is surely a command line tool for this”. Of course there is: the file by Ian Darwin.
It often gets automatically installed by dependencies. In any case, aptitude install file will help you. file depends on libmagic which provides patterns for the so called “magic number” detection.
Let’s assume we have the following directory with unknown files:
$ ls -l total 2152 -rw-r--r-- 1 ccm ccm 4118 2008-03-30 06:32 unknown.0 -rw-r--r-- 1 ccm ccm 10220 2008-05-06 02:23 unknown.1 -rw-r--r-- 1 ccm ccm 12693 2008-05-06 02:23 unknown.2 -rw-r--r-- 1 ccm ccm 25933 2007-10-26 07:41 unknown.3 -rw-r--r-- 1 ccm ccm 2121 2007-10-26 07:41 unknown.4 -rw-r--r-- 1 ccm ccm 185 2007-10-14 20:14 unknown.5 -rw-r--r-- 1 ccm ccm 1189011 2008-05-17 22:37 unknown.6 -rw-r--r-- 1 ccm ccm 824163 2008-02-02 05:02 unknown.7 -rw-r--r-- 1 ccm ccm 82367 2007-09-20 06:18 unknown.8 -rw-r--r-- 1 ccm ccm 8872 2006-04-24 12:43 unknown.9
Now we want to know what’s inside those black boxes. Therefore we just call file * on the console:
$ file * unknown.0: XML unknown.1: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped unknown.2: ASCII C program text unknown.3: PDF document, version 1.4 unknown.4: LaTeX 2e document text unknown.5: perl script text executable unknown.6: gzip compressed data, from Unix, last modified: Wed Oct 8 16:27:09 2003 unknown.7: Ogg data, Vorbis audio, stereo, 44100 Hz, ~192003 bps, created by: Xiph.Org libVorbis I (1.0) unknown.8: PNG image data, 492 x 417, 8-bit/color RGBA, non-interlaced unknown.9: HTML document text
Hey, that’s all. Pretty impressive, isn’t it? file does even not only distinguishes binaries and text files, it even tries to guess what programming language a text file is written in. And the magic is not that much magic: for example, in case of the ZSH script it just sees a shebang pointing to the zsh in the first line of the file, a PDF file typically starts with “%PDF
” and so on. It’s all about patterns.
file provides you with some command line options that make it’s usage even more helpful. The most interesting is -i as it prints out MIME-types instead of verbose file types. If you are a web developer and want to know the exact MIME-type for a file download, this can save you a lot of time:
$ file -i * unknown.0: text/xml unknown.1: application/x-object, not stripped unknown.2: text/x-c; charset=us-ascii unknown.3: application/pdf unknown.4: text/x-tex unknown.5: application/x-perl unknown.6: application/x-gzip unknown.7: application/ogg unknown.8: image/png unknown.9: text/html
Great, isn’t it? The Apache web server also uses libmagic for this purpose. With file you just use a wrapper for the same task.
file is available in Debian and Ubuntu for a long time.
Posted in Debian, Ubuntu | 8 Comments »