Search

file: classify unknown files on the console

June 29th, 2008 edited by Alexey Beshenov

Article submitted by Caspar Clemens Mierau. Guess what? We still need you to submit good articles about software you like!

Somebody just sent you a mail with attachments that don’t have usable file extensions so you don’t really know how to handle them. Audio file? PDF? What is it? The same problem might occur after a file recovery, on web pages with upload features, etc.

While you can try to give the file an extension and open it with a software you think might be suitable, the better way is to let your computer find out what is all about. As a GNU/Linux user you probably already think “There is surely a command line tool for this”. Of course there is: the file by Ian Darwin.

It often gets automatically installed by dependencies. In any case, aptitude install file will help you. file depends on libmagic which provides patterns for the so called “magic number” detection.

Let’s assume we have the following directory with unknown files:

$ ls -l
total 2152
-rw-r--r-- 1 ccm ccm    4118 2008-03-30 06:32 unknown.0
-rw-r--r-- 1 ccm ccm   10220 2008-05-06 02:23 unknown.1
-rw-r--r-- 1 ccm ccm   12693 2008-05-06 02:23 unknown.2
-rw-r--r-- 1 ccm ccm   25933 2007-10-26 07:41 unknown.3
-rw-r--r-- 1 ccm ccm    2121 2007-10-26 07:41 unknown.4
-rw-r--r-- 1 ccm ccm     185 2007-10-14 20:14 unknown.5
-rw-r--r-- 1 ccm ccm 1189011 2008-05-17 22:37 unknown.6
-rw-r--r-- 1 ccm ccm  824163 2008-02-02 05:02 unknown.7
-rw-r--r-- 1 ccm ccm   82367 2007-09-20 06:18 unknown.8
-rw-r--r-- 1 ccm ccm    8872 2006-04-24 12:43 unknown.9

Now we want to know what’s inside those black boxes. Therefore we just call file * on the console:

$ file *
unknown.0: XML
unknown.1: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
unknown.2: ASCII C program text
unknown.3: PDF document, version 1.4
unknown.4: LaTeX 2e document text
unknown.5: perl script text executable
unknown.6: gzip compressed data, from Unix, last modified: Wed Oct  8 16:27:09 2003
unknown.7: Ogg data, Vorbis audio, stereo, 44100 Hz, ~192003 bps, created by: Xiph.Org libVorbis I (1.0)
unknown.8: PNG image data, 492 x 417, 8-bit/color RGBA, non-interlaced
unknown.9: HTML document text

Hey, that’s all. Pretty impressive, isn’t it? file does even not only distinguishes binaries and text files, it even tries to guess what programming language a text file is written in. And the magic is not that much magic: for example, in case of the ZSH script it just sees a shebang pointing to the zsh in the first line of the file, a PDF file typically starts with “%PDF” and so on. It’s all about patterns.

file provides you with some command line options that make it’s usage even more helpful. The most interesting is -i as it prints out MIME-types instead of verbose file types. If you are a web developer and want to know the exact MIME-type for a file download, this can save you a lot of time:

$ file -i *
unknown.0: text/xml
unknown.1: application/x-object, not stripped
unknown.2: text/x-c; charset=us-ascii
unknown.3: application/pdf
unknown.4: text/x-tex
unknown.5: application/x-perl
unknown.6: application/x-gzip
unknown.7: application/ogg
unknown.8: image/png
unknown.9: text/html

Great, isn’t it? The Apache web server also uses libmagic for this purpose. With file you just use a wrapper for the same task.

file is available in Debian and Ubuntu for a long time.

Posted in Debian, Ubuntu |

8 Responses

  1. bed Says:

    Elder unix administrators did that know of course. But, yes someone new to linux do not even thinking about that this magic is build in :)
    So, good idea to introduce the very old file!

  2. schrambo Says:

    Awesome! this saves me a trip to filext.com to look up what a file type extension is. :D

  3. erKURITA Says:

    @schrambo

    No, this won’t save you a trip to filext.com. All this programs does is look up the headers of the file and compare it to its file type database.

    As you can see, the files have numbers as extensions, and most, if not all, of us know that extensions in linux are meaningless, other than just for visual reference.

  4. anon Says:

    If only MIME was not that badly designed… A killing example : application/ogg (right in this article), why not audio/ogg and video/ogg ?

  5. Roland Says:

    Your mention of magic made me laugh. That’s how it works: see
    ‘man magic’
    Now you know why most shell scripts start with pound-bang.

  6. Al Says:

    @Roland

    Thanks, but the text does mention “magic number” detection and libmagic.

  7. bubo Says:

    UNIX rules the universe!

  8. Mighty83 Says:

    There’s not a way to find the lenght of a movie? For example if I type:

    $file pippo.avi
    pippo.avi: RIFF (little-endian) data, AVI, 608 x 336, 25.00 fps, video: XviD, audio: MPEG-1 Layer 3 (stereo, 48000 Hz)

    That’s ok! But I’d like have an output like this

    Lenght: 1.13.00 (h/m/s)

    There’s a way to do that? Thanks