rdiff-backup: Easy incremental backups from the command line

October 26th, 2008 edited by Vicho

Storage is becoming cheaper and cheaper: you can find hard drives that cost less than a dollar per GiB. Buying an external hard drive to make backups (or even having a backup server) is a must if you value your work and what you have stored in your computer. However, doing backups should be easy enough to be done on a regular basis. The more automated, the better.

So, I find no excuse not to do regular backups and looked for a tool easy-to-use but powerful. rdiff-backup is a python script that helps doing local and remote incremental backups. To backup your $HOME to an external hard drive mounted in /media/backup simply do:

$ rdiff-backup $HOME /media/backup/home_backup

If after some days you want to backup your new files, run the same command to update the backup.

Now, in /media/backup/home_backup you have an exact copy of your home as it was when you did the last backup. If you want to restore a directory, you can just copy it:

$ cp -a /media/backup/home_backup/src/myprogram ~/src/

Which is equivalent to:

$ rdiff-backup --restore-as-of now /media/backup/home_backup/src/myprogram ~/src/

Of course, you can restore previous versions of the file. For example, to restore the source of myprogram as it was a mounth ago:

$ rdiff-backup --restore-as-of 1M /media/backup/home_backup/src/myprogram ~/src/

You can see all the incremental backups you have done executing:

$ rdiff-backup --list-increments /media/backup/home_backup

If you run out of space in your backup device and you’re sure you don’t need the backups you made three years ago, you can remove them with:

$ rdiff-backup --remove-older-than 3Y /media/backup/home_backup

rdiff-backup works exactly the same with remote directories. You need to have ssh access and rdiff-backup must be installed in the remote(s) machine(s). Note that in any example above, you can change the local directories to remote ones, so you can backup a remote machine locally, or do a backup of this machine to a remote backup-server. For example, say is your backup server. You can backup regularly using:

$ rdiff-backup local-dir/

If you use RSA or DSA authentication, you can even put that in a cron job.

See rdiff-backup documentation and other examples to discover all the functionality of this package.

Similar packages

Frontends for rdiff-backup:

  • keep is a GUI (KDE) frontend for rdiff-backup.
  • archfs is a fuse (filesystem in userspace) virtual filesystem that lets you browse each version of a rdiff-backup repository as if they were any other directory. Adam Sloboda has stated his intention to package archfs for Debian.
  • rdiff-backup-web (not in Debian, no WNPP yet) is a web frontend for rdiff-backup.

There are a ton of other programs to make backups. I will list here some of them (but this list is no where near complete) that are similar to rdiff-backup:

  • backup2l also makes local backups, but seems to miss the remote-backup feature.
  • backuppc is a perl script that also makes incremental backups, and has an http user interface to help manage and restore backups.
  • duplicity also makes remote incremental backups, but encrypts the data using gnupg. I haven’t test it myself, but it can be useful if you don’t trust the remote file server.
  • storebackup also makes local incremental backups. It makes a new tree in every snapshot, but disk space is preserved by hard-linking unchanged files.



  • Easy to use. Now there’s no excuse not to do backups!
  • Works from the command line, so you can easily put it in a script or cron job.
  • Simple recovery from last snapshot, you can use standard tools like cp or find.


  • Not having a GUI may scare some users.
  • It stores the last snapshot uncompressed, so depending on what you are backing up, it can be very space consuming. Older snapshots are compressed, which makes this con a not-so-con ;-).

rdiff-backup has been available in Debian since Sarge (perhaps even longer), and in Ubuntu since Dapper.

Posted in Debian, Ubuntu | 14 Comments »

memstat: Identify what is using up virtual memory

October 19th, 2008 edited by Vicho

Article submitted by Todd Troxell. Guess what? We still need you to submit good articles about software you like!

This tool lets you discover what libraries and programs are using up memory. It is very simple to use. Here is an example of its output (the -w makes memstat not truncate lines at 80 columns):

gaius% memstat -w
    256k: PID  5465 (/lib/
    368k: PID 13019 (/var/db/nscd/passwd)
   3352k: PID 13914 (/usr/lib/gconv/gconv-modules.cache)
      8k: /usr/bin/memstat 5465
     12k: /lib/ 13019
    256k: /lib/ 13914
     88k: /lib/ 5465 13019
    256k: /lib/ 13019
   1212k: /lib/tls/ 13914
     32k: /lib/tls/ 13914
     24k: /lib/tls/ 13914
     12k: /lib/tls/ 13914
    144k: /lib/tls/ 13914
     76k: /lib/tls/ 13914
     40k: /lib/tls/ 13914
     36k: /lib/tls/ 13914
     60k: /lib/tls/ 13914
     28k: /lib/tls/ 13914
     88k: /lib/ 13914
   1212k: /lib/tls/ 5465 13019
     12k: /lib/tls/ 13019
    144k: /lib/tls/ 13019
     76k: /lib/tls/ 13019
    480k: /bin/zsh-beta 13019
    212k: /var/db/nscd/passwd 13019
    788k: /usr/bin/irssi 13914
    148k: /usr/lib/ 13019
    176k: /usr/lib/perl5/auto/Irssi/ 13914
     80k: /usr/lib/perl5/auto/Irssi/Irc/ 13914
     80k: /usr/lib/perl5/auto/Irssi/UI/ 13914
     12k: /usr/lib/gconv/ 13914
     24k: /usr/lib/gconv/gconv-modules.cache 13914
     76k: /usr/lib/ 13914
    584k: /usr/lib/ 13914
   1128k: /usr/lib/ 13914
     12k: /usr/lib/ 13914
   1240k: /usr/lib/i686/cmov/ 13914
    248k: /usr/lib/i686/cmov/ 13914
      8k: /usr/lib/zsh-beta/4.3.2-dev-1/zsh/ 13019
     24k: /usr/lib/zsh-beta/4.3.2-dev-1/zsh/ 13019
     56k: /usr/lib/zsh-beta/4.3.2-dev-1/zsh/ 13019
    116k: /usr/lib/zsh-beta/4.3.2-dev-1/zsh/ 13019
    196k: /usr/lib/zsh-beta/4.3.2-dev-1/zsh/ 13019

This output lists many libraries and processes loaded into memory and their sizes. First of all, processes and the size of their private memory are listed. This does not include their shared memory. Afterwards, shared objects are listed, and finally the total is listed.

In case you are wondering, shared object are libraries like /lib/tls/ that are shared across all processes that need them to save memory and make things run faster. Instead of loading this library into memory for every process, Linux loads one copy and uses this for any process that wants to use the library. Therefore, you may notice that these values sometimes do not add up to the amount of memory you see used on your system. If you look at ps(1) you will see two columns related to memory: RSS and VSZ. For each process, RSS is the amount of memory used by the process, and VSZ is the amount used counting shared objects. To add up memory correctly, you must count each shared object only once for the system.

It is possible to output per user statistics by running memstat as an unprivileged user. When run as root, memstat will list everything on the system.

Memstat works by scanning files in /proc and then searching for binaries in the paths listed at /etc/memstat.conf. The default file should be sufficient for most cases. If you have libraries or binaries in a non-standard place, you may need to modify this file to get accurate results.

Memstat was authored by Joshua Yelon. It is available in all current Debian and Ubuntu releases.

Posted in Debian, Ubuntu | 9 Comments »

smartmontools: control the health of your hard disk

October 12th, 2008 edited by Tincho

Article submitted by Noel David Torres Taño. Guess what? We still need you to submit good articles about software you like!

One of the packages I manually install in every new installation is smartmontools. I’ve some expertise in managing computers and networks, and it is a fact that pirate hackers and software bugs are not the main cause of problems in small and medium installations. Hardware is.

Thus, you have hardware that can fail, and Murphy says that if it can fail, it will. The point is not to avoid hardware failures, which would be impossible, but to detect them early or even prevent them.

Particularly for hard disks, the tool in charge is smartctl from the package smartmontools. IDE disks (if they’re not of the age of dinosaurs) have an integrated self-testing tool called SMART which means “Self-Monitoring, Analysis and Reporting Technology”. Modern SCSI disks have it too if they’re SCSI 3 or newer. It happens that inside the disk chipset there are routines to check parameters of disk health: spin-up time, number of read failures, temperature, life elapsed… And all of those parameters are not only registered by the disk chipset, but they have designated security limits and both parameters and limits can be checked by software who access the disk using the appropriate I/O instructions.

And that software is smartctl, a piece of the smartmontools deb package. Of course, since they access the disk in a raw way, you need to be root to use these commands.

smartctl can ask the disk for its smart identification:

# smartctl -i /dev/sda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is

Model Family:     Fujitsu MHV series
Device Model:     FUJITSU MHV2060BH
Serial Number:    NW10T652991F
Firmware Version: 00850028
User Capacity:    60,011,642,880 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
Local Time is:    Mon May 12 02:39:31 2008 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

More interesting, smartctl can ask the disk for its parameter values:

# smartctl -A /dev/sda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0×000f   100   100   046    Pre-fail  Always       -       124253
  2 Throughput_Performance  0×0004   100   100   000    Old_age   Offline      -       18284544
  3 Spin_Up_Time            0×0003   100   100   025    Pre-fail  Always       -       0
  4 Start_Stop_Count        0×0032   099   099   000    Old_age   Always       -       1199
  5 Reallocated_Sector_Ct   0×0033   100   100   024    Pre-fail  Always       -       8589934592000
  7 Seek_Error_Rate         0×000e   100   087   000    Old_age   Always       -       1761
  8 Seek_Time_Performance   0×0004   100   100   000    Old_age   Offline      -       0
  9 Power_On_Seconds        0×0032   079   079   000    Old_age   Always       -       10866h+57m+47s
 10 Spin_Retry_Count        0×0012   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0×0032   100   100   000    Old_age   Always       -       1199
192 Power-Off_Retract_Count 0×0032   099   099   000    Old_age   Always       -       283
193 Load_Cycle_Count        0×0032   100   100   000    Old_age   Always       -       6953
194 Temperature_Celsius     0×0022   100   100   000    Old_age   Always       -       45 (Lifetime Min/Max 14/58)
195 Hardware_ECC_Recovered  0×001a   100   100   000    Old_age   Always       -       62
196 Reallocated_Event_Count 0×0032   100   100   000    Old_age   Always       -       459276288
197 Current_Pending_Sector  0×0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0×0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0×003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0×000e   100   082   000    Old_age   Always       -       22371
203 Run_Out_Cancel          0×0002   100   100   000    Old_age   Always       -       1533257648465
240 Head_Flying_Hours       0×003e   200   200   000    Old_age   Always       -       0

As you can see, there are some attributes marked as “Pre-fail”. If any of these attributes goes beyond its threshold, the disk is about to fail in hours, maybe minutes.

Even if there are more options to smartctl , the last ones I will comment here are -a and -t.

smartctl -t launches a disk test. It needs a parameter indicating the type of the test, and in the longest case it can last for tens of minutes and will check the electrical and mechanical performance as well as the read performance of the disk, going through all its surface. smartctl -a, in its turn, shows all available information about the disk, including self testing results. Since tests will span minutes or tens of minutes, we can not see them happening. All what we will get when launching tests is like:

# smartctl -t long /dev/sda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is

Sending command: “Execute SMART Extended self-test routine immediately in
off-line mode”.
Drive command “Execute SMART Extended self-test routine immediately in
off-line mode” successful.
Testing has begun.
Please wait 41 minutes for test to complete.
Test will complete after Mon May 12 05:44:03 2008

Use smartctl -X to abort test.

Here, we’re being informed that (maybe) we will get a slightly lower performance on the disk for the next 41 minutes, since the test has started. It is completely background, or better ‘underground’, since it does not happen under the kernel control at all: everything is happening internally to the disk, and all what we can get is the result.

smartctl -a, in turn, show a very large amount of SMART information about the disk: almost all stored SMART information parsed for us. It is usually better to use a more specific switch, see the man page for details.

Finally, I want to comment that there is a daemon in the smartmontools package, smartd, who can take care of doing tests for you. It works by running smartctl in a periodic way (typically every 30 minutes) and logging all errors and parameter value changes to the syslog. The default configuration in Debian will also mail root if there’s any problem detected. I will not explain here about it, because I want you to read its (short and easy) documentation, but remember that in order to use it you must enable it in /etc/default/smartmontools.

The smartmontools package has been available both in Debian and Ubuntu since a long time ago.

Posted in Debian, Ubuntu | 20 Comments »

logstalgia: pong-like apache log viewer

October 5th, 2008 edited by Tincho

Article submitted by Andrew Caudwell

Logstalgia (inspired by glTail) is a website traffic visualization tool that replays or streams Apache access logs as a pong-like battle between the web server and an unrelenting army of requesting hosts. It is rendered using OpenGL, so you’ll need a 3D accelerated video card to run logstalgia.


Requests appear as colored balls (the same color as the host) which travel across the screen to arrive at the requested location. Successful requests are hit by the pong paddle while unsuccessful ones (such as 404s) are missed and pass through.

The paths of requests are summarized within the available space by identifying common path prefixes.

Related paths are grouped together under headings. For instance, by default paths ending in png, gif or jpg are grouped under the heading Images. Paths that don’t match any of the specified groups are lumped together under a Miscellaneous section. Groups can be customized to the page layout of your website fom the command line by specifying a heading, an associated regular expression and a screen percentage.

The simulation can be paused at any time by pressing space. While paused, individual requests can be inspected by passing over them with the mouse.

Logstalgia can read from either a file or standard input. To replay an apache log just run:

logstalgia /var/log/apache/access.log

You can combine Logstalgia with other tools like tail and ssh to watch the access.log of your web server in real time. eg:

ssh tail -f /var/log/apache/access.log | logstalgia -

Check out a video of Logstalgia in action:

Logstalgia is available in Debian since Lenny and in Ubuntu intrepid. A version of the package for Debian Etch is available on the homepage.

Posted in Debian, Ubuntu | 4 Comments »