Search

tmpreaper: keep your temp files under control

November 22nd, 2009 edited by Tincho

Article submitted by JP Vossen. DebADay needs you more than ever! Please submit good articles about software you like!

The tmpreaper utility will clean out your temporary file directories by recursively removing files that haven’t been accessed in some amount of time. You can configure exclusions and it will not dive into symlinks, or remove symlinks, sockets, FIFOs, or special files unless specifically told to.

However, the package description contains this:

WARNING: Please do not run `tmpreaper’ on `/’. There are no protections against this written into the program, as that would prevent it from functioning the way you’d expect it to in a `chroot(8)’ environment.

After you install the package, you need to manually edit /etc/tmpreaper.conf and remove or comment the SHOWWARNING=true line to actually active it. Also review the settings in that file.

At least some versions of Ubuntu, and possibly Debian, do not install tmpreaper by default. I assume that is in accordance with the “principle of least surprise” but this policy may bother system administrators familiar with Red Hat or other systems where /tmp is automatically cleaned out by default. Note that /tmp and other directories are still cleaned at boot-time by the default /etc/init.d/bootclean (Debian) or /etc/init.d/*-bootclean.sh (Ubuntu) scripts.

The Red Hat and derivatives equivalent is ‘tmpwatch’ and is installed by default on those systems.

The tmpreaper package has been available in Debian since Etch, and in Ubuntu since Dapper.

Posted in Debian, Ubuntu | Comments Off

lbzip2: parallel bzip2 utility

November 12th, 2009 edited by ana

Article submitted by ERSEK Laszlo. DebADay needs you more than ever! Please submit good articles about software you like!

lbzip2 is a multi-threaded bzip2 compressor/decompressor utility that can be used on its own, in pipelines, or passed to GNU tar with the –use-compress-program option (or with the –use shorthand).

The main motivation for writing lbzip2 was that I didn’t know about any parallel bzip2 decompressor that would exercise multiple cores on a single-stream bz2 file (i.e. the output of a single bzip2 run) and/or on a file read from a non-seekable source (e.g. a pipe or socket). Thus lbzip2 started out as lbunzip2, but with time it gained multiple-workers compression and single-worker decompression features. Due to the input-bound splitter of its multiple-workers decompressor, it should scale well to many cores even when decompressing.

Target audience

Originally, the target audience for lbzip2 was experienced users and system administrators: up to version 0.15, lbzip2 deliberately worked only as a filter. Now at 0.17, lbzip2 is mostly command line compatible with bzip2, except it doesn’t remove or overwrite files it didn’t create. If lbzip2 will have a chance to enter the Debian alternatives system, as an alternative for bzip2, I’ll add this feature. In any case, you are encouraged always to verify lbzip2’s output manually before (or instead of automatically) removing its input, both when compressing and when decompressing. I also recommend perusing the README, installed as /usr/share/doc/lbzip2/README.gz on Debian, before switching over to lbzip2 eventually.

Usage examples

As lbzip2 was chiefly created for speeding up decompression of single-stream bz2 files and/or for speeding up decompression from a pipe, I’ll provide examples of decompression first. Basically all free software tarballs should be available on the net as tar.bz2 files, I’ll choose (not surprisingly) a kernel tarball.

The “traditional” method:

wget \
  http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.31.1.tar.bz2
tar --use=lbzip2 -x -f linux-2.6.31.1.tar.bz2

The overlapped method:

wget -O - \
  http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.31.1.tar.bz2 \
| tee -i linux-2.6.31.1.tar.bz2 \
| tar --use=lbzip2 -x

If wget fails to download the tarball for some reason (at which point at least tar will complain), you should remove the partially decompressed tree and fall back to the traditional method. To avoid losing the already downloaded part, pass -c to wget.

Another example might be the import of a Wikimedia Dump file, perhaps with a pipeline like this:

lbzip2 -d < enwiki-latest-pages-articles.xml.bz2 \
| php importDump.php

Finally, a compression/backup example with verification at the end:

tar --format=pax --use=lbzip2 -c -f tree.tar.bz2 tree
tar --use=lbzip2 --compare -f tree.tar.bz2 -v -v

Hypothetically, with lbzip2 as the configured bzip2 alternative, we should be able to replace –use=lbzip2 with the well-known -j GNU tar option.

Comparison with other bzip2 utilities

I posted a longish mail with feature analyses and performance measurements to the debian-mentors maling list. To reiterate what I said there: fundamentally, lbzip2 was created to fill a performance gap left by pbzip2.

After working on lbzip2 for a while, I found out that p7zip does in parallel the decompression of single-stream bz2 files, but (the last time I checked) it couldn’t scale above four threads, and it refused to read bz2 files from a pipe.

Bzip2 compression and decompression performance is very sensitive to the cache size that is dedicated to a single worker thread (i.e. a single CPU core). To my limited knowledge, this implies that among commodity desktops, lbzip2 performs best on multi-core AMD processors.

lbzip2 does have shortcomings. They are either inherent in the design or I deem then unimportant. I tried to document them all. Please read the debian-mentors post linked above, the README file, and the manual page.

As said above, I didn’t originally intend lbzip2 as a drop-in replacement for bzip2. Even though it is almost there now, you should nonetheless get to know it thoroughly before deciding to switch over to it.

Availability

Various versions of lbzip2 are available for Debian (squeeze and sid) and Ubuntu (karmic and lucid).

You should be able to install lbzip2 on lenny too; it shouldn’t break anything. I used the following commands:

cat >>/etc/apt/sources.list <<EOT
deb http://security.debian.org/      testing/updates main
deb http://ftp.hu.debian.org/debian/ testing         main
EOT
apt-get update
apt-get install lbzip2

Upstream releases are announced on the project’s Freshmeat page. I distribute the upstream version to end-users from my recently moved home page, which also links to other distributions’ lbzip2 packages.

A development library version is very unlikely. You can work around this by communicating with an lbzip2 child process over pipes via select(), and by checking its exit status via waitpid() after receiving EOF. This is not an unusual method; see, for example, gpg’s many –[^-]*-fd options.

End-user stress-testing

I encourage you to test lbzip2. The upstream README describes the test method in general; let me instantiate that description here specifically for Debian.

Necessary packages, in alphabetical order:

  • bzip2
  • dash
  • gcc
  • lbzip2
  • perl

Recommended packages, in alphabetical order:

  • p7zip-full
  • pbzip2

Create a test directory (you will need lots of free space under that directory), and under it a well-compressible big file. For example:

mkdir -m 700 -v -- "$TMPDIR"/testdir
tar -c -v -f "$TMPDIR"/testdir/testfile.tar /usr/bin/ /usr/lib/

Then issue the following commands, utilizing the test file created above. As this could take several hours, I suggest entering a screen session first. Your machine should be otherwise unloaded during the test, both IO- and CPU-wise.

cd /usr/share/lbzip2
dash test.sh "$TMPDIR"/testdir/testfile.tar

Any errors encountered during the test should be either handled or fatally rejected. In particular, utilities refusing to decompress from a pipe are handled.

Estimated disk space usage: when writing this article, I executed the above commands with a 100 MB test file. (You should aim at least at 1 GB.) The test directory ended up being 250 MB in size. M stands for 220, G stands for 230.

Estimated time span: supposing

  • your machine has N cores (each with a dedicated L2
    cache),
  • the file you use for testing lbzip2 is S GB big,
  • and bzip2 takes T seconds to compress a 1 GB test file with similar contents,

then the full test should take around
S * (1879 + 2098 * 2 / N) * T / 240
seconds.

Estimated peak memory usage: N * 50 MB should be a very safe bet.

To view the test report:

less -- "$TMPDIR"/testdir/results/report

The only obscure entries in the table should be the “ws” ones. They mean “workers stalled” and give a percentage of how many times the (de)compressor worker threads tried to start munching a block but had to go to sleep because there was no block to munch. Anything above 1-2% usually implies some bottleneck and shows that lbzip2 couldn’t fully exhaust your cores. This shouldn’t occur, but if it does and lbzip2 and pbzip2 have performed similarly in the compression tests, then the bottleneck is in your system, not lbzip2.

The lzip2 package has been available .

Posted in Debian, Ubuntu | 4 Comments »

Backupninja: the ultimate data defender

November 1st, 2009 edited by Tincho

Article submitted by Андрей Парамонов (Andrey Paramonov). DebADay needs you more than ever! Please submit good articles about software you like!

Everyone knows they should do regular backups. Sooner or later, your hardware will fail, or you will accidentally delete a directory, or something else will happen.

Many people, however, ignore periodic backups because they find it too much of a hassle. That’s why, the backup procedure must be fully automated and require no user intervention, at all.

Backupninja is a backup system that provides excellent automation and configuration facilities. You only need to instruct Backupninja once, and he will take silent duty of defending your valuable data. This can be done via direct edit of configuration files, or via a nice console wizard called ninjahelper, which also helps to test the backup actions interactively.

Backupninja doesn’t do the hard work himself, but rather relies on specialized tools like rdiff and duplicity, thus following the Unix-way. There is built-in support for specialised backup actions, including things like the backup of Subversion repositories, or LDAP, MySQL, and PostgreSQL databases. It can do remote, incremental backups, as well as burning them to CDs or ISO images.

But the best part is that Backupninja is capable of learning new powerful skills, just by reading user-provided shell scripts. For example, I use the following script to dump important package information of my Debian system:

#!/bin/sh

dpkg --get-selections > /var/backups/dpkg-selections
if [ $? -ne 0 ]
then
   error “dpkg selections dump failed”
else
   info “dpkg selections dump done”
fi

aptitude search -F %p ‘~i’ > /var/backups/apt-installed && \
aptitude search -F %p ‘~i!~M’ > /var/backups/apt-installed-manual && \
aptitude search -F %p ‘~i ~M’ > /var/backups/apt-installed-auto
if [ $? -ne 0 ]
then
   error “installed package list dump failed”
else
   info “installed package list dump done”
fi

Note the use of some special functions: debug, info, and error. They put descriptive messages into the log file. It allows me to quickly ensure that fresh backups have actually been created. I’ve been using Backupninja to backup my personal data for a long time.

Pros:

  • Fully automates the backup procedure
  • Is very easy to setup
  • Is very flexible

Cons:

  • Build-in functionality could support more features
  • Support for non-shell backup scripts is limited

The backupninja package has been available in Debian since etch, and in Ubuntu since Dapper.

Posted in Debian, Ubuntu | 2 Comments »