Debian Package of the Day (static archived copy)

mmv: Mass moving and renaming files

June 13th, 2007 edited by ana

Article submitted by Ferry Boender. We are running out of articles ! Please help DPOTD and submit good articles about software you like NOW !

Mmv is command-line tool which allows the user to move, rename, copy, append and link large amounts of files with a single command. The tool is especially useful when you need to rename a lot of files that have similar filenames, yet subtle differences.

Although mmv does more than just renaming files, this article will focus only on renaming because that is what I use it for the most. The tool is best explained using an example.

Suppose you have the following files in a directory:

foo1.png
foo2.png
bar3.png

You want all the files that start with ‘foo’ to instead start with ‘bar’. In this case it could easily be done manually, but suppose there are hundreds of files! You’d soon be forced to start shell-scripting some solution. But mmv is the perfect tool for this job:

mmv "foo*.png" "bar#1.png"

The above command will result in the following files:

bar1.png
bar2.png
bar3.png

Explanation

From pattern

Mmv matches the files using the wildcard you gave (the ‘From’ pattern). Then it will rename the matched files according to the second argument (the ‘To’ pattern). The ‘From’ pattern can take all the usual shell wildcards such as ‘*’, ‘?’ and ‘[]‘. Remember that you need to enclose the patterns with quotes, otherwise they will be expanded by the shell and mmv won’t understand them!

To pattern

The ‘#1′ in the ‘To’ pattern is a wildcard index. It matches the first wildcard found in the ‘From’ pattern. A ‘#2′ in the ‘To’ pattern would match the second second wildcard, etc. Mmv replaces any occurrences of wildcard indexes with the text for the corresponding wildcard in the ‘From’ pattern. In the example above, ‘#1′ matches the number after ‘foo’ and in front of the period. Note that ‘??’ are actually two wildcards, both of which match a single character!

More examples

The ‘From’ and ‘To’ pattern can also be used to switch around stuff in filenames:

abc_123.txt
def_456.txt
ghi_789.txt

mmv "*_*.txt" "#2_#1.txt"

Would result in:

123_abc.txt
456_def.txt
789_ghi.txt

Another nifty trick mmv can do is changing the case of text matched by a wildcard. To do this, you place a ‘l’ (lowercase) or ‘u’ (uppercase) between the ‘#’ and the number in the ‘To’ pattern:

john.txt
pete.txt

mmv "?*.txt" "#u1#2.txt"

This results in:

John.txt
Pete.txt

Safety

Mmv tries to be as safe as possible to avoid collisions in renaming which might cause files to be deleted. If, for instance, the result of the renaming would cause two different files to get the same name (thereby overwriting one), mmv will issue a collision warning and abort.

Mmv also tries to gracefully handle a rename that causes one of the resulting filenames to be identical to one of the source filenames. For instance:

a
aa

mmv "*" "a#1"

This does not overwrite the ‘aa’ file with the ‘a’ file but instead results, as expected, in:

aa
aaa

Availability

Mmv has been available in Debian at least since v3.1 (’Sarge’) and in Ubuntu since Warty. apt-get install mmv will install it for you.

Notes

Renaming directories can only be done with the -r switch.
Remember to enclose the ‘To’ and ‘From’ parameters in quotes!

Posted in Debian, Ubuntu | 10 Comments »

Debaday closed for Holidays!

June 12th, 2007 edited by ana

Dear readers,

Debaday will be closed for holidays after tomorrow’s entry until July 1st. The editors team will be busy at DebConf, Debian Project’s developer annual conference, for the next two weeks and we only have a couple of entries that are worth publishing since we have not receive a lot of contributions lately. We will continue publishing entries after 1st July on a regular basis as we have done until now, but remember we need your contributions to keep debaday alive!

The editors team.

If you are attending at DebConf7, see you there! :)

Posted in Debian, Ubuntu | 1 Comment »

TreeLine: a versatile tree-like structured custom data manager

June 10th, 2007 edited by ana

Article submitted by Miriam Ruiz. We are running out of articles ! Please help DPOTD and submit good articles about software you like !

Do you have lots of sticky notes lying around with various useful information jotted down? Or many lists of books, movies, website logins, personal contacts, or things to do? Can you find them when you need them? TreeLine is a possible anwser to all those questions. It might be an Outliner, or maybe a PIM. What it basically do is store almost any kind of information (including plain text, HTML, numbers, dates, times, booleans, URLs, etc.). The data is structured in a user-defined tree structure, so that it’s easy to keep things organized.

Each of the nodes of the tree can contain several fields, thus forming a mini-database. The output format for each node can be defined, and the output can be shown on the screen, printed, or exported to HTML. Several node types, with different sets of fields, can be included in one file.

TreeLine’s window is divided into two panes. The view on the left shows the entire tree structure, while the view on the right shows various information about the tree node that is selected in the left pane. The right pane is tabbed to show one of three different views of the data. The “Data Output” view shows the formatted text for each node and is read-only. The “Data Editor” view shows a text edit box for each data field within a node. The “Title List” view shows a list of node titles that can be modified using typical text editor methods.

View of treeline’s main window.

TreeLine files are XML by default, but there are options for automatically compressing or encrypting the files. The data can alse be exported to HTML. You can also export a XSLT file, so you can work with the XML TreeLine files. There are many other file formats in which the data can be exported and imported: Tab-delimited tables and tab-indented text files can be imported and exported. Plain text files and Treepad files can be imported. Mozilla and XBEL format bookmark files can also be imported and exported. Generic XML files can be imported and exported, allowing TreeLine to function as a crude XML editor.

The program has many other useful features, like sorting or filtering the nodes, change the node’s icon and output format conditionally based on its data, spell check the text data, or automatically arrange the data. The user interface and documentation are available in different languages.

TreeLine is coded in Python, and uses the Qt toolkit library. If spell checking capability is desired, either the GNU aspell (preferred) or ispell programs are required. Treeline is available for both Ubuntu and Debian with a simple apt-get.

Posted in Debian, Ubuntu | 3 Comments »

OTS: Command line text auto-summary

June 6th, 2007 edited by ana

Article submitted by Alex Gretlein. We are running out of articles ! Please help DPOTD and submit good articles about software you like NOW !

Open Text Summarizer is both a library and a command line tool (developed by Nadav Rotem) that, well, summarises text. It is similar to the functionality incorporated into Microsoft Word and available in all native Mac OS X applications. The approach taken by OTS is to use word frequency to prepare a list of keywords and assign priority to sentences based on that frequency. It then outputs a summarised version of your text based on a ratio you supply —the default is 20%, i.e. the summary will be one-fifth the size of the original in terms of number of sentences. An automated process like this can never be perfect, and some texts are more amenable to auto-summarising than others. The reliance on sentences means that a well structured prose text works best, and that it should be somewhat substantial to produce meaning. Auto-summaries can be used as a basis for abstracts or catalogue descriptions, for article summaries in RSS feeds, or for checking keyword frequency for Search Engine Optimisation. Shorter texts, lists, and internally incoherent or structurally inconsistent texts will tend to produce gibberish —which can have its own amusement value. While the performance of OTS may not quite be up to the standards of proprietary alternatives (see this 2003 review), it is —as far as I was able to determine— the only available free or open source (specifically GPL) library for this purpose.

The developer has produced a screencast showing OTS in action. As a sample of the program’s output, this is a 20% summary of the “Ground Rules” section of the Ubuntu Code of Conduct.

This Code of Conduct covers your behaviour as a member of the Ubuntu Community, in any forum, mailing list, wiki, web site, IRC channel, install-fest, public meeting or private correspondence. The Ubuntu Community Council will arbitrate in any dispute over the conduct of a member of the community. We expect members of the Ubuntu community to be respectful when dealing with other contributors as well as with people outside the Ubuntu project, and with users of Ubuntu. Your work should be done transparently and patches from Ubuntu should be given back to the community when they are made, not just when the distribution releases. If you really want to go a different way, then we encourage you to make a derivative distribution or alternative set of packages available using the Ubuntu Package Management framework, so that the community can try out your changes and ideas for itself and contribute to the discussion.

You can run OTS by itself with the command —surprise!— ots:

Usage: ots [OPTIONS...] [file.txt | stdin]
  -r, –ratio=<int>      summarization % [default = 20%]
  -d, –dic=<string>     dictionary to use
  -o, –out=<string>     output file [default = stdout]
  -h, –html             output as html
  -k, –keywords         only output keywords
  -a, –about            only output the summary
  -v, –version          show version information

Help options:
  -?, –help             Show this help message
  –usage                Display brief usage message

So, for example if I had a document called ucoc and I wanted a 10% summary of it in a file called ucoc-tiny, I would run:

$ ots -r 10 -o ucoc-tiny ucoc

The --keywords option seems to be deprecated. The --html option outputs an HTML page of the entire text with the elements that would make up the summary highlighted in yellow.

OTS uses XML based dictionary files to provide word recognition for different languages. The latest version includes files for 37 languages —including most of the major languages written in Roman script, as well as Russian and Hebrew. It does not appear to have any means of recognising variant forms of the same word, such as verb conjugation, particularly in languages like Hebrew.

OTS is available in the repositories of Debian from at least sarge on, and of every release of Ubuntu. It is available by itself under the name libots0. To install it use your favourite graphical installer or run:

$ sudo apt-get install libots0

In both distros this is version 0.4.2, released in 2003. There are also libots-dev packages. A new version, 0.5.0 was released in April, 2007. The source code is available from Sourceforge.

As a library, libots can also be used by other programs. There is a list on the project home page of three applications that provide summarising through OTS:

There was a plug-in in the development version of AbiWord at the time the OTS site was written. That “development version” is fairly ancient and it is a fully integrated part of the version available in current Debian or Ubuntu systems. Abiword itself is a great lightweight alternative to OpenOffice Writer. It’s also the default word processor for Xubuntu, and part of the xubuntu-desktop package. If you’re not running Xubuntu, you can install it with the package name abiword. (There are also development and plug-in packages for Abiword.)
The second application to use OTS is Gnome-Summarizer, a GUI by the author himself. It appears from the screenshot to display the output HTML file and keywords. Even more exciting-looking than this is the “Researcher’s Tool” demonstrated in the screencast of the next version.
The third program listed is a gedit plug-in by Daniel Brodie.

To this, we may be able to add Haystack (described here), an extension for the Plone framework which identifies related content. It uses a Python wrapper called ots, which is available in Python’s Cheese Shop.

An earlier version of this article was posed at IQAG Notes.

Posted in Debian, Ubuntu | 7 Comments »

KLone: C web programming framework

June 3rd, 2007 edited by Tincho

Article submitted by Kari Pahula. We are running out of articles ! Please help DPOTD and submit good articles about software you like NOW !

PHP is well known for its coding style, mixing HTML with source code inside special <?php code > tags. There are tools and frameworks for that kind of web development using other programming languages like Perl and Python, but there is one fairly surprising choice of programming language that you could use instead, namely C.

KLone is a web application development framework that takes HTML with C embedded in as its input and turns it all into a single binary that is the server and the web app in one package.

Let’s step through an example of how to make a simple Hello World type of web app. First off, apt-get install klone-package. That’ll install KLone and a few tools you can use in a Debian environment. Change to a directory where you have a write permission.

$ make-klone-project create -p myhello
$ cd myhello-0.1

There will be a number of files and directories in the project directory. The important ones are debian/, where you have the files necessary for generating Debian packages containing your web app. We’ll return to that part later on. For now, let’s concentrate on the other directory of interest, userdata/.

$ cd userdata
$ mkdir etc
$ cd etc

Create and edit a file called kloned.conf.

server_list my_http
allow_root yes

my_http
{
    type      http
    addr.type IPv4
    addr.port 8880
    dir_root  /www
}

Now we’re ready for action!

$ cd ..
$ mkdir www
$ cd www

Create and edit a file called index.kl1. Any files that end with the .kl1 suffix will be treated as HTML/C files.

<%!
#include <time.h>
time_t now;
%>
<html>
<head><title>A hello world app for debaday</title></head>>
<body>
<h1>Hello World</h1>
<p><%
now = time(0);
io_printf(out, "Time is now %s\n", ctime(&now));
%>
</body>
</html>

That’s really all there is to it. Now, return to the project root directory and to build a kloned server, just run:

$ kloned-build -o myapp userdata
$ ./myapp -F

That -F switch is there so that the server won’t start as a daemon. Now you can access http://localhost:8880/ with the web browser of your choice to see a friendly greeting and to know the current time. But this isn’t the whole story, yet. Stop the web app you just created, run apt-get install dpkg-dev and then do the following:

$ dpkg-buildpackage -rfakeroot
$ sudo dpkg -i ../myhello_0.1*deb

You have just installed a Debian package containing your web app. As long as the package is installed, you can rely on having your app start at boot time.

If you want to see more examples of how to use KLone, feel free to visit KoanLogic’s web pages.

KLone is available in Debian in Etch and later, and in Ubuntu since Edgy.

Posted in Debian, Ubuntu | 9 Comments »

Search

Archives

Meta:

Blogroll

Recent Posts