Search

websec: monitor websites for changes

December 30th, 2006 edited by ana

Entry submitted by Lucas Nussbaum. DPOTD needs your help, please contribute !

A lot of websites don’t provide RSS feeds yet. When you want to monitor changes to such sites, you can only visit it from time to time, or use public services such as FEED43 or RSSPECT. Websec (Web Secretary) automates the process of regularly visiting the website: it typically runs in a cron job, and compares the content of a web page with what it fetched during the previous run. If the content has changed, it will email it to you with the changed content highlighted.

It works quite well, but has some limitations:

  • You cannot monitor a whole website, but only single web pages ;
  • You can exclude some text from the comparison (typically “Generated in 0.2s”, the current date/time, etc), but I couldn’t get it to exclude multi-line expressions (this makes it impossible to monitor Google results, for example). See bug #402113.

I couldn’t find a package doing the same as websec, but generating an RSS feed instead instead of emailing the changes. If someone want to write one, it might be possible to re-use some of websec’s code.

You can find websec’s homepage at http://baruch.ev-en.org/proj/websec/

Websec has been available in Debian and Ubuntu for several stable releases, and doesn’t suffer from any really annoying bug.

Posted in Debian, Ubuntu |

2 Responses

  1. adren Says:

    indeed a very nice and valuable tool

    other limitations are :

    - pure ASCII pages (text/plain) doesn’t show up nicely (ex: http://www.rhyolite.com/anti-spam/dcc/CHANGES)
    - is not able to visit a page that has login/passwd restrictions
    - some page are not well handled (chunk encoding)

  2. geetha Says:

    Is it necessary to monitor changes made to a website..