Article submitted by Zhao Difei. We are running out of articles! Please help DPOTD and submit good articles about software you like!
HTTrack is a powerful tool that allows you to download / mirror a website to a local location.
Basically, HTTrack follows the links of the original website, recursively downloads them to the local directory while re-arranging the hyper-links structure so you can just simply open a downloaded HTML file and browse at the local machine. In contrast, the recursive mirror function of Wget will not rearrange the hyper-links on the web pages you downloaded, so they might still be pointing to remote locations.
HTTrack is a powerful tool but the syntax is very simple, let’s have a look at the basic usage:
A simple example that copies the debian.org website to the local “httrack” directory:
HTTrack can also apply download filters, you may have noticed the “*_FILTER” things from the httrack usage line above, the plus sign + means to download a specific patter, and the minus sign - means to avoid download. The following examples (mirroring slashdot) show a simple usage of filters, the first one will not download items from the apple.slashdot.org site, and the second one will not download items which have a MIME image/jpeg type, please notice that you can still view the things you did not download if you have the Internet connection available, because HTTrack will arrange the hyperlinks for you:
To download two sites that share lots of common links, you can do:
There are still many options and more advanced usages left, interested readers may always read the manual. HTTrack is available in Debian from oldstable Sarge to unstable Sid and for Ubuntu from Dapper to Gutsy.