v0.4

GNU Copyleft

improved perl report generator: Can generate separate reports for
different subsections of the web tree.  Can mark specific errors and
redirections as expected.

Caught common error specifying -proxy.

v0.3

Add URL rewriting to speed checking of web heirarchies that live on
local disk.  We sacrifice the HTTPConnection reuse for the time being.

Cope with http://hostname urls lacking a final / with -loose.

The expiration date is now displayed on application startup.

Switch to a less aggressive URL encoder for our output files.

Rewrite the HTML scanner to operate on bytes instead of chars and also
cope with some less common, but legal, SGML constructs.

When we get an IOError retrieving /robots.txt, the URL will have a
status of IOError, instead of the previous robotDenied which masked
the more serious problem with the URL.

v0.2

Add ability to specify loose (prefix) or exact urls for include/exclude.

Add -exact, -loose and -noautoinclude to govern the include effect of
URLs from the command line.

Add concept of harvester/nonharvester threads to better manage CPU load
and reduce forced context switching.

Kill watchdog threads before starting results dump (they're useless at
that stage and they slow things down).

Remove excessive case-sensitivity in URLClassifier.

Speed up progress reports from the WorkQueue (they weren't scaling).

Fix bug with redirections to documents with # subdocument links.

Harvest URLs from all %URI tag attributes listed in the HTML 4.0 spec
except <form action=>.

Expand SGML entity references in HTML tag attributes instead of
decoding them HTTP-style.