v0.4 GNU Copyleft improved perl report generator: Can generate separate reports for different subsections of the web tree. Can mark specific errors and redirections as expected. Caught common error specifying -proxy. v0.3 Add URL rewriting to speed checking of web heirarchies that live on local disk. We sacrifice the HTTPConnection reuse for the time being. Cope with http://hostname urls lacking a final / with -loose. The expiration date is now displayed on application startup. Switch to a less aggressive URL encoder for our output files. Rewrite the HTML scanner to operate on bytes instead of chars and also cope with some less common, but legal, SGML constructs. When we get an IOError retrieving /robots.txt, the URL will have a status of IOError, instead of the previous robotDenied which masked the more serious problem with the URL. v0.2 Add ability to specify loose (prefix) or exact urls for include/exclude. Add -exact, -loose and -noautoinclude to govern the include effect of URLs from the command line. Add concept of harvester/nonharvester threads to better manage CPU load and reduce forced context switching. Kill watchdog threads before starting results dump (they're useless at that stage and they slow things down). Remove excessive case-sensitivity in URLClassifier. Speed up progress reports from the WorkQueue (they weren't scaling). Fix bug with redirections to documents with # subdocument links. Harvest URLs from all %URI tag attributes listed in the HTML 4.0 spec except
. Expand SGML entity references in HTML tag attributes instead of decoding them HTTP-style.