HTML Hacking Scripts
Here are a few useful web-related programs I've written lately.
You might also see this 
interesting program by Abigail.
Shaking Up the Web
- latro
 - Latro finds idiotic PC sites open to perl.exe?FMH.pl
abuse and reports their little problem.
 
HTML Munging
- churl
 - Extract URLs and verify validity;
currently only looks for FTP:, HTTP:, and FILE: schemata,
stored in A or IMG tags.
 - striphtml
 - Strip out all the html bits from a document, leaving (unformatted)
plain text in its wake.
 - htdecom
 - Strips out comments from an HTML document.
 
htitle
Retrieve the title from a URL.
URL Munging
- surl
 - Given a list of URLs, sorts them by last-modified date.
 - xurl
 - Given one URL, extract all URLs it contains.  Uses the LWP
library, and is pretty complete.
 - qxurl
 - Somewhat like xurl,  
(means ``quick xurl'')
but expects to 
read from files, not URLs, and doesn't canonicalize relative links.
It also runs about 100x faster and doesn't require an external library.
 - reltree
Fix up a tree's URL to make them all relative instead of absolute.
 
Netscape Munging
- ggh
 - Grovel global history.  Search or dump out the netscape global history 
history file.