Here's an example of Beautiful Soup in action, fetching the titles of all the posts from this blog. There's a recent Ruby port, but I'm using the Python version:
import urllib
from BeautifulSoup import BeautifulSoup
archives_url = 'http://throwingbeans.org/archives.html'
archives_html = urllib.urlopen(archives_url).read()
soup = BeautifulSoup(archives_html)
for h3 in soup.fetch('h3'):
print h3.a.string
The first project spiders two levels of the Multiple Sclerosis Society's site, pulling out related links and rewriting pages for use on a 'Knowledge Store' CD, which helpdesk staff will refer to during support telephone calls. The second sidesteps an interface deficiency in our content management system, and allows me to rapidly republish swathes of content by simulating a robot editor. With screenscraping software this good, who needs APIs?