The Morbidity and Mortality World Report (MMWR) is available from the CDC's website via a web interface: choose a year (1996-2011) and week number (1-53) from lists, press submit, choose a table number (there are 10-12 tables per week per year), press submit, and you're presented with an HTML table containing data for a subset of the notifiable diseases. Okay, CDC, now suppose I'm interested in large scale patterns: I want to download data for all diseases for a five year period. This is going to involve hitting "submit" 6,360 times (5 * 53 * 12 * 2). Sure, I could write a script to do it automatically, but the output is a bunch of HTML tables, each of which has a slightly different format making it difficult to "scrape" out the data.
(After toying with the idea of trying to build a scraper, I made contact with the CDC back in June to try to acquire the raw data behind the online tables. I thought this would be faster. I was mistaken. I've had a few responses, but I'm still waiting for the actual data.)
It's great that we've seen such an explosion of publicly available data, but one of the key words there is available. For some very important datasets, in terms of availability, we still have a long way to go.