Wholesale Site Validation

Album Cover: Diamond Hoo Ha

"I got to get you in my suitcase."
Supergrass / Diamond Hoo Ha Man

Posted on January 22, 2006 2:13 AM in Web Development
Warning: This blog entry was written two or more years ago. Therefore, it may contain broken links, out-dated or misleading content, or information that is just plain wrong. Please read on with caution.

The last time I checked out my site's SiteScore, I noticed that several older blog entries were causing validation errors on my site. I've been aware of this issue for some time, and even promised to fix the invalid code after someone took the liberty of pointing the issue out in a comment.

I almost always check the validity of my site's XHTML after I add a new entry to my blog, so I can almost guarantee that any new entries will be standards compliant. However, I have noticed that as some of the more popular posts are commented on, they lose their validity over time. This is out of my control for the most part.

I say "for the most part" because there are really two options for keeping everything compliant around here. The first option is to be proactive and check that a new comment will not break the validity of an entry's page. However, this would require either posting the comment and then checking the validity (and then subsequently backing it out if it breaks the validity) or writing out a temporary HTML file that could be sent to the validator for examination.

The main problem with being proactive is that you are basically putting the onus on your visitors to write compliant comments, and in my opinion that is asking too much. Users shouldn't have to be responsible for your site's validity — you should.

So this brings us to the retroactive approach, which is doable, but not necessarily straightforward. Luckily for me, my blog is structured in such a way that iterating through every single item is as simple as replacing an integer in a URI. Every blog entry here at bernzilla.com looks something like:


Because of this, I can replace 100 with all of the identifiers that are appropriate for my blog (some are missing because of bad posts or other miscellaneous reasons).

So the next step is to select all the appropriate identifiers from the database, send each individual URI to the validator, and then compile a list of the results so it is easy to spot the entries that contain invalid markup. Once that is taken care of, fixing the invalid markup should be the easy part.

The problem with the retroactive approach is that the validator is a tad slow, and regardless, one should strive to be a good netizen and avoid sending hundreds and hundreds of validation requests to the validator in a short period of time.

Well, in an attempt to find all the invalid portions of my site and still be a semi-well-behaved netizen, I wrote a PHP script that does exactly what I've explained here. I've left it as a manual process so that the successive hits to the validator are kept to a minimum while still allowing me to verify my site's validation from time to time.

If enough people are interested in what the script looks like, I may post an example version for perusal. Otherwise, the output of the run I ran tonight is already available for all to see (and it of course validates 'cause if it didn't, that would just be silly). As you can tell, I have some work to do...


Ryan on January 23, 2006 at 8:15 PM:

Wouldn't another solution be for w3c to release the validator's code, or at least a binary, so that you can run validation locally?

Or does this already exist?


Bernie Zimmermann on January 23, 2006 at 8:34 PM:

Ryan, that was my first intention, but once I looked through their online documentation I realized it was almost geared more toward mirror-like setups than local validation. I'd love to see a simple Perl (or Pythong) app that you call from the command line rather than an elaborate web app that you have to run from a browser.

The bottom line is, I don't anticipate running this script more than a handful of times per year, so I don't think the W3C will come hunting me down anytime soon (I'm more worried about the RIAA).


Bernie Zimmermann on January 23, 2006 at 8:36 PM:

At first glance, I thought pythong was a Freudian slip ;)


Post Comments

If you feel like commenting on the above item, use the form below. Your email address will be used for personal contact reasons only, and will not be shown on this website.


Email Address:



Check this box if you hate spam.