You are here

Whitehouse.Gov's Robot Exclusion File

A friend (who shall remain nameless) just learned about Robot Exclusion Files; these are wide open and you can look at them for a number of very public sites. Being the curious sort, and being particularly mindful of the current administration, it occurred to her to see what happened when she tried to look at the robots.txt file for Whitehouse.gov.

Surprise!

[Since these are paranoid times, I think I should point out pre-emptively that by its very nature, the robots.txt file is intended to be read -- that, in fact, it's read many many many times a day (just not usually by humans). So while the massively secretive and paranoid current occupants of the White House might wish otherwise, there is no conceivable legal reason why I shouldn't be able to look at it. OK?] (The preceding paragraph, and my friend's insistence on anonymity, by the way, are examples of the "chilling effect" in action.)

Typically these things are pretty short. CNN's is an exception, and an educational one. In fact, the four commercial examples I give above seem to all be pretty good examples of when a big site would use them:

  • To exclude highly dynamic content that really just shouldn't be indexed, anyway.
  • To exclude stuff that just plain doesn't need to be indexed, like login pages.
  • To prevent indexing of non-text content like images, audio/video files, amd Shockwave movies. (CNN's web geeks have some fun with this. Hell, why not?)

What's immediately interesting to me on the White House's robots.txt is how superficially mundane a lot of the links are -- and how suggestive others are. If you actually plug in some of those links, you'll find that they run the gamut from broken pages to 404s; the broken or ill-formed pages seem to be static, for the most part, and often old. My friend wondered why the list was so long; I set that in the back of my mind and had a much more mundane and plausible answer flash into my head as I lay my head down last night: Sloppiness. Their web admins are too lazy to set up a sandbox or set passwords, or they don't have the clout to get White House staffers to actually use passwords, so they're opting for security by obscurity.

Maybe someone should tell the Bushites that "security by obscurity" is an oxymoron....nah, let 'em figure it out for themselves.

Add new comment