You are here

InfoTech

IT, computing, etc. -- technical stuff, not political or social.

The Wages of Spam

A thought for the morning: Over the past 24 hours, about 94% of the email my company received has been some kind of spam. That means that only a little over 6% (6.18% or so, to be precise) has been legitimate.

That's about normal for recent weeks.

The category breakdown looks like this:

Valid Mail 30 3.31%
Allowed Sender 15 1.66%
Allowed IP 11 1.21%
Phishing 2 0.22%
Meds 532 58.72%
Truncated 158 17.44%
General 60 6.62%
Adult 41 4.53%
Gambling 11 1.21%
Other Spam 11 1.21%
Abstract 8 0.88%
Scams 7 0.77%
Get Rich 6 0.66%
Debt/Credit 6 0.66%
Cable Theft 6 0.66%
NewSource 2 0.22%

It's interesting to note that medicine has so far outstripped sex. Though I have to wonder if penis enlargement is classified under "meds" or "adult."

All this having been said, I see no indication that email will go away. Most corporate environments will resort (as we have) to aggressive third-party spam filtering with whitelists. Draconian non-solutions like Serios, private email, replacing email with IM, and the like, just aren't making any headway because the value of free and open communication is so great that it easily outweighs the cost of spam mitigation. At the same time, companies like Appriver have made the process of implementing third party spam filters so seamless that even small businesses like ours can do it painlessly.



Incompetence Abets Malice In Alaska

Napoleon famously remarked that it was best not to attribute to malice that which could be explained by stupidity. But sometimes one gets a little help from the other.

AP asked for documents using Alaska's freedom of information laws. The state informed them that the tab will be over $15 million. The State of Alaska is getting a lot of these requests and its IT staff has been "overwhelmed" by them. Superficially, the problem seems to be that they don't know what the hell they're doing:

How did the cost reach $15 million? Let's look at a typical request. When the Associated Press asked for all state e-mails sent to the governor's husband, Todd Palin, her office said it would take up to six hours of a programmer's time to assemble the e-mail of just a single state employee, then another two hours for "security" checks, and finally five hours to search the e-mail for whatever word or topic the requestor is seeking. At $73.87 an hour, that's $960.31 for a single e-mail account. And there are 16,000 full-time state employees. The cost quoted to the AP: $15,364,960.

.... And this is what they're doing every time someone makes a request. That is, they're apparently not taking any effort to save time or effort by, say, just extracting the mailbox once, or setting up a data warehouse of old emails. But hey, if they can get someone to pay for it every time, isn't it the American way to exploit the situation for gain?

I suppose I should leave it at that, but as someone with a much better than average grasp of IT operations principles, the situation irks the hell out of me. Let's just leave aside for the moment that if this description of activities required is accurate, they've got a hopelessly incompetent IT staff (both at the level of execution and architecture). Implausibly incompetent, in fact. Let's just look at the activities themselves. If we do that, we can see that somebody is scamming somebody somewhere, because there's no way it should take five hours of "programmer" [sic] time to extract a mailbox from an archive. (I can imagine that it might plausibly take an hour or two for the server to execute the extraction, but there's no way that a person should be billable for that entire time unless they've got some really gross problems with employee slacking up there.) (And by the way, "programmer" would just be a job title for that person -- at least, I bloody well hope so. If an actual programmer is required for that task, then in addition to being fired, the system architect should be stripped of any professional IT certifications he/she possesses.)

What the "security" checks are, I don't know, but I suspect that it amounts to auditing the mail spools for the presence of passwords or "secret" server names. In any case, the two hour figure is implausible in two ways. First, if they're audits for specific vulnerabilities, they should be automated, and thus would require at most 10-15 minutes of billable time, not two hours. (If they're not automated, that's a problem, because there's a probability that some steps in these "security" tests are going to be missed.) Second, the security checks would probably be more comprehensive and more difficult than the actual search operations, and take more time -- here, they're estimated at a lower cost. 

Finally, five hours to search for the offending words or phrases is quite absurd. It's true that Exchange and Outlook don't search that quickly (and I'm given to believe that AK is an Exchange shop), but they search much faster than that. And in any case, again, most of that "search" time is going to be time when the operator (who will not be a "programmer", except maybe in job title only) is just sitting at his/her console twiddling thumbs. I.e., the time bloody well should be spent doing something else.

The "five hours to search" figure, actually, though, might be the one slightly realistic figure. Redaction isn't itemized in the total above, but it is mentioned in the article, so one could imagine that the five hours includes redaction time. That said, we know that the state uses an absurdly time- and labor-intensive redaction method. Saying that they're 'not set up for' digital redation, they print hardcopies and redact those, then photocopy the redactions. Now, if you're redacting hardcopy, then yes, you absolutely photocopy, because otherwise it's possible to read the redacted text in many cases. But if you redact digitally by deleting the redacted text and replacing it with, say, the letter 'X', then there's no possibility of reading the redacted text and you haven't had to take the time and effort to to print, magic-marker and photocopy all that hardcopy.

Plus -- and here's the best part -- every PC in the AK state governmenment is "set up" to do this. After all, they all have 'Delete' and 'X' keys.

Your Mother's Name, Redux

Bruce Schneier, Ed Felten and Steve Ragan all had reactions similar to mine regarding Sarah Palin's email account. As might be expected, the folks posting in Schneier's comment thread were even more hard-core: Most suggested just using a secure password (something like "18D*F9afgsk*", maybe) in place of an answer. But Ragan had what I thought was the most interesting and useful extension to my own practice:

 

If you can pick you own question and answer, then that is the best bet. Make the question and answer something that no one knows, and that would never appear on a personal blog, Facebook or MySpace profile, or outside a close circle of family and friends.

For example, the question could be the name of your personal doctor. This will stop many of the guessing attacks on the system, and offer a stronger level of protection. Moreover, the answer needs to be a full sentence, and use all of the available space offered by the form when signing up for the account.

Q: What is the name of your doctor?

A: Her name is actually the name of the city where she was born.

What if you cannot pick a personal question and have to select one of the offered questions and answers? The fix here is also a simple one, namely you should lie. Lie through your teeth, pick a question, make the answer the same as you would if you wrote the question yourself, and stick to this lie.

The explanation is a little unclear, IMO, so I'll re-state it: You make your answer a complete sentence that you can remember and that is as long as it can be given the size of the box. That way the complexity of the "backup password" [Schneier's phrase] is increased exponentially just by virtue of its length, but the password actually becomes more memorable, because now it's mnemonic.

This is how and why WPA Passphrases work work the way they do. You can have your network authentication be something like "when i was a kid we loved to eat grasshoppers in cleveland." It's absurd and counter-factual (so hard to guess), but memorable (so you don't have to write it down).

Time is the new Bandwidth

I've been doing a lot of video blogging on BEYOND THE BEYOND lately, which must be annoying to readers who don't have broadband. But look: outside the crass duopoly of the USA's pitifully inadequate broadband, digital video is gushing right through the cracks. There's just no getting away from it. There is so much broadband, so cheap and so widespread, that the video pirates are going out of business. I used to walk around Belgrade and there wasn't a street-corner where some guy wasn't hawking pirated plastic disks. Those crooks and hucksters are going away, their customers are all on YouTube or LimeWire...

Bruce Sterling, WIRED Blogs: Beyond the Beyond

Broadband isn't the problem. Bruce makes his living being a visionary. I make my living doing work for other people. It's truly not the visionaries who actually change things -- it's the people who buy (into) their visions, and those people just don't have the time to look and listen at the same time to continuous "bites" of see-hear content.

Podcasts are bad enough -- I have to listen for as long as someone speaks in order to get their point, I can't really skim ahead or scan around with my eyes. I've got to buy into their narrative construction. And I'm paying for that purchase with my time and attention.

This also goes to Cory Doctorow's point about text bites. He's grazing around, taking in small chunks of text at a go, and the web is fine for that, that's his message. Great. Fine. But text can be time- and space-shifted far more effectively than audio, which in turn can be time-/space-shifted far more effectively than video.

What's really needful as I've noted before is a way to mode shift text into audio without human intervention. Or video, for that matter, if you want to get visionary about it. But I'm not going to worry about video right now, because audio is something that some basement hacker could actually pull off with an evening's work, and refine with the labor of just a few weeks. Or so it seems to me. On my Mac, right now, I can select text and have the machine speak it to me, complete with sentence and paragraph pauses. The Speech service is AppleScript-able, so (if I actually knew AppleScript) I could script it to pick up blog posts and pump them into audio files, that in turn could be pumped onto my audio player for listening in the gym or on the road. If I spent that much time in the gym or on the road. Which I don't.

Why is there no decent Mac word processor?

The late Isaac Asimov famously resisted computers for many years. With good reason: Until relatively late in his life, they couldn't have kept up with him. His workspace was infamous. He kept several long tables in the attic of his town house, arranged in a big "U", with an IBM Selectric (the fastest typewriter available then or since) every few feet. Each smaller workspace was set up to work on a different project, or part of a project. When he got bored working on one thing, he'd simply roll over to another project.

I got into computers to use word processors. That's not true: I got into computers to manage prose. That was really my dream: To manage prose, which meant managing ideas, managing text, searching seamlessly through stuff that I'd written, changing on the fly, getting rid of hard copy, automating tedious tasks.... I imagined a day when I'd be able to store large amounts of text and search through them easily. I imagined a day when I'd be able to effortlessly switch back and forth between projects the way that Asimov would wheel from one Selectric to the next.

That was in the mid-80s; I'm part of the way there. I use (and have used for something around ten years) a multi-tasking computer that lets me keep multiple projects in progress (dare I say "a la Asimov"?); with wireless networking, I can get connected to the Internet in a surprising and growing number of places; I have a small, light, powerful laptop that lets me do real work when away from an "office."

But I still don't have the text tools that I really want. OS X 10.4 has nice meta-data-aware indexing, implemented in a fairly efficient way; it also has good solid multitasking and power management. But it's still lacking one thing:

It doesn't have a decent word processor.

What would a word processor need to have for me to regard it as "decent"? At a high level, it needs to fulfill three basic criteria:

  1. It has to have good usability characteristics.
  2. It has to support all of the basic, required business functionality that people nowadays expect from a word processor.
  3. It has to be able to interchange files with no meaningful loss of information or formatting with the people with whom I need to work.
Those are actually pretty loaded criteria. Let's break them down a little:
  1. Usability: By this I mean that it has to stay out of my way and let me work. It has to not require that I do a lot of things with the mouse. It has to not place unusual constraints on me, like saving everything into some proprietary "project" or "drawer".
    1. Good interaction performance: Screen writes need to be fast and free of artifacts, document navigation actions like page up and page down need to be quick.
    2. It must be easy to do basic, standard things like move to different points in a document. There are conventional ways of doing this that might be CUA, but are probably just convention: Ctrl-End to move to the end of the current document, Ctrl-Home to move to the beginning, Ctrl-Up-Arrow to go back one paragraph, etc. You will find these conventions honored on the majority of Windows (and *nix) editors and word processors, with spotty acceptance on the Mac.
    3. It must at least be possible to de-clutter the visual field -- to remove extraneous noise. As an example, many find word processors have for many years offered a "full screen" mode that brings that page to your focus and blocks out all other programs. That's an extreme example; Word and OpenOffice 2.0 have a "draft mode" that's pretty good in that regard.
  2. Features: Again, pretty loaded, but at a minimum I think a useful business word processor absolutely has to support the following -- these are things that I have found myself using again and again in preparing business documents, and they save incredible amounts of time:
    1. Automatically formatted (and numbered) lists and outlines. This might seem picky, but if you don't understand the need for it, you haven't really created many complex business documents. Consider a project plan document, that might have a list of things in order. On review, the order changes. If your list has 50 items, you might need to change 50 ordinal numbers. (This has been available in MS Word, WordPerfect, StarOffice/OpenOffice, and many others for many years.)
    2. Section-sensitive headers and footers. I.e., start a new section, you can change the presentation or content of the headers and footers.
    3. Automated tables of contents.
    4. A simple way to format the first page of a simple document differently than the subsequent pages. This has been possible for many years in Word, WordPerfect.
    5. It must implement style-based formatting at at least the character and paragraph levels; more than that (such as page styles) might be overkill, since my experience so far suggests that they don't interoperate well. Furthermore, though, it must be possible to import styles from other documents or from some kind of repository. The feature is dramatically less useful without that capability.
  3. Interoperability: The software must, must, must be able to both import and export files -- files, not text, but files (this is important, guys, please listen) -- in one or more widely used formats. For practical purposes right now, that means that it must be able to interchange files with Word 2000 and later versions on the Windows platform. OASIS OpenDocument format compatibility would be nice from a future-proofing standpoint, but I'm already seeing some indications that the OpenDocument format may go places where it's not very inter-operable with Word's native RTF. So interoperability with RTF, clumsy and locked-in as it is, is what's needful.
    1. No information should be lost in an import/export. E.g., you should never ever lose footnotes/endnotes; you should not lose change tracking; you should not lose bookmarks.
    2. No formatting should be altered in an import/export. Obviously that's easier said than done -- especially with a poorly-documented format like RTF -- but OpenOffice and Word have come surprisingly close.

It's a fact -- and this is not seriously disputable by any honest and experienced user of both platforms -- that Windows (and to lesser extent Linux) beat all but one (arguably two) of the available Mac word processors hands down on all these counts.

I leapt into using a Mac with the assumption I'd be able to find what I needed when I got here, and for the most part, that's been true. Some glaring exceptions: There really aren't any good music players (iTunes is a sad and cynical joke), and -- most glaringly -- there are no (repeat, no, repeat, no capable, stable, usable, general purpose word processors.

The field of modern word processors is pretty small to begin with. On Windows you've basically got Word, OpenOffice, and WordPerfect, with a few specialist players. Down the feature ladder a bit you've got AbiWord lurking in the shadows: It's pretty stable on Windows, and does most of what you'd need to do for basic office word processing, but it has some problems translating Word docs with unusual features like change tracking.

On *nix, you've always got OpenOffice and AbiWord. In addition, you've got kWrite, which is about on feature-par with AbiWord, but tends to remain more stable version to version.

To be fair, there are a lot of word processors available for the Mac. But few of them really fill the minimal requirements for a business word processor, and those few fail in critical to borderline critical extended requirements. And what's most frustrating for me is that it's been that way for years, and the situation shows no real signs of changing.

Here are the players on the Mac:

Word (Mac)

The Good: It supports all the basic, required business features.

The Ugly: Performance sucks, and so does price. I

OpenOffice 1.1.2
The Good: Supports all the basic, required business features.
The Ugly:The two big problems are that it requires X11 and that it's not up to version with OO on the other platforms. I don't think. Truthfully, I haven't tried it yet, but my expectation is for poor performance. In any case, OpenOffice is in general clumsier than Word on a PC. That may not be true versus MacWord. Also, it does lack some Word features I've come to be very very fond of: Chapter navigation in the sidebar, and (this is a real biggie) the Outline Mode document view.
NeoOffice/J 1.1.4

The Good: Price -- it's free. Features -- it's got all the basic features, just as OpenOffice 1.2 does. By all accounts, it's more stable and performs better than OOo 1.1.2 does on a Mac. This is what I use every day, for better or worse. It's very impressive for what it is; I'd just like it to be more.

The Ugly: Rendering performance is flaky. It's hard to de-clutter the visual field -- there's nothing analogous to Word or OOo 2.x's "draft mode". NO/J is somewhat unstable from build to build, though genuine stability issues seem to get fixed pretty quickly, and the software will (theoretically) prompt you when there's a new patch or version available. Unpredictable behavior with regard to application of styles -- e.g. I apply a style, and it often doesn't fully obtain. Some of these problems get addressed on a build by build basis, but it's hard to know which are bugs and which are core defects of OOo. This is OO 1.x, after all, which was kind of flaky in the best of times.

Nisus Writer Express

The Good: Small, fast, good-looking, and the drawer-palette is less obtrusive than Word 2002's right-sidebar. RTF is its native format, which gives the (false) hope that it will have a high degree of format compatibility with Word.

The Ugly: I had high hopes for this one, but it's been disappointing to learn that it fails in some really critical areas. Format compatibility with Word is hampered by the fact that it's missing some really important basic features, like automatic bullets and outlining. I use those all the time in business and technical writing -- hell, just in writing, period. I don't have time to screw around adding bullets or automating the feature with macros, and because the implementation for bulleted or numbered lists is via a hanging indent, the lists won't map to bullet lists or numbered lists in Word or OO. Ergo, NWE is useless for group work. This is intriguing to me, since they've clearly done some substantial work to make it good for handling long documents, and yet they've neglected a very basic formatting feature that's used in the most commonly created kind of long document, business and technical reports: Automatically numbered lists and outlines.

Interestingly, it also fails to import headers and footers. I would have expected those to be pretty basic. Basically, this isn't exactly a non-starter, but it's close.

AbiWord 2.x

The Good: Free.

The Ugly: Unstable and has poor import and rendering performance in the Mac version. I know the developers are working on it, but there's only one guy working on the OS X port right now so I don't have high hopes. Also, it's not as good for long technical documents as Word or OO would be.

Mellel

The Good: Don't know; haven't tried it. People swear by it for performance, but see below.

The Ugly: File compatibility. Doesn't read OpenOffice files or OpenDocument (OASIS-standard) files, and has a native format that isn't RTF. That makes me think it's a waste of time to even bother to evaluate it. I don't need to be screwing around with something new if I'm going to run up against the same file compatibility issues I have with Nisus.
MarinerWrite

The Good: Cheap. Light. Quick.

The Ugly: Features. As in, ain't got many.

Apple Pages

The Good: Inexpensive. Conforms to the Mac UI.

The Ugly: Conforms to the Mac UI -- which means that it requires finger-contorting key combinations to do basic things without using the mouse, and makes poor use of the screen. And it's severely lacking in features: Apparently it can't export very well to RTF, which is odd, considering how deeply Apple has ingrained RTF into their system.

Why am I mincing words, here? Pages, based on what I know about it, is the same kind of sad and cynical joke as iTunes. It's a piece of brainwashing; it's eye-candy; it's got nothing very useful to anyone who does anything serious with documents.

For the time being, it looks as though I'll be sticking with NeoOffice/J, and at some point installing the OO plus X11 package to see how ugly that is.

Insidious Bot-ulism

As grim and depressing as I can find the automation of spam and the proliferation of bot networks, I like to think I have some perspective on the matter. For example, I recognize that there's a real danger of incredible, profound disruption from bot networks like the one that's driving the spread of the Sober.x worm[s].

But that disruption won't come from "hacking" -- most particularly, it won't come from using the bot networks to crack encryption. As usual, Bruce Schneier has cut through a lot of the nonsense that passes for wisdom on the subject.

The very idea that the main threat from bot networks is cracking is ridiculous -- it displays a basic misunderstanding not only of how end to end security systems are designed, but also some very peculiar and extremely fuzzy thinking about how to defeat those systems. You defeat the systems by gaming them, not by cracking encryption. Sure, you may want to crack encryption at some point to get through some particular locked door -- but the hard part is finding that door in the first place. And more often than not, if you're clever and you know how to game systems, you'll find that you don't need to crack encryption: You can get someone to just give you the key, or even (figuratively) open the door wide and usher you through.

Of course, it is possible, and even likely, that computers will be or even are as I write this being used to game security systems more effectively than humans can. Some clever bloke somewhere might even be writing bots that crack systems. But bot networks -- "herds" of dumb, zombified PCs, even if harnessed into a computational grid -- are more or less irrelevant to that.

Heuristics like that aren't helped by brute force. Anyone who calls himself a security expert ought to know that.

The greatest threat from bot-driven disruption is not hacking or cracking, but denial of service. The person or persons controlling the Sober zombie network alone could, should they so choose, have a significant impact on the operation of the open, civilian internet. It would be easy. It would be pointless, but it would be easy.

But again: it wouldn't be the end of civilization. We'd get by. That's what we do.

And that's the ultimate lesson of security: Unless the system is severely broken (as in Iraq after the fall of Saddam or in Rwanda in '96), people will generally act to preserve structures of civilization (as we see again and again after natural disasters throughout the world).

A Sobering Milestone

I can foresee a day when we're nostalgic about commercially-motivated spammers and mass-mailing-worms.

I get jaded about virus and worm stories. Each day seems to bring a new watershed in rate of infection, purpose, or technique. Sober is the worm du jour: It appeared sometime during the week of May 2, spread widely and rapidly, and this week started to download updates to itself. The latest variant, Sober.Q, is being used to spread "hate speech."

So let's count the milestones: Rapid spread; remote control; use for propaganda. None all that impressive anymore, on their own. But put together, they're like seeing someone walk down the street wearing sandals with black socks: It's just another sign of the end times. It's depressing.

But Seriously, Folks: Using mass-mailing worms to spread propaganda really is something to take notice of. It's a truism that spam is just about too cheap to meter, as exemplified by the fact that it's not cost effective for a spammer to even care whether most of his messages get through, much less whether you're trying to sell cialis to a woman; it was only a matter of time before the marketers of ideology grokked the significance of that fact and started using it to virtualize their lightpost handbills.

Self-updating zombie mass-mailing worms are the computing equivalent of a bio-weapon: (mind-bogglingly) cheap, effective, and devilishly hard to kill. Previously, they've been used for a rationally-accessible goal: Making money. Now, they're being used for goals that are comprehensible only in terms of the ideologies that drive their purveyors.

Still more proof, as though we needed it, that markets are dangerously deficient metaphors for understanding human social behavior.

The User Experience is the User Experience

Jakob Nielsen, among others, has remarked that "the network is the user experience." They're all wrong, and they're all right.

Browsing through UseIt.com yesterday left Nielsen's June 2000 predictions of sweeping change in the user experience loaded in my browser when I sat down at my desk this morning:

Since the late 1980s, hypertext theory has predicted the emergence of a navigation layer that would be the nexus of the user experience. Traditionally, we assumed that this would happen by integrating the browser with the operating system to create a unified interface for manipulating remote information and local files. It has always been silly to have some stuff treated specially because it happened to come in over a certain network. Browsers must die as independent applications.

It is counter-productive to have users suffer sub-standard user interfaces for applications that happen to run across the Internet as opposed to the local client-server environment. Application functionality requires more UI than document browsing: another reason browsers must die.

Silly, counter-productive: Sure. I've always thought so. But the tendency in the late 1990s was to assume that document browsing was exactly enough. And though the peculiar insanity of things like "Active Desktop" (which strove to make the Win95 desktop work just like the Web circa 1999) does seem to have passed, it remains true that the bias is toward the browser, not toward rich application-scope UIs.

Which is to say that Nielsen, in this old piece, is failing to heed his own advice. Users are inherently conservative: They continue to do what continues to work, which drives a feedback loop.

But more than that, he -- like almost everyone else I can think of, except myself -- is missing the single most important thing about modern computing life: People don't use the same computer all the time. Working from home, now, I frequently use two: My desktop, an OS X Mac, and my laptop, a Sony Picturebook running Windows 2000. In my most recent full time job (where I sometimes spent 12 hour days on a routine basis), I used two more systems: A desktop running Windows NT and a laptop running Windows 2000. And that's not even counting the Windows 2000 desktop I still occasionally use at home. (And would use more if I had an easy way to synchronize it with my Mac and my Picturebook.)

And so it's interesting to look at each of Nielsen's predictions as of June 2000:

  1. Amazon is healthier than ever, in no small part because "zero click payments everywhere" are no closer now than they were in 2000. (See [3].)
  2. Yahoo's network of services is healthier than ever, in no small part because people are less and less tied to specific machines. (See [3].)
  3. Websites know your preferences only insofar as you invest those with a particular services vendor/provider, like Yahoo or Google. That's actually a reflection of increasing network-centricity: These services are finally recognizing that people have lives that cross many machines.
  4. AOL is failing rapidly, but its proprietary messaging system is still going strong -- as are the proprietary messaging systems of Yahoo and Microsoft. Messaging aggregators like Trillian are still bleeding edge.

None of this is to say that I don't think the network is the user experience. He's sort of right about that -- or at least, he's right that it sort of should be, that things would work better if we made apps more network aware. After all, in the age of ubiquitous wireless, the network is spreading to places it's never been before. But what the 2005 situation reveals is that relatively low-impact solutions like using cell phone networks for instant messaging or logging-in to websites have trumped high-impact solutions like re-architecting the user experience to eliminate the web. Instead of using the increasingly ubiquitous broadband services to synch all our stuff from a networked drive, we're carrying around USB keychain drives and using webmail. Instead of doing micropayments, we're still living in a world of aggregated vendors a la Amazon and charity (Wikipedia) or ad-/sales-supported services (IMDB, GraceNote).

At a more fundamental level, we have to be mindful that we don't define "the network" too narrowly. Consider the old school term "sneakernet": Putting files on floppies to carry them from one person to another. It was an ironism -- sneaker-carried "networking" wasn't "networking", right? -- but it revealed a deeper truth: "Networking" describes more than just the stuff that travels across TCP/IP networks. At a trivial level, it also includes (mobile) phone networks and their SMS/IM/picture-sharing components. But at a deeper level, it covers the human connections as well. In fact, the network of people is really at least as important as the network of machines.

Understood that way, "the network is the user experience" takes on a whole new meaning.

Podcasting Is Dead. Long Live Podcasting.

I'm mildly surprised that in the storm of mutual annoyance over podcasting, there hasn't been a clearer statement of where, how and why podcasting can succeed and fail. I suppose I shouldn't be, since clear-headed analysis doesn't generally sell trackbacks, but I think it's a really interesting phenomenon that will teach us a lot if we baseline and understand it correctly. And that can start with etymology.

As it is:

podcasting = [i]Pod + [broad/multi]casting = "multi-casting to people's iPods"

As it may be:

1: [private] podcasting = pod + [narrow/multi]casting = "narrowcasting to my pod"
2: [public] podcasting = [i]Pod + broadcasting = "broadcasting to people's media players"

Podcasting "as it is" currently understood is a short transitional phase. As a popular blogging modality, it won't last beyond 2005. Yet by 2006, something or things called "podcasting" will be extremely popular, and might even drive some interesting and powerful changes in the distribution of information.

Podcasting will very soon split into two distinct types of output: One that's highly personal, targeted at people you know and who hence know your voice (and hence don't require high production values), who are in tune with your interests (and hence don't require extensive meta-data to get your point). Personal podcasting will serve to cement bonds among groups of people who are not immediately and intimately connected. The second form, pro-podcasting, will be the kind of stuff David Berlind is talking about: Professionally or quasi-professionally produced output, primarily from media outlets but also from people for whom it's cost-effective to produce output that's essentially promotional.

The reasons are really simple and kind of rock solid, and they are simply that it's just not cost effective for either the 'caster to produce a high-quality podcast unless you've got facilities, skill and time at your command to do so; or for the listener to spend a lot of time listening to something that s/he could apprehend a lot faster and with more flexiblity by reading it. A podcast of sufficient quality that even interested strangers would want to listen to takes time to produce; furthermore, on-air reading is not something everyone can do well enough to make for a tolerable listening experience. Podcasts are also more or less invulnerable to full-text indexing (which makes it seem ironic to me that many of its proponents are also strong proponents of letting Google traffic arbitrate on the importance of a resource). It's arguable that software solutions will be found to these problems, and I think there's merit to those arguments. But that's not to say that people will then actually use those solutiong to blog as podcasts.

Typical "pro-podcasters" will range from Bill O'Reilly to Al Franken to Dave Barry. I wouldn't expect it to include people like Glenn Reynolds and Markos Moulitsas, because too much of their value comes from nimbleness and textual integration with the blogosphere. It may include people like Wonkette or Drudge, who could use their pro-podcasts to drive spiral traffic to their website, and vice-versa. Pro-podcasting will have a market-mover effect in terms of driving progress toward "radio TiVo" and pushing media players (and media players of all kinds, since it will rapidly start to include offline video content).

But it's the personal-podcasters who will have the most interesting effects. The obvious market is distributed groups of friends and families -- people will be able to send narrowly-targeted multicasts to groups of people with whom they share an emotional connection. But there are also tremendous potential business applications for personal-podcasting. Personal podcasting could be used to facilitate workgroup solidarity, send what amount to persistent offline voicemails, even facilitate something like non-real-time audio chat.

And I find it interesting that I haven't heard about these uses, yet. Perhaps it's that for the first-movers and strong evalngelists like Curry, Searls and Winer, there really isn't a separation between the business and personal pplications. Which would also be interesting, if true. But more on that another time.

Podcasting By Any Other Name

People like to find arguments. It gives them a place to plant their intellectual flags and say "I was here first!" For example, there's apparently an argument over whether "podcasting" is "significant" from an investment perspective. David Berlind weighs in on his ZDNet blog. Berlind's answer is quite oblique, and while making some very important points implicitly, I think it will be accused by the podcasting faithful of 'not getting' podcasting; I'll accuse him of the same thing, for different reasons.

Basically, as far as I'm concerned, "podcasting" borders on being a hoax, of sorts: It's a name concocted more or less with the sole purpose of counting coup in the blogosphere, that's been blown up as something important and significant, and in blogospheric terms, it is both, but not on the scale that's presumed on its behalf. Podcasting as practiced in blogland will have very little impact on what the thing that will be called "podcasting" will look like in the future. It's one of those things that's important for the impact it's said to have, and not for the impact it actually has. It's important, in short, for the same reason that Jessica Simpson is [sic] important: Because people say so. It's got nothing to do with her singing.

The spur to Berlind's meditation was a question from a fellow reporter, working on a story (and hence, kept anonymous -- and no, I do not find anything sinister in that). "Old media" blokes, it seems, are still wondering whether blogs are "significant", and -- here's the curious part -- what that means for "podcasting". "His perception is that the blogger phenomenon is insignificant," Berlind's colleague supplies, "making podcasting negligible." From an investment perspective, of course.

Well, it's a terrible analysis, of course, as far as it goes: Major acquisitions and strategic investments are being made that are directly motivated by the idea of blogging, and so blogging is by definition "siginificant", and so we have to wonder what the heck this expert really means. Even if the raw numbers of new bloggers (tens of millions in the last year alone, similar to the boom-period growth figures for internet use) don't impress him, he's myopic if he doesn't understand that blogging per se isn't the issue; it's just the nascent stage of new modes of mass-personal communication. My own nutshell evaluation of this particular analyst is that I suspect he doesn't actually know what he's talking about.

Nevertheless, there is a grain of truth in the analysis. Personalistic "morning coffee notes", produced on an ad hoc basis by random bloggers, will never be significant in this "investment" sense. (Though I can see some interesting possibilities, there, for things that will be significant.) Why? Because the medium sucks; podcasting will never, ever become popular in the way that blogging is popular. On the other hand, as Berlind rightly points out, the rather old idea of media-shifting print content to voice (which used to go by the name "radio") and then mode-shifting that from a stream to an offline file, not only will be big, but has been going on for a while. In fact, it's older than the web, even on the Internet. The only things that're new about it are, first, doing the notification and distribution through RSS, and second, automating the media load onto portable devices.

Those are important things, sure; but the podcasters didn't think of them. They just took their particular process public. And the particular "open" modality that they specified will be important during a transitional period -- but it's not where the money will be made or most of the traffic will happen. That will be on satellite. Podcasting in its current form is merely an interim step to the full realization of potential of satellite radio. "[U]sing the technology to audio-tivo satellite" would be just a start; wait until Apple or XM really get going on these ideas.

The Not-So-Hidden Truth About The iPod Shuffle

I'm not sure what's radical about the iPod Shuffle. OK, I'm lying, I know what's "radical" about it, and that's nothing: It has exactly two things that haven't appeared in previous flash-based players, and lacks a lot of things that have. Even in those two things, it breaks no new ground, since they're both attributes of the leading high-capcity product: It comes from Apple, and it integrates with iTunes. ("The Future Is Random"?!)

Those two little non-revolutionary things (Being Apple and Being [Of] iTunes) are pretty important. And the impact of the Shuffle doesn't lie within whether it's actually new or not, or even whether it's actually any good. The impact lies in how it serves to expand the iPod halo.

The random shuffle feature is nothing new -- I can do that on my iRiver. Neither is the integrated USB A-plug (I own a Virgin player, currently on permanent loan to a friend, that has a better-designed implementation of that). Recharging off the USB bus? It's been done. And though I don't troll the flash-player market, I'd be surprised to find it hadn't already all been done in the same player.

Even the "radical" step of "eliminat[ing] the user interface altogether" [sic] has been done before: There have been plenty of flash-based players that eschewed a song title display. Though usually, players that do that are actually cheaper than their competitors, instead of more expensive. But I digress.

As for what it lacks: An FM tuner, and a display. FM tuners have become big differentiators in the flash-player market in recent years; it happened because the circuitry to make them suddenly became really cheap, and not as such because of demand -- a matter of capacity converging with sub-rasa desire, as it were. But I digress, again: Apple apparently doesn't think that matters, and I think I know why. They're planning to horn in on the ground floor with Satellite Radio integration into the Digital Media Center. (Mark this, that's their real next target. The micro-workstation market will expand under its own steam for a while; the next strategic play is getting XM Radio into the iPod Halo.) How they accomplish this is yet to be determined; as iTunes grows, they're increasingly integrated into the DRM fold, and it's a mistake to think that "Rip, Mix, Burn" was any more than a marketing strategy.

I can virtually guarantee that I will never own an iPod Shuffle. But it's important. And by all the accouts I've read so far, it was done contrary to Jobs's better judgement. But again, I digress....

[sic]: Memo to David Pogue at the NYT: Buttons are a user interface.

Apple Proves Me Wrong (about a few things, at least)

The "headless iMac" is the "Mac Mini." (Close-follower branding from Apple? Or synergy from their cooperative projects with BMW -- er, I mean, Cooper? But I digress, as usual...) Of course, they'll sell millions of the buggers. That's what they do: Create cute things that people want to buy, regardless of what it is or really does. But I swear, I'm different: I swear, I actually care what it does.

But is it an earth shattering device? Even without wireless, as it is, it could be, but in and of itself -- no. Everyone I know who's ever thought of getting a Mac wants one -- hell, I want one -- and yet, I don't think it will take over the low-end market the way it could if the price point were, say, $100 lower, or the base RAM were 256KB bigger.

But in another way, it will be revolutionary. Consider the size of the thing: It will now no longer be acceptable for PCs to be as big as they have traditionally been. Ultra-small variations on the ATX form factor, which are common now only among hobbyists and "gear fetishists", will become standard PC form factors, and will at the same time cease to command a premium price. They will drive devices the same size as (or smaller than) a Mac Mini, and aren't inherently much more expensive to manufacture than the larger boards; since Intel and AMD chips clock higher, they'll be faster; and they'll become radically cheaper as demand soars from people who've seen the Mac Mini, but still can't afford the extrapolated $800-$1000 price tag for a really capable, obsolescence-resistent MiniMac.

It's interesting to see where the rumors went wrong. The "iHome" branding turned out to be a red herring; it would be interesting to find out where it came from, because it so effectively skewed the speculative field in the days just before the presentation that it seems as though no one even tried to get spy shots of a Mac Mini. It's a lot smaller than the hoaxed pictures. The hoaxter dubbed it 'iHome', and various rumour millers reported with confidence that it would be "branded" as an iMac; neither turned out to be true. It was said to include WiFi in its base configuration; WiFi ("Airport Extreme") is an add-on, as is Bluetooth. Performance numbers were more or less right, though the rumors missed the fact that there'd be two base processor speeds. And to illustrate just how far off the original rumor was, the "headless iMac" was said to "share the 1.5" [1U, or "one rack unit"] height of the latest Apple G5 server; it's actually 2" tall. A picky detail, but it demonstrates how completely off-mark we all were.

It's tempting to speculate (as I'm sure someone has) that Apple planted rumors to throw people off the scent. But I don't think they need to. For what other PC brand would people bother to create physical hoax models? Whatever the explanation, the community of Mac users has a hardened core of Macintosh and Apple fetishists. In fact, I think they don't really try, for the most part, to get real rumors; they just make stuff up, because it's more satisfying than the truth. Anyway, true wisdom, to the Mac zealot, is received wisdom: It issues forth every January from the Dark Steve, from a well-lit stage at the MacWorld keynote address...

Convergence through Desire, Redux

How can I remark on digital convergence without remarking on the forthcoming "headless iMac"?

More to the point, what the hell does a "headless Mac" have to do with digital convergence?

I'll explain. Gizmodo facilitated leaking a bunch of really convincing (to me) product unpacking shots of a device called "iHome", which has a buttload of ports on the back and a CD-ROM slot on the front. Alas, there's lots of smoke and steam on the Apple rumor forums to the effect that these must be fake, because the box is just so ugly. Apple's legendary industrial design staff surely couldn't have produced something so "fugly". (Um...right. Something about this presentation really offends Mac-heads, as is clear from the Engadget comments, but I'm not sure what.) But consider that any box unveiled now is most likely not a production version, and might well be camoflaged the way Detroit camoflages their long-range test models.

Be that as it may, and leaving aside the authenticity of the photos, the name would tell us volumes about how Apple sees the market-positioning of this device, and I belive they do not see it the way that 'Bob Cringely' sees it:

.... The price for that box is supposed to be $499, which would give customers a box with processor, disk, memory, and OS into which you plug your current display, keyboard, and mouse. Given that this sounds a lot like AMD's new Personal Internet Communicator, which will sell for $185, there is probably plenty of profit left for Apple in a $499 price. But what if they priced it at $399 or even $349? Now make it $249, where I calculate they'd be losing $100 per unit. At $100 per unit, how many little Macs could they sell if Jobs is willing to spend $1 billion? TEN MILLION and Apple suddenly becomes the world's number one PC company. Think of it as a non-mobile iPod with computing capability. Think of the music sales it could spawn. Think of the iPod sales it would hurt (zero, because of the lack of mobility). Think of the more expensive Mac sales it would hurt (zero, because a Mac loyalist would only be interested in using this box as an EXTRA computer they would otherwise not have bought). Think of the extra application sales it would generate and especially the OS upgrade sales, which alone could pay back that $100. Think of the impact it would have on Windows sales (minus 10 million units). And if it doesn't work, Steve will still have $5 billion in cash with no measurable negative impact on the company. I think he'll do it.

I see it different[ly].

Nobody's talking yet about what the iHome actually does have. Rumors abound, and they mostly assume it's basically an iBook without a display. I don't buy it.

The very name of the device indicates to me that iHome is not intended to be used as a general purpose computer in any really sophisticated way. It's intended as a media hub, and any other functions it fulfills are incidental, and what's more, Apple won't be enthusiastic about helping it fulfill those other uses: It will most likely be a mediocre platform for applications work. It will be somewhat more than a set-top box, only because it would cost more to dumb it down. (If I'm proven wrong, I'll certainly be taking a look at iHomes for my own use, but I don't think I'm wrong here. We'll see in a few days.)

I think it will be somehow substantially crippled, and I think I know how: It will have limited display capability, ouputting by S-Video and composite only (and the latter through an extra-cost converter from S-Video); and it will not have expandable RAM. Both decisions will be defended on the basis of price, but they'll really have been taken to prevent cannibalizing iBook, iMac and eMac sales. By the way, I essentially agree with Cringely's analysis of the market impact of a fully-capable and cheap iHome, but I think he's applying a much too rational (and charitable) thought process to Apple's senior management.

I think Jobs doesn't know what to do with iTunes. It's a juggernaut he doesn't know how to stop; it's prompting people at his company to actually think about ideas that could shake up the personal computing marketplace, like, say, a genuinely cheap computer with a powerful OS and operating environment. Baseline Macs are built with remarkably inexpensive electronic components: Many still use relatively slow and old versions of the PowerPC chip (the "G4" generation), which by virtue of their vintage are dirt cheap; the "G5" models mostly use relatively slow versions of that chip, and below the most expensive levels, they all use graphics subsystems that are last year's news on PCs. Macs are cheap, cheap, cheap to build. And yet, they're hideously expensive on a bang:buck basis.

If Jobs wanted to really go big, he could have done it years ago. Opportunities like the one that Cringely describes are always there for Apple, all the time. And they never take them. Why? The only answer that's compelling to me is that Steve Jobs does not want Apple to be successful, because that would mean that Apple was no longer about him. Sure, the cult of personality would flourish for a while, but I think he understands that part of his bizarre public loveability is the fact that his exposure is limited. He'll never be as much of a self-charicature as Steve Ballmer or Larry Ellison, but the tarnish would settle pretty quickly and Apple would quickly become beset by the woes of any company that moves beyond a customer base comprised primarily of true believers.

So Cringely's right, I think, about the opportunity, and he's right about what iHome is, but I think he's wrong about what Apple will do with it. And though I predict that Jobs will be accused of not taking these steps out of greed, I think his motivation will be darker: Ego. Though I suppose the Dark Steve's flavor of ego could be cast as a kind of greed....

Convergence Through Desire

I'm sitting here in Spot Coffee looking out over the scene. I'm blogging from a coffee shop: I'm officially ... something. Not a geek, anymore, because convergence activities like logging on to the net though a coffee shop's hotspot are now officially mainstream and mundane, at least if you believe that TV reflects reality.

Which is my point, as I remind myself not to bury my lede: Convergences that actually lead somewhere tend to come not from planning toward goals, but from the accidental confluence of opportunity with desire. As a case in point, consider the Archos PMA 400.

This whole coffee-house laptop thing... how did I miss out? It was a matter of not having converged the right equipment. I've puttered at doing this for a long time -- my friend Pete's laundromat even has a hotspot -- but have tended to feel a little sour-grapish over the whole deal, since my equipment has made it a challenge: My laptop has a tiny keyboard (I've gotten used to that) and a small, dim screen; if I brightened the screen to make it readable, the battery life was relatively poor. Battery life already suffered because with only 128MB of memory, the laptop was constantly thrashing the hard drive to swap in and out of RAM. And I always seemed to have problems connecting to the WiFi hotspots.

Well, thanks to eBay I now have a bigger, stronger battery and another 128MB of RAM (both a third of less of last years's price), with updated software for my WiFi card, and I'm blogging from a coffee shop. I've leapt squarely and soundly into 1999. Or something like that.

Which brings me, roundabout, to my point. This was really a convergence issue. It was always an high-status behavior, hooking up to the net from open hot spots, but like most high-status activities, not many people really did it. Which is, of course, what's made it a high-status behavior, at a certain level.

Well, now the barriers to entry are much lower: Most open networks don't charge for connections (at least not at the moment), which we can chalk up to the proliferation of cheap bandwidth. (That will change, but we've got tons of dark fiber out there still going unused.) Good quality portable computing hardware is cheaper and lighter, and the social acceptability of hauling out a laptop has increased; now it seems relatively benign next to loud mobile-phone conversations. Networked communication from a hotspot has become easy and cheap enough that lots of people can do it, bit it still hasn't outgrown its chic-factor. (And it will be slow to do so, by the way, due to latent education factors -- but that's another story for another time.)

This is the crux of it, I think: Convergence will only happen below a certain fairly low price-point, and will be driven by desire, not by need. Blogging got big when it broke $10/month (or thereabouts), and nobody really needs to blog; WiFi got big when it got free and you didn't have to buy a card for your laptop. And of course, nobody really needs to network from a coffee shop.

Convergence devices like wireless handhelds will break through, too, and soon. It will happen when you can buy the device at little (or no) additional cost over what you would have spent anyway: It happen when you can get a thing that you wanted for some completely other purpose, and have it bring along wireless connectivity or email or word-processing as a bonus.

My thoughts turned to this train a few days ago when Gizmodo posted a note from CES about the new "convergence device" from Archos, their "Personal Media Assistant [PMA] 400" -- a Linux-based variation on their AV400 "pocket video recorder". It's a toy calculated to make geeks salivate, hitting almost all of the key requirements for a high-end PDA (color screen, built-in 802.11g wireless, color, browsing and email capability) along with one thing that no conventional PDA has, yet: a 30GB hard drive.

And the best part, from Archos's perspective, is that most of this capability would be there whether they wanted to make this thing into a PDA or not. Because it's not primarily a PDA. It's primarily a multimedia time-shifting device, a la TiVO, but without many of TiVO's restrictions. It includes WiFi because WiFi would make it easier to integrate into 802.11g-based home multimedia networks, not because it would make it a killer toy for the coffee shops set. And yet, that's what it will be.

There have been lots of chances for convergence, and they've mostly appeared to founder on the cost of mass storage or on battery life. Well, mass storage is now absurdly cheap; and low-power components have met improved batteries halfway to more or less solve the power problem. And battery life shouldn't have been an inhibitor to convergence for the most likely candidates, the game platforms. Any NES or PS2 or GameCube has much more computing power than most PCs, at a much lower cost. Why not hook them up to hard drives and keyboards and have a computer? Why, indeed; it's a mystery. So, here we have a device (a multimedia time-shifter) that's basically a general purpose computer; and contrary to the usual trend, its makers decide to go the distance and make it, of all things, a general purpose computer. Why should this be different from the miss-starts from Sony or Nintendo?

Perhaps because this one is personal; perhaps because this one is "adult." Games are still socially marked as "juvenile", even though the majority of players are adult. But music, TV, movies: Those are adult past-times.

There have been lots of attempts to make a "computer for the masses." They've ranged from the geeks-only Sinclair 100, back in the dawn of the personal computing era, to more recent efforts driven by Microsoft and others. Perhaps the most radical attempt was the Simputer, which re-thought not only the user interfaces but the form factors and the assumptions about use.

The first commercial Simputers are nice, elegant device; but they're still too expensive, and don't come near addressing their designed audience. They're toys for well-off Indian technophiles, not the village computer they were designed to be. The PMA400 is in many ways much like a Simputer with a hard-drive and with much less noble goals. This device isn't intended to bring computing to the masses; it's intended to bring this week's "Survivor" or "ER" or "Six Feet Under" to the departure lounge. It didn't come from any high and noble goals. Instead, it came from a desire to be entertained.

And yet, the PMA400 has everything, literally everything, that's needed in a basic -- and even a bit more than basic -- personal computer. It's networked; it's based on an open platform with standard and open APIs, so there's already a lot of software that will run on it; it's got (LOTS of) mass storage; it can take keyboard (and presumably mouse) input; it can accept removable mass storage. It can probably even be hooked up to a printer via USB.

I don't have any illusion that Archos will make a huge success out of this; that's just not in their corporate DNA. But this device can be the model for the true "people's PC" that IBM, Microsoft, and others have been jousting at for years. The question is whether a company as clever as, say, HP or Creative Labs or Nintendo can be clever enough to see the opportunity and seize it. Don't look to Apple or Sony or Microsoft for this device by the way: They have a vested interest in keeping the personal computing devices big, relatively costly, and relatively non-convergent.

SixApart Plus LiveJournal: The New Elephant In The Room

SixApart have announced they're acquiring LiveJournal in a friendly takeover. This is actually bigger news at a cultural level than Microsoft breaking in with "MSN Spaces" or even than Google acquiring Pyra.

Whether the merger can be successful at all will hinge largely on how seriously the "bloggers" at SixApart take the "LiveJournalers", but there are powerful synergies to be achieved here that I'm not sure either SixApart or LiveJournal really understand. There are significant cultural differences between the two "communities" that are commonly parsed as socioeconomic (by the LiveJournalers) and generational (by the MoveableType-focused bloggers). There are lots of dimensions to the cultural split, and of course it's often an error to speak of statistical humans, but the more salient long-range divide is really hands-dirty versus hands-clean: Do you open the hood, or do you rely on your mechanic? Do you mod your vehicle (or PC case or backpack), or do you leave it as-is? And when you mod, are you picking from a menu, or thinking up ideas on your own?

And that's the dimension on which the new, merged SixApart-LiveJournal entity will attain success or not: The continuum from commodity to customization -- from people who are content to buy and use off-the-shelf to the country of the hard-core modifiers. LiveJournal is off the shelf, with essentially menu-driven site customizations that are still very branded as "LiveJournal" sites. MoveableType, and TypePad to a lesser extent, are under-the-hood affairs, which are capable of driving rich visual and functional customization. They're right that they don't need to merge the products or the codebases -- the merger of the two organizations will succeed at a basic level if they can overcome cultural biases. But if they can learn to move fluidly (and cost-effectively) along that continuum from commodity to customization, they will morph into a truly powerful challenger to established players, and maybe even a cultural force to be reckoned with.

This is more than mass-customization redux; it's really the first true-coming of a model that was heralded by Saturn in the '90s, but it goes beyond the product delivery to the customers desire to make the "product" their own. Penn Jillette sang an early paean to this desire back in 1990, and Toyota recently started a whole division based on the idea that what you might really want to do is plug stuff in after the fact. But hey, they'll be happy to let the dealer do some value-adds for you, too...

But back to the merger. Technical issues are certainly important. Mena Trott plays up LiveJournal's experience with scalability, and that's important for SixApart: TypePad is probably as scalable as MoveableType could be made in the relevant timeframe, but my sense is that it doesn't achieve the economies of scale they'll need to accommodate 30 million new bloggers a year, and I'm sure this will have occurred to Ben Trott. They'll need to be cautious, though, about taking an overly-architectural tack; considering recent advances in automation and system virtualization, it's probably more cost effective (and almost certainly quicker-to-market) to build a big, comprehensive automation and virtualization infrastructure than it is to re-architect MoveableType for scalability. (Incidentally, that approach would also give them better traction while moving back and forth on that critical commodity-customization continuum.)

All this having been said, I think it's an even bet whether or not SixApart will "get it" enough to really synergize their merger. They're really good with feedback, as their quick response to last May's license fiasco demonstrates. But they also have a history of making exactly the mistake that precipitated that problem: They try to retain too much control over their user base. I would have been a big fan of MoveableType in its early days, except for one little detail: Their license forbade any licensee from charging for customization services. "That's our business," they explained. "We make money doing that." I saw that as short-sighted, and time proved me right: There are now no such restrictions, and part of the reason is that people went out and went nuts modifying MoveableType, and probably in many cases in violation of those license terms.

My point is that even though they corrected, they did made that same mistake twice, and now they're saying things that lead me to believe they're missing some crucial points. So the real bottom line on the success of this merger might be whether people of more expansive vision will be guiding the course of the company, or whether they'll still be taking protectionist gut-checks at every step.

Old School Exploits

The 'Net is quietly abuzz with chatter about Santy. It's a worm -- an old-school worm, that travels server-to-server, running a single exploit against one of the server's exposed services. But there's something "new" that scares people about Santy, Santy.B, and all its forthcoming incarnations: It can "discover" likely targets using internet search engines. Santy's success [re-]proves two points that have been made over and over again over the years: The more services you expose, the more vulnerable you are; and as you make it easier to code software, you also make it easier to code malware.

Security in computing, as in the non-virtual world, ends up being largely a matter of how many ways there are to get in and out: If you've got lots of doors and windows, you have poorer security. You can qualify the analogy somewhat, but that's pretty much how it works. Santy works because there are not only lots of doors and windows, but also because some of them aren't as well secured as they should be -- and because some of them advertise too much about themselves. (But that's another topic for another time. And none of this should be taken as excuses for following the "one strong door" model.)

Santy's first manifestation used Google to locate likely targets; now it also uses AOL and Yahoo search interfaces. It finds its way onto a system, patches itself into vulnerable code in certain versions of phpBB, and then proceeded to run Google searches for certain strings that would appear in those vulnerable versions of phpBB. This worked because Google has a stable and easy to use API -- really, from the perspective of Santy's author, just a standard format for input and output -- and it doesn't make any distinction between clients that have a person looking at the results and clients that have a machine looking at the results. As much as I'm wary of Google in general, this is a perfectly right and proper way for their software interfaces to operate, and they're not any different in this regard from any other search engine. Or any open bookmark repository, for that matter.

I repeat, this is old-school, albeit updated for new times: In 1994, someone might have written something analogous to this to use Gopher or WAIS or Archie. They'd have had to be smarter coders, and because some techniques hadn't been pioneered yet, the codebase would be larger and clumsier. But it could have been done, and probably was. I knew guys who thought that way, back in The Day. But the community of people who'd have been impressed was smaller, and frankly, servers were inherently more secure by virtue of simply having fewer exposed services.

The main thing that's actually changed is that it's now a lot easier to code this stuff. A mediocre scripter could hack together Santy in a week or less of spare time; a good coder in just a few hours. Back in the day it would have taken much more skill and time. The target would have been more esoteric, and the task would have taken much more technical expertise. There was no PHP or phpBB; there were no bulletin board systems to exploit. But it would have been possible to do something analogous to Santy in 1994. Not to slight the effort: What's needed in both cases is to have had the ideas about how to design it, which is a non-trivial point.

There has been a call for Google to "shut down" this vulnerability -- for it to block Santy's searches. That call was, frankly, ignorant: Yes, it would be possible for Google to play a game of cat and mouse with Santy coders (and it appears as though they may have been bullied into doing just that), but that would be bad for two reasons: First, because it would create an "artificial selection" environment in which Santy-coders would be forced to evolve things like source IP masking, metamorphic user-agent strings, and Bayesian pattern-matching for target identification thus indirectly causing the other side to get better; and second, because it's an unwinnable battle, and anyone smart enough to get hired at Google already knows that.

To be sure, poor security practices are also partly to blame. If I read the advisories correctly (and I may not), you have to configure phpBB in a fairly non-secure way to be vulnerable: Namely, you have to make your comment-posting forms world-accessible. That's common practice on boards that allow anonymous posting; there are obvious downsides to dis-allowing anonymous posting. It's a baby:bathwater conundrum, in that you throw away some of the liveliness and spontaneity of your board if you don't allow anonymous posting. (BTW, to forestall concern, Antikoan.net is not vulnerable to the exploit used by Santy, since our hosting provider has already installed the relevant PHP security patches.)

And the commodification of hosting plays a huge role in exposing vulnerabilities, though it can also help alleviate them. From the mid-'90s through the present, hardware and software commodification have conspired with consumer expectation and economies of scale to thin the margins on hosting to a hair's breadth; if a hosting provider spends two minutes a day of his own energy on a single customer, that's two minutes too long for him to be making a profit. So corners get cut, processes get automated using home-brewed scripts that aren't debugged. Best-practices for security get ignored because they make things harder to manage. The same commodification, though, has driven the development of highly automated hosting maintenance systems that bake-in things like security best-practices and software version control. There's distance to go, but it's getting better very rapidly.

Final aside: Santy is, in a way, the very worst kind of robotic exploit tool, because it's scripted, not compiled. Its instructions are available to any reasonably competent sysadmin who happens to get infected. And since it's PERL exploiting PHP, the development and execution environments are nearly ubiquitous, and the exploitable platforms still richly plentiful. phpBB may be the first exploit target, but it won't be the last; my quick research indicates that the most common family of OSS CMS systems may be vulnerable at its core, and almost certainly is in some of its more popular third-party modules. (The more frightening prospect is that one of those modules is now without a maintainer, and so if exploited, would remain exploitable. I'll leave my conjectures vague for the time being because they're still just conjectures, and anyway I wouldn't want to give anyone ideas.)

What Will $50 Get You?

Sometimes, the primary driver in social change is the spread of technology. And the primary driver in the spread of technology is usually falling prices (not falling costs, that's another matter altogether).

And the primary driver in falling prices is usually theft.

If you're a business user, US$50 might get you just what you need in a desktop computing environment: A strong productivity suite, with Outlook/Exchange-like email and scheduling and centralized administration (very important in controlling the IT costs). Sun and Novell currently have enterprise-level offerings in that price range, and that's generally being taken as a sign that Microsoft is in trouble. But Dave Berlind at ZDNet argues that at that level of commodification, the underlying infrastructure doesn't really matter:

When software delivers a specific utility, that utility or "layer of value" is often referred to as "the contract." Like a real contract, a software contract sets the expectations of the external entities that will interface with the software. Those entities can be other systems or software, or they can be humans. ... If software interacts with users, then the rubber meets the road at the user-interface level where users feed something in and get something out in whatever format they want it (think documents and communications like instant messaging).

In the case of desktop Linux, the contract is in the user interface (which includes the applications). After all, a lot of the attraction to desktop Linux is due to the fact that it does things out of the box that Windows does not. For example, there's no need to run out and buy a productivity suite or install an instant messaging client. Most distributions of desktop Linux include fairly robust software for each. This model is remarkably similar to that of PDAs. As with PocketPC or PalmOS-based devices, the targeted users of JDS, NLD, and whatever Red Hat comes up with next will mostly interact with the applications and not with the operating system, which in turn reduces the OS to a mostly embedded and, not coincidentally, rather trivial commodity status. [emph added]

A minor point that Berlind misses: Commodity productivity only works as long as interoperability is rock solid. Ten years of domianance by MS Office have gotten us hooked on being able to freely trade editable documents with anybody, anywhere, anytime, with no format translation necessary. Not that I think Berlind misses this point; it would probably just confuse this issue, but it does end up being important, nonetheless.

But Berlind's main point is that this doesn't look as bad for Microsoft as we might think:

Anybody who thinks that Microsoft is just going to lie down and die as a result of this revolution in what $50 gets you is dreaming.  If Novell, Sun, or any other company can turn a profit off of a $50 soup-to-nuts desktop offering, there's no reason Microsoft can't do it, too.  It's just that the result may not be Windows and Office as we know them in their entirety.  For example, Microsoft already has plans to offer a $36 Windows XP Starter Kit in India and will be offering copies of Office to certain schools at $2.50 per copy.

Berlind's right to say that MS wasn't driven to these tactics by Linux, per se. Linux has played a big role, particularly in the emerging nations and in India, and to a lesser extent in the EU. Especially in South America, new offerings to Governments often have to be Linux or nothing, more or less. And Berlind is right that hardware commodification and per seat pricing pressures in the corporate IT realms are prime proximate drivers for this kind of offering.

The real key driver, though, as I see it, is piracy.

These cheap Windows packagings that Microsoft is hawking aren't really intended to compete with Linux distros. Linux really isn't competition for this market. Much as places like Russia, the Ukraine, India, China, and Brazil are hotbeds of software innovation (and they are), the bulk of users in those places are still "home" and "office" users: They're even more unsophisticated, in other words, than the home users in the US market. It pains me to say this, but Linux is simply not a viable option for them. (Through no fault of the OS, let's be clear. It's a packaging and UI issue. Period.)

The alternatives aren't "buy MS" vs "install Linux for free"; they're "buy MS" or "steal MS". MS has understood this for years, and have taken localized stabs of this sort at it for a long time. What they seem to be realizing now is that their strategy needs to be global. After all, government purchasers in Brazil and Hyderabad can now easily communicate and compare notes on what they're hearing from their MS reps. Again, to be fair, this is probably not something that Berlind missed, so much as something that didn't fit in.

But in de-emphasizing the primary root cause (piracy) and over-focusing on the proximate cause (price wars in conjunction with hardware commodification), Berlind misses a very interesting point about information flow in the new digital world order.

All quiet on the virtual front

The onslaught of comment-spam has stopped, for now; it could pick up again tomorrow. It continued without abating for about five and a half hours, with what looks to be an average of about one message every three and a half minutes. And after the first fifteen or so -- which is to say, after I'd had a chance to mark a single one as spam -- not one single message made it in front of a site user other than myself, and all without any further intervention on my part.

One more ironic fact about this: The site being advertised by all of this comment spam is actually not accessible at the URL given in the comments. The server doesn't respond. That means either: their spamming was so effective that the servers have failed under the load; or they comment spammed the wrong folks, and earned a Denial of Service attack in return. One can only hope...

Firefox is now at 1.0. Hoo-ray.

For all my protestations, I did switch. There were plugins I wanted to use, that just weren't available for Mozilla. I'm still not quite used to looking under "Tools" for my program preferences (a very Microsoftian shift, I must say), and I can't get the tabbed browsing behavior to match the cleaner and more intuitive tabbed browsing experience in Mozilla. And I miss the memory-resident feature that lets Mozilla pop up near-instantaneously whenever I click the dinosaur-head.

But I did switch, and I have been using Firefox consistently throughout the last few weeks. I only switched to Mozilla when I needed to troubleshoot problems for others, or set it up on other people's PCs. At the end of this time, I still advocate Mozilla over Firefox for casual users: It remains more solid, more bug-free, and more polished at the presentation and installer level. It's what I set Mom & Dad up with on their PCs.

My opinion of Firefox hasn't really changed. I still think it's the kewl kidz browser, and it looks and feels like it -- which is to say, stuff often doesn't work right, or plain doesn't work -- it crashes frequently and churns at unexpected times -- and many things still show distinct signs of the developer's ego-centric contempt for objective evaluations of usability. But it's still the train that's going forward; if that's where I want to go, that's the train I get on.

LATER: Most Firefox plugins are now invalid! Mirabile dictu! And more remarkably yet, there's no way to tell (at release plus several days as I write this) which extensions are compatible with 1.0. So the process of trying to produce a browser for mass-public-consumption has taught the Firefox team exactly nothing! Why am I not surprised....

On the plus side, some things (like the Extensions dialog) are no longer cruelly slow. And the installer seems to work rather well under Linux.

The Next Cluetrain Test

Microsoft Windows XP SP2 will be an important, but probably un-noticed, watershed in the progress of the "cluetrain".

I've yet to see a major case where "cluetrain" customer/user-emplowerment juju actually had an impact on any company's actions. There are lots of cases on the books of products doing poorly, but the vast majority are the same traditional feedback mechanism: The product sucks, people don't use it, the product fails; or, the product is poorly marketed (Coke C2? New Coke?), people don't buy it, the product fails.

A cluetrain feedback loop would be different. It would mean that net-empowered buyers (which doesn't necessarily mean internet-empowered buyers) had acted consciously -- as opposed by passibly, by simply not buying -- to make the product fail. That action could be in the form of spreading word of the product's suckfulness via some network; in the pure Cluetrain vision, that network would be a human network, enabled by technology. (Side snark: Which network will shortly be owned and controlled by Google...)

Cluetrain thinking is quite a bit like Marxism or "singularity" theory, in that it presumes the inevitability of something which a little basic observation and some applied knowledge of human nature would tell you is highly unlikely. "But it's emergent," is one common (if foggy) response. "You won't be able to predict the shape of the future from the present." "But from what will it emerge?" would be my response. I've yet to see an "emergent phenomenon" that couldn't be traced back to properties of its culture medium.

So, what the hell does all this have to do with Microsoft? Well, they've decided not to bother doing security patches on IE for anything but XP, once they release SP2. (At least, that's what I think they mean; they might mean they're stopping now.) Many see this as a calculated move to incentivize paid upgrades (XP SP2 won't be free -- it will cost $99 for most XP users). If so, it's a very calculated move, based on the idea that they don't need to care anymore how people feel about Microsoft. It's Rock v. Hard Place. It means they think they're winning the anti-Linux fight (which may well be true).

If the Cluetrain is what it's boosters have always said it is, it will stop this, and what's more, it will stop this in a particular way: It will wound Windows XP via Market Forces. Microsoft's sales of SP2 will be poor, Linux and Mac adoption will rise sharply, and Market Forces will drive radical improvements in the usability of Linux desktops. Or MS will "get on the cluetrain" and cancel plans to charge for SP2, at least -- and ideally, continue to distribute updates for Win2K. (Which is a better OS, anyway -- though it doesn't have all those wonderful hooks for MS lock-in...)

Now, I actually think it's pretty likely that XP users will be getting SP2 for free. Whether MS continues to update Win2K is another matter. This will happen because their corporate customers will communicate their profound disappointment, and telegraph a willingness to migrate to Firefox or Googlezilla. Is this a manifestation of the Cluetrain in action? I don't know; it would tend to support the view that the "cluetrain" is nothing new or emergent, if it were, because changing plans based on Big Customer feedback is as old as the PC industry, and is mediated not by networking but by traditional sales channels. And the reaction would be just exactly as little as Microsoft has to give up to get what they want.

Now, all that snark having been levelled, I would love to see MS take a hard line on this. Because it would force the watershed, and make it that much more visible. Such a watershed would place more pressure on the open-source communities to come up with alternatives, whatever those might be. But whether those alternatives are really better and more empowering than Microsoft is another matter. Given the exclusive choice between a joyless overlord despised by most who still knows relatively little about me, and a beloved overlord who knows my every browsing habit, I'll pick the former -- Microsoft -- every time.

Adam Kalsey Dares To See Through The Emperor's Cloak

Adam Kalsey has had the temerity to criticize the Kewl Kidz browser, Firefox, and thinks that maybe, just maybe, aggressively marketing it prior to "1.x" isn't such a good idea: "Aggressively marketing Firefox before it is a completely stable product is dangerous. Youâ??re running the risk of having people trying it out and being put off by the bugs, never again to return." [Adam Kalsey, "Why I don't recommend Firefox"]

I agree; in addition, I wonder again why Firefox is being so aggressively marketed in preference to the more stable, more usable, more feature-rich Mozilla. Wait -- I know the answer to that already: It's basically because Firefox is COOL, and Mozilla is NOT COOL. There really are no serious technical reasons -- it's all a matter of how to best herd the cats.

The history on this is worth looking at. Mozilla and Firefox forked sometime in '00, when Firefox was still "Phoenix". The split happened because a small group of developers thought that some of the approaches used in the Mozilla core were wrong-headed, and they thought everything had to be rebuilt from the ground up to improve performance. They were particularly obsessed with load-time and rendering speed.

Fast forward to 2004: Mozilla still loads faster (though it's slightly -- slightly -- bigger), and renders pages faster. Mozilla core has been modified to have more or less all the customization hooks that Firefox has. Mozilla is still significantly more usable out of the box. But those kooky Firefox kids have their own bazaar, now. Oh, and, yeh, they finally did implement extension management.

In a really objective analysis, there's no strong argument for introducing Firefox to novice browsers, and as Adam points out, lots of reasons not to. There are also very few sound technical arguments for basing future product development on the Firefox paradigm of "distribute minimal, expect the user to do all the work." The Firefox kidz want their own kewl browser? Fine -- let them go build it, like the Camino team did. Don't turn their misbegotten hacker-bait into the core product. That's a sure way to fail.

Nevertheless, it's abundantly clear at this point that Firefox is the way of the future with regard to the Mozilla Project's browser product, and it's also abundantly clear why: The kidz wouldn't play unless they got to do things their way, and the project needed them.

Whitehouse.Gov's Robot Exclusion File

A friend (who shall remain nameless) just learned about Robot Exclusion Files; these are wide open and you can look at them for a number of very public sites. Being the curious sort, and being particularly mindful of the current administration, it occurred to her to see what happened when she tried to look at the robots.txt file for Whitehouse.gov.

Surprise!

[Since these are paranoid times, I think I should point out pre-emptively that by its very nature, the robots.txt file is intended to be read -- that, in fact, it's read many many many times a day (just not usually by humans). So while the massively secretive and paranoid current occupants of the White House might wish otherwise, there is no conceivable legal reason why I shouldn't be able to look at it. OK?] (The preceding paragraph, and my friend's insistence on anonymity, by the way, are examples of the "chilling effect" in action.)

Typically these things are pretty short. CNN's is an exception, and an educational one. In fact, the four commercial examples I give above seem to all be pretty good examples of when a big site would use them:

  • To exclude highly dynamic content that really just shouldn't be indexed, anyway.
  • To exclude stuff that just plain doesn't need to be indexed, like login pages.
  • To prevent indexing of non-text content like images, audio/video files, amd Shockwave movies. (CNN's web geeks have some fun with this. Hell, why not?)

What's immediately interesting to me on the White House's robots.txt is how superficially mundane a lot of the links are -- and how suggestive others are. If you actually plug in some of those links, you'll find that they run the gamut from broken pages to 404s; the broken or ill-formed pages seem to be static, for the most part, and often old. My friend wondered why the list was so long; I set that in the back of my mind and had a much more mundane and plausible answer flash into my head as I lay my head down last night: Sloppiness. Their web admins are too lazy to set up a sandbox or set passwords, or they don't have the clout to get White House staffers to actually use passwords, so they're opting for security by obscurity.

Maybe someone should tell the Bushites that "security by obscurity" is an oxymoron....nah, let 'em figure it out for themselves.

John Carroll, Master Of FUD

According to John Carroll, alternative browsers will never catch on because they're being promoted by religious zealots who fail to understand that 100% compatability with obscure IE extensions is critical to browsing success.

His main point (once you wade through an irrelevant and unnecessary retelling of the History of the Browser Wars, as told by the winning side) seems to be that because the Web Standards Project is behind BrowseHappy.com, the entire issue of browser-switching must be purely religious.

The reasoning resembles the common Conservative canard that if a "liberal" has an idea, it must by that token be a bad idea.

The truth of the matter is that there remain no important incompatabilities between IE and any of the four principle "alternative" browsers, except in areas that render Internet Explorer inherently less secure. Carroll should know this, but he apparently refuses to. Instead, he remains reflexively committed to supporting Microsoft. In this regard, he resembles Freepers: Aligned with the biggest kid on the block, for no other apparent reason than that he's the biggest.

I'll be curious to see what his reaction is to the fact that the Opera and Mozilla development teams will be incorporating Apple-driven extensions, intended to improve their ability to serve as application interfaces on a personal computer. These represent an attempt to reach a saner compromise between security and functionality than Microsoft's ActiveX. Macromedia and Adobe are on board (both heavy players in the Mac space); Mozilla (which is used most widely on PCs) will push the architecture to Windows, and KHTML will push it to Unix and Linux. Unless MS successfully embraces and extends (not likely, since they haven't got the plugin muscle to out-compete Macromedia), they'll be stuck playing on a level field. For the first time in a while.

The Next Logical Step in Bootable Media

la Cie are specialists in external storage devices (though they also make excellent flat-panel monitors). They got their start building SCSI drives for Macs and other SCSI-equipped PCs, and then were heavy early adopters of IEEE 1394 (a.k.a. "FireWire" -- still superior to USB 2.0, as far as I'm concerned, but what's a guy gonna do...).

Now they've partnered with MandrakeSoft to package one of their pocket-sized 40GB "portable" drives with a bootable, autoconfiguring version of Mandrake Linux version 10. Called the "Globe Trotter" [ZD Net story / MandrakeSoft product page], this is essentially a lineal descendent of MandrakeMove, and directly analogous to interesting and generally excellent Linux distros like Knoppix. It's designed to be booted from a CD and then auto-configure to use the system's resources.

The advantage of a gizmo like this, or of these bootable CDs, is that they let you carry your own computing environment with you without carrying (or even owning) a computer. With MandrakeMove, you carry just the computer and a one-ounce USB thumb-drive; I've typically also carried around my 20GB Archos external drive, which gives me still more capacity. This gives you an even more complete computing environment with even more storage space, as well as the ability to easily install new Linux software. With the benign neglect of a helpful librarian, or by just rebooting your office PC for your lunch hour, you can escape the confines of "public" computing environments. This type of device can also be handy for students having to use PCs in computing labs.

(Aside: While you could probably figure out a way to do this with Mac OS X, it would be technically difficult and legally questionable to try it with Windows.)

I'm a huge fan of MandrakeMove, and have been planning to upgrade to their second generation version; I might just get this instead. True, $219 is a little high for a 40GB drive (even one of that size), but the markup from MSRP of the naked 40GB drive is only about $60. So what you have to ask yourself is whether it's worth $60 of your time and effort to buy naked and install on top. For many Linux geeks, the answer will be "yes"; more power to them.

Me, I'll think seriously about this, because I've already found so many uses for my MandrakeMove CD that I can't begin to tell you. For example, it's been hugely useful in filling in for the deficiencies of Windows NT 4, which I still use on one of my systems at the office. The only way I have of making backups is by copying from my old PC to a slightly less old network fileserver. But since this box has USB, I can reboot using my MandrakeMove disc, and then backup my files to my 20GB Archos disk or to my 1GB Lexar thumb-drive.

Of course, there are also less savory uses for this kind of thing, such as bypassing IT policies or serving as a hackers toolbox. But then, just as you can use a car to transport stolen goods, you can use any of these things (and I, personally, do use them) for legitimate purposes, too.

Your Excuses Are Gone, Browse Happy Already!

Browse Happy logoSome folks at the Web Standards Project have put together a slick little website called BrowseHappy.com (very Carbon-ish design, if I may say) that highlights the four principle alternative browsers: Firefox, Mozilla, Apple Safari and Opera. I've used all of these but Safari, though I have used recent versions of its close cousin, Konqueror. I can say with confidence that all of them are superior to IE in almost every siginificant regard.

The one area where they're not superior, is in compatability with applications that rely on IE's idiosyncracies and proprietary extensions, like Outlook Netmail. (It's a Microsft application, what do you want?) But for many such applications, the "incompatabilities" can be resolved by simply setting the user agent string to let you masquerade as IE. (That's a standard feature on Opera and KHTML browsers like Safari and Konqueror, and an easy add-on for Mozilla and Firefox.)

I'm a Mozilla user. Since version 1.1 or so, I've used it for everything that didn't explicitly require Internet Explorer. In that time, it's gotten faster, smaller, more feature-packed, and developed better and better support for web standards. Intentionally or not, this site is optimal in Mozilla, since that's what I use when I'm building pages. IE compatability tweaks are the last step I take for my personal sites.

I'm also an enthusiastic booster of Mozilla Mail, which is baked into Mozilla. It's fast, standards-based, robust, secure, easy to learn, and provides some of the best support I've ever seen for multiple email identities. One big reason that I prefer Mozilla to Firefox is that Mozilla does come with the mail client baked in, and I'd have to go and get Mozilla Thunderbird to switch to Firefox. I have some general gripes about this arrangment, but overall, either Mozilla or Firefox+Thunderbird is a good, safe, usable combination.

The message: Go for it, folks. Switch to a safe browser. And while you're at it, switch to a safe email client, too.

Topics: 

How I Want To Work, Part I

Here's how I want to work: I want to be able to just note stuff down, wherever I happen to be at that moment, and have it get stored and automatically categorized, and be available for publication wherever I want from wherever I am, whenever I want to. This has been an achievable dream for nearly ten years -- people are constantly hacking together systems to do just that. But we're stuck in a technologically-determined rut that keeps these solutions from being developed.

I've been thinking about these things a lot, and decided it was time that I wrote it all out, to organize my own ideas as much as anything else. So here's part one, where I try to unpack what it is that I'm really asking for, and start to get a sense for what's not working now, and why. So, as a separate story (because they're long, and would push everything down the page and out of site), here's how I want to work...

How I Want To Work, Part One

[continued from blog entry]

Here's how I want to work: I want to be able to just note stuff down (in my ideal world, wherever I am at that moment) and have it get stored and automatically categorized, organized -- by timestamp, at least, but ideally also in some kind of navigable taxonomy

That Pernicious "Search Is King" Meme

There's an ever-waxing meme out there which basically boils down to this: "Forget about organizing information by subject -- let a full-text search do everything for you." The chief rationale is that such searching will help increase serendipity by locating things across subject boundaries.

Here's the problem: It's a load of crap. It throws the baby out with the bathwater, by discarding one time-honored, effective way of organizing for serendipity in exchange for another, inferior (but sexier) one.

This morning, via Wired News:

"We all have a million file folders and you can't find anything," Jobs said during his keynote speech introducing Tiger, the next iteration of Mac OS X, due next year.

"It's easier to find something from among a billion Web pages with Google than it is to find something on your hard disk," he added.

... which is bullshit, incidentally. At least, it is on my hard drive...

The solution, Jobs said, is a system-wide search engine, Spotlight, which can find information across files and applications, whether it be an e-mail message or a copyright notice attached to a movie clip. "We think it's going to revolutionize the way you use your system," Jobs declared.

In Jobs' scheme, the hierarchy of files and folders is a dreary, outdated metaphor inspired by office filing. In today's communications era, categorized by the daily barrage of new e-mails, websites, pictures and movies, who wants to file when you can simply search? What does it matter where a file is stored, as long as you can find it?

Ah, I see -- the idea of hierarchically organizing data is bad because it's "dreary" and "outdated" -- that is, of course, so quintessentially Jobsian a dismissal that we can be pretty sure the reporter took his words from The Steve, Himself.

But this highlights something important: That this is not a new issue for Jobs, or for a lot of people. Jobs was an early champion (though, let's be clear, not an "innovator") in the cause of shifting to a "document-centric paradigm". The idea was that one ought not have to think about the applications one uses to create documents -- one just ought to create documents, and then make them whatever kind of document one needs. Which, to me, seems a little like not having to care what kind of vehicle you want, when you decide to drive to the night club or go haul manure.

But I digress. This is supposed to be how Macs work, but it's actually not: Macs are just exactly as application-centric as anything else, though it doesn't appear that way at first. The few attempts at removing the application from the paradigm, like ClarisWorks and the early versions of StarOffice (now downstream from OpenOffice), merely emphasized the application-centricity even more: While word processors and spreadsheet software could generally translate single-type documents without much data loss, there was no way that they were going to be able to translate a multi-mode (i.e. word processor plus presentation plus spreadsheet) document from one format to another without significant data loss or mangling.

Take for example, Rael Dornfest, who has stopped sorting his e-mail. Instead of cataloging e-mail messages into neat mailboxes, Dornfest allows his correspondence to accumulate into one giant, unsorted inbox. Whenever Dornfest, an editor at tech publisher O'Reilly and Associates, needs to find something, he simply searches for it.

Again, a problem: It doesn't work. I do the same thing (though I do actually organize into folders -- large sigle-file email repositories are a data meltdown just waiting to happen). This is a good paradigmatic case, so let's think it through: I want to find out about a business trip to Paris that was being considered a year and a half ago. I search for "trip" and "paris". If my spam folder's blocked, and assuming we're still just talking about email, I'm probably not going to get a lot of hits on Simple Life 2 or the meta-tags for some other Paris Hilton <ahem!> documentary footage. In fact, unless the office was in Paris, and the emails explicitly used the term "trip", which they may well not, I probably won't find the right emails at all. Or I'll only find part of the thread, and since no email system currently in wide use threads messages, I won't have a good way of linking on from there to ensure that I've checked all messages on-topic. (And that could lead into another rant about interaction protocols in business email, but I'll stop for now.)

By contrast, if I've organized my email by project, and I remember when the trip was, I can go directly to the folder where I keep that information and scan messages for the date range in question.

The key problem here is that search makes you work, whereas with organization, you just have to follow a path. I used to train students on internet searching. This was back in the days when search engines actually let you input Boolean searches (i.e., when you could actually get precise results that hadn't been Googlewhacked into irrelevance). Invariably, students could get useful results faster by using the Yahoo-style directory drill-down, or a combination of directory search and drill-down, than they could through search.

If they wanted to get unexpected results, they were better off searching (at least, with the directory systems we had then and have now -- these aren't library catalogs, after all). And real research is all about looking for unexpected results, after all.

And that leads me to meta data.

Library catalogs achieve serenditity through thesaurii and cross referencing. (Though in the 1980s, the LC apparently deprecated cross-referencing for reasons of administrative load.)

The only way a system like Spotlight works to achieve serendipitous searching -- and it does, by the accounts I've read -- is through cataloged meta-data. That is, when a file is created, there's a meta-information section of the file that contains things like subject, keywords, copyright statement, ownership, authorship, etc. Which almost nobody ever fills out. Trust me, I'm not making this up: from my own experience, and that of others, I know that people think meta-data is a nuisance. Some software is capable of generating its own meta-data from a document, but such schemes have two obvious problems:

  1. They only include the terms in the document -- no synonyms or antonyms or related subjects, and no obvious way of mapping ownership or institutional positioning -- so they're no real help to search.
  2. They only apply to that software, and then only going forward, and then only if people actually use them.

Now, a lot of this is wasted cycles if I take the position that filesystems aren't going away and this really all amounts to marketer wanking. But it's not wasted cycles, if I consider that the words of The Steve, dropped from On High, tend to be taken as the words of God by a community of technorati/digerati who think he's actually an innovator instead of a slick-operating second-mover with a gift for self-promotion and good taste in clothes.

This kind of thinking, in other words, can cause damage. Because people will think it's true, and they'll design things based on the idea that it's true. And since "thought leaders" like Jobs say it's important, people will use these deficient new designs, and I'll be stuck with them.

But there's little that anyone can do about it, really, except stay the course. Keep organizing your files (because otherwise, you're going to lose things, trust me on this, I know a little about these things). The "true way" to effective knowledge management (if there is one) will always involve a combination of effective search systems (from which I exclude systems like Google's that rely entirely on predictive weighting) with organization and meta-data (yes, I do believe in it, for certain things like automated resource discovery).

Funny, who would have thunk it: The "true way" is balance, as things almost always seem to come out, anyway. You can achieve motion through imbalance, but you cannot achieve progress unless your motions are in harmony -- in dynamic balance, as it were. What a strange concept...

I Want My Faraday Cage

From ZDNet: BAE is developing a smart wallpaper that will block some signals (e.g. WiFi) while allowing others to pass: "BAE says the material is cheap. The company will be developing it commercially through its corporate venture subsidiary."

In my rare fantasies of home-ownership (when I'm not too set in my renter's ways), my ideal cave -- er, I mean, home -- is usually encased in a Faraday Cage, to prevent the neighbors from listening in on my phone conversations and network chatter. (Of course, there is the small matter of being able to listen to the radio or talk on the cell phone...) OK, yes, I already know I'm strange, move along...

Pages

Subscribe to RSS - InfoTech