"The future is already here. It's just not well distributed yet."
There's an ever-waxing meme out there which basically boils down to this: "Forget about organizing information by subject -- let a full-text search do everything for you." The chief rationale is that such searching will help increase serendipity by locating things across subject boundaries.
Here's the problem: It's a load of crap. It throws the baby out with the bathwater, by discarding one time-honored, effective way of organizing for serendipity in exchange for another, inferior (but sexier) one.
This morning, via Wired News:
"We all have a million file folders and you can't find anything," Jobs said during his keynote speech introducing Tiger, the next iteration of Mac OS X, due next year.
"It's easier to find something from among a billion Web pages with Google than it is to find something on your hard disk," he added.
... which is bullshit, incidentally. At least, it is on my hard drive...
The solution, Jobs said, is a system-wide search engine, Spotlight, which can find information across files and applications, whether it be an e-mail message or a copyright notice attached to a movie clip. "We think it's going to revolutionize the way you use your system," Jobs declared.
In Jobs' scheme, the hierarchy of files and folders is a dreary, outdated metaphor inspired by office filing. In today's communications era, categorized by the daily barrage of new e-mails, websites, pictures and movies, who wants to file when you can simply search? What does it matter where a file is stored, as long as you can find it?
Ah, I see -- the idea of hierarchically organizing data is bad because it's "dreary" and "outdated" -- that is, of course, so quintessentially Jobsian a dismissal that we can be pretty sure the reporter took his words from The Steve, Himself.
But this highlights something important: That this is not a new issue for Jobs, or for a lot of people. Jobs was an early champion (though, let's be clear, not an "innovator") in the cause of shifting to a "document-centric paradigm". The idea was that one ought not have to think about the applications one uses to create documents -- one just ought to create documents, and then make them whatever kind of document one needs. Which, to me, seems a little like not having to care what kind of vehicle you want, when you decide to drive to the night club or go haul manure.
But I digress. This is supposed to be how Macs work, but it's actually not: Macs are just exactly as application-centric as anything else, though it doesn't appear that way at first. The few attempts at removing the application from the paradigm, like ClarisWorks and the early versions of StarOffice (now downstream from OpenOffice), merely emphasized the application-centricity even more: While word processors and spreadsheet software could generally translate single-type documents without much data loss, there was no way that they were going to be able to translate a multi-mode (i.e. word processor plus presentation plus spreadsheet) document from one format to another without significant data loss or mangling.
Take for example, Rael Dornfest, who has stopped sorting his e-mail. Instead of cataloging e-mail messages into neat mailboxes, Dornfest allows his correspondence to accumulate into one giant, unsorted inbox. Whenever Dornfest, an editor at tech publisher O'Reilly and Associates, needs to find something, he simply searches for it.
Again, a problem: It doesn't work. I do the same thing (though I do actually organize into folders -- large sigle-file email repositories are a data meltdown just waiting to happen). This is a good paradigmatic case, so let's think it through: I want to find out about a business trip to Paris that was being considered a year and a half ago. I search for "trip" and "paris". If my spam folder's blocked, and assuming we're still just talking about email, I'm probably not going to get a lot of hits on Simple Life 2 or the meta-tags for some other Paris Hilton <ahem!> documentary footage. In fact, unless the office was in Paris, and the emails explicitly used the term "trip", which they may well not, I probably won't find the right emails at all. Or I'll only find part of the thread, and since no email system currently in wide use threads messages, I won't have a good way of linking on from there to ensure that I've checked all messages on-topic. (And that could lead into another rant about interaction protocols in business email, but I'll stop for now.)
By contrast, if I've organized my email by project, and I remember when the trip was, I can go directly to the folder where I keep that information and scan messages for the date range in question.
The key problem here is that search makes you work, whereas with organization, you just have to follow a path. I used to train students on internet searching. This was back in the days when search engines actually let you input Boolean searches (i.e., when you could actually get precise results that hadn't been Googlewhacked into irrelevance). Invariably, students could get useful results faster by using the Yahoo-style directory drill-down, or a combination of directory search and drill-down, than they could through search.
If they wanted to get unexpected results, they were better off searching (at least, with the directory systems we had then and have now -- these aren't library catalogs, after all). And real research is all about looking for unexpected results, after all.
And that leads me to meta data.
Library catalogs achieve serenditity through thesaurii and cross referencing. (Though in the 1980s, the LC apparently deprecated cross-referencing for reasons of administrative load.)
The only way a system like Spotlight works to achieve serendipitous searching -- and it does, by the accounts I've read -- is through cataloged meta-data. That is, when a file is created, there's a meta-information section of the file that contains things like subject, keywords, copyright statement, ownership, authorship, etc. Which almost nobody ever fills out. Trust me, I'm not making this up: from my own experience, and that of others, I know that people think meta-data is a nuisance. Some software is capable of generating its own meta-data from a document, but such schemes have two obvious problems:
Now, a lot of this is wasted cycles if I take the position that filesystems aren't going away and this really all amounts to marketer wanking. But it's not wasted cycles, if I consider that the words of The Steve, dropped from On High, tend to be taken as the words of God by a community of technorati/digerati who think he's actually an innovator instead of a slick-operating second-mover with a gift for self-promotion and good taste in clothes.
This kind of thinking, in other words, can cause damage. Because people will think it's true, and they'll design things based on the idea that it's true. And since "thought leaders" like Jobs say it's important, people will use these deficient new designs, and I'll be stuck with them.
But there's little that anyone can do about it, really, except stay the course. Keep organizing your files (because otherwise, you're going to lose things, trust me on this, I know a little about these things). The "true way" to effective knowledge management (if there is one) will always involve a combination of effective search systems (from which I exclude systems like Google's that rely entirely on predictive weighting) with organization and meta-data (yes, I do believe in it, for certain things like automated resource discovery).
Funny, who would have thunk it: The "true way" is balance, as things almost always seem to come out, anyway. You can achieve motion through imbalance, but you cannot achieve progress unless your motions are in harmony -- in dynamic balance, as it were. What a strange concept...