You are here

That Pernicious "Search Is King" Meme

There's an ever-waxing meme out there which basically boils down to this: "Forget about organizing information by subject -- let a full-text search do everything for you." The chief rationale is that such searching will help increase serendipity by locating things across subject boundaries.

Here's the problem: It's a load of crap. It throws the baby out with the bathwater, by discarding one time-honored, effective way of organizing for serendipity in exchange for another, inferior (but sexier) one.

This morning, via Wired News:

"We all have a million file folders and you can't find anything," Jobs said during his keynote speech introducing Tiger, the next iteration of Mac OS X, due next year.

"It's easier to find something from among a billion Web pages with Google than it is to find something on your hard disk," he added.

... which is bullshit, incidentally. At least, it is on my hard drive...

The solution, Jobs said, is a system-wide search engine, Spotlight, which can find information across files and applications, whether it be an e-mail message or a copyright notice attached to a movie clip. "We think it's going to revolutionize the way you use your system," Jobs declared.

In Jobs' scheme, the hierarchy of files and folders is a dreary, outdated metaphor inspired by office filing. In today's communications era, categorized by the daily barrage of new e-mails, websites, pictures and movies, who wants to file when you can simply search? What does it matter where a file is stored, as long as you can find it?

Ah, I see -- the idea of hierarchically organizing data is bad because it's "dreary" and "outdated" -- that is, of course, so quintessentially Jobsian a dismissal that we can be pretty sure the reporter took his words from The Steve, Himself.

But this highlights something important: That this is not a new issue for Jobs, or for a lot of people. Jobs was an early champion (though, let's be clear, not an "innovator") in the cause of shifting to a "document-centric paradigm". The idea was that one ought not have to think about the applications one uses to create documents -- one just ought to create documents, and then make them whatever kind of document one needs. Which, to me, seems a little like not having to care what kind of vehicle you want, when you decide to drive to the night club or go haul manure.

But I digress. This is supposed to be how Macs work, but it's actually not: Macs are just exactly as application-centric as anything else, though it doesn't appear that way at first. The few attempts at removing the application from the paradigm, like ClarisWorks and the early versions of StarOffice (now downstream from OpenOffice), merely emphasized the application-centricity even more: While word processors and spreadsheet software could generally translate single-type documents without much data loss, there was no way that they were going to be able to translate a multi-mode (i.e. word processor plus presentation plus spreadsheet) document from one format to another without significant data loss or mangling.

Take for example, Rael Dornfest, who has stopped sorting his e-mail. Instead of cataloging e-mail messages into neat mailboxes, Dornfest allows his correspondence to accumulate into one giant, unsorted inbox. Whenever Dornfest, an editor at tech publisher O'Reilly and Associates, needs to find something, he simply searches for it.

Again, a problem: It doesn't work. I do the same thing (though I do actually organize into folders -- large sigle-file email repositories are a data meltdown just waiting to happen). This is a good paradigmatic case, so let's think it through: I want to find out about a business trip to Paris that was being considered a year and a half ago. I search for "trip" and "paris". If my spam folder's blocked, and assuming we're still just talking about email, I'm probably not going to get a lot of hits on Simple Life 2 or the meta-tags for some other Paris Hilton <ahem!> documentary footage. In fact, unless the office was in Paris, and the emails explicitly used the term "trip", which they may well not, I probably won't find the right emails at all. Or I'll only find part of the thread, and since no email system currently in wide use threads messages, I won't have a good way of linking on from there to ensure that I've checked all messages on-topic. (And that could lead into another rant about interaction protocols in business email, but I'll stop for now.)

By contrast, if I've organized my email by project, and I remember when the trip was, I can go directly to the folder where I keep that information and scan messages for the date range in question.

The key problem here is that search makes you work, whereas with organization, you just have to follow a path. I used to train students on internet searching. This was back in the days when search engines actually let you input Boolean searches (i.e., when you could actually get precise results that hadn't been Googlewhacked into irrelevance). Invariably, students could get useful results faster by using the Yahoo-style directory drill-down, or a combination of directory search and drill-down, than they could through search.

If they wanted to get unexpected results, they were better off searching (at least, with the directory systems we had then and have now -- these aren't library catalogs, after all). And real research is all about looking for unexpected results, after all.

And that leads me to meta data.

Library catalogs achieve serenditity through thesaurii and cross referencing. (Though in the 1980s, the LC apparently deprecated cross-referencing for reasons of administrative load.)

The only way a system like Spotlight works to achieve serendipitous searching -- and it does, by the accounts I've read -- is through cataloged meta-data. That is, when a file is created, there's a meta-information section of the file that contains things like subject, keywords, copyright statement, ownership, authorship, etc. Which almost nobody ever fills out. Trust me, I'm not making this up: from my own experience, and that of others, I know that people think meta-data is a nuisance. Some software is capable of generating its own meta-data from a document, but such schemes have two obvious problems:

  1. They only include the terms in the document -- no synonyms or antonyms or related subjects, and no obvious way of mapping ownership or institutional positioning -- so they're no real help to search.
  2. They only apply to that software, and then only going forward, and then only if people actually use them.

Now, a lot of this is wasted cycles if I take the position that filesystems aren't going away and this really all amounts to marketer wanking. But it's not wasted cycles, if I consider that the words of The Steve, dropped from On High, tend to be taken as the words of God by a community of technorati/digerati who think he's actually an innovator instead of a slick-operating second-mover with a gift for self-promotion and good taste in clothes.

This kind of thinking, in other words, can cause damage. Because people will think it's true, and they'll design things based on the idea that it's true. And since "thought leaders" like Jobs say it's important, people will use these deficient new designs, and I'll be stuck with them.

But there's little that anyone can do about it, really, except stay the course. Keep organizing your files (because otherwise, you're going to lose things, trust me on this, I know a little about these things). The "true way" to effective knowledge management (if there is one) will always involve a combination of effective search systems (from which I exclude systems like Google's that rely entirely on predictive weighting) with organization and meta-data (yes, I do believe in it, for certain things like automated resource discovery).

Funny, who would have thunk it: The "true way" is balance, as things almost always seem to come out, anyway. You can achieve motion through imbalance, but you cannot achieve progress unless your motions are in harmony -- in dynamic balance, as it were. What a strange concept...

Comments

escoles wrote:

The "true way" to effective knowledge management (if there is one) will always involve a combination of effective search systems (from which I exclude systems like Google's that rely entirely on predictive weighting) with organization and meta-data (yes, I do believe in it, for certain things like automated resource discovery).

Very interesting post, escoles. So, if I may ask, which current search systems do you consider effective?

Depends on what you're trying to do. For web search, no one has the resources to spider as deeply as Google. And my issues with the Google "algorithm" (which is really not one algorithm, but many, incorporated into a business workflow and product) are complex and warrant a long detailed rant -- suffice to say that the only pure-search providers that I bother with are Teoma (which is another window onto the same database as askJeeves) and Google. For directory-search, Yahoo usually comes first (since that's my home page) and then OpenDirectory (http://dmoz.org) -- the latter has the [dis/]advantage of a less formal editing structure. For blog subjects, which is a pretty specialized application, I usually run to Daypop (they're poor, so their servers get slashdotted easily). That's just because I'm used to them; Technorati gets a ping whenever I update (think of it as reverse spidering); there are others.

But the real question here is how do we find the things that we work on. If I'm working on projects, I organize the emails into project-related folders, or by folders for that client (things often get muddled together within a client). That way, I open the folder and scan by date and sub line. So many of the work-related discussions that I get involved in via email go off on tangents, or use unsearchable terms. So dumping into a flat box and searching it would just never work for me, and I fail to understand how it could work for anyone unless they're doing repetetive searches, one after another.

It's my experience, from watching people in offices and trying different things myself, that people often settle on ways of organizing their work that "just work" or "work for them" that're actually really, really inefficient. Like this single-mailbox thing that Rael Dornfest does. How can anyone possibly find such a system productive?

Well, I think the answer, more often than people want to admit, is: They don't. Because they don't actually use those methods that they claim to use.

I have a friend who claims to do something similar with his bookmark list. He claims that he always bookmarks by just adding a bookmark to the list, and he never organizes his bookmarks by subject. He just does a simple text search to find what he's looking for. But when I take apart what he's doing, that's not what he actually does. He presents it as a simple process, but he's actually doing several things he doesn't tell you about until you dig into it:

  1. Press ctrl-D to bookmark the page (he uses Mozilla)
  2. Retype the subject to something he can search on
  3. Later, periodically, he sorts into one layer of bookmark folders.

Here's what I do:

  1. Click on the icon next to the URL
  2. Drag it ot the Bookmarks icon on the "personal toolbar" (I also use Mozilla), which expands the bookmark tree
  3. Drag it to the appropriate folder -- the tree expands as you drag -- and drop it.

You can see that part of the conflict is interaction paradigm. Bill does everything by keyboard, I've moved to doing a number of thing by mouse (i.e., using a spatial interaction metaphor). It's true, it's difficult to do what Bill wants by keyboard in Mozilla -- or at all, for that matter. But it's easy with a spatial interaction metaphor.

To find bookmarks, Bill switches to Netscape 4.x and opens its bookmark manager, which lets him search in ways that he prefers. (I don't quarrel with that; if you have to find bookmarks by searching -- e.g., if you organize by only very broad subjects, again, Mozilla is not as useful because its bookmark manager returns inferior search results.)

Here's what I do:

  1. Click on Bookmarks
  2. Walk through the tree looking for something like what I want to know.
  3. Click on it.

Again, I've exchanged the keyboard metaphor for a spatial one. Or, perhaps, I've added the spatial metaphor to the keyboard, character or "digital" interaction medium. It's not really sufficient to call it "charater based", because that's not really what it's about. It's really about something else -- about issueing commands that are executed, and those happen to be as key-combinations.

Part of the reason that Bill prefers this method is probably that he's a musician. He's not a brilliant musician, but music penetrates so deeply into his worldview that you really can't understand him until you know that. Heaven forbid you should have something to say to him while he's playing the piano; he won't be able to hear you (or, really, won't be able to process what you're saying to him).

So... yes, there really was a point to all of this... it's that there isn't one way that works. People like Steve Jobs and yes, my friend Bill, think there is one "reasonable" way to do things like this, and really, the ways that they champion are rooted in their own subjective view of things. And they are largely unaware of this.

Which, of course, doesn't really answer you question, I think...

ASIDE: Gotta fix this damn layer problem -- can't click on any of the links right now...

escoles wrote:

Depends on what you're trying to do. For web search, no one has the resources to spider as deeply as Google.

Well, truly, escoles, web search is probably my greatest reason for searching. Iâ??ll have much fun exploring the search engines youâ??ve provided. Thank you so much! And, yes, I am able to click on the links. I had that problem early on until I changed my theme setting (ooh, that â??horridâ? Internet Explorerâ??I know that should elicit another rant). I have no problem clicking on links now once I sign in. Some other people coming to this site who donâ??t have accounts and view the default theme setting will undoubtedly not be able to click on all the links though.

Thanks also for the bookmarking example. I have so much to learn.

escoles wrote:

Part of the reason that Bill prefers this method is probably that he's a musician. He's not a brilliant musician, but music penetrates so deeply into his worldview that you really can't understand him until you know that. Heaven forbid you should have something to say to him while he's playing the piano; he won't be able to hear you (or, really, won't be able to process what you're saying to him). So... yes, there really was a point to all of this... it's that there isn't one way that works.

Well, yes, thatâ??s the upshot, although Iâ??m not sure where I would fit in. As you know, music â??penetratesâ? deeply in my worldview also, but Iâ??m not sure you really understand me, escoles. LOL! I can fully relate to not completely hearing or processing a conversation if Iâ??m deeply involved in the musical experience while singing or playing the piano. Iâ??m transported to another realm, as it were...

Peggy wrote:

Some other people coming to this site who donâ??t have accounts and view the default theme setting will undoubtedly not be able to click on all the links though.

Oooh! What a surprise -- you fixed it! No problem with links now. Very professional layout, escoles. Thank you!

Add new comment