A Manifesto for Tags

September 19th, 2006

Ask Skeptic’s Mom doesn’t get tags at all. I don’t blame her. I think it’s time for a… Manifesto. Or at least a way of trying to explain tagging to somebody who is new to this way of thinking.

In the beginning there was the librarian. Wise, unnoticed by members of the opposite sex and mostly under-paid, they did their Good Work. However come the end of the 19th-century there were many books that needed to be organised, and no agreed-upon way of organising them. Enter our hero for this introduction, Melivil Dewey, who invented The Dewey Decimal Classification system that we all know and our love from our fine public libraries today.

The DCC (as it’s known to its friends) is a specific form of something we like to call a Taxonomy, or if we’re feeling particularly philosophical, an Ontology. The purpose is simple: how can we take a large pile of books (or indeed any resource) and place them into some order such that when we need to recover an item we can do so easily. Further, would it not be easier for all items of a similar nature to be grouped together in the same place?

Without such a system, the public libraries would be even more chaotic, noisy and party-like than they are even today. We would be clambering over the beer kegs asking the jocks if they knew where that volume on architecture up to circa 300AD was. Do you think they’d know? They wouldn’t answer, as they do now, “Dude, 722 - don’t you even know your DCC codes, you doofus?!” whilst whipping us with a towel. No, they would merely look confused.

On meandering over to section 722, we would surely find the book we were looking for, but behold! We also find lots of other books on the same subject, that we didn’t even know we were looking for in the first place! We are in one place looking at the entire majesty of the resources the local library has managed to put before us in section 722 - all 2 books! OK, so public libraries are under-funded, but that’s not the point. Without DCC or some other taxonomy to replace it, the two books would never be near each other, we might find one, but never the other.

We know this is one reason the classification of resources is very important. It helps us not only find information we’re looking for, but it also helps group those items together to make ‘neighbours’ easily found.

Great, so what do we need tags for?

Well, the problem with any taxonomy or ontology is scope. How do you create a taxonomy that is so large it can takes absolutely anything humanity can come up with? The short answer is, generalise - don’t try and be too specific. If you have a book “How to Hunt & Cook Pigeons”, then that probably belongs in either 799 (hunting), 598 (birds) or 641 (Food & drink). But wait, you have one book - which one does it go in? Or do we create a new category specifcally about cooking hunted birds?

This is the problem with taxonomies - at some level you need them to be very basic so that they can be easily understood and referenced. However, too general and over-arching and you find that some things need to fit into multiple categories, or you need to create whole new categories every time something comes along that doesn’t quite fit. What’s more it can all be quite subjective: is shooting pigeons and cooking them hunting, or is it sport, or survivalism or is it just cruel and barbaric and not the sort of thing you should have in public libraries? Who decides?

The problem gets harder when you try and get information out so you can find the book later. You now need to navigate through possibly three different branches of the hiearachy to discover what you’re looking for, each time having to make choices down branches that might be wrong.

What’s more, whilst it’s nice to find neighbours, what if I want to find neighbours that don’t match the way the taxonomy was organised? What if I want a collection of books on cooking birds of all types, not just pigeons? I might need to go to several different places. What if I’m just interested in pigeons in general? Do I miss out by looking in the birds section by not knowing there is a book I might be interested in over in the hunting or cookery section?

It gets worse when you realise that you could be dealing with not just a few thousand items in a library, but the entire sum of human knowledge. Every document, photo, film, sound recording, computer program and physical object. Imagine trying to classify and then later find everything related to piegeons, cooking and hunting in that lot.

Enter our new much-hyped, but little-understood hero: tagging.

The purpose of tagging is to replace taxonomies. We want to do this for lots of reasons, including:

  • We don’t want to have to worry about where we put stuff into the system. We want to mark the item up without having to spend an hour - or decade - debating which part of the taxonomy it belongs in.
  • We want to know it can be easily retrieved by those who may be interested in finding it at a later date.
  • We want to be able to easily find ‘neighbours’ even if they belong in a traditionally unrelated taxonomy.

Let’s look at the lifetime of a tagged object, our now familiar book on cooking pigeons. We have a book that we’re going to enter into our database with tags. We decide just a few tags will be sufficient:

cookery, birds, pigeon, hunting, book

If it’s a digital book, we would attach the file to this ‘record’ now, or we might just point to a shelf location if that’s where it belongs. Note, we have not referred to any taxonomy here, we’ve just put the data into the system, and we’ve now moved onto entering our book on architecture in the 2nd century. No debates, no discussions, no new classifications needed for a quirky book. It’s just been put in the system.

Now comes the important bit: getting the data back out.

Our first custodian stumbles in, scratching his beard and thinks about doing some shooting for dinner tonight. He walks to the console and types in ‘cookery’ and ‘hunting’ as tags to search for. We get a hit for hundreds of books, and he notices our book on pigeons. He selects it, looks around, and now asks for ‘cookery’ and ‘pigeon’ to swap the classification he’s looking for to see if there are any other useful guides in this library. Vegetarians the World over will be pleased to hear that there aren’t, but when he de-selects ‘cookery’ and has just ‘pigeon’ on screen, he is reminded that this is but a mere mortal beast worthy of his mercy thanks to the billions of pictures of cute pigeons he is exposed to.

Our second custodian is thinking of doing some game hunting this weekend, but has no idea what he might do with his catch. He selects ‘birds’ and ‘hunting’. Again, a selection of titles comes along, including this title he might otherwise have missed.

Our third custodian is an animal-rights protestor. She means business. On pushing our first custodian out of the way, she searches for ‘hunting’ and is bewildered by her choices until she notices in very small letters in the tag cloud ‘pigeon’. Mortified, she then discovers the option for ‘cookery’ and decides to create a list of all books tagged ‘cookery’ and ‘birds’ to include in her letter to the chief librarian.

The point, you see, is retrieval. We don’t all think in terms of a taxonomy. Creators will create things that don’t fit into a category, and people will be able to take advantage of finding items and their neighbours that don’t belong together in the scheme devised by Dewey (or any other scheme for that matter). This is particularly hard in taxonomies because ‘neighbours’ are really subjective based on the context of the person doing the looking, not the Dewey view that they are objectively determined at the point of classification.

This is particularly useful when you consider that if all content on the Internet is tagged (or maybe metadata ‘keyword’ tagged), we can create powerful search engines that help us devise something powerful and far more user-friendly than any taxonomy.

Google effectively does this for us. Google in fact beat the ‘taxonomy’ that Yahoo and others were pushing before Google arrived through ‘keyword search’ (which is just another way of saying ‘tag search’) - but we don’t realise or recognise it.

The fact we’re now bringing it into the foreground of the application in the Web 2.0 age should be something users rejoice over, rather than reject. Imagine how they would feel however if Google only allowed one word in its search box and if you wanted a particular cookery web page you had to select ‘cookery’ and then browse through all of them until you found the one you wanted.

The downside of course, is that most current interfaces for tags only allow the selection of one tag at a time. Very few allow for us to find inter-sections of tags. It is no good being able to find all books tagged ‘pigeon’ OR all items tagged ‘cookery’ - we need to find the cross-over. Current web applications have reached the first stage, but it is when they reach the next and all content on the net has been tagged that we will truly understand their power.