You are reading a blog - Innovation in Software - no longer under active maintenance. These pages are kept here for archive purposes. If you wish to find out more about Vagueware please read our current website which will include links to the new blogs when live.
Kagtum: An Announcement
People have been asking about Kagtum for some time now. About 3 years, actually. When will it launch? When can we see it? Much excitement abounds, and I’ve wanted to provide firm answers from the start, but for various reasons – mainly caused by a lack of resource to throw at the development which has meant it’s been a “spare time” project – it has been constantly delayed.
As it stands right now, Kagtum could go to private beta in about a fortnight. It would be rough around the edges, half the features I want it to have would be turned off, but it would offer the ability to read news stories and have the system work out (based on where, who and what you are), what stories should appear on your front page each day. It would offer an idea of what the technology is capable of, and I believe garner interest and audience from day one.
However, a clanger has emerged. It’s something I’ve been dismissive of, simply because I didn’t believe anybody would be so stupid to go through with it as a plan. I now think this idiocy is going to be copied by the rest of the industry, and that causes some real problems.
Yesterday, Rupert Murdoch concluded that his review into pay-wall publishing was over and all News International/News Corporation titles will go behind a pay-wall. By the end of the year, if you want to read a story from The Time, Sun or News of the World, or indeed any of the hundreds of other newspapers owned by the empire, you’ll have to pay. I’ve already explained the stupidity in another article, so let me just point you to the bit from the Guardian article on this that I knew might be coming, but hoped would be delayed:
He accepted that there could be a need for furious litigation to prevent stories and photographs being copied elsewhere: “We’ll be asserting our copyright at every point.”
That kills the current Kagtum model, dead.
Source: Reuters
I have no idea how litigious this could get, but at the very minimum Kagtum needs to read in a story, look at every word and word group (it recognises “swine flu pandemic” as one tag common to many stories without prompting, even when it appears in a paragraph), store it, and at the very minimum provides a link to the original article with the original headline.
Until it’s clear whether that’s acceptable by News Corporation and family, I can’t process any of their newspaper articles in a publicly available system: I simply don’t have the cash to go to court over this.
So, all News Corporation media (including websites affiliated with magazines, newspapers, TV channels and other media they own) is now to be excluded from Kagtum until the legal position is clarified. Anybody else who puts up a paywall will also get excluded.
That has a couple of knock-on effects:
- It makes maintenance of the platform much more difficult as suddenly we’re dealing with websites that explicitly do not want to be linked to – something so alien to the spirit of the web, I’m still confused as to how anybody ever thought it was a good idea
- The quality of the product for the user is decreased and therefore the take-up rate of Kagtum will suffer
- People who pay for access to one newspaper will end up in the echo chamber Kagtum is specifically designed to avoid
So, what to do? Throw everything away and regret not proving the model and the technology? Hope NC change their minds, or watch them go bust? Hope people won’t mind lack of access to NC content? Or get the technology launched now and plead ignorance in court when I get sued to high-heaven?
None of the above are viable in the long-term. Being sued isn’t even viable in the short-term.
I’ve had time to think about this for a while, and stand by my arguments. I still genuinely think that pay-walls are bad for newspapers, bad for customers, and ultimately bad for democracy. I also think this is the nail in the coffin for the old media empires – News Corporation have just provided a way for the rest of the industry to kill them off permanently. However, market realities do not often take into account philosophy essays, and the situation is as it is.
If News Corporation are open to systems like Kagtum and competitors scraping data and linking to them, we can re-consider. It’s pretty easy for the site to ask you which titles you have a paid subscription for, so you don’t get links to content you can’t read and you do get links for content you like enough to hand over cash for. However without the ability to read the data in the first place, it can’t be grouped or targeted. Grouping and targeting content is all Kagtum really does, and unless it can do that freely and easily it can’t even reference that content.
There is the ability to do something else with the technology, news related but perhaps softer in its definition of news. There are toolkits kicking around on my hard drive for analysing e-mail, RSS and other sources of information in order to help prioritise content, and that might be one direction to go. Building an application for your phone or desktop that does what Kagtum did, but on your machine with your subscription parameters might even work. I’m also intrigued about the prospect of a different kind of UGC-based news portal that doesn’t have the “all stories are equal” issues of Wikinews or the strong bias of Indymedia.
In essence, I’m having a re-think. I genuinely thought NC would abandon their plans by now. As recently as last week I had a drink with one journalist friend and suggested Kagtum would launch in August, and I believed it – I just did not accept this plan was going to happen. My mistake. Sorry, Sarah, I called it wrong (again).
I expect my next announcement will be firmer, but right now I need to brainstorm my options and let them brew a while.
Kagtum is dead, long live Kagtum!


Why don’t you just not include content that’s behind pay-walls? It’s not like it’s going to be readable by most users anyway. The more sites that link to free content and the less to paid content the more valuable free content will become.
Will Jessop
6 Aug 09 at 12:36
Will: for the reasons I explain in the article. It makes maintenance harder, take-up slower, and the product worse quality. It would be like Google only indexing a few websites rather than lots – News Corp don’t just own a few websites, they own *a lot*, and I fully expect in the coming weeks for their competitors to follow suit.
Of course, NC’s competitors could laugh at it all off, NC could fail, lots of other things could happen which would mean the current build could ship.
All I’m saying is it won’t ship right now until I’ve thought this through, and it’s likely a different kind of tool needs to ship.
Paul Robinson
6 Aug 09 at 13:00
I am confused about the maintenance issue, surely if the content is behind a pay-wall you can’t index it anyway? Unless there are some implementation issues I don’t know about.
Will Jessop
6 Aug 09 at 13:05
Couldn’t you batten-down and wait this out?
It’s pretty clear that the pay-wall model won’t be successful, even in the short term. Even something a slow-moving as News Corp will realise sooner rather than later that the only people benefiting are lawyers.
Not knowing enough about Kagtum (though I am eager to get my hands on it), could the technology be re-targetted at another source of news? Blog networks? Microblog entries?
Jon Atkinson
6 Aug 09 at 13:12
Ah, yes that’s confusing unless you have heard the rumours around how they currently plan to pay-wall it. You won’t just get “no content”, you’ll get links to all of the content and there will be more extraneous licensing on it.
When I get a URL and I go and index it, at first it will let me. Then it’ll hit a limit (normally after a number of stories in a given time frame, possibly like FT and WSJ do at the moment), and will throw up an error that needs working out. It won’t be a HTTP error, it’ll just be a page that asks me to login or register which I need to detect somehow.
So then I need to go and take a look at the paper, work out whether it’s owned by News Corporation or a News Corporation subsidary, if not work out the terms for accessing/indexing data, etc.
Bear in mind, I can’t leave this to an algorithm. Until this morning I thought I might get away with it, but then the phrase “furious litigation” made its appearance, and I suddenly had a problem – I couldn’t risk leaving this to a few lines of Ruby.
I expect Google will get sued within about a week of this going live. I don’t want to be next.
It’s also not News Corp alone that’s the problem: it’s approximately 4,000 English language newspapers globally all of whom are likely to now develop different terms and conditions (because they’re likely to follow News Corp’s lead). Right now, it’s flat, that’s about to change.
I could out-source some of this, I could just work through it as I hit pay-walls, I could negotiate with each media group, but assuming 20 minutes per website, I need to find an additional 6 man-months of administration to get up and running the moment pay-walls start to appear. That’s not really tenable.
Who knows, you might be right. I could see the pay-walls, or the T&Cs, and realise that it’s easy to skirt around. However I can’t put something up today that could accidentally get me sued in a few months time. I don’t think they’re going to play nice here.
However, I’ll think it through and I suspect there might be something better to come out of this. I see it as an opportunity to take another look at what I’ve got and decide whether I really did have the best possible product. News Corp’s decision – licensing aside – would make the current product less interesting, because it’s surprising how much content is theirs. Once other groups start to follow – and I fully expect the regional groups to do so – it becomes almost useless.
So, there’s another inkling here of how to just ignore them and still tick off the original goals with something more interesting. I need to think it through, it’ll take a few weeks, but I’ll get there.
Paul Robinson
6 Aug 09 at 13:23
Jon: yes, the tech can be re-targeted very easily. Anything with text it can prioritise against a user profile. Originally KT *was* an RSS aggregator and prioritisation system, it just never got released.
It may be that those lower-priority ideas I was going to release “one day” might now take the fore, and they will get released whilst awaiting the news stuff to settle down.
It’s one of those things I haven’t looked at in a while though so I need to work out where the market is today. That will take a little time. Bear in mind, there is no VC fund here, it’s all funded out of my spare cash, I need to be careful before spinning up half a dozen EC2 instances and throwing a couple of hundred gig of data into the processing engine. :-)
Paul Robinson
6 Aug 09 at 13:26
Its not ethical to read Murdock papers and they are shite: fact
alex
6 Aug 09 at 14:34