You are here

Microformats, hNews, the AP and the Animals

Submitted by yelvington on July 29, 2009 - 11:10pm

In 1965, before a lot of you were born, Eric Burdon of the Animals sang these lines:

But I'm just a soul whose intentions are good:
Oh Lord! Please don't let me be misunderstood ...

That's the background music. Here's the story.

Some geeks at the AP got together with some geeks in Europe and came up with a really smart idea. Unfortunately, that smart idea got sucked into the swirling vortex of panic and craziness that reigns at a lot of media companies these days. And a really smart idea has become terribly misunderstood, twisted into a really bad idea, portrayed as something it is not, sold as a cure for a questionable ailment that it can't fix.

The idea is the application of microformats to news content. A microformat, for those of you who aren't all geeked out, is a way of adding hints to HTML markup so that Web spiders and other software can precisely discover facts without having to guess. This is a name. That is an address. And so forth.

A microformat lets you indicate structure where otherwise there would be just a big messy blob of data. It's sort of an "Oh Lord! Please don't let me be misunderstood" message to web spiders and scrapers.

If you understand how important search engines are to the Internet, this probably makes sense to you. You see why structure is important. Looking for a restaurant? You care that it's a restaurant and not a drycleaner with a similar name. And you care where it is. Making certain that a Web spider understands a location reference is the job of the geo microformat, which is used in standards called hCard (sort of like business cards) and hCalendar (events happen in a location).

Collectively, all of these little hints are small parts of a broader movement toward a World Wide Web of data and meaning, referred to as the Semantic Web. But you don't need to go down that rabbit hole to understand this. If you're a journalist, you understand that a byline is significant: it clearly identifies the writer responsible for a story. A dateline is significant: it identifies the location central to the story, where the writer presumably gathered the information. Wouldn't it be great if we had a standard, machine-readable way to indicate byline and dateline in Web content? Instead of just throwing it out there and hoping for the best?

We get that from hNews, a proposal from two UK-based organizations, the Media Standards Trust and the Web Science Research Initiative, with the help of some money from the MacArthur Foundation and the Knight Foundation. The proposal has been picked up by some smart geeks inside the Associated Press. But then everything went wrong.

What the AP announced was not a smart initiative to properly encode structure into its blobbish data, but rather a harebrained scheme to "create a news registry that will tag and track all AP content online to assure compliance with terms of use." It's harebrained because it does nothing of the sort. Even worse, it's a crass harebrained scheme, an attempt to suck up to AP's base of newspaper publishers not unlike a Republican politician sucking up to the birther crazies.

(What's the connection? This: Nobody in their right mind thinks Barack Obama isn't an American citizen. Nobody in their right mind thinks newspapers are facing financial trouble because of evil content pirates on the Internet. But there's no shortage of people willing to believe any convenient nonsense that excuses them of personal responsibility for the situation in which they find themselves. Crass.)

So let's get back to hNews. It lets a news organization publish news on the Web that looks to the consumer pretty much like it always did, but behind the scenes, it's easy for fact-stripping robots to identify and extract fielded data.

One of those fields may include licensing terms. A standard for machine-readable copying conditions is a good thing.

But if you think for a minute that such a standard prevents pirates from copying and distributing your content, you're either smoking something, or you're a technologically ignorant. Filtering out microformats is child's play. If you're a pirate, you will have no qualms in doing so.

So we have on the floor a proposal. From the perspective of building a better Internet, it's a good idea. From the perspective of stopping bad people from stealing, it's utterly ineffective. We should understand what it really does, and adopt it for what it really is, and drop the silly posturing about how it's going to make all our financial troubles vanish. Because it's not that, not at all. What it is, is a good thing.

Comments

Great explanation on why microformats are a slick technical improvement for giving context to our blobs of data. I hope that hNews and other microformats won't get tarnished unfairly from the AP's wishful thinking that it will solve all the content stealing boogyman nightmares. I've got to wonder if the developers in the AP saw that their PHB's were missing the point of hNews but decided to sell them the magic beans anyways. If hNews spread widely it could give a nice kick-start to the semantic web and give us developers lots of fun new tech areas to play with. The lesson from all of this for the corporate bigwigs is to have a developer in the room who can call BS on silly ideas before the press release ;)

Interesting that AP would play to this newspaper "base," considering that dues from member newspapers make up just about a fifth of the company's revenue (see Singleton's annual address) - though they are certainly well-represented on the board!

Of course they're well-represented on the board; only newspapers can be full members of the Associated Press cooperative. Broadcasters can be associate members, and everyone else is just a commercial customer.

While AP's newspaper revenues are rapidly declining as a percentage of the total, they're still important, and AP is not (yet?) in a position to live without them, or without the newspaper stories that are fed daily to the AP for redistribution and/or rewriting. AP relies less on newspaper content than many newspaper people imagine, but it's still important.

AP's newspaper relationships are becoming somewhat precarious. Faced with a need to cut newsroom expenses, an editor looks at the obvious candidates:

  • Reporters and photographers. Cut them and you lose your exclusive local content. Preserve at all costs.
  • Editors. Cut them and you lower the quality, allow more errors into the product. Acceptable given the alternatives?
  • Syndicates and wire services. Cut them and you lose commodity nonlocal content that's available elsewhere on the Internet. Maybe you lose some older readers, but it's probably less of an issue than the other options.

Dropping AP contractually requires a substantial lead time. A number of newspapers have given notice in order to keep their options open. AP needs to fend that off before it becomes real.

Thanks, I was having trouble understanding what this "is". Now that I know what it's not, I'm excited. I'm not sure how I feel about some of the tags I saw in an AP-story I dug into. If AP is somehow tracking the use of their stories, it seems to be done through javascript in the user's browser.

Why does Hulu.com succeed and other video sites, such as Joost, fall flat? Respect for copyright, and a system that provides an extraordinary user interface. Video, unlike text, requires codecs to get video from Point A to Point B. Few realize how much Hulu.com has done to protect the rights of TV programing, and the various distribution windows thereto. I believe some form of syndication/distribution/licensing is critical for any news org to be sustainable. Just as cuetones enabled local TV advertising, so too will Twitter or Tarpipe (or something like it) provide a similar backbone for all sorts of automated copyright management workflows. You can't expect plain old HTML to provide the basis for your copyright management platform, but Adobe LiveCycle is probably too extreme. Whether it is microformats or solutions like Apture, I do not know. But newspapers will not get to the point where they can create multiple productions and enforce revenue rights without a critical mass buying into some sort of solution.