Microformats, hNews, the AP and the Animals
In 1965, before a lot of you were born, Eric Burdon of the Animals sang these lines:
But I'm just a soul whose intentions are good:
Oh Lord! Please don't let me be misunderstood ...
That's the background music. Here's the story.
Some geeks at the AP got together with some geeks in Europe and came up with a really smart idea. Unfortunately, that smart idea got sucked into the swirling vortex of panic and craziness that reigns at a lot of media companies these days. And a really smart idea has become terribly misunderstood, twisted into a really bad idea, portrayed as something it is not, sold as a cure for a questionable ailment that it can't fix.
The idea is the application of microformats to news content. A microformat, for those of you who aren't all geeked out, is a way of adding hints to HTML markup so that Web spiders and other software can precisely discover facts without having to guess. This is a name. That is an address. And so forth.
A microformat lets you indicate structure where otherwise there would be just a big messy blob of data. It's sort of an "Oh Lord! Please don't let me be misunderstood" message to web spiders and scrapers.
If you understand how important search engines are to the Internet, this probably makes sense to you. You see why structure is important. Looking for a restaurant? You care that it's a restaurant and not a drycleaner with a similar name. And you care where it is. Making certain that a Web spider understands a location reference is the job of the geo microformat, which is used in standards called hCard (sort of like business cards) and hCalendar (events happen in a location).
Collectively, all of these little hints are small parts of a broader movement toward a World Wide Web of data and meaning, referred to as the Semantic Web. But you don't need to go down that rabbit hole to understand this. If you're a journalist, you understand that a byline is significant: it clearly identifies the writer responsible for a story. A dateline is significant: it identifies the location central to the story, where the writer presumably gathered the information. Wouldn't it be great if we had a standard, machine-readable way to indicate byline and dateline in Web content? Instead of just throwing it out there and hoping for the best?
We get that from hNews, a proposal from two UK-based organizations, the Media Standards Trust and the Web Science Research Initiative, with the help of some money from the MacArthur Foundation and the Knight Foundation. The proposal has been picked up by some smart geeks inside the Associated Press. But then everything went wrong.
(What's the connection? This: Nobody in their right mind thinks Barack Obama isn't an American citizen. Nobody in their right mind thinks newspapers are facing financial trouble because of evil content pirates on the Internet. But there's no shortage of people willing to believe any convenient nonsense that excuses them of personal responsibility for the situation in which they find themselves. Crass.)
So let's get back to hNews. It lets a news organization publish news on the Web that looks to the consumer pretty much like it always did, but behind the scenes, it's easy for fact-stripping robots to identify and extract fielded data.
One of those fields may include licensing terms. A standard for machine-readable copying conditions is a good thing.
But if you think for a minute that such a standard prevents pirates from copying and distributing your content, you're either smoking something, or you're a technologically ignorant. Filtering out microformats is child's play. If you're a pirate, you will have no qualms in doing so.
So we have on the floor a proposal. From the perspective of building a better Internet, it's a good idea. From the perspective of stopping bad people from stealing, it's utterly ineffective. We should understand what it really does, and adopt it for what it really is, and drop the silly posturing about how it's going to make all our financial troubles vanish. Because it's not that, not at all. What it is, is a good thing.