Observation-based vs. active harvesting of human intelligence

Submitted by yelvington on November 21, 2008 - 1:41pm

I tripped over a reference to "artificial intelligence" the other day. I guess I tripped because it's not a term I hear very much any more. Maybe it's because I hang around with a lot of geeky people, but it seems quaint and maybe a little pretentious.

Instead, I hear about a lot of very specific techniques: Bayesian networks, collaborative filtering and the slope-one algorithm. I guess those fall under the "artificial intelligence" umbrella, but often it's really a matter of harvesting human intelligence and then acting on the results.

Google has just turned on a change that reportedly has been in a fairly wide test, giving logged-in Google users a chance to vote up/down specific items returned by search queries.

This is a huge change for Google, which made its mark by observation-based harvesting of human intelligence. Both Google Search and Google News observe the results of human decisions and use those observations to recommend items.

Google Search places a high value on inbound links, which are considered to reflect whether a page is "authoritative." If a lot of people link to a page, it must be good. This is why blog spammers prowl the net, posting comments that sneakily embed a link to their websites.

Google News looks at the relative prominence that has been given to a story by editors at thousands of news-related websites, then uses that information to help design its top-level presentation. Rather than reflecting the news judgment of an editor at Google (there isn't one), Google reflects a sort of broad consensus among human editors.

The new Google feature -- which it calls SearchWiki -- switches gears and asks people to take an overt action to provide it with information about human judgment.

Let me bring this home to the world of news sites. This is a good thing because the scale and impact of Google will significantly broaden the pool of people who are in the habit of explicitly evaluating items on the net. This is a habit we can use to our advantage.

Many news sites are adding "rate this item" features, then using that to display lists of actively "top rated" stories, often paired with observationally ranked "most emailed" and "most viewed." That's one way to use the information, but it's a fairly naive way.

I'm far more interested in how we might use this information to generate personalized recommendations using collaborative-filtering principles and that mysterious slope-one algorithm that I mentioned.

As so often is the case, there's already a Drupal module for that, one that originated as a 2006 Google Summer of Code project. As we collect rankings, ratings and other overt evaluations on our websites, I'm looking forward to pointing the recommendation module at that data and seeing what comes out of it.