Drug spammers exploit newspaper site search

As newspapers work to improve their search experience and embrace Web search as well as on-site search, they're being exploited by a new round of automated blog spam that displays Internet drug listings right on the newspapers' websites.

This allows unscrupulous scammers to present their pitch under the "trusted information provider" brand of the newspaper. And it undoubtedly undermines the newspaper's brand.

Tribune Company and McClatchy sites in particular are being targeted. [Update: nytimes.com also is being exploited.]

Various "Canadian drugstore" sites are being promoted, but a minor bit of domain detective work traces much of this back to Israel, where several "businesses" registered to people with Russian surnames have registered a number of prescription-drug domains.

On the McClatchy sites, it's an Overture clickthrough tag that's being exploited. Here's an example, with the domain adjusted to my site in order not to promote the drug spammer:
http://www.miamiherald.com/cgi-bin/mi/overture/overture.pl?Keywords=site...

On the Tribune sites, the same trick looks like this:
http://www.orlandosentinel.com/search/dispatcher.front?Query=site:yelvin...

(Go ahead and click on those links; they're safe, and they will show you how the result set is presented.)

In both cases, a bit of checking of the HTTP request headers would probably allow the newspaper's search script to foil the spammer with minimal side effects.

These blog spammers attack websites with automated scripts that attempt to post comments on blog entries. Typically several dozen comments containing little more than the Web links are posted at once.

There are several techniques blog sites can use to foil these attacks.

Registration-only commenting stops most of it, although a few blog spammers do register usernames, then return weeks or months later with scripts programmed to log in and post spam.

Requiring approval of the posting (as I do) prevents the spam from being made public, at the cost of some administrative overhead to delete the evil and promote the good.

Captcha, a technique that requires users to answer a question in order to post, is the most effective technique. There are several variations. One uses a warped graphic image of a random password that the user has to type. Another asks the user to type the Nth word of a random sentence. And yet another asks the user to perform a calculation, or answer a trivia question. They're all remarkably annoying to the innocent.

Comments

Another trick to blog form spam is to use a hidden field that tricks robots into inserting text into the field.

You hide the field with CSS (cause robots will skip over type="hidden" attributed fields).

If, upon submission, the field isn't blank, the form isn't processed.

Not that a newspaper site can adopt these directly, but --

a) Akismet, a WordPress plugin, does a great job of identifying and trashing spam blog comments.

b) Gmail is fantastic at catching 99 percent of the spam e-mail that comes to my address. (It is 100 times better than whatever our IT people are using on the Microsoft Exchange server, which still misses tons of spam e-mails.)

These success stories indicate that spam can be caught and stopped. The newspapers are just going to need to develop or buy -- and install and maintain -- the right systems to do the job.

i had this problem in barrowloads, in some cases returning to pages (all of which allow reader comments to be added without registration) to find new lists of spam postings added overnight or sometimes within hours of removing them. Some of the bot postings were so voluminous it required a big page load with some 100 added spam comments.

So, I added a generated random string of digits display with the form and added that same string also as a hidden field in the form - i didnt use a warped text image display, but simply ask the poster to copy or type the displayed random string into a feild which then checks it against the hidden string on posting

i thought it wouldn't work because the string wasnt displayed with a fancy warped graphic, but it did work, no more drug, casino or pron spam, yay! it's been effective now for a year or so, seems to be working. I generate the string with a simple CFSET line at the page header.