SiteBlog 2 Mini-Tutorial: Comment-Spam

Definition and Background
Search Engines and Comment-Spammers
Recognizing Comment-Spam
Fighting Comment-Spam


DEFINITION AND BACKGROUND

(For a quick summary of comment-spam, please refer to the General FAQ.)

Spam, as we all know, is an unsolicited commercial email. Its rampant presence on the Web is costing millions, if not billions, of dollars in bandwidth and productivity, and has spawned industries promoting and fighting spam.

Comment-spam, spam in blog comments and trackbacks, is a relatively new phenomenon. But due to the growth in blogging, comment-spams have turned from being a nuisance to being a real problem for many in the Blogosphere. Given recent efforts to curtail comment-spamming, it's not surprising to learn that spammers are countering by using new, and sometimes very surreptitious, tactics.

To understand the rationale for comment-spamming, it's necessary to first understand search engines and their method of ranking Web pages.

SEARCH ENGINES AND COMMENT-SPAMMERS

Search engines, such as Google, span the entire Web, index any relevant pages, and store them in a database. They are generally crawler-based search engines that utilize "robots" (or "spiders") to crawl through the Web (including your Web site and SiteBlog 2 page) and associate certain keywords and phrases to a site. When a keyword or phrase is searched by a user, the search engine scans its database of crawled sites and attempts to match and rank relevance. Although each search engine uses its own algorithm for determining "relevance" (and thus the order in which results are presented to you), Google - the most used search engine - uses a process called PageRank.

Essentially, PageRank determines a Web page's relevance by how often it's linked by other pages, what those other pages say, and the relevance of those other pages in of themselves. Spammers are very aware what PageRank does, and have in fact created a process called Google Bomb. It's an attempt to influence the ranking of Web site pages by placing links to the same site or page on hundreds/thousands of other sites.

Thus was born comment-spamming. In the guise of commenting on an article in your blog, comment-spammers submit a comment that includes a link back to their commercial site. Now imagine the influence it would have on search engine relevance if that same spammer repeated the same comment on thousands of other blogs. Unlike email spams, the target audience is not you but rather the search engines. In fact, comment-spammers would prefer that you ignore their comment posts so that they can continued to be spidered and linked back to their sites.

RECOGNIZING COMMENT-SPAM

Comment-spammers' tactics range from the most blatantly commercial to extremely subtle. The latter often manifests itself in very benign forms such as famous quotes, common-sensical phrasings, or even positive statements about your blog. So how will you know it's spam? There will be a link to their commercial site, a site that will usually be completely irrelevant to your blog and/or article.

FIGHTING COMMENT-SPAM

Fortunately, SiteBlog 2 provides several means of minimizing comment-spam.

  • You can choose to approve ALL comments so that comment-spams never get a chance to appear on your blog.
    • This method will eliminate all spams since every comment will be moderated by you.
    • So why wouldn't this be the only tool provided? The two primary disadvantage are that the blog administrator must commit more time for comment moderation, and some legitimate commentators may wonder why their comments aren't appearing instantaneously and impatiently leave never to return.
    • To combat the possibility of legitimate commentator frustration, every comment page in SiteBlog 2 includes the following text: "Publication may require admin approval; please come back later to view your comment."
  • If you do not wish to moderate all comments, then you can:
    • Disallow all commenting. We do not recommend this option as reader interaction is a very important component of blogging.
    • Automatically moderate comments based on the following criteria:
      • The comment contains more than X number of links. One characteristic of comment-spams is the inclusion of multiple links. You can set X to any number.
      • The contents of the comment match a list (a list created by you) of keywords, name, URL, email address or even the IP addresses of known comment-spammers.

 


©Copyright 2005 Hostway Corporation. All Rights Reserved.
Web Hosting | Domain Name Registration | Email | Managed Hosting