Welcome Guest, you are in: Namespace

A description of the Agent Blocker function in the Sueetie Addon Pack.

Blocking verses Filtering

Sueetie gives you the ability to block crawler and bot access to your community using string excerpts of the Agent description. Here are a few agent string examples: Mail.Ru, googlebot, msnbox and R6_ScrapBox. Some of these agents are good and some are bad. We have two options in how to handle bots and crawlers in Sueetie:

  1. Block them completely by throwing up a 404 page
  2. Let them do their thing on our site, but filter them by not entering the page request into the Sueetie Request Logs

Blocking is easy enough to understand. They don't get in. Period. Filtering requires a small bit of elaboration.

Why Filter Bot Page Requests

There are two primary reasons to filter bot page requests from being recorded in the Sueetie Request Logs. 1) To slow the growth of our SQL database, and 2) To improve Sueetie Analytics Reporting Data.

Sueetie Analytics (in development) records all page requests. It will provide you with detailed information on your most popular pages. We should not care if bots find our pages popular, we care if PEOPLE find our pages popular, and with Sueetie Analytics, precisely which people in our community. So no bots in our Sueetie Request Logs.

Managing Bot and Crawler Access

Below is a screenshot of the Sueetie Addon Pack's Agent Access Management page. Here we enter a string excerpt found in the UserAgent request description. Our only remaining decision is to allow site access or not. Allowing site access gives them full reign of the site (according to your /robot.txt file), but does not log any page in the Sueetie Request Logs. Not allowing them site access will result in blocking them from the site entirely using the Sueetie Addon Pack AgentBlocker HttpModule.

Image

Determining Which Bots to Block

We know how to block and filter crawlers and bots, but identifying the crawler activity on our sites is a necessary prerequisite. So next to the agent blocking and filtering table is a report of all user agent activity on your site grouped by number of hits. Here's the full Agent Blocker page that includes the User Agent Activity table.

Image

The User Agent Activity Table includes all agents used by only the anonymous userID. With the Agent Activity table we can determine which terms to use for agent blocking and filtering. Once we add the agent to our process list, we click the Agent's DELETE button where all records in the Sueetie Request logs will be purged.

Image

Checking on our Bot's Site Access Status

One final piece to the Agent Blocking and Filtering module is the ability to confirm that your agent string is being handled the way you want it to be handled. Here's a screenshot of the Agent Status Checker page where we're testing access for "Googlebot." We want to give the Googlebot agent access to our site, but we don't want to include it in our request logs. We're identifying "Googlebot" as a crawler with site access, in other words.

Image

Top

ScrewTurn Wiki version 3.0.4.560.

Copyright © 2008-2012 Sueetie LLC. All rights reserved.
Sueetie