How to Oust Unwanted Spiders

As a whole, web spiders and robots are a good thing. They are an automated means for search engines to index new content on your site. However, there are some cases in which spyders and robots can be a bad thing. For instance, improper spydering techniques can cause the indexing of individual pages multiple times or repeat visits by the same bot daily. This all adds up to wasted bandwidth and skewed traffic statistics.

I recently had a run in with a very unwanted spider who was indexing my entire site multiple times a day. The spider also ignored my robots.txt file which denied its user agent access to my site. Remembering I had once read a tutorial on mod_rewrite, I decided to write a quick little script to disallow access to said spider.

If you have never heard the terms mod_rewrite or .htaccess, now might be a good time to do some research. First things first, we need to create a file named ‘.htaccess’ (without the quotes). Next inside that file we need add the following code which I will explain later:
[code]

RewriteCond %{HTTP_USER_AGENT} "name of user agent to block"
RewriteRule ^.*$ http://google.com/ [R]

[/code]

You will need to replace “name of user agent to block” with the actual name of the user agent you wish to block. Make sure to keep the quotes intact.

The above code examines each user agent that tries to access your site and redirects them to Google. To redirect multiple user agents you can do:
[code]

RewriteCond %{HTTP_USER_AGENT} "name of user agent to block"
RewriteCond %{HTTP_USER_AGENT} "and another...."
RewriteCond %{HTTP_USER_AGENT} "...and another"
RewriteRule ^.*$ http://google.com/ [R]

[/code]

Save the .htaccess file and upload it to the base directory of your website and relax knowing your bandwidth usage is safe from spam spiders. You can have alot of fun with this. Try changing the redirect from google to localhost!

Tags: , , , ,


Similar Posts:

Programming

Comments

One Response to “How to Oust Unwanted Spiders”

Leave Comment

(required)

(required)