Blocking Bots in IIS

Blocking Bots in IIS

Recently I had an application on apache become the victim of bot spam. As a good measure and to be proactive, I set out to implement the same protection on a Windows Server running IIS 7.5.

The web is something on the order of 60% bot traffic, many of these are inconsequential and can safely be blocked with out damaging your SEO. I chose to block them based on user agent, since many of these bots have a range of IP addresses they can utilize.

First off you will need to make sure you have the URL rewrite module added to your installation of IIS. The version I am writing this for is IIS 7, the process should be similar for other versions.

(Microsoft URL Rewrite Module 2.0 for IIS 7 (x64))

I checked that this Rewrite Module will work for both IIS 7.0 and 7.5.

After installing the module, restart your IIS Manager and click your server on the left hand side. It will be the very top options after “Start Page”. You can see it in the image below.

url-rewrite-icon

Once URL Rewrite is enabled on your web server. Next, click on “Add Rules…” from the Actions pane.

You will see a window open with the below information. Click on request blocking, then click “OK”.

add-rule-request-blocking

You will then be prompted with choosing the settings for your rule.

Select User-agent Header for the “block access based on” field.

Select Using: regular expressions

user-agent-header

Then enter your pattern.

I used the below pattern for my rule. The top listed one, “^$” is the regex for an empty string. I do not allow bots to access the pages unless they identify with a user-agent, I found most often the only things hitting my these applications with out a user agent were security tools gone rogue.

I will advise you when blocking bots be very specific. Simply using a generic word like “fire” could pop positive for “firefox” You can also adjust the regex to fix that issue but I found it much simpler to be more specific and that has the added benefit of being more informative to the next person to touch that setting.

Additionally, you will see I have a rule for Java/1.7.0_25 in this case it happened to be a bot using this version of java to slam my servers. Do be careful blocking language specific user agents like this, some languages such as ColdFusion run on the JVM and use the language user agent and web requests to localhost to assemble things like PDFs. Jruby, Groovy, or Scala, may do similar things, however I have not tested them.

Below you will see a full list of all bots that are blocked by the above regex.

^$
EasouSpider
Add Catalog
PaperLiBot
Spiceworks
ZumBot
RU_Bot
Wget
Java/1.7.0_25
Slurp
FunWebProducts
80legs
Aboundex
AcoiRobot
Acoon Robot
AhrefsBot
aihit
AlkalineBOT
AnzwersCrawl
Arachnoidea
ArchitextSpider
archive
Autonomy Spider
Baiduspider
BecomeBot
benderthewebrobot
BlackWidow
Bork-edition
Bot mailto:craftbot@yahoo.com
botje
catchbot
changedetection
Charlotte
ChinaClaw
commoncrawl
ConveraCrawler
Covario
crawler
curl
Custo
data mining development project
DigExt
DISCo
discobot
discoveryengine
DOC
DoCoMo
DotBot
Download Demon
Download Ninja
eCatch
EirGrabber
EmailSiphon
EmailWolf
eurobot
Exabot
Express WebPictures
ExtractorPro
EyeNetIE
Ezooms
Fetch
Fetch API
filterdb
findfiles
findlinks
FlashGet
flightdeckreports
FollowSite Bot
Gaisbot
genieBot
GetRight
GetWeb!
gigablast
Gigabot
Go-Ahead-Got-It
Go!Zilla
GrabNet
Grafula
GT::WWW
hailoo
heritrix
HMView
houxou
HTTP::Lite
HTTrack
ia_archiver
IBM EVV
id-search
IDBot
Image Stripper
Image Sucker
Indy Library
InterGET
Internet Ninja
internetmemory
ISC Systems iRc Search 2.1
JetCar
JOC Web Spider
k2spider
larbin
larbin
LeechFTP
libghttp
libwww
libwww-perl
linko
LinkWalker
lwp-trivial
Mass Downloader
metadatalabs
MFC_Tear_Sample
Microsoft URL Control
MIDown tool
Missigua
Missigua Locator
Mister PiX
MJ12bot
MOREnet
MSIECrawler
msnbot
naver
Navroad
NearSite
Net Vampire
NetAnts
NetSpider
NetZIP
NextGenSearchBot
NPBot
Nutch
Octopus
Offline Explorer
Offline Navigator
omni-explorer
PageGrabber
panscient
panscient.com
Papa Foto
pavuk
pcBrowser
PECL::HTTP
PHP/
PHPCrawl
picsearch
pipl
pmoz
PredictYourBabySearchToolbar
RealDownload
Referrer Karma
ReGet
reverseget
rogerbot
ScoutJet
SearchBot
seexie
seoprofiler
Servage Robot
SeznamBot
shopwiki
sindice
sistrix
SiteSnagger
SiteSnagger
smart.apnoti.com
SmartDownload
Snoopy
Sosospider
spbot
suggybot
SuperBot
SuperHTTP
SuperPagesUrlVerifyBot
Surfbot
SurveyBot
SurveyBot
swebot
Synapse
Tagoobot
tAkeOut
Teleport
Teleport Pro
TeleportPro
TweetmemeBot
TwengaBot
twiceler
UbiCrawler
uptimerobot
URI::Fetch
urllib
User-Agent
VoidEYE
VoilaBot
WBSearchBot
Web Image Collector
Web Sucker
WebAuto
WebCopier
WebCopier
WebFetch
WebGo IS
WebLeacher
WebReaper
WebSauger
Website eXtractor
Website Quester
WebStripper
WebStripper
WebWhacker
WebZIP
WebZIP
Wells Search II
WEP Search
Widow
winHTTP
WWWOFFLE
Xaldon WebSpider
Xenu
yacybot
yandex
YandexBot
YandexImages
yBot
YesupBot
YodaoBot
yolinkBot
youdao
Zao
Zealbot
Zeus
ZyBORG