Recently I had an application on apache become the victim of bot spam. As a good measure and to be proactive, I set out to implement the same protection on a Windows Server running IIS 7.5.
The web is something on the order of 60% bot traffic, many of these are inconsequential and can safely be blocked with out damaging your SEO. I chose to block them based on user agent, since many of these bots have a range of IP addresses they can utilize.
First off you will need to make sure you have the URL rewrite module added to your installation of IIS. The version I am writing this for is IIS 7, the process should be similar for other versions.
(Microsoft URL Rewrite Module 2.0 for IIS 7 (x64))
I checked that this Rewrite Module will work for both IIS 7.0 and 7.5.
After installing the module, restart your IIS Manager and click your server on the left hand side. It will be the very top options after “Start Page”. You can see it in the image below.
Once URL Rewrite is enabled on your web server. Next, click on “Add Rules…” from the Actions pane.
You will see a window open with the below information. Click on request blocking, then click “OK”.
You will then be prompted with choosing the settings for your rule.
Select User-agent Header for the “block access based on” field.
Select Using: regular expressions
Then enter your pattern.
I used the below pattern for my rule. The top listed one, “^$” is the regex for an empty string. I do not allow bots to access the pages unless they identify with a user-agent, I found most often the only things hitting my these applications with out a user agent were security tools gone rogue.
I will advise you when blocking bots be very specific. Simply using a generic word like “fire” could pop positive for “firefox” You can also adjust the regex to fix that issue but I found it much simpler to be more specific and that has the added benefit of being more informative to the next person to touch that setting.
Additionally, you will see I have a rule for Java/1.7.0_25 in this case it happened to be a bot using this version of java to slam my servers. Do be careful blocking language specific user agents like this, some languages such as ColdFusion run on the JVM and use the language user agent and web requests to localhost to assemble things like PDFs. Jruby, Groovy, or Scala, may do similar things, however I have not tested them.
Below you will see a full list of all bots that are blocked by the above regex.
^$ EasouSpider Add Catalog PaperLiBot Spiceworks ZumBot RU_Bot Wget Java/1.7.0_25 Slurp FunWebProducts 80legs Aboundex AcoiRobot Acoon Robot AhrefsBot aihit AlkalineBOT AnzwersCrawl Arachnoidea ArchitextSpider archive Autonomy Spider Baiduspider BecomeBot benderthewebrobot BlackWidow Bork-edition Bot mailto:email@example.com botje catchbot changedetection Charlotte ChinaClaw commoncrawl ConveraCrawler Covario crawler curl Custo data mining development project DigExt DISCo discobot discoveryengine DOC DoCoMo DotBot Download Demon Download Ninja eCatch EirGrabber EmailSiphon EmailWolf eurobot Exabot Express WebPictures ExtractorPro EyeNetIE Ezooms Fetch Fetch API filterdb findfiles findlinks FlashGet flightdeckreports FollowSite Bot Gaisbot genieBot GetRight GetWeb! gigablast Gigabot Go-Ahead-Got-It Go!Zilla GrabNet Grafula GT::WWW hailoo heritrix HMView houxou HTTP::Lite HTTrack ia_archiver IBM EVV id-search IDBot Image Stripper Image Sucker Indy Library InterGET Internet Ninja internetmemory ISC Systems iRc Search 2.1 JetCar JOC Web Spider k2spider larbin larbin LeechFTP libghttp libwww libwww-perl linko LinkWalker lwp-trivial Mass Downloader metadatalabs MFC_Tear_Sample Microsoft URL Control MIDown tool Missigua Missigua Locator Mister PiX MJ12bot MOREnet MSIECrawler msnbot naver Navroad NearSite Net Vampire NetAnts NetSpider NetZIP NextGenSearchBot NPBot Nutch Octopus Offline Explorer Offline Navigator omni-explorer PageGrabber panscient panscient.com Papa Foto pavuk pcBrowser PECL::HTTP PHP/ PHPCrawl picsearch pipl pmoz PredictYourBabySearchToolbar RealDownload Referrer Karma ReGet reverseget rogerbot ScoutJet SearchBot seexie seoprofiler Servage Robot SeznamBot shopwiki sindice sistrix SiteSnagger SiteSnagger smart.apnoti.com SmartDownload Snoopy Sosospider spbot suggybot SuperBot SuperHTTP SuperPagesUrlVerifyBot Surfbot SurveyBot SurveyBot swebot Synapse Tagoobot tAkeOut Teleport Teleport Pro TeleportPro TweetmemeBot TwengaBot twiceler UbiCrawler uptimerobot URI::Fetch urllib User-Agent VoidEYE VoilaBot WBSearchBot Web Image Collector Web Sucker WebAuto WebCopier WebCopier WebFetch WebGo IS WebLeacher WebReaper WebSauger Website eXtractor Website Quester WebStripper WebStripper WebWhacker WebZIP WebZIP Wells Search II WEP Search Widow winHTTP WWWOFFLE Xaldon WebSpider Xenu yacybot yandex YandexBot YandexImages yBot YesupBot YodaoBot yolinkBot youdao Zao Zealbot Zeus ZyBORG