There Is No Cat - Fighting comment spam

Sunday, May 14, 2006

Fighting comment spam

Comment spammers have been attacking There Is No Cat for the past three weeks or so. It’s been kind of fun doing battle with them, although I have to say, I’m getting kind of tired of it.

There Is No Cat runs a content management system of my own creation. One of the benefits of this is that it’s relatively immune to comment spam. I would occasionally get some manual drive-by spams, but nothing too bad. Almost nobody is going to bother to take the time to custom code a spam system to hit a single system run by a host with only fair-to-middling Google whuffie. Almost nobody.

The first run at my server three weeks ago was clearly a test run. I received 110 comments in the space of about three hours on a weekend with no links and nonsensical text. It was clear someone was preparing for something. That was what made me think some custom coding was required. I caught the spam a few hours after the initial attack ended. With so many spam comments, it was just easier for me to go into MySQL and manually nuke all the comments at once. Before I did that, I saved a copy of the database and loaded it on my computer at home so I could analyze the attack and where it came from at my leisure.

A few days later, the real comment spam started showing up. With each attack, I would block the class C network from which it came, which slowed things down. But I also started noting characteristics of the spam, such as a particular misspelled word, or a method of trying to include URLs. One advantage of having written my own system is that it was relatively simple for me to go into the code and add some filters for these characteristics.

The attacks started coming more often in the coming days, and with them some new characteristics. I added some more filtering, and added a logging capability that noted the IP address and which filter triggered the spam blocking code. At this point, I was catching about 98% of the spam. I could have caught 100%, but one of the phrases I would have had to filter on was one I thought had too high a probability of filtering out legitimate comments.

At this point, I looked at my server logs to see if I could discern any patterns over the previous few weeks. Inevitably, just before an attack on a particular page, that page would be accessed with a GET command from the IP address 72.232.92.142, which resolves to 142.92.232.72.reversedns.resolve.ru. I found one instance of this IP address being mentioned on a Polish bulletin board as a source of spam. Okay, so it looks like I’m dealing with a Russian spammer. Searching the ARIN Whois database, I discovered that the net block for this IP address belongs to a company in St. Petersburg:

CustName: Internet Technologies Ltd
Address: Rustavele 48/1 of. 42
Address: IP Management Department
City: Saint Petersburg
StateProv: Saint Petersburg
PostalCode: 199000
Country: RU

In fact, said company owns more than one segment of IP addresses.

I added the following lines to my .htaccess file to prevent them from accessing my site from their spam seeding host at any of their possible IP addresses:

Deny from 72.232.92 # Russian spammer
Deny from 72.232.93 # Russian spammer
Deny from 72.36.222 # Russian spammer
Deny from 72.36.223 # Russian spammer
Deny from 72.36.244 # Russian spammer
Deny from 72.36.245 # Russian spammer

This is actually a little broader than it needs to be; not all of the subnets this company owns are full Class C networks. But I didn’t feel like being charitable.

This stopped the spam seeding accesses, but the actual spam attacks still came (although my countermeasures were still catching 98% of them).

After three weeks of logging the attacks, I had about 800 accesses documented. I wasn’t sure if the spammer was spoofing IP addresses, in which case the IP addresses attacking me would likely be completely random, or if he was operating a bot net of compromised hosts, in which case the same limited number of IP addresses would likely show up over and over.

Well, they weren’t all that limited, but it appeared that most of the IP addresses were used multiple times. There were a few with only single accesses, but most had between three and ten attack instances logged by my filters. And it was clear that in most cases there were only one or two machines on a subnet attacking the site. Probably a bot net, then. In any case, a limited set of IP addresses was being used. So I picked out single lines for each machine and wound up adding 249 individual hosts to my .htaccess list of hosts denied access to the site. I did that about 24 hours ago. Since then, fingers crossed, no spam, and no additions to my log file of blocked attempts. You’re welcome to look at the list of hosts; if this same scumbag is attacking you, maybe you’ll find it useful.

I hope this is the end of this. Why someone would bother to attack a system with one host is beyond me. It would seem to me to be more worthwhile from the perspective of the spammer to attack systems like WordPress or Moveable Type. Of course, it possible they’re using a system that just parses any random comments form and attacks that way without having any special knowledge of how the system is set up by default, in which case my use of a unique CMS wouldn’t afford me any extra protection, but the test run made me think that maybe that wasn’t the case here. Fortunately, after almost 20 years of using computers online, I not only know my way around networks, but also have an in house network forensics expert to bounce ideas off of....

Tags: spam comments weblogging htaccess Internet Technologies Ltd

Posted at 10:25 PM

Comments

My blog isn’t of my own making, but it is open source. It is a php blog called Simplog. It is not the best or worst, and I estmate maybe 20-100 people use it at most. Maybe your spammer is guessing your CMS has some other users somewhere.

I have tweaked Simplog in my own unique way to block spammers. I offer people making comments a spot to put in their e-mail address, but if they do, they get a message telling them not to, and their comment is shipped to /dev/null. This also forces anonymity, but that is the way I run my blog anyway. Spambots don’t get this and keep trying to fill in an e-mail address, but never get through. It is a very simple and effective modification, until they learn better. By simply going against expectations, and the dominant web paradigm, I have cut out all spambots, as far as I can tell.

I also had a problem with spam trackbacks. I had thousands of trackbacks on my old posts on my blog. Since no one I know ever used trackbacks, I just disabled them and deleted the links from my MySQL database. Trackbacks have never really seemed to catch on in a big way.

Posted by lilbro at 8:17 AM, May 15, 2006 [Link]

I didn’t get too many trackback spams; I turned it off for older posts. But I noticed about twenty this morning for the first time in a while, and my tolerance for spam is particularly low right now. I never got many trackbacks, but probably more than you. In any case, not enough to make keeping it around worthwhile. I’ve disabled it. One less potential problem to worry about.

Goddamned spammers.

Posted by ralph at 6:58 AM, May 24, 2006 [Link]