Probably the most boring post in the world

| | Comments (2)

I decided the other day, out of sheer boredom and a strong desire to end it all, to take a look at my web stats. See if I could work out where my visitors were coming from, that sort of thing. I can't remember the last time I looked at my stats, but what I found was quite an eye-opener.

My top 20 referrers were all online poker sites. This is odd, I thought, so I downloaded the raw log file to examine it in more detail. I discovered that these web pages (a typical, slightly modifed, example being http://xxx.poker-4all.com/online-poker.html) were all accessing my comments page, from different IP addresses. I can only guess that there are people out there loading pages on a cash-for-clicks basis, or perhaps running a trojan behind the scenes which loads pages and scans them for email addresses for spamming at a later date. This annoyed the hell out of me for two reasons. One, they're skewing my web stats out of all recognition, and two, they're using up my bandwidth. The swines. It also annoyed me because I'd already renamed my comments script once, and the bots were loading pages via the renamed script. So it wasn't some script blindly searching for mt-comments.cgi - this script knew what I'd renamed it to.

First things first, I renamed the script again. This only takes a couple of minutes, and after rebuilding the first page of the site again everything was working normally, and visitors can leave comments again. From now on the bots would be getting a 404 page not found error. At least until they discovered the new name of the comments script, which actually isn't all that difficult to find since it's in plain view for all to see in the page source of any entry with comments on. It looks like I might be renaming the script on a regular basis from now on.

I looked at the IP addresses of the offending visitors. There were lots of different ones, of course, but some seemed to come up more than others. I had noticed on my web host's control panel that I can block access from certain named IP addresses or URL's, so I thought I'd give this a go. I may be swimming against the tide here, but I thought I had nothing to lose by trying anyway! I made a list of the worst culprits, and added them to my IP ban list. Nothing to do now but wait.

A few hours later, I went back to look at my stats. Sure enough, there were still lots of requests for random comments pages, still coming from poker and casino sites. But they were all getting 404's now. That was still using up some bandwidth (the 404 error page was loading up each time) but now I saw something else. The IP addresses I'd banned were still coming back, but at least now they were getting a "403 forbidden access" error. A small victory, heheh.

I'll keep an eye on this over the next few weeks. There is one small consolation for those of you who have commented on my site. Your email addresses are not shown in the page source of the comments page. However your own url's are visible, and if the bot follows these links and gets your email address off your own page, well, there's not a lot I can do about that. That's the way the internet works, I'm afraid. Then again, anyone who puts an email address on a web page can expect spam sooner or later. gMail seems to be pretty good at filtering it all out, which is why I use my gMail address on my own site and also when I leave comments on other sites.

Hopefully after a few days (weeks, months?) my web stats will tidy up a bit and I'll get a more accurate picture of where my visitors are coming from.

Now, was that the most boring post in the world or not?

2 Comments

annie said:

No, that's very interesting. I had heard of that and I check my stats occasionally and I do not see stuff like that. Where do they come from? Isn't it when you mention some certain word in your page, they find it? I had that problem once when I mentioned a certain something, but then they just went away. Hmmm...

cheryl said:

Try this:
http://english-38095918254.spampoison.com/
The blurb says: These links will redirect email harvesting bots to trap sites that will feed it with an almost infinite loop of dynamically generated fake email addresses, mostly on known spammer owned domains! This will render their harvested lists pratically useless and of no commercial value.

Leave a comment

About this Entry

This page contains a single entry by Dan published on July 14, 2005 11:46 PM.

Excuse me while I mention... was the previous entry in this blog.

Probably the most boring post in the world (vol 2) is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.01