thought this cld be of interest...
http://www.benzedrine.cx/relaydb.html
Annoying spammers with pf and spamd
A Polish translation of this article by l5x(at)revival(dot)pl can be found here.
Introduction
I don't like getting spam. The problem is not detecting it automatically, that works very well with tools like SpamAssassin and bmf. Even though I can automatically delete spam without reading it, the spammers still successfully deliver their mails and get paid by volume. I want to hurt them. They should not be able to deliver their mails, and waste as much of their resources as possible attempting to do so.
Tarpits
Tarpits like spamd are fake SMTP servers, which accept connections but don't deliver mail. Instead, they keep the connections open and reply very slowly. If the peer is patient enough to actually complete the SMTP dialogue (which will take ten minutes or more), the tarpit returns a 'temporary error' code (4xx), which indicates that the mail could not be delivered successfully and that the sender should keep the mail in his queue and retry again later. If he does, the same procedure repeats. Until, after several attempts, wasting both his queue space and socket handles for several days, he gives up. The resources I have to waste to do this are minimal.
If the sender is badly configured, an uncooperative recipient might actually delay his entire queue handling for several minutes each time he connects to the tarpit. And many spammers use badly configured open relays.
Obviously, I only want known spammers to get connected to my tarpit instead of my real MTA.
Blacklists
I can use an externally maintained list of spammers like spamhaus to redirect senders to the tarpit selectively. But such lists may be either too slow to include new spamming hosts, or too aggressive for my taste. Some blacklists will not only include single hosts, but entire networks that contain a single spamming host, willingly hurting innocent customers of an ISP to pressure the ISP to terminate the spammer. The blacklist maintainers document such policies, and if I agree with them, it's my decision to block mail from such networks by using their blacklist.
But even if I'm comfortable with blocking mail from innocent bystanders and use the most aggressive blacklists combined, there will still be spammers getting mails delivered to me through newly discovered open relays. Those spam mails will of course be detected by my spam filters, so I'd like to use these IP addresses to build my own blacklist.
Building my own blacklist
Assume I have the following procmail configuration in place to detect (and file) spam:
:0fw
| /usr/local/bin/spamc
:0:
* ^X-Spam-Status: Yes
in-x-spam
Each incoming mail is piped through the spam detector. If it classifies the mail as spam, the message gets stored in a separate file. I could delete them instead, but I might want to check the mails for false positives every once in a while. Once the classifier is tuned right, there will be almost no false positives, and almost all spam is detected. I'm reaching 99.95% accuracy here, with maybe 0.01% false positives, which is fine for me.
Analyzing Received: headers
I'm using one additional tool, relaydb, to build a database of all hosts that send me mail. This is done after the classification by the spam detector, so I can tell the database whether the sender was sending spam or legitimate mail.
I add the following part to my procmail configuration:
:0fw
| /usr/local/bin/spamc
:0c
* ^X-Spam-Status: Yes
| /home/dhartmei/bin/relaydb -b
:0:
* ^X-Spam-Status: Yes
in-x-spam
:0c
| /home/dhartmei/bin/relaydb -w
So, detected spam gets piped through relaydb -b (blacklist), and legitimate mail through relaydb -w (whitelist). Note that only copies of mails get piped through relaydb, the program never modifies or drops a mail. All it does is build a database of hosts that sent me mail, counting spam and legitimate mail from each one.
relaydb traverses all Received: headers in a mail from top (nearest relay) to bottom. It only acts on valid numerical IP addresses in [] brackets, which is the only reliable part. And it's only reliable when I trust the previous relay in the chain, as spammers often add fake Received: headers. So relaydb starts with the top-most relay in the header and consults its database to see whether it is a known host, and if so, whether it sent me legitimate mail before. If that's the case, it increases the respective counter (spam or legitimate, as told through the -b/-w option) for that host and continues with the next relay found in the header. If the relay is a known spammer, traversal ends, as further headers cannot be trusted.
After I run this setup for a while, relaydb has built both a blacklist and a whitelist. One important detail is that a legitimate mail has more weight than than a spam mail. I regularly receive spam through mailing lists. Of course, I don't consider the mailing list server a spamming host. Yet, each spam I receive through it will increase the spam counter for that server. Therefore, relaydb only reports hosts as blacklisted when their spam counter is at least three times as high as the counter for legitimate mail (and the factor can be adjusted, of course). So a relay doesn't get blacklisted as long as it sends me legitimate mail to compensate for spam it sends, which covers mailing list servers. But if I get a spam from a host that never sent me anything before, that will cause it to get blacklisted immediately (1 >= 0*3).
http://www.benzedrine.cx/relaydb.html
Annoying spammers with pf and spamd
A Polish translation of this article by l5x(at)revival(dot)pl can be found here.
Introduction
I don't like getting spam. The problem is not detecting it automatically, that works very well with tools like SpamAssassin and bmf. Even though I can automatically delete spam without reading it, the spammers still successfully deliver their mails and get paid by volume. I want to hurt them. They should not be able to deliver their mails, and waste as much of their resources as possible attempting to do so.
Tarpits
Tarpits like spamd are fake SMTP servers, which accept connections but don't deliver mail. Instead, they keep the connections open and reply very slowly. If the peer is patient enough to actually complete the SMTP dialogue (which will take ten minutes or more), the tarpit returns a 'temporary error' code (4xx), which indicates that the mail could not be delivered successfully and that the sender should keep the mail in his queue and retry again later. If he does, the same procedure repeats. Until, after several attempts, wasting both his queue space and socket handles for several days, he gives up. The resources I have to waste to do this are minimal.
If the sender is badly configured, an uncooperative recipient might actually delay his entire queue handling for several minutes each time he connects to the tarpit. And many spammers use badly configured open relays.
Obviously, I only want known spammers to get connected to my tarpit instead of my real MTA.
Blacklists
I can use an externally maintained list of spammers like spamhaus to redirect senders to the tarpit selectively. But such lists may be either too slow to include new spamming hosts, or too aggressive for my taste. Some blacklists will not only include single hosts, but entire networks that contain a single spamming host, willingly hurting innocent customers of an ISP to pressure the ISP to terminate the spammer. The blacklist maintainers document such policies, and if I agree with them, it's my decision to block mail from such networks by using their blacklist.
But even if I'm comfortable with blocking mail from innocent bystanders and use the most aggressive blacklists combined, there will still be spammers getting mails delivered to me through newly discovered open relays. Those spam mails will of course be detected by my spam filters, so I'd like to use these IP addresses to build my own blacklist.
Building my own blacklist
Assume I have the following procmail configuration in place to detect (and file) spam:
:0fw
| /usr/local/bin/spamc
:0:
* ^X-Spam-Status: Yes
in-x-spam
Each incoming mail is piped through the spam detector. If it classifies the mail as spam, the message gets stored in a separate file. I could delete them instead, but I might want to check the mails for false positives every once in a while. Once the classifier is tuned right, there will be almost no false positives, and almost all spam is detected. I'm reaching 99.95% accuracy here, with maybe 0.01% false positives, which is fine for me.
Analyzing Received: headers
I'm using one additional tool, relaydb, to build a database of all hosts that send me mail. This is done after the classification by the spam detector, so I can tell the database whether the sender was sending spam or legitimate mail.
I add the following part to my procmail configuration:
:0fw
| /usr/local/bin/spamc
:0c
* ^X-Spam-Status: Yes
| /home/dhartmei/bin/relaydb -b
:0:
* ^X-Spam-Status: Yes
in-x-spam
:0c
| /home/dhartmei/bin/relaydb -w
So, detected spam gets piped through relaydb -b (blacklist), and legitimate mail through relaydb -w (whitelist). Note that only copies of mails get piped through relaydb, the program never modifies or drops a mail. All it does is build a database of hosts that sent me mail, counting spam and legitimate mail from each one.
relaydb traverses all Received: headers in a mail from top (nearest relay) to bottom. It only acts on valid numerical IP addresses in [] brackets, which is the only reliable part. And it's only reliable when I trust the previous relay in the chain, as spammers often add fake Received: headers. So relaydb starts with the top-most relay in the header and consults its database to see whether it is a known host, and if so, whether it sent me legitimate mail before. If that's the case, it increases the respective counter (spam or legitimate, as told through the -b/-w option) for that host and continues with the next relay found in the header. If the relay is a known spammer, traversal ends, as further headers cannot be trusted.
After I run this setup for a while, relaydb has built both a blacklist and a whitelist. One important detail is that a legitimate mail has more weight than than a spam mail. I regularly receive spam through mailing lists. Of course, I don't consider the mailing list server a spamming host. Yet, each spam I receive through it will increase the spam counter for that server. Therefore, relaydb only reports hosts as blacklisted when their spam counter is at least three times as high as the counter for legitimate mail (and the factor can be adjusted, of course). So a relay doesn't get blacklisted as long as it sends me legitimate mail to compensate for spam it sends, which covers mailing list servers. But if I get a spam from a host that never sent me anything before, that will cause it to get blacklisted immediately (1 >= 0*3).
