Amir Lev's picture
Amir Lev

Security Levity

Ask Amir #5: How to deal with gray reputation?

In this week's Security Levity: a reply to a couple of reader questions about spam filtering techniques. Specifically, the types of techniques that can be used when the sender's reputation is 'gray'.

What do I mean by 'gray' reputation?
Senders have gray reputation when messages from the same sender -- an IP address range, a domain name or an actual email address -- are a mix of legitimate email and spam. For example, most free mail services send both spam and good email and so have some level of "gray reputation". Anti-spam systems award a gray reputation to sources that are they simply do not know how to classify.

What can we do about email from gray senders?
In the good old days of spam filtering, we'd simply fall back to examining the email content. We'd look for keywords that indicated spam, or use statistical techniques to see if it looked like spam we'd seen before.

But looking at the content is, in the jargon, expensive. In other words, it uses a lot of computing resources -- fine back in the day, but not so much with today's huge spam volumes. Also, content filtering techniques can be less accurate than we'd like.

There are better techniques we can use before resorting to brute-force content scanning. They're often known as behavioral techniques, to distinguish them from examining sender reputation or content.

Here are three key behavioral techniques for detecting spam:

1. Graylisting
The SMTP protocol allows an email receiver to temporarily fail to receive a message -- the email elite refer to this as a tempfail. The receiving MTA sends back an error code in the 400-series, which in essence means, "Oops, something went wrong. My bad. Please try again later." This was designed for situations when, say, the MTA ran out of disk space.

However, we can deliberately return this error to a gray sender, to see what happens. The idea being that many spammers don't bother retrying. Even if they do retry, the delay might give us enough time to have built a better picture of the sender's reputation.

Typically, legitimate MTAs will retry after a few minutes. So the delay isn't usually a big problem for legitimate email, especially as most legitimate senders will already have a good reputation, so shouldn't be subject to graylisting.

Sounds simple, but it adds a level of complication to the spam filter. The filter needs to maintain a database of connections that it's recently tempfailed: a series of records containing sending IP address range, normalized envelope sender, and intended envelope recipient. It also needs to purge old records from the database in the cases when senders haven't retried. (I'm glossing over a number of tricky issues, in the interests of space.)

2. Greetpause
SMTP is a classic, well-defined command/response protocol. When a sender makes a connection to a recipient, the sender should wait until the recipient emits a greeting before beginning to send its commands.

However, many spam senders don't bother waiting for the greeting. So we can deliberately pause before greeting the sender, to see what it does. If it starts talking too early, we can treat it as a spammer.

We don't need to wait for an excessive length of time, just a period that's unusually long, but still legal -- for example, 20 seconds. Again, the delay isn't usually a big problem for legitimate email. (Theoretically, the delay could be as long as five minutes, at least according to the SMTP specifications. However, some implementations of legitimate MTAs can't cope with waiting longer than 30 seconds.)

From my experience, this method is not highly recommended: at large SMTP installations it can keep too many open connections hanging around.

3. Tarpitting
Essentially a superset of the greetpause technique, tarpitting deliberately slows down the entire connection. The theory here is that spammers will give up trying to send to a slow connection and move on to easier pickings elsewhere. In some cases it might also keep the spammers' connections busy and reduce their spam flow.

Again, not highly recommended: on large SMTP installations it can keep too many open connections hanging around.

There's also a belief that tarpitting spammers is a public service, because you're tying up resources that spammers can't then use to send spam to others. However, this is basically misguided -- this may have been true when spammers used their own servers to spam from. But nowadays -- when most of the spammers use virus-infected, hijacked PCs -- spammers are much less vulnerable to resource constrains..
 

So there you have it: three behavioral techniques to use when the reputation is ambiguous, without resorting to the sledgehammer of content filtering. And, of course, the results of these techniques can be fed back into the reputation system, so we can avoid needing to repeat them.

On balance, I’m not fond of any of these techniques. They all have their (quite substantial) deficiencies, so should be used only when you're lacking a good reputation system. Feel free to ask more questions in the comments below.

 
When he's not analyzing email sender behavior, Amir Lev is the CTO, President, and co-founder of Commtouch (NASDAQ:CTCH), an e-mail and Web defense technology provider.
 MORE...